Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/embucket/embucket/llms.txt

Use this file to discover all available pages before exploring further.

Embucket provides wire-compatible Snowflake SQL REST API support, enabling existing Snowflake clients, SDKs, and tools to connect without modification. This page describes what Snowflake features are supported and key differences.

API Compatibility

Embucket implements the Snowflake SQL REST API v1 specification, providing compatibility with official Snowflake client libraries.

Supported Endpoints

Authentication

Session creation with JWT token-based authentication

Query Execution

Synchronous and asynchronous query submission

Query Cancellation

Abort running queries by request ID

Session Management

Create, refresh, and delete sessions
Authentication endpoint:
POST /session/v1/login-request
Content-Type: application/json

{
  "data": {
    "LOGIN_NAME": "embucket",
    "PASSWORD": "embucket",
    "ACCOUNT_NAME": "acc",
    "CLIENT_APP_ID": "MyApp",
    "CLIENT_APP_VERSION": "1.0.0",
    "CLIENT_ENVIRONMENT": {},
    "SESSION_PARAMETERS": {}
  }
}
Returns JWT token valid for 3 days (configurable). Token contains session metadata including username, database, schema, and warehouse context. Query execution endpoint:
POST /queries/v1/query-request?requestId={uuid}
Authorization: Bearer {jwt_token}
Content-Type: application/json

{
  "sqlText": "SELECT * FROM customers LIMIT 10",
  "asyncExec": false,
  "querySubmissionTime": 1234567890
}
Returns results in Snowflake JSON format with rowset data and column metadata. Implementation: See crates/api-snowflake-rest/src/server/router.rs for endpoint definitions.

Authentication & Sessions

Embucket uses JWT tokens for authentication with session metadata encoded in claims. Session parameters:
  • Database: Current database context
  • Schema: Current schema context
  • Warehouse: Logical warehouse identifier (informational)
  • User: Authenticated username
  • Account: Account name (from login)
Session lifecycle:
  1. Login request creates new session with unique ID
  2. JWT token issued with session metadata
  3. Session expires after inactivity (default: configurable)
  4. Expired sessions automatically cleaned up
  5. Session can be explicitly deleted via /session endpoint
See crates/api-snowflake-rest/src/server/logic.rs:27 for login request handling. Demo authentication: By default, Embucket accepts demo credentials:
  • Username: embucket (configurable via --auth-demo-user)
  • Password: embucket (configurable via --auth-demo-password)
Production deployments should configure custom credentials or integrate with external authentication systems.

SQL Dialect Compatibility

Embucket supports the Snowflake SQL dialect with extensive syntax compatibility for common operations.

Supported SQL Features

  • SELECT with joins, subqueries, CTEs
  • WHERE, GROUP BY, HAVING, ORDER BY
  • LIMIT, OFFSET, TOP N
  • UNION, INTERSECT, EXCEPT
  • Window functions (ROW_NUMBER, RANK, LAG, LEAD, etc.)
  • Aggregate functions (SUM, COUNT, AVG, MIN, MAX, etc.)
  • Lateral joins (LATERAL, TABLE())
  • Pivot and unpivot operations
  • Semi-structured data queries (GET, GET_PATH, array indexing)

SQL Parsing

Embucket uses a custom SQL parser built on sqlparser-rs with Snowflake dialect extensions. Parser features:
  • Snowflake-specific syntax (e.g., FLATTEN, $1 positional parameters)
  • Case-insensitive identifiers with optional normalization
  • Identifier quoting rules (double quotes preserve case)
  • Session parameter substitution
  • Table function support (e.g., TABLE(FLATTEN(...)))
  • COPY INTO statement parsing
Query rewriting: Embucket applies several query rewrites for compatibility:
  1. Identifier case normalization: Converts unquoted identifiers to uppercase (Snowflake convention)
  2. Function translation: Maps Snowflake functions to DataFusion equivalents
  3. Syntax desugaring: Expands Snowflake-specific syntax to standard SQL
  4. Session context injection: Resolves session parameters in queries
See crates/executor/src/query.rs:140 for query parsing implementation.

Supported Data Types

Embucket supports most common Snowflake data types through Arrow/DataFusion type mapping:
Snowflake TypeEmbucket/Arrow TypeNotes
NUMBER(p,s)Decimal128(p,s)Precision up to 38
INT, INTEGERInt64
BIGINTInt64
SMALLINTInt16
FLOAT, DOUBLEFloat64
VARCHAR, STRING, TEXTUtf8
BINARY, VARBINARYBinary
BOOLEANBoolean
DATEDate32Days since epoch
TIMETime64(Nanosecond)
TIMESTAMP, TIMESTAMP_NTZTimestamp(Nanosecond)
TIMESTAMP_LTZTimestamp(Nanosecond, tz)With timezone
TIMESTAMP_TZTimestamp(Nanosecond, tz)With timezone
VARIANTUnion or Utf8Semi-structured data
OBJECTStruct
ARRAYList
Type conversion: Automatic type coercion follows Snowflake semantics where possible. Use CAST or TRY_CAST for explicit conversions.

Compatible Clients & Tools

Embucket works with standard Snowflake client tools:

Snowflake CLI

Official snow sql command-line tool

Python Connector

snowflake-connector-python library

JDBC/ODBC

Snowflake JDBC and ODBC drivers (API endpoints)

dbt

dbt-snowflake adapter for transformations
Configuration example (Snowflake CLI):
# ~/.snowflake/config.toml
[connections.embucket]
host = "localhost"
port = 3000
protocol = "http"
account = "acc.local"
user = "embucket"
password = "embucket"
database = "demo"
schema = "public"
warehouse = "em.wh"
See the Quickstart for detailed setup instructions.

Key Differences from Snowflake

While Embucket provides broad compatibility, there are important differences:

Architecture Differences

No distributed query execution: Each Embucket node executes queries independently. For distributed processing across multiple nodes, use a separate query engine like Apache Spark or Trino with the same Iceberg tables.
  • Query-per-node model: Queries run entirely on one node vs. Snowflake’s distributed execution
  • No automatic clustering: Table optimization requires manual OPTIMIZE commands
  • Local caching only: Metadata and result caching is per-node, not global
  • Session affinity: Session state (temp tables, variables) exists only on the session node

Unsupported Features

Enterprise features:
  • Multi-cluster warehouses
  • Automatic query optimization and statistics
  • Query result caching across sessions
  • Data sharing and marketplaces
  • Replication and failover
  • Row-level security policies
  • Column-level security and masking
  • External functions (AWS Lambda, etc.)
Advanced SQL features:
  • User-defined functions (UDFs) in JavaScript/Java
  • Stored procedures
  • Tasks and streams (change data capture)
  • Snowpipe (continuous loading)
  • External tables (partial support)
  • Materialized views
  • Transient and temporary tables (limited)
Account and access management:
  • Role-based access control (RBAC)
  • User and role management commands
  • Resource monitors
  • Network policies

Behavior Differences

Transactions:
  • Embucket supports single-statement transactions via Iceberg
  • Multi-statement transactions have limited support
  • No BEGIN TRANSACTION / COMMIT / ROLLBACK (uses auto-commit)
Concurrency:
  • Configurable concurrent query limit (default: 100 per node)
  • No automatic query queueing - exceeding limit returns error
  • No warehouse scaling or query distribution
Performance:
  • Query performance depends on node resources (CPU, memory)
  • No automatic performance tuning or query optimization hints
  • Smaller data volumes recommended for single-node architecture

Migration Considerations

When migrating from Snowflake to Embucket:
1

Assess SQL compatibility

Review queries for unsupported features (UDFs, stored procedures, etc.)
2

Test workload

Run representative queries to validate correctness and performance
3

Adjust client configuration

Update connection strings to point to Embucket endpoint
4

Plan for architecture differences

Understand single-node execution model and concurrency limits
5

Export to Iceberg format

Convert existing tables to Apache Iceberg if not already
Best practices:
  • Start with read-heavy analytical workloads
  • Use Embucket for development/testing environments
  • Validate query results against Snowflake before cutover
  • Monitor resource usage and adjust node sizing accordingly
  • Consider horizontal scaling for higher throughput needs

Troubleshooting

Common Issues

“Invalid authentication data” error:
  • Verify username/password matches configured demo credentials
  • Check account name format (include region if needed)
“Query timeout” errors:
  • Increase --query-timeout-secs CLI argument (default: 1200)
  • Optimize query or add indexes/partitioning to underlying tables
  • Consider breaking large queries into smaller pieces
“Query limit exceeded” errors:
  • Increase --max-concurrency-level CLI argument (default: 100)
  • Add more Embucket nodes behind load balancer
  • Review concurrent query patterns and add throttling
Unsupported syntax errors:
  • Check if query uses Snowflake-specific features not yet implemented
  • Rewrite using standard SQL or supported functions
  • File issue on GitHub for feature requests

Next Steps

Architecture

Learn how Embucket’s architecture works

Iceberg Integration

Understand Apache Iceberg storage