Embucket provides wire-compatible Snowflake SQL REST API support, enabling existing Snowflake clients, SDKs, and tools to connect without modification. This page describes what Snowflake features are supported and key differences.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/embucket/embucket/llms.txt
Use this file to discover all available pages before exploring further.
API Compatibility
Embucket implements the Snowflake SQL REST API v1 specification, providing compatibility with official Snowflake client libraries.Supported Endpoints
Authentication
Session creation with JWT token-based authentication
Query Execution
Synchronous and asynchronous query submission
Query Cancellation
Abort running queries by request ID
Session Management
Create, refresh, and delete sessions
crates/api-snowflake-rest/src/server/router.rs for endpoint definitions.
Authentication & Sessions
Embucket uses JWT tokens for authentication with session metadata encoded in claims. Session parameters:- Database: Current database context
- Schema: Current schema context
- Warehouse: Logical warehouse identifier (informational)
- User: Authenticated username
- Account: Account name (from login)
- Login request creates new session with unique ID
- JWT token issued with session metadata
- Session expires after inactivity (default: configurable)
- Expired sessions automatically cleaned up
- Session can be explicitly deleted via
/sessionendpoint
crates/api-snowflake-rest/src/server/logic.rs:27 for login request handling.
Demo authentication:
By default, Embucket accepts demo credentials:
- Username:
embucket(configurable via--auth-demo-user) - Password:
embucket(configurable via--auth-demo-password)
SQL Dialect Compatibility
Embucket supports the Snowflake SQL dialect with extensive syntax compatibility for common operations.Supported SQL Features
- Data Query (DQL)
- Data Definition (DDL)
- Data Manipulation (DML)
- Functions
SELECTwith joins, subqueries, CTEsWHERE,GROUP BY,HAVING,ORDER BYLIMIT,OFFSET,TOP NUNION,INTERSECT,EXCEPT- Window functions (
ROW_NUMBER,RANK,LAG,LEAD, etc.) - Aggregate functions (
SUM,COUNT,AVG,MIN,MAX, etc.) - Lateral joins (
LATERAL,TABLE()) - Pivot and unpivot operations
- Semi-structured data queries (
GET,GET_PATH, array indexing)
SQL Parsing
Embucket uses a custom SQL parser built onsqlparser-rs with Snowflake dialect extensions.
Parser features:
- Snowflake-specific syntax (e.g.,
FLATTEN,$1positional parameters) - Case-insensitive identifiers with optional normalization
- Identifier quoting rules (double quotes preserve case)
- Session parameter substitution
- Table function support (e.g.,
TABLE(FLATTEN(...))) COPY INTOstatement parsing
- Identifier case normalization: Converts unquoted identifiers to uppercase (Snowflake convention)
- Function translation: Maps Snowflake functions to DataFusion equivalents
- Syntax desugaring: Expands Snowflake-specific syntax to standard SQL
- Session context injection: Resolves session parameters in queries
crates/executor/src/query.rs:140 for query parsing implementation.
Supported Data Types
Embucket supports most common Snowflake data types through Arrow/DataFusion type mapping:| Snowflake Type | Embucket/Arrow Type | Notes |
|---|---|---|
NUMBER(p,s) | Decimal128(p,s) | Precision up to 38 |
INT, INTEGER | Int64 | |
BIGINT | Int64 | |
SMALLINT | Int16 | |
FLOAT, DOUBLE | Float64 | |
VARCHAR, STRING, TEXT | Utf8 | |
BINARY, VARBINARY | Binary | |
BOOLEAN | Boolean | |
DATE | Date32 | Days since epoch |
TIME | Time64(Nanosecond) | |
TIMESTAMP, TIMESTAMP_NTZ | Timestamp(Nanosecond) | |
TIMESTAMP_LTZ | Timestamp(Nanosecond, tz) | With timezone |
TIMESTAMP_TZ | Timestamp(Nanosecond, tz) | With timezone |
VARIANT | Union or Utf8 | Semi-structured data |
OBJECT | Struct | |
ARRAY | List |
CAST or TRY_CAST for explicit conversions.
Compatible Clients & Tools
Embucket works with standard Snowflake client tools:Snowflake CLI
Official
snow sql command-line toolPython Connector
snowflake-connector-python libraryJDBC/ODBC
Snowflake JDBC and ODBC drivers (API endpoints)
dbt
dbt-snowflake adapter for transformations
Key Differences from Snowflake
While Embucket provides broad compatibility, there are important differences:Architecture Differences
- Query-per-node model: Queries run entirely on one node vs. Snowflake’s distributed execution
- No automatic clustering: Table optimization requires manual OPTIMIZE commands
- Local caching only: Metadata and result caching is per-node, not global
- Session affinity: Session state (temp tables, variables) exists only on the session node
Unsupported Features
Enterprise features:- Multi-cluster warehouses
- Automatic query optimization and statistics
- Query result caching across sessions
- Data sharing and marketplaces
- Replication and failover
- Row-level security policies
- Column-level security and masking
- External functions (AWS Lambda, etc.)
- User-defined functions (UDFs) in JavaScript/Java
- Stored procedures
- Tasks and streams (change data capture)
- Snowpipe (continuous loading)
- External tables (partial support)
- Materialized views
- Transient and temporary tables (limited)
- Role-based access control (RBAC)
- User and role management commands
- Resource monitors
- Network policies
Behavior Differences
Transactions:- Embucket supports single-statement transactions via Iceberg
- Multi-statement transactions have limited support
- No
BEGIN TRANSACTION/COMMIT/ROLLBACK(uses auto-commit)
- Configurable concurrent query limit (default: 100 per node)
- No automatic query queueing - exceeding limit returns error
- No warehouse scaling or query distribution
- Query performance depends on node resources (CPU, memory)
- No automatic performance tuning or query optimization hints
- Smaller data volumes recommended for single-node architecture
Migration Considerations
When migrating from Snowflake to Embucket:
Best practices:
- Start with read-heavy analytical workloads
- Use Embucket for development/testing environments
- Validate query results against Snowflake before cutover
- Monitor resource usage and adjust node sizing accordingly
- Consider horizontal scaling for higher throughput needs
Troubleshooting
Common Issues
“Invalid authentication data” error:- Verify username/password matches configured demo credentials
- Check account name format (include region if needed)
- Increase
--query-timeout-secsCLI argument (default: 1200) - Optimize query or add indexes/partitioning to underlying tables
- Consider breaking large queries into smaller pieces
- Increase
--max-concurrency-levelCLI argument (default: 100) - Add more Embucket nodes behind load balancer
- Review concurrent query patterns and add throttling
- Check if query uses Snowflake-specific features not yet implemented
- Rewrite using standard SQL or supported functions
- File issue on GitHub for feature requests
Next Steps
Architecture
Learn how Embucket’s architecture works
Iceberg Integration
Understand Apache Iceberg storage