Documentation Index
Fetch the complete documentation index at: https://mintlify.com/embucket/embucket/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Embucket can be deployed as an AWS Lambda function, providing a serverless lakehouse solution. This deployment mode is ideal for:- On-demand query processing
- Cost-effective workloads with intermittent usage
- Scalable query endpoints without managing infrastructure
- Integration with AWS services
Prerequisites
Install cargo-lambda
Install the cargo-lambda tool for building and deploying Rust Lambda functions:
Prepare configuration file
Create a
config/metastore.yaml file with your catalog configuration (see Configuration for details).Building for Lambda
Build the Embucket Lambda binary for ARM64 architecture (recommended for better price/performance):bootstrap binary in the target/lambda/embucket-lambda/ directory.
ARM64 (Graviton2) provides better price-performance compared to x86_64. Use
--arm64 flag for ARM builds.Deployment
Basic Deployment
Deploy the function using cargo-lambda:- IAM Role:
AWSLambdaBasicExecutionRole - Memory: 1024 MB
- Timeout: 30 seconds
- Includes:
config/directory from project root
Deployment with Function URL
Enable a public HTTPS endpoint for your function:Function URL Configuration
IAM Authentication
For production deployments, configure IAM authentication on the function URL:Public Access
For development or public access:IAM Role Requirements
Your Lambda function needs appropriate IAM permissions to access AWS resources.Basic Execution Role
Minimum permissions for Lambda execution:S3 Access for Iceberg Tables
If using S3 for Iceberg table storage:S3 Tables Access
For AWS S3 Table Buckets:Configuration File Handling
Theconfig/metastore.yaml file is packaged with your Lambda deployment.
Example Configuration
config/metastore.yaml
Using IAM Roles (Recommended)
Use IAM roles instead of embedding credentials:config/metastore.yaml
Environment Variables
Configure Lambda function settings using environment variables:Common Environment Variables
Path to metastore configuration file (packaged with deployment)
Query execution timeout in seconds (should be less than Lambda timeout)
Maximum concurrent queries per Lambda invocation
Memory pool type:
greedy or fairLogging level:
off, info, debug, or tracePerformance Considerations
Memory Configuration
Allocate sufficient memory based on query complexity:
- Minimum: 1024 MB
- Recommended: 2048-4096 MB
- Large queries: 8192+ MB
Timeout Settings
Set appropriate timeout (max 15 minutes for Lambda):Ensure
QUERY_TIMEOUT_SECS < Lambda timeout.Cold Start
First invocation may take 5-10 seconds:
- Use provisioned concurrency for latency-sensitive workloads
- Consider SnapStart for Java-compatible runtimes
- Lambda typically reuses instances for subsequent requests
Ephemeral Storage
Default 512 MB, increase for large queries:
Connecting with Snowflake CLI
Create a connection profile for your Lambda deployment:- Connection name:
lambda - Account:
acc.lambda - User:
embucket - Password:
embucket - Role:
em.role - Warehouse:
em.wh - Database:
demo - Schema:
public - Host:
https://7mh4xw9n2pqjvf5kzrbt8ycusg6dla3e.lambda-url.us-east-2.on.aws - Region:
us-east-2
Test the Connection
Monitoring and Observability
CloudWatch Logs
Lambda automatically sends logs to CloudWatch Logs:OpenTelemetry Integration
Embucket Lambda includes OpenTelemetry support. Configure the OTLP endpoint:The Lambda version includes JSON-formatted logs optimized for CloudWatch Logs Insights.
Next Steps
Configuration
Explore all configuration options
Docker Deployment
Deploy using Docker containers