Documentation Index
Fetch the complete documentation index at: https://mintlify.com/embucket/embucket/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Embucket can be configured using:- Command-line flags:
./embucketd --flag value - Environment variables:
export VAR=value - Configuration files: YAML files for metastore configuration
Server Configuration
Network Settings
Host address to bind the server to.
- Use
0.0.0.0to accept connections from any network interface - Use
localhostor127.0.0.1for local-only access - Specify a specific IP address to bind to a particular interface
Port number for the Snowflake-compatible API server.Example:
Service idle timeout in seconds. Connections idle for longer than this duration may be closed.Default: 18000 seconds (5 hours)
Metastore Configuration
Path to YAML configuration file describing volumes, databases, schemas, and tables.Example:See Metastore Configuration File section below for YAML format.
Metastore Configuration File
The metastore configuration file defines external catalogs and table locations.S3 Tables (AWS S3 Table Buckets)
S3 with External Iceberg Tables
Query Execution Settings
Concurrency and Timeouts
Maximum number of queries that can run concurrently.
- Higher values increase throughput but require more memory
- Set based on available CPU cores and memory
Maximum duration in seconds a single query is allowed to run before being terminated.Default: 1200 seconds (20 minutes)Example:
Maximum number of concurrent requests to fetch table metadata.
- Increase for faster catalog scanning with many tables
- May increase load on catalog service
Memory and Disk Pools
Memory Pool Configuration
Memory pool allocation strategy:
greedy: Allocates memory aggressively, may use all available pool for a single queryfair: Distributes memory more evenly across concurrent queries
Maximum memory pool size in megabytes for query execution.
- If not set, uses system-determined defaults
- Recommended: 50-70% of available system memory
Enable tracking of per-consumer (per-query) memory usage.
- Useful for debugging memory-intensive queries
- Adds slight overhead
Disk Pool Configuration
Maximum disk pool size in megabytes for query spilling.
- Used when queries exceed memory pool size
- Requires fast disk (SSD recommended)
SQL Configuration
Data serialization format for the Snowflake v1 API.Default:
jsonSQL parser dialect to use for query parsing.Options:
snowflake, postgres, mysql, genericExample:Authentication
Username for demo authentication mode.Example:
Password for demo authentication mode.Example:
Secret key for JWT token signing. Values are hidden in logs for security.Example:
Tracing and Monitoring
Logging Configuration
Default tracing/logging level.Options:Example:
off: Disable all logginginfo: Standard operational logsdebug: Detailed debugging informationtrace: Very verbose trace-level logs
This setting is overridden by the
RUST_LOG environment variable if set.Enable memory allocation tracing for debugging memory usage.
- Adds performance overhead
- Useful for identifying memory leaks or inefficient allocations
OpenTelemetry Configuration
OpenTelemetry OTLP exporter protocol.Options:
grpc: Use gRPC protocol (default)http/json: Use HTTP with JSON encoding
Tracing span processor type.Options:
batch-span-processor: Batch spans for efficient exportbatch-span-processor-experimental-async-runtime: Experimental async batch processor
OTLP Endpoint
Configure the OpenTelemetry collector endpoint using standard OTLP environment variables:AWS SDK Timeouts
These settings control timeouts for AWS SDK operations (S3, S3 Tables, etc.).Timeout for establishing AWS SDK connections in seconds.
Total timeout for AWS SDK operations in seconds.
Timeout for individual AWS SDK operation attempts in seconds (before retry).
Iceberg Timeouts
Timeout for Iceberg table operations (create, load metadata) in seconds.
Timeout for Iceberg catalog operations in seconds.
Object Store Timeouts
Timeout for object store operations (reads, writes) in seconds.
Timeout for establishing object store connections in seconds.
Configuration Examples
Development Configuration
Quick local development setup:Production Configuration
Optimized production deployment:Environment Variables Configuration
Using environment variables for cleaner configuration:High-Performance Configuration
For compute-intensive workloads:Docker Environment Variables
When running in Docker, these additional environment variables are commonly used:Next Steps
Docker
Deploy with Docker
AWS Lambda
Serverless deployment
Binary
Standalone binary setup