Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/embucket/embucket/llms.txt

Use this file to discover all available pages before exploring further.

Embucket can be configured entirely through environment variables, making it ideal for containerized deployments. All CLI flags have corresponding environment variables.

Server Configuration

METASTORE_CONFIG
string
Path to YAML config file describing volumes/databases to seed the metastore.
export METASTORE_CONFIG="/etc/embucket/config.yaml"
BUCKET_HOST
string
default:"localhost"
Host address to bind the server to.
export BUCKET_HOST="0.0.0.0"
BUCKET_PORT
integer
default:"3000"
Port number to bind the server to.
export BUCKET_PORT="8080"
IDLE_TIMEOUT_SECONDS
integer
default:"18000"
Service idle timeout in seconds.
export IDLE_TIMEOUT_SECONDS="3600"

Query Execution

MAX_CONCURRENCY_LEVEL
integer
default:"8"
Maximum number of queries running simultaneously.
export MAX_CONCURRENCY_LEVEL="32"
QUERY_TIMEOUT_SECS
integer
default:"1200"
Maximum duration in seconds for a single query.
export QUERY_TIMEOUT_SECS="600"
MAX_CONCURRENT_TABLE_FETCHES
integer
default:"2"
Maximum concurrent requests to get table details.
export MAX_CONCURRENT_TABLE_FETCHES="4"

Memory and Resources

MEM_POOL_TYPE
enum
default:"greedy"
Memory pool type: greedy or fair.
export MEM_POOL_TYPE="fair"
MEM_POOL_SIZE_MB
integer
Maximum memory pool size in megabytes.
export MEM_POOL_SIZE_MB="8192"
MEM_ENABLE_TRACK_CONSUMERS_POOL
boolean
default:"false"
Enable per-consumer memory usage tracking.
export MEM_ENABLE_TRACK_CONSUMERS_POOL="true"
DISK_POOL_SIZE_MB
integer
Maximum disk pool size in megabytes for spilling.
export DISK_POOL_SIZE_MB="20480"
ALLOC_TRACING
boolean
default:"false"
Enable memory tracing functionality.
export ALLOC_TRACING="true"

Data Format

DATA_FORMAT
string
default:"json"
Data serialization format in Snowflake v1 API.
export DATA_FORMAT="json"
SQL_PARSER_DIALECT
string
default:"snowflake"
SQL parser dialect: snowflake, postgres, mysql, or generic.
export SQL_PARSER_DIALECT="postgres"

Authentication

AUTH_DEMO_USER
string
default:"embucket"
Username for demo authentication.
export AUTH_DEMO_USER="admin"
AUTH_DEMO_PASSWORD
string
default:"embucket"
Password for demo authentication.
export AUTH_DEMO_PASSWORD="secure_password"
JWT_SECRET
string
JWT secret for authentication. Automatically cleared after startup.
Keep this value secret in production environments.
export JWT_SECRET="your-secure-secret-key"

AWS SDK Configuration

AWS_SDK_CONNECT_TIMEOUT_SECS
integer
default:"3"
AWS SDK connection timeout in seconds.
export AWS_SDK_CONNECT_TIMEOUT_SECS="5"
AWS_SDK_OPERATION_TIMEOUT_SECS
integer
default:"30"
AWS SDK operation timeout in seconds.
export AWS_SDK_OPERATION_TIMEOUT_SECS="60"
AWS_SDK_OPERATION_ATTEMPT_TIMEOUT_SECS
integer
default:"10"
AWS SDK operation attempt timeout in seconds.
export AWS_SDK_OPERATION_ATTEMPT_TIMEOUT_SECS="15"

Iceberg Configuration

ICEBERG_CREATE_TABLE_TIMEOUT_SECS
integer
default:"30"
Iceberg table creation timeout in seconds.
export ICEBERG_CREATE_TABLE_TIMEOUT_SECS="60"
ICEBERG_CATALOG_TIMEOUT_SECS
integer
default:"10"
Iceberg catalog operation timeout in seconds.
export ICEBERG_CATALOG_TIMEOUT_SECS="20"

Object Store Configuration

OBJECT_STORE_TIMEOUT_SECS
integer
default:"10"
Object store operation timeout in seconds.
export OBJECT_STORE_TIMEOUT_SECS="30"
OBJECT_STORE_CONNECT_TIMEOUT_SECS
integer
default:"3"
Object store connection timeout in seconds.
export OBJECT_STORE_CONNECT_TIMEOUT_SECS="5"

Observability

TRACING_LEVEL
enum
default:"info"
Tracing level: off, info, debug, or trace. Can be overridden by RUST_LOG.
export TRACING_LEVEL="debug"
span_processor
enum
default:"batch-span-processor"
Tracing span processor type.
export span_processor="batch-span-processor"
OTEL_EXPORTER_OTLP_PROTOCOL
string
default:"grpc"
OpenTelemetry Exporter Protocol: grpc or http.
export OTEL_EXPORTER_OTLP_PROTOCOL="http"

Volume Bootstrap Variables

These variables allow you to bootstrap a volume at startup without a YAML configuration file.
VOLUME_TYPE
enum
Type of volume to bootstrap.Options: s3, s3tables (or s3_tables, s3-tables), memory
export VOLUME_TYPE="s3"
VOLUME_IDENT
string
default:"embucket"
Identifier name for the volume.
export VOLUME_IDENT="my_volume"
VOLUME_DATABASE
string
Optional database name to auto-create with this volume.
export VOLUME_DATABASE="my_database"

S3 Volume Variables

VOLUME_ACCESS_KEY
string
AWS access key ID for S3 volumes. Required for s3 and s3tables volume types unless using credential provider chain.
export VOLUME_ACCESS_KEY="AKIAIOSFODNN7EXAMPLE"
VOLUME_SECRET_KEY
string
AWS secret access key for S3 volumes. Required for s3 and s3tables volume types unless using credential provider chain.
export VOLUME_SECRET_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
VOLUME_AWS_SESSION_TOKEN
string
Optional AWS session token for temporary credentials.
export VOLUME_AWS_SESSION_TOKEN="temporary-token"

S3 Tables Volume Variables

VOLUME_ARN
string
Amazon S3 Tables bucket ARN. Required for s3tables volume type.Format: arn:aws:s3tables:region:account-id:bucket/bucket-name
export VOLUME_ARN="arn:aws:s3tables:us-east-1:123456789012:bucket/my-table-bucket"

Deployment Examples

Docker Compose

version: '3.8'
services:
  embucket:
    image: embucket/embucket:latest
    environment:
      BUCKET_HOST: "0.0.0.0"
      BUCKET_PORT: "3000"
      MAX_CONCURRENCY_LEVEL: "32"
      MEM_POOL_SIZE_MB: "8192"
      TRACING_LEVEL: "info"
      VOLUME_TYPE: "s3"
      VOLUME_ACCESS_KEY: "${AWS_ACCESS_KEY_ID}"
      VOLUME_SECRET_KEY: "${AWS_SECRET_ACCESS_KEY}"
    ports:
      - "3000:3000"

Kubernetes ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: embucket-config
data:
  BUCKET_HOST: "0.0.0.0"
  BUCKET_PORT: "3000"
  MAX_CONCURRENCY_LEVEL: "32"
  MEM_POOL_SIZE_MB: "8192"
  MEM_POOL_TYPE: "fair"
  QUERY_TIMEOUT_SECS: "3600"
  TRACING_LEVEL: "info"
  SQL_PARSER_DIALECT: "snowflake"

Bootstrap S3 Tables Volume

export VOLUME_TYPE="s3tables"
export VOLUME_IDENT="production_bucket"
export VOLUME_DATABASE="analytics"
export VOLUME_ARN="arn:aws:s3tables:us-east-1:123456789012:bucket/prod-tables"
export VOLUME_ACCESS_KEY="AKIAIOSFODNN7EXAMPLE"
export VOLUME_SECRET_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

embucketd

Use AWS Credential Provider Chain

When VOLUME_ACCESS_KEY and VOLUME_SECRET_KEY are not set for S3 Tables volumes, Embucket will use the AWS default credential provider chain:
  1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  2. Shared config files (~/.aws/config, ~/.aws/credentials)
  3. Web Identity Tokens
  4. ECS (IAM Roles for Tasks) & General HTTP credentials
  5. EC2 IMDSv2
export VOLUME_TYPE="s3tables"
export VOLUME_ARN="arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket"
# Credentials will be resolved from AWS credential chain
embucketd