Configuration Reference

Overview

Embucket can be configured using:

Command-line flags: ./embucketd --flag value
Environment variables: export VAR=value
Configuration files: YAML files for metastore configuration

All CLI flags have corresponding environment variable equivalents.

Server Configuration

Network Settings

--host

string

default:"localhost"

Host address to bind the server to.

Use 0.0.0.0 to accept connections from any network interface
Use localhost or 127.0.0.1 for local-only access
Specify a specific IP address to bind to a particular interface

Example:

./embucketd --host 0.0.0.0

--port

number

default:"3000"

Port number for the Snowflake-compatible API server.Example:

./embucketd --port 8080

--timeout

number

default:"18000"

Service idle timeout in seconds. Connections idle for longer than this duration may be closed.Default: 18000 seconds (5 hours)

Metastore Configuration

--metastore-config

string

Path to YAML configuration file describing volumes, databases, schemas, and tables.Example:

./embucketd --metastore-config /opt/embucket/config/metastore.yaml

See Metastore Configuration File section below for YAML format.

Metastore Configuration File

The metastore configuration file defines external catalogs and table locations.

S3 Tables (AWS S3 Table Buckets)

volumes:
  - ident: demo
    type: s3-tables
    database: demo
    credentials:
      credential_type: access_key
      aws-access-key-id: YOUR_ACCESS_KEY
      aws-secret-access-key: YOUR_SECRET_KEY
    arn: arn:aws:s3tables:us-east-2:123456789012:bucket/my-table-bucket

S3 with External Iceberg Tables

volumes:
  - ident: lakehouse
    type: s3
    region: us-east-2
    bucket: YOUR_BUCKET_NAME
    credentials:
      credential_type: access_key
      aws-access-key-id: YOUR_ACCESS_KEY
      aws-secret-access-key: YOUR_SECRET_KEY

databases:
  - ident: demo
    volume: lakehouse

schemas:
  - database: demo
    schema: tpch_10

tables:
  - database: demo
    schema: tpch_10
    table: customer
    metadata_location: s3://YOUR_BUCKET_NAME/tpch_10/customer/metadata/00001-eea1cccb-38a4-4fe2-8c95-c01dae9d0c60.metadata.json
  - database: demo
    schema: tpch_10
    table: lineitem
    metadata_location: s3://YOUR_BUCKET_NAME/tpch_10/lineitem/metadata/00001-d777220e-d508-4033-a229-8c4c8d8fe514.metadata.json

External Iceberg tables must be in the same bucket as the volume definition.

Query Execution Settings

Concurrency and Timeouts

--max-concurrency-level

number

default:"8"

Maximum number of queries that can run concurrently.

Higher values increase throughput but require more memory
Set based on available CPU cores and memory

Example:

./embucketd --max-concurrency-level 16

--query-timeout-secs

number

default:"1200"

Maximum duration in seconds a single query is allowed to run before being terminated.Default: 1200 seconds (20 minutes)Example:

./embucketd --query-timeout-secs 600

--max-concurrent-table-fetches

number

default:"2"

Maximum number of concurrent requests to fetch table metadata.

Increase for faster catalog scanning with many tables
May increase load on catalog service

Memory and Disk Pools

Memory Pool Configuration

--mem-pool-type

string

default:"greedy"

Memory pool allocation strategy:

greedy: Allocates memory aggressively, may use all available pool for a single query
fair: Distributes memory more evenly across concurrent queries

Example:

./embucketd --mem-pool-type fair

--mem-pool-size-mb

number

Maximum memory pool size in megabytes for query execution.

If not set, uses system-determined defaults
Recommended: 50-70% of available system memory

Example:

./embucketd --mem-pool-size-mb 8192

--mem-enable-track-consumers-pool

boolean

Enable tracking of per-consumer (per-query) memory usage.

Useful for debugging memory-intensive queries
Adds slight overhead

Example:

./embucketd --mem-enable-track-consumers-pool true

Disk Pool Configuration

--disk-pool-size-mb

number

Maximum disk pool size in megabytes for query spilling.

Used when queries exceed memory pool size
Requires fast disk (SSD recommended)

Example:

./embucketd --disk-pool-size-mb 20480

SQL Configuration

--data-format

string

default:"json"

Data serialization format for the Snowflake v1 API.Default: json

--sql-parser-dialect

string

default:"snowflake"

SQL parser dialect to use for query parsing.Options: snowflake, postgres, mysql, genericExample:

./embucketd --sql-parser-dialect snowflake

Authentication

--auth-demo-user

string

default:"embucket"

Username for demo authentication mode.Example:

./embucketd --auth-demo-user myuser

--auth-demo-password

string

default:"embucket"

Password for demo authentication mode.Example:

./embucketd --auth-demo-password mypassword

--jwt-secret

string

Secret key for JWT token signing. Values are hidden in logs for security.

Always change this from the default in production deployments. Use a cryptographically secure random string.

Example:

export JWT_SECRET="your-secure-random-string-here"
./embucketd

Tracing and Monitoring

Logging Configuration

--tracing-level

string

default:"info"

Default tracing/logging level.Options:

off: Disable all logging
info: Standard operational logs
debug: Detailed debugging information
trace: Very verbose trace-level logs

This setting is overridden by the RUST_LOG environment variable if set.

Example:

./embucketd --tracing-level debug

--alloc-tracing

boolean

Enable memory allocation tracing for debugging memory usage.

Adds performance overhead
Useful for identifying memory leaks or inefficient allocations

Example:

./embucketd --alloc-tracing true

OpenTelemetry Configuration

--otel-exporter-otlp-protocol

string

default:"grpc"

OpenTelemetry OTLP exporter protocol.Options:

grpc: Use gRPC protocol (default)
http/json: Use HTTP with JSON encoding

Example:

./embucketd --otel-exporter-otlp-protocol grpc

--tracing-span-processor

string

default:"batch-span-processor"

Tracing span processor type.Options:

batch-span-processor: Batch spans for efficient export
batch-span-processor-experimental-async-runtime: Experimental async batch processor

OTLP Endpoint

Configure the OpenTelemetry collector endpoint using standard OTLP environment variables:

export OTEL_EXPORTER_OTLP_ENDPOINT=https://your-collector:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
./embucketd

AWS SDK Timeouts

These settings control timeouts for AWS SDK operations (S3, S3 Tables, etc.).

--aws-sdk-connect-timeout-secs

number

default:"3"

Timeout for establishing AWS SDK connections in seconds.

--aws-sdk-operation-timeout-secs

number

default:"30"

Total timeout for AWS SDK operations in seconds.

--aws-sdk-operation-attempt-timeout-secs

number

default:"10"

Timeout for individual AWS SDK operation attempts in seconds (before retry).

Example:

./embucketd \
  --aws-sdk-connect-timeout-secs 5 \
  --aws-sdk-operation-timeout-secs 60 \
  --aws-sdk-operation-attempt-timeout-secs 15

Iceberg Timeouts

--iceberg-table-timeout-secs

number

default:"30"

Timeout for Iceberg table operations (create, load metadata) in seconds.

--iceberg-catalog-timeout-secs

number

default:"10"

Timeout for Iceberg catalog operations in seconds.

Example:

./embucketd \
  --iceberg-table-timeout-secs 60 \
  --iceberg-catalog-timeout-secs 20

Object Store Timeouts

--object-store-timeout-secs

number

default:"10"

Timeout for object store operations (reads, writes) in seconds.

--object-store-connect-timeout-secs

number

default:"3"

Timeout for establishing object store connections in seconds.

Example:

./embucketd \
  --object-store-timeout-secs 30 \
  --object-store-connect-timeout-secs 5

Configuration Examples

Development Configuration

Quick local development setup:

./embucketd \
  --host localhost \
  --port 3000 \
  --tracing-level debug \
  --max-concurrency-level 4 \
  --query-timeout-secs 300

Production Configuration

Optimized production deployment:

./embucketd \
  --host 0.0.0.0 \
  --port 3000 \
  --metastore-config /opt/embucket/config/metastore.yaml \
  --max-concurrency-level 16 \
  --query-timeout-secs 1800 \
  --mem-pool-type greedy \
  --mem-pool-size-mb 16384 \
  --disk-pool-size-mb 51200 \
  --tracing-level info \
  --aws-sdk-operation-timeout-secs 60 \
  --object-store-timeout-secs 30

Environment Variables Configuration

Using environment variables for cleaner configuration:

# Create environment file
cat > embucket.env << EOF
BUCKET_HOST=0.0.0.0
BUCKET_PORT=3000
METASTORE_CONFIG=/opt/embucket/config/metastore.yaml
QUERY_TIMEOUT_SECS=1800
MAX_CONCURRENCY_LEVEL=16
MEM_POOL_TYPE=greedy
MEM_POOL_SIZE_MB=16384
DISK_POOL_SIZE_MB=51200
TRACING_LEVEL=info
JWT_SECRET=your-secure-secret-here
AWS_SDK_OPERATION_TIMEOUT_SECS=60
OBJECT_STORE_TIMEOUT_SECS=30
EOF

# Load and run
source embucket.env
./embucketd

High-Performance Configuration

For compute-intensive workloads:

./embucketd \
  --max-concurrency-level 32 \
  --mem-pool-type fair \
  --mem-pool-size-mb 32768 \
  --disk-pool-size-mb 102400 \
  --max-concurrent-table-fetches 8 \
  --query-timeout-secs 3600

Docker Environment Variables

When running in Docker, these additional environment variables are commonly used:

docker run -p 3000:3000 \
  -e BUCKET_HOST=0.0.0.0 \
  -e OBJECT_STORE_BACKEND=s3 \
  -e AWS_ACCESS_KEY_ID=your-key \
  -e AWS_SECRET_ACCESS_KEY=your-secret \
  -e AWS_REGION=us-east-2 \
  -e S3_BUCKET=your-bucket \
  -e S3_ENDPOINT=http://minio:9000 \
  -e S3_ALLOW_HTTP=true \
  -e QUERY_TIMEOUT_SECS=1200 \
  -e MAX_CONCURRENCY_LEVEL=8 \
  embucket/embucket

Next Steps

Docker

Deploy with Docker

AWS Lambda

Serverless deployment

Binary

Standalone binary setup

Get Started

Core Concepts

Deployment

Catalogs & Storage

Usage Guides

Operations

Configuration Reference

Overview

Server Configuration

Network Settings

Metastore Configuration

Metastore Configuration File

S3 Tables (AWS S3 Table Buckets)

S3 with External Iceberg Tables

Query Execution Settings

Concurrency and Timeouts

Memory and Disk Pools

Memory Pool Configuration

Disk Pool Configuration

SQL Configuration

Authentication

Tracing and Monitoring

Logging Configuration

OpenTelemetry Configuration

OTLP Endpoint

AWS SDK Timeouts

Iceberg Timeouts

Object Store Timeouts

Configuration Examples

Development Configuration

Production Configuration

Environment Variables Configuration

High-Performance Configuration

Docker Environment Variables

Next Steps

Docker

AWS Lambda

Binary

Get Started

Core Concepts

Deployment

Catalogs & Storage

Usage Guides

Operations

Documentation Index

​Overview

​Server Configuration

​Network Settings

​Metastore Configuration

​Metastore Configuration File

​S3 Tables (AWS S3 Table Buckets)

​S3 with External Iceberg Tables

​Query Execution Settings

​Concurrency and Timeouts

​Memory and Disk Pools

​Memory Pool Configuration

​Disk Pool Configuration

​SQL Configuration

​Authentication

​Tracing and Monitoring

​Logging Configuration

​OpenTelemetry Configuration

​OTLP Endpoint

​AWS SDK Timeouts

​Iceberg Timeouts

​Object Store Timeouts

​Configuration Examples

​Development Configuration

​Production Configuration

​Environment Variables Configuration

​High-Performance Configuration

​Docker Environment Variables

​Next Steps

Docker

AWS Lambda

Binary

Overview

Server Configuration

Network Settings

Metastore Configuration

Metastore Configuration File

S3 Tables (AWS S3 Table Buckets)

S3 with External Iceberg Tables

Query Execution Settings

Concurrency and Timeouts

Memory and Disk Pools

Memory Pool Configuration

Disk Pool Configuration

SQL Configuration

Authentication

Tracing and Monitoring

Logging Configuration

OpenTelemetry Configuration

OTLP Endpoint

AWS SDK Timeouts

Iceberg Timeouts

Object Store Timeouts

Configuration Examples

Development Configuration

Production Configuration

Environment Variables Configuration

High-Performance Configuration

Docker Environment Variables

Next Steps