Embucket provides extensive performance tuning options to optimize query execution for your workload. This guide covers key configuration parameters and best practices.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/embucket/embucket/llms.txt
Use this file to discover all available pages before exploring further.
Memory Pool Configuration
Memory pools control how query execution allocates and manages memory. Embucket supports two memory pool types:Pool Types
Greedy (Default)
Greedy (Default)
Allows aggressive memory consumption up to the configured limit. Once the pool is full, all consumers are blocked until memory is freed.Best for:
- Single-query workloads
- Development environments
- Simpler deployment scenarios
Fair
Fair
Enforces fair memory usage across all consumers with spill-based control. No single query dominates memory resources.Best for:
- Concurrent workloads
- Production environments with multiple simultaneous queries
- Multi-tenant scenarios
Setting Memory Limits
Configure the maximum memory pool size in megabytes:If
MEM_POOL_SIZE_MB is not set, Embucket uses unlimited memory, which may lead to OOM conditions under heavy load.Memory Consumer Tracking
Enable detailed per-consumer memory tracking for debugging and optimization:TrackConsumersPool, which tracks the top 5 memory consumers. Enable this when troubleshooting memory issues.
Disk Pool for Spilling
When queries exceed available memory, Embucket can spill intermediate results to disk:DiskManager with OsTmpDirectory mode. Embucket automatically creates temporary files in the OS temp directory.
Concurrency Settings
Maximum Concurrent Queries
Limit the number of queries that can run simultaneously:crates/embucketd/src/cli.rs:52-56
When the concurrency limit is reached, new queries receive a “Concurrency limit reached” error immediately rather than queuing.
Query Timeout
Maximum duration a single query is allowed to run:crates/embucketd/src/cli.rs:58-64
Queries exceeding this timeout are automatically cancelled with a QueryTimeout error.
Table Fetch Parallelism
Control concurrent metadata requests when fetching table details:crates/embucketd/src/cli.rs:166-170
Increasing this value speeds up catalog operations but increases load on object stores and Iceberg catalogs.
AWS SDK Timeout Tuning
Connection Timeout
Maximum time to establish a connection to AWS services:crates/embucketd/src/cli.rs:172-178
Operation Timeout
Total time allowed for an AWS SDK operation to complete:crates/embucketd/src/cli.rs:180-186
Operation Attempt Timeout
Maximum time for a single attempt of an AWS SDK operation:crates/embucketd/src/cli.rs:188-194
Iceberg Timeout Configuration
Table Operations
crates/embucketd/src/cli.rs:196-202
Catalog Operations
crates/embucketd/src/cli.rs:204-210
Object Store Timeout Configuration
Read/Write Timeout
crates/embucketd/src/cli.rs:212-218
Connect Timeout
crates/embucketd/src/cli.rs:220-226
Best Practices for Production
Memory Configuration
- Use Fair memory pool for concurrent workloads
- Set
MEM_POOL_SIZE_MBto 60-70% of available RAM - Configure disk spilling at 2-3x memory size
- Enable consumer tracking only when debugging
Concurrency Tuning
- Set
MAX_CONCURRENCY_LEVELbased on CPU cores (1-2x core count) - Adjust
QUERY_TIMEOUT_SECSfor your longest queries - Monitor query queue depth and adjust limits
Network Timeouts
- Increase AWS SDK timeouts for slow networks or large data
- Set Iceberg timeouts based on catalog responsiveness
- Configure object store timeouts for reliable reads
Table Metadata
- Increase
MAX_CONCURRENT_TABLE_FETCHESfor catalogs with many tables - Balance between metadata fetch speed and catalog load
- Monitor catalog response times
Example Production Configuration
Here’s a recommended configuration for a production server with 32GB RAM and 16 cores:Monitoring Performance
After tuning, monitor these metrics:- Query execution times
- Memory usage and spill frequency
- Concurrency limit rejections
- Timeout errors
- AWS SDK operation durations