Documentation Index
Fetch the complete documentation index at: https://mintlify.com/embucket/embucket/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Embucket can be deployed as a single, self-contained binary. This deployment method provides:- No dependencies or external runtime requirements
- Direct control over execution environment
- Minimal resource overhead
- Easy integration with existing infrastructure
Building from Source
Prerequisites
Build Process
Release builds are optimized and significantly faster than debug builds. Always use
--release for production deployments.System Requirements
Minimum Requirements
CPU
- Minimum: 2 CPU cores
- Recommended: 4+ cores
- x86_64 or ARM64 architecture
Memory
- Minimum: 2 GB RAM
- Recommended: 4-8 GB RAM
- More memory for larger queries
Disk
- Minimum: 1 GB for binary and metadata
- Recommended: 10+ GB for query spilling
- SSD recommended for better I/O
Network
- Low-latency connection to object storage
- Bandwidth depends on query workload
- For S3: Same region deployment recommended
Operating Systems
- Linux (Ubuntu 20.04+, Debian 11+, RHEL 8+, Fedora 35+)
- macOS (11.0+)
- Other Unix-like systems with Rust support
Running the Binary
Basic Usage
Start Embucket with default settings:localhost:3000 by default.
With Configuration File
Start with a metastore configuration:Custom Host and Port
Bind to a specific host and port:Using Environment Variables
Configure via environment variables:Binary Flags and Options
All command-line flags can be set via environment variables or CLI arguments.Server Configuration
Host to bind to
Port to bind to
Service idle timeout in seconds
Metastore Configuration
Path to YAML config describing volumes/databases to seed the metastore
Query Execution
Maximum number of running queries at the same time
Maximum duration in seconds a single query is allowed to run
The maximum number of concurrent requests to get table details
Memory and Disk Pools
Memory pool type for query execution:
greedy or fairMaximum memory pool size in megabytes
Wrap memory pool with TrackConsumersPool for tracking per-consumer memory usage
Maximum disk pool size in megabytes (for spilling)
Authentication
User for auth demo
Password for auth demo
JWT secret for auth (hidden in logs)
SQL Configuration
Data serialization format in Snowflake v1 API
SQL parser dialect:
snowflake, postgres, mysql, generic, etc.Tracing and Monitoring
Tracing level:
off, info, debug, or trace (overridden by RUST_LOG)Tracing span processor
OpenTelemetry Exporter Protocol:
grpc or http/jsonEnable memory tracing functionality
AWS SDK Timeouts
AWS SDK connect timeout in seconds
AWS SDK operation timeout in seconds
AWS SDK operation attempt timeout in seconds
Iceberg Timeouts
Iceberg create table timeout in seconds
Iceberg catalog timeout in seconds
Object Store Timeouts
Object store timeout in seconds
Object store connect timeout in seconds
Systemd Service
Run Embucket as a systemd service for automatic startup and management.Service File
Create/etc/systemd/system/embucket.service:
Environment File
Create/opt/embucket/embucket.env:
Installation
Service Management
Production Considerations
Reverse Proxy
Use nginx or similar for:
- TLS termination
- Load balancing
- Request rate limiting
- Access logging
Monitoring
Integrate with monitoring systems:
- Prometheus metrics (via OTLP)
- CloudWatch (if on AWS)
- Datadog, New Relic, etc.
High Availability
Deploy multiple instances:
- Place behind load balancer
- Share object storage backend
- Query-per-node architecture
Security
Security best practices:
- Change default credentials
- Use strong JWT secret
- Run as non-root user
- Enable firewall rules
Next Steps
Configuration
Explore all configuration options
Docker Deployment
Deploy using Docker containers