This guide helps you diagnose and resolve common issues when running Embucket.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/embucket/embucket/llms.txt
Use this file to discover all available pages before exploring further.
Common Errors and Solutions
Concurrency Limit Errors
Error Message
MAX_CONCURRENCY_LEVEL.
Reference: crates/executor/src/error.rs:19-23
Solutions:
Increase Concurrency Limit
Increase Concurrency Limit
Raise the maximum concurrent queries:Balance this with available CPU and memory resources.
Optimize Long-Running Queries
Optimize Long-Running Queries
- Identify slow queries in traces
- Add appropriate filters and limits
- Consider breaking large queries into smaller batches
- Review table statistics and partition pruning
Implement Client-Side Queuing
Implement Client-Side Queuing
Add retry logic with exponential backoff in your application when this error occurs.
Query Timeout Errors
Error Message
QUERY_TIMEOUT_SECS.
Reference: crates/executor/src/error.rs:25-29
Solutions:
Optimize Query Performance
- Check execution plan with
EXPLAIN - Verify partition pruning is working
- Add appropriate WHERE clauses
- Review join strategies
Connection Issues
Session Not Found
Error Message
crates/executor/src/error.rs:126-131
Solutions:
-
Check session lifecycle:
- Sessions expire after inactivity (default: 5 hours)
- Verify client is creating sessions properly
- Check session ID is being passed correctly
-
Monitor session expiry:
-
Increase session timeout (if needed):
Session timeout is controlled by
SESSION_INACTIVITY_EXPIRATION_SECONDS(18000 seconds / 5 hours). Reference:crates/executor/src/session.rs
AWS Connection Timeouts
Error Message
AWS SDK timeout errors when accessing S3 or Glue
- Increase Timeouts
- Check Network
- Optimize Object Store
crates/embucketd/src/cli.rs:172-194Query Failures
DataFusion Query Errors
Error Message
crates/executor/src/error.rs:133-140
Common causes:
Schema Mismatch
- Column names or types don’t match
- Case sensitivity issues
- Missing columns in SELECT
DESCRIBE TABLEType Coercion
- Incompatible data types in operations
- Invalid CAST operations
- Type inference failures
Missing Table
- Table or view doesn’t exist
- Wrong database or schema context
- Catalog not registered
SELECT CURRENT_DATABASE(), CURRENT_SCHEMA()Invalid SQL
- Syntax errors
- Unsupported SQL features
- Parser dialect mismatch
Table or Database Not Found
Error Messages
crates/executor/src/error.rs:184-216
Troubleshooting steps:
-
List available objects:
-
Verify current context:
-
Use fully qualified names:
-
Check catalog registration:
- Verify
METASTORE_CONFIGpoints to correct file - Ensure all volumes/databases are defined
- Check AWS credentials for Glue catalog access
- Verify
Iceberg Catalog Errors
Error Message
Iceberg catalog timeout or connection errors
Verify Catalog Configuration
Check your metastore config file for correct catalog settings:
- Glue catalog region
- REST catalog endpoint
- Authentication credentials
Performance Problems
High Memory Usage
Symptoms:- Out of memory errors
- Frequent disk spilling
- Slow query execution
- Enable Memory Tracking
- Check Memory Configuration
- Monitor Spilling
crates/executor/src/service.rs:247-263-
Increase memory pool:
-
Enable fair memory pool:
Reference:
crates/executor/src/utils.rs:126-143 -
Configure disk spilling:
-
Optimize queries:
- Add LIMIT clauses where appropriate
- Use incremental processing for large datasets
- Reduce number of concurrent queries
Slow Query Execution
Diagnostics:Analyze Query Plan
- Table scans without filters
- Missing partition pruning
- Inefficient joins
| Issue | Cause | Solution |
|---|---|---|
| Full table scan | No partition pruning | Add partition column to WHERE clause |
| High network latency | Slow S3 reads | Increase object store timeouts, check region |
| Disk spilling | Insufficient memory | Increase MEM_POOL_SIZE_MB |
| CPU bottleneck | Too many concurrent queries | Reduce MAX_CONCURRENCY_LEVEL |
| Slow metadata | Many table fetches | Increase MAX_CONCURRENT_TABLE_FETCHES |
Memory and Resource Issues
Disk Manager Errors
Error Message
crates/executor/src/error.rs:55-59
Cause: Internal error with disk spilling configuration.
Solutions:
- Restart Embucket
- Check disk space in temp directory
- Verify
DISK_POOL_SIZE_MBis set correctly
Query Cancelled
Error Message
crates/executor/src/error.rs:588-593
Causes:
- User initiated: Client called abort/cancel
- Timeout: Query exceeded
QUERY_TIMEOUT_SECS - System shutdown: Embucket shutting down gracefully
abort_cancelled_query- User abortquery_timeout_received_do_abort- Timeout
crates/executor/src/service.rs:629-642
Debug Logging
Enabling Debug Output
For comprehensive debugging:Key Debug Targets
Query Execution
Query Execution
- Query submission and lifecycle
- Execution status changes
- Result handling
Session Management
Session Management
- Session creation and deletion
- Session expiry
- Context management
Catalog Operations
Catalog Operations
- Table metadata fetches
- Catalog registration
- Database/schema lookups
DataFusion Internals
DataFusion Internals
- Physical plan execution
- Memory pool operations
- Disk spilling
Reading Trace Spans
Key spans to monitor:crates/executor/src/service.rs:453-707
Where to Get Help
GitHub Issues
Report bugs and request features
Documentation
Browse complete documentation
Discussions
Ask questions and share solutions
Slack Community
Join the Embucket community (link in GitHub README)
Diagnostic Checklist
When reporting issues, include:- Embucket version (
embucketd --version) - Configuration (environment variables, sanitized)
- Error message and stack trace
- Query that caused the issue (if applicable)
- Relevant logs with debug enabled
- System resources (CPU, memory, disk)
- Deployment environment (Docker, Kubernetes, bare metal)
- Metastore configuration (sanitized)
- OpenTelemetry trace ID (if available)