Documentation Index
Fetch the complete documentation index at: https://mintlify.com/embucket/embucket/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Embucket can connect to existing Apache Iceberg tables created by other tools like Spark, Trino, Dremio, or Flink. This enables seamless integration with your existing data lake infrastructure.External Iceberg tables must be on the same storage volume (bucket) defined in your volume configuration. Embucket uses the volume credentials to access table metadata and data files.
When to Use External Iceberg
Use external Iceberg table configuration when:- You have existing Iceberg tables from Spark, Trino, or other engines
- You need to share tables across multiple query engines
- You want explicit control over which tables are accessible
- Your tables are not in an S3 Tables catalog
- You need to query tables created by other systems
Configuration Structure
External Iceberg configuration requires defining the storage volume and explicitly registering each table:Volume Configuration
S3 Volume Setup
Define an S3 volume that points to the bucket containing your Iceberg tables:Unique identifier for the storage volume. Referenced by databases and tables.
Must be
s3 for standard S3 storage.AWS region where the S3 bucket is located (e.g.,
us-east-2, eu-west-1).Name of the S3 bucket containing your Iceberg tables.
AWS credentials for accessing the S3 bucket. See Credentials below.
Custom S3 endpoint URL for S3-compatible storage (MinIO, Ceph, etc.). Must start with
http:// or https://.Alternative Storage Types
Embucket also supports other volume types:Table Registration
Each external Iceberg table must be explicitly registered with its metadata location:Required Fields
The database this table belongs to. Must match a database defined in the
databases section.The schema this table belongs to. Must match a schema defined in the
schemas section.The name of the table as it will appear in queries.
Full S3 URI to the Iceberg metadata JSON file. This file contains the table schema, partition spec, and snapshot information.
Finding Metadata Locations
Iceberg metadata files are typically organized like:- AWS CLI
- Spark
- Manual
Credentials
Access Key Authentication
Required IAM Permissions
Your AWS credentials need read access to the S3 bucket:Complete Example
Here’s a full configuration for a data lake with multiple tables:metastore.yaml
Important Constraints
Multiple Volumes for Multiple Buckets
If you have tables in different buckets:Querying External Tables
Once registered, query external Iceberg tables using standard SQL:Metadata Updates
Embucket reads the metadata file specified in the configuration. If the table is updated by another tool (Spark, Trino), you’ll need to update the
metadata_location in your configuration and restart Embucket to see changes.Troubleshooting
Table Not Found
If queries fail with “table not found”:- Verify the database, schema, and table names match your configuration exactly
- Check that schemas are defined before tables that use them
- Confirm the database references a valid volume
Invalid Metadata Location
If Embucket reports invalid metadata:- Verify the S3 URI is correct and accessible
- Check that the metadata file exists at the specified location
- Ensure credentials have
s3:GetObjectpermission - Confirm the file is valid Iceberg metadata JSON
Metadata Parse Errors
If metadata parsing fails:- Verify the metadata file is from a compatible Iceberg version
- Check that the JSON structure is valid
- Ensure the file isn’t corrupted or truncated
Next Steps
Write SQL Queries
Learn Snowflake SQL syntax
Metastore Configuration
Complete YAML schema reference