Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/embucket/embucket/llms.txt

Use this file to discover all available pages before exploring further.

Embucket implements the Apache Iceberg REST Catalog API specification, allowing Iceberg clients to interact with table metadata stored in the metastore.

Base URL

http(s)://host:port/
Example:
http://localhost:3000/

Configuration Endpoint

GET /v1/config

Retrieve catalog configuration properties. Query Parameters:
warehouse
string
Warehouse location or identifier to request from the service.
Description: All REST clients should first call this route to get catalog configuration properties from the server. Configuration consists of two sets of key/value pairs:
  • defaults - Properties used as default configuration; applied before client configuration
  • overrides - Properties used to override client configuration; applied after defaults and client configuration
Catalog configuration is constructed by:
  1. Setting the defaults
  2. Applying client-provided configuration
  3. Applying overrides
The final property set is used to configure the catalog. Response:
{
  "overrides": {
    "warehouse": "s3://bucket/warehouse/"
  },
  "defaults": {
    "clients": "4"
  }
}
overrides
object
Properties that override client configuration.
defaults
object
Properties used as default configuration.
Example:
curl http://localhost:3000/v1/config
Example with Warehouse:
curl "http://localhost:3000/v1/config?warehouse=production"

Common Catalog Properties

The following properties are commonly returned in catalog configuration:
warehouse
string
Base location for the warehouse (e.g., s3://bucket/warehouse/).
uri
string
Catalog URI for client connections.
clients
string
Number of client connections to use.
token
string
Authentication token for catalog operations (if required).

Schema Types

The Iceberg REST API uses the following schema types:

Schema

schema-id
integer
Unique identifier for the schema.
identifier-field-ids
array
Array of field IDs that make up the identifier.
type
string
required
Must be "struct".
fields
array
required
Array of struct fields.

StructField

id
integer
required
Unique field identifier.
name
string
required
Field name.
type
string
required
Field data type (primitive or complex).
required
boolean
required
Whether the field is required (non-nullable).
doc
string
Optional documentation for the field.

Primitive Types

Supported primitive types:
  • boolean
  • int
  • long
  • float
  • double
  • decimal(precision,scale) - Example: decimal(10,2)
  • date
  • time
  • timestamp
  • timestamptz
  • string
  • uuid
  • fixed[N] - Example: fixed[16]
  • binary

Complex Types

List Type:
{
  "type": "list",
  "element-id": 1,
  "element": "string",
  "element-required": true
}
Map Type:
{
  "type": "map",
  "key-id": 1,
  "key": "string",
  "value-id": 2,
  "value": "int",
  "value-required": false
}

Partition Specification

spec-id
integer
Unique identifier for the partition spec.
fields
array
required
Array of partition fields.

PartitionField

field-id
integer
Unique field identifier.
source-id
integer
required
Source column ID from the schema.
name
string
required
Partition field name.
transform
string
required
Transform function applied to the source column.
Transform Functions:
  • identity - No transformation
  • year - Extract year from timestamp
  • month - Extract month from timestamp
  • day - Extract day from timestamp
  • hour - Extract hour from timestamp
  • bucket[N] - Hash bucket with N buckets (e.g., bucket[256])
  • truncate[W] - Truncate string to W characters (e.g., truncate[16])

Sort Order

order-id
integer
Unique identifier for the sort order.
fields
array
required
Array of sort fields.

SortField

source-id
integer
required
Source column ID from the schema.
transform
string
required
Transform function (same as partition transforms).
direction
enum
required
Sort direction: asc or desc.
null-order
enum
required
Null ordering: nulls-first or nulls-last.

Table Metadata

format-version
integer
required
Iceberg table format version (1 or 2).
table-uuid
string
required
Unique identifier for the table.
location
string
Base location for table data.
last-updated-ms
integer
Timestamp of last update in milliseconds.
properties
object
Table properties as key-value pairs.
schemas
array
Array of schema objects.
current-schema-id
integer
ID of the current schema.
partition-specs
array
Array of partition specifications.
default-spec-id
integer
ID of the default partition spec.
sort-orders
array
Array of sort orders.
default-sort-order-id
integer
ID of the default sort order.
snapshots
array
Array of table snapshots.
current-snapshot-id
integer
ID of the current snapshot.

Snapshot

snapshot-id
integer
required
Unique snapshot identifier.
parent-snapshot-id
integer
ID of the parent snapshot.
timestamp-ms
integer
required
Snapshot timestamp in milliseconds.
manifest-list
string
required
Location of the snapshot’s manifest list file.
summary
object
required
Snapshot summary information.
summary.operation
enum
required
Snapshot operation type: append, replace, overwrite, or delete.

Snapshot References

type
enum
required
Reference type: tag or branch.
snapshot-id
integer
required
ID of the referenced snapshot.
max-ref-age-ms
integer
Maximum age of the reference in milliseconds.
max-snapshot-age-ms
integer
Maximum age of snapshots in milliseconds.
min-snapshots-to-keep
integer
Minimum number of snapshots to retain.

Usage Examples

Python (pyiceberg)

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "embucket",
    **{
        "uri": "http://localhost:3000",
        "warehouse": "s3://my-bucket/warehouse",
    }
)

# List namespaces
namespaces = catalog.list_namespaces()

# List tables
tables = catalog.list_tables("analytics")

# Load table
table = catalog.load_table("analytics.public.users")

Java (Iceberg)

import org.apache.iceberg.catalog.Catalog;
import org.apache.iceberg.rest.RESTCatalog;

Map<String, String> properties = new HashMap<>();
properties.put("uri", "http://localhost:3000");
properties.put("warehouse", "s3://my-bucket/warehouse");

Catalog catalog = new RESTCatalog();
catalog.initialize("embucket", properties);

// List namespaces
List<Namespace> namespaces = catalog.listNamespaces();

// Load table
Table table = catalog.loadTable(TableIdentifier.of("analytics", "public", "users"));

Spark Configuration

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .config("spark.sql.catalog.embucket", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.embucket.catalog-impl", "org.apache.iceberg.rest.RESTCatalog") \
    .config("spark.sql.catalog.embucket.uri", "http://localhost:3000") \
    .config("spark.sql.catalog.embucket.warehouse", "s3://my-bucket/warehouse") \
    .getOrCreate()

# Query Iceberg table
df = spark.sql("SELECT * FROM embucket.analytics.public.users")
df.show()

Configuration Examples

S3 Warehouse

{
  "overrides": {
    "warehouse": "s3://my-data-lake/warehouse/",
    "s3.region": "us-west-2"
  },
  "defaults": {
    "clients": "8"
  }
}

File System Warehouse

{
  "overrides": {
    "warehouse": "file:///data/warehouse/"
  },
  "defaults": {
    "clients": "4"
  }
}

Error Handling

The API returns standard HTTP status codes:
  • 200 OK - Request succeeded
  • 400 Bad Request - Invalid request parameters
  • 401 Unauthorized - Authentication required
  • 404 Not Found - Resource not found
  • 409 Conflict - Resource conflict (e.g., table already exists)
  • 500 Internal Server Error - Server error

Timeouts

Configure catalog operation timeouts:
embucketd --iceberg-catalog-timeout-secs 20
Or via environment variable:
export ICEBERG_CATALOG_TIMEOUT_SECS=20

References