Iceberg Catalog REST API

Embucket implements the Apache Iceberg REST Catalog API specification, allowing Iceberg clients to interact with table metadata stored in the metastore.

Base URL

http(s)://host:port/

Example:

http://localhost:3000/

Configuration Endpoint

GET /v1/config

Retrieve catalog configuration properties. Query Parameters:

warehouse

string

Warehouse location or identifier to request from the service.

Description: All REST clients should first call this route to get catalog configuration properties from the server. Configuration consists of two sets of key/value pairs:

defaults - Properties used as default configuration; applied before client configuration
overrides - Properties used to override client configuration; applied after defaults and client configuration

Catalog configuration is constructed by:

Setting the defaults
Applying client-provided configuration
Applying overrides

The final property set is used to configure the catalog. Response:

{
  "overrides": {
    "warehouse": "s3://bucket/warehouse/"
  },
  "defaults": {
    "clients": "4"
  }
}

overrides

object

Properties that override client configuration.

defaults

object

Properties used as default configuration.

Example:

curl http://localhost:3000/v1/config

Example with Warehouse:

curl "http://localhost:3000/v1/config?warehouse=production"

Common Catalog Properties

The following properties are commonly returned in catalog configuration:

warehouse

string

Base location for the warehouse (e.g., s3://bucket/warehouse/).

uri

string

Catalog URI for client connections.

clients

string

Number of client connections to use.

token

string

Authentication token for catalog operations (if required).

Schema Types

The Iceberg REST API uses the following schema types:

Schema

schema-id

integer

Unique identifier for the schema.

identifier-field-ids

array

Array of field IDs that make up the identifier.

type

string

required

Must be "struct".

fields

array

required

Array of struct fields.

StructField

integer

required

Unique field identifier.

name

string

required

Field name.

type

string

required

Field data type (primitive or complex).

required

boolean

required

Whether the field is required (non-nullable).

doc

string

Optional documentation for the field.

Primitive Types

Supported primitive types:

boolean
int
long
float
double
decimal(precision,scale) - Example: decimal(10,2)
date
time
timestamp
timestamptz
string
uuid
fixed[N] - Example: fixed[16]
binary

Complex Types

List Type:

{
  "type": "list",
  "element-id": 1,
  "element": "string",
  "element-required": true
}

Map Type:

{
  "type": "map",
  "key-id": 1,
  "key": "string",
  "value-id": 2,
  "value": "int",
  "value-required": false
}

Partition Specification

spec-id

integer

Unique identifier for the partition spec.

fields

array

required

Array of partition fields.

PartitionField

field-id

integer

Unique field identifier.

source-id

integer

required

Source column ID from the schema.

name

string

required

Partition field name.

transform

string

required

Transform function applied to the source column.

Transform Functions:

identity - No transformation
year - Extract year from timestamp
month - Extract month from timestamp
day - Extract day from timestamp
hour - Extract hour from timestamp
bucket[N] - Hash bucket with N buckets (e.g., bucket[256])
truncate[W] - Truncate string to W characters (e.g., truncate[16])

Sort Order

order-id

integer

Unique identifier for the sort order.

fields

array

required

Array of sort fields.

SortField

source-id

integer

required

Source column ID from the schema.

transform

string

required

Transform function (same as partition transforms).

direction

enum

required

Sort direction: asc or desc.

null-order

enum

required

Null ordering: nulls-first or nulls-last.

Table Metadata

format-version

integer

required

Iceberg table format version (1 or 2).

table-uuid

string

required

Unique identifier for the table.

location

string

Base location for table data.

last-updated-ms

integer

Timestamp of last update in milliseconds.

properties

object

Table properties as key-value pairs.

schemas

array

Array of schema objects.

current-schema-id

integer

ID of the current schema.

partition-specs

array

Array of partition specifications.

default-spec-id

integer

ID of the default partition spec.

sort-orders

array

Array of sort orders.

default-sort-order-id

integer

ID of the default sort order.

snapshots

array

Array of table snapshots.

current-snapshot-id

integer

ID of the current snapshot.

Snapshot

snapshot-id

integer

required

Unique snapshot identifier.

parent-snapshot-id

integer

ID of the parent snapshot.

timestamp-ms

integer

required

Snapshot timestamp in milliseconds.

manifest-list

string

required

Location of the snapshot’s manifest list file.

summary

object

required

Snapshot summary information.

summary.operation

enum

required

Snapshot operation type: append, replace, overwrite, or delete.

Snapshot References

type

enum

required

Reference type: tag or branch.

snapshot-id

integer

required

ID of the referenced snapshot.

max-ref-age-ms

integer

Maximum age of the reference in milliseconds.

max-snapshot-age-ms

integer

Maximum age of snapshots in milliseconds.

min-snapshots-to-keep

integer

Minimum number of snapshots to retain.

Usage Examples

Python (pyiceberg)

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "embucket",
    **{
        "uri": "http://localhost:3000",
        "warehouse": "s3://my-bucket/warehouse",
    }
)

# List namespaces
namespaces = catalog.list_namespaces()

# List tables
tables = catalog.list_tables("analytics")

# Load table
table = catalog.load_table("analytics.public.users")

Java (Iceberg)

import org.apache.iceberg.catalog.Catalog;
import org.apache.iceberg.rest.RESTCatalog;

Map<String, String> properties = new HashMap<>();
properties.put("uri", "http://localhost:3000");
properties.put("warehouse", "s3://my-bucket/warehouse");

Catalog catalog = new RESTCatalog();
catalog.initialize("embucket", properties);

// List namespaces
List<Namespace> namespaces = catalog.listNamespaces();

// Load table
Table table = catalog.loadTable(TableIdentifier.of("analytics", "public", "users"));

Spark Configuration

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .config("spark.sql.catalog.embucket", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.embucket.catalog-impl", "org.apache.iceberg.rest.RESTCatalog") \
    .config("spark.sql.catalog.embucket.uri", "http://localhost:3000") \
    .config("spark.sql.catalog.embucket.warehouse", "s3://my-bucket/warehouse") \
    .getOrCreate()

# Query Iceberg table
df = spark.sql("SELECT * FROM embucket.analytics.public.users")
df.show()

Configuration Examples

S3 Warehouse

{
  "overrides": {
    "warehouse": "s3://my-data-lake/warehouse/",
    "s3.region": "us-west-2"
  },
  "defaults": {
    "clients": "8"
  }
}

File System Warehouse

{
  "overrides": {
    "warehouse": "file:///data/warehouse/"
  },
  "defaults": {
    "clients": "4"
  }
}

Error Handling

The API returns standard HTTP status codes:

200 OK - Request succeeded
400 Bad Request - Invalid request parameters
401 Unauthorized - Authentication required
404 Not Found - Resource not found
409 Conflict - Resource conflict (e.g., table already exists)
500 Internal Server Error - Server error

Timeouts

Configure catalog operation timeouts:

embucketd --iceberg-catalog-timeout-secs 20

Or via environment variable:

export ICEBERG_CATALOG_TIMEOUT_SECS=20

CLI

Configuration

REST API

Documentation Index

​Base URL

​Configuration Endpoint

​GET /v1/config

​Common Catalog Properties

​Schema Types

​Schema

​StructField

​Primitive Types

​Complex Types

​Partition Specification

​PartitionField

​Sort Order

​SortField

​Table Metadata

​Snapshot

​Snapshot References

​Usage Examples

​Python (pyiceberg)

​Java (Iceberg)

​Spark Configuration

​Configuration Examples

​S3 Warehouse

​File System Warehouse

​Error Handling

​Timeouts

​References

Base URL

Configuration Endpoint

GET /v1/config

Common Catalog Properties

Schema Types

Schema

StructField

Primitive Types

Complex Types

Partition Specification

PartitionField

Sort Order

SortField

Table Metadata

Snapshot

Snapshot References

Usage Examples

Python (pyiceberg)

Java (Iceberg)

Spark Configuration

Configuration Examples

S3 Warehouse

File System Warehouse

Error Handling

Timeouts

References