Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/embucket/embucket/llms.txt

Use this file to discover all available pages before exploring further.

Getting Started

This guide covers everything you need to know to start developing Embucket, from setting up your environment to understanding our coding conventions.

Prerequisites

Rust Toolchain

Install Rust and Cargo from rustup.rs

Git

Version control system for cloning the repository

Build from Source

Follow these steps to build Embucket from source:
1

Clone the repository

git clone https://github.com/Embucket/embucket.git
cd embucket
2

Build the project

Build the project using Cargo:
cargo build
For a release build with optimizations:
cargo build --release
3

Run Embucket

Run the embucketd daemon:
./target/debug/embucketd
Or for release builds:
./target/release/embucketd

Running Tests

Always run tests before submitting a pull request to ensure your changes don’t break existing functionality.
Run the test suite:
# Run all tests
cargo test

# Run tests for a specific package
cargo test -p <package-name>

# Run tests with output
cargo test -- --nocapture

Building for AWS Lambda

Embucket can be deployed as an AWS Lambda function. To build for Lambda:
1

Install cargo-lambda

cargo install cargo-lambda
2

Build for Lambda

cargo lambda build --release -p embucket-lambda --arm64
3

Deploy to AWS

Make sure your configuration file exists in the config directory (e.g., config/metastore.yaml).
cargo lambda deploy --binary-name bootstrap embucket-lambda
To enable function URL access:
cargo lambda deploy --binary-name bootstrap embucket-lambda --enable-function-url
Read the cargo-lambda documentation for more details on configuring IAM roles and function URLs.

Coding Standards

Embucket follows strict coding conventions to ensure code quality and consistency across the codebase.

Design Conventions

Error Definitions

Define errors with display messages in dedicated errors.rs files. Avoid inlining error texts outside of those files.

Error Types

Define Error enum and Result<T> types per crate, with public visibility.

API Error Handling

Implement IntoResponse trait for top-level error for API crates.

Error Logging

Errors in logs and tracing spans/events should include an error stack trace.

Error Handling with Snafu

Embucket uses the Snafu error library for consistent error handling across the codebase, along with error_stack_trace::debug proc macro for enabling error stack traces.

Error Construction and Propagation

Derive from Snafu and use the error_stack_trace::debug proc macro when defining error enums:
#[derive(Snafu)]
#[error_stack_trace::debug]
pub enum Error {
    // Error variants here
}
Restrict generated Snafu selectors’ visibility to crate level:
#[snafu(visibility(pub(crate)))]
Use Snafu’s helpers for implicit error conversions:
  • .context(...) - also supports chaining
  • .build()
  • .fail()
  • .into_error()
These are preferred over .map_err(...) which is considered an anti-pattern due to loss of context and less ergonomic traceability.

Special Error Cases

When nesting a non-Snafu foreign error, rename its field from source to error and add #[snafu(source)]:
#[snafu(source)]
error: ObjectStoreError,
Avoid constructing errors manually except in rare edge cases (e.g., boxed, non-Snafu, or external errors). Always document the reason when doing so.

Error Propagation Best Practices

  1. Use the ? operator for implicit error conversions when propagating errors
  2. Avoid .map_err(...) - it’s generally considered an anti-pattern
  3. Chain context when errors pass through multiple layers to provide better error messages
  4. Include stack traces by using the error_stack_trace::debug proc macro

Project Architecture

Embucket is built on proven open source technologies:

Apache DataFusion

Powers SQL execution engine

Apache Iceberg

Provides ACID table metadata and storage format

Key Features

  • Snowflake SQL dialect and API: Wire-compatible with existing Snowflake queries, dbt projects, and BI tools
  • Apache Iceberg storage: Data stays in Apache Iceberg format on object storage with no lock-in
  • Single binary deployment: Radical simplicity in deployment
  • Query-per-node: Each instance handles complete queries independently
  • Horizontal scaling: Add nodes for more throughput

Development Workflow

1

Discuss your changes

Before making significant changes, discuss them with the maintainers via GitHub Discussions or by opening an issue.
2

Create a feature branch

Create a new branch for your feature or bug fix:
git checkout -b issueID/feature-name
3

Make your changes

Follow the coding conventions outlined above. Write tests for new functionality.
4

Run tests and checks

Ensure all tests pass and code follows conventions:
cargo test
cargo clippy
cargo fmt -- --check
5

Submit a pull request

See the Contributing Guide for detailed PR submission instructions.

Getting Help

If you need help with development:

GitHub Discussions

Ask questions and get help from the community

Contributing Guide

Review the full contributing guidelines