Development Guide

Getting Started

This guide covers everything you need to know to start developing Embucket, from setting up your environment to understanding our coding conventions.

Prerequisites

Rust Toolchain

Install Rust and Cargo from rustup.rs

Git

Version control system for cloning the repository

Build from Source

Follow these steps to build Embucket from source:

Clone the repository

git clone https://github.com/Embucket/embucket.git
cd embucket

Build the project

Build the project using Cargo:

cargo build

For a release build with optimizations:

cargo build --release

Run Embucket

Run the embucketd daemon:

./target/debug/embucketd

Or for release builds:

./target/release/embucketd

Running Tests

Always run tests before submitting a pull request to ensure your changes don’t break existing functionality.

Run the test suite:

# Run all tests
cargo test

# Run tests for a specific package
cargo test -p <package-name>

# Run tests with output
cargo test -- --nocapture

Building for AWS Lambda

Embucket can be deployed as an AWS Lambda function. To build for Lambda:

Install cargo-lambda

cargo install cargo-lambda

Build for Lambda

cargo lambda build --release -p embucket-lambda --arm64

Deploy to AWS

Make sure your configuration file exists in the config directory (e.g., config/metastore.yaml).

cargo lambda deploy --binary-name bootstrap embucket-lambda

To enable function URL access:

cargo lambda deploy --binary-name bootstrap embucket-lambda --enable-function-url

Read the cargo-lambda documentation for more details on configuring IAM roles and function URLs.

Coding Standards

Embucket follows strict coding conventions to ensure code quality and consistency across the codebase.

Design Conventions

Error Definitions

Define errors with display messages in dedicated errors.rs files. Avoid inlining error texts outside of those files.

Error Types

Define Error enum and Result<T> types per crate, with public visibility.

API Error Handling

Implement IntoResponse trait for top-level error for API crates.

Error Logging

Errors in logs and tracing spans/events should include an error stack trace.

Error Handling with Snafu

Embucket uses the Snafu error library for consistent error handling across the codebase, along with error_stack_trace::debug proc macro for enabling error stack traces.

Error Construction and Propagation

Basic Error Definition

Derive from Snafu and use the error_stack_trace::debug proc macro when defining error enums:

#[derive(Snafu)]
#[error_stack_trace::debug]
pub enum Error {
    // Error variants here
}

Restricting Selector Visibility

Restrict generated Snafu selectors’ visibility to crate level:

#[snafu(visibility(pub(crate)))]

Error Conversion Helpers

Use Snafu’s helpers for implicit error conversions:

.context(...) - also supports chaining
.build()
.fail()
.into_error()

These are preferred over .map_err(...) which is considered an anti-pattern due to loss of context and less ergonomic traceability.

Special Error Cases

Non-Snafu Foreign Errors
Boxing Large Errors
Transparent Errors

When nesting a non-Snafu foreign error, rename its field from source to error and add #[snafu(source)]:

#[snafu(source)]
error: ObjectStoreError,

When clippy::result_large_err is triggered, box the error variant:

#[snafu(source(from(S3tablesError, Box::new)))]
source: Box<S3tablesError>,

With Snafu, most use cases do not require manual boxing or unboxing. Let Snafu manage implicit conversions where possible.

Define transparent error variant to re-use context and display message from the underlying error:

#[snafu(transparent)]
error: SomeError,

No context selectors are generated for transparent errors. No .context(...) call can be used, but it still supports implicit .into() conversion when used with ?.

Avoid constructing errors manually except in rare edge cases (e.g., boxed, non-Snafu, or external errors). Always document the reason when doing so.

Error Propagation Best Practices

Use the ? operator for implicit error conversions when propagating errors
Avoid .map_err(...) - it’s generally considered an anti-pattern
Chain context when errors pass through multiple layers to provide better error messages
Include stack traces by using the error_stack_trace::debug proc macro

Project Architecture

Embucket is built on proven open source technologies:

Apache DataFusion

Powers SQL execution engine

Apache Iceberg

Provides ACID table metadata and storage format

Key Features

Snowflake SQL dialect and API: Wire-compatible with existing Snowflake queries, dbt projects, and BI tools
Apache Iceberg storage: Data stays in Apache Iceberg format on object storage with no lock-in
Single binary deployment: Radical simplicity in deployment
Query-per-node: Each instance handles complete queries independently
Horizontal scaling: Add nodes for more throughput

Development Workflow

Discuss your changes

Before making significant changes, discuss them with the maintainers via GitHub Discussions or by opening an issue.

Create a feature branch

Create a new branch for your feature or bug fix:

git checkout -b issueID/feature-name

Make your changes

Follow the coding conventions outlined above. Write tests for new functionality.

Run tests and checks

Ensure all tests pass and code follows conventions:

cargo test
cargo clippy
cargo fmt -- --check

Submit a pull request

See the Contributing Guide for detailed PR submission instructions.

Getting Help

If you need help with development:

GitHub Discussions

Ask questions and get help from the community

Contributing Guide

Review the full contributing guidelines

Contributing

Documentation Index

​Getting Started

​Prerequisites

Rust Toolchain

Git

​Build from Source

​Running Tests

​Building for AWS Lambda

​Coding Standards

​Design Conventions

Error Definitions

Error Types

API Error Handling

Error Logging

​Error Handling with Snafu

​Error Construction and Propagation

​Special Error Cases

​Error Propagation Best Practices

​Project Architecture

Apache DataFusion

Apache Iceberg

​Key Features

​Development Workflow

​Getting Help

GitHub Discussions

Contributing Guide

Getting Started

Prerequisites

Build from Source

Running Tests

Building for AWS Lambda

Coding Standards

Design Conventions

Error Handling with Snafu

Error Construction and Propagation

Special Error Cases

Error Propagation Best Practices

Project Architecture

Key Features

Development Workflow

Getting Help