dbt Integration

Embucket provides a wire-compatible Snowflake API, allowing you to run dbt models locally without connecting to a cloud Snowflake instance. This accelerates development workflows and reduces costs.

Why Use Embucket with dbt?

Faster Development

Run and test dbt models locally without network latency or cloud compute costs.

Cost Savings

Develop and test without consuming Snowflake credits during the development phase.

Offline Development

Work on your dbt models even without internet connectivity.

Consistent Environment

Ensure reproducible development environments across your team.

Prerequisites

dbt Core or dbt Cloud CLI installed
Embucket running locally or on your network
Existing dbt project with Snowflake models

Profile Configuration

Basic Setup

Add an Embucket target to your ~/.dbt/profiles.yml:

my_project:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: acc.local
      user: embucket
      password: embucket
      role: em.role
      database: analytics
      warehouse: em.wh
      schema: dev
      threads: 4
      
      # Embucket-specific settings
      host: localhost
      port: 3000
      protocol: http
      region: us-east-2

Use type: snowflake - Embucket is compatible with the dbt-snowflake adapter.

Multiple Environments

Configure both Embucket (local) and Snowflake (production) targets:

my_project:
  target: local
  outputs:
    local:
      type: snowflake
      account: acc.local
      user: embucket
      password: embucket
      database: analytics
      warehouse: em.wh
      schema: "{{ env_var('USER', 'dev') }}"
      threads: 4
      host: localhost
      port: 3000
      protocol: http
      region: us-east-2
      
    production:
      type: snowflake
      account: your_account
      user: "{{ env_var('SNOWFLAKE_USER') }}"
      password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
      database: analytics
      warehouse: compute_wh
      schema: public
      threads: 8
      region: us-east-1

Switch between environments:

# Use local Embucket
dbt run --target local

# Use Snowflake production
dbt run --target production

Running dbt Models

Standard Workflow

Run your dbt models against Embucket just like you would with Snowflake:

dbt run

Development Workflow Example

Typical development cycle with Embucket:

# 1. Start Embucket locally
docker run --name embucket --rm -p 3000:3000 embucket/embucket

# 2. Create development schema
dbt run-operation create_schema --args '{"schema_name": "dev_alice"}'

# 3. Run models incrementally during development
dbt run --select my_new_model+

# 4. Test your changes
dbt test --select my_new_model+

# 5. Preview results
snow sql -c local -q "SELECT * FROM analytics.dev_alice.my_new_model LIMIT 10;"

# 6. When ready, deploy to production Snowflake
dbt run --target production --select my_new_model+

Supported dbt Features

Embucket supports most dbt-core functionality:

Models

SQL models (SELECT statements)
Incremental models
Ephemeral models
Table and view materializations
Custom materializations

Tests

Schema tests (unique, not_null, accepted_values, relationships)
Data tests (custom SQL tests)
Test severity levels

Macros

Jinja templating
Custom macros
dbt built-in macros
ref() and source() functions

Seeds

CSV seed files
Seed configuration
Data type inference

Snapshots

Timestamp-based snapshots
Check-based snapshots
Snapshot configurations

Development Workflow Benefits

Rapid Iteration

Develop and test models quickly without waiting for cloud resources:

# Fast feedback loop
dbt run --select my_model && dbt test --select my_model

Local execution eliminates network latency and queue times.

Cost-Efficient Testing

Test complex transformations without incurring cloud compute costs:

# Test on sample data locally
dbt build --select tag:testing

# Deploy to production when confident
dbt build --target production --select tag:testing

Collaborative Development

Each developer can run their own Embucket instance:

# profiles.yml
schema: "dev_{{ env_var('USER') }}"

This prevents schema conflicts and enables parallel development.

Working with External Data

Embucket can connect to external Iceberg tables and S3 Table Buckets:

Configure External Catalog

Create config/metastore.yaml:

volumes:
  - ident: lakehouse
    type: s3
    region: us-east-2
    bucket: your-data-lake
    credentials:
      credential_type: access_key
      aws-access-key-id: YOUR_KEY
      aws-secret-access-key: YOUR_SECRET

databases:
  - ident: raw_data
    volume: lakehouse

schemas:
  - database: raw_data
    schema: staging

Start Embucket with Configuration

docker run --name embucket --rm -p 3000:3000 \
  -v $PWD/config:/app/config \
  embucket/embucket \
  ./embucketd --metastore-config config/metastore.yaml

Reference External Tables in dbt

Create sources in your dbt project:

# models/sources.yml
version: 2

sources:
  - name: raw
    database: raw_data
    schema: staging
    tables:
      - name: customers
      - name: orders

Reference in models:

-- models/staging/stg_customers.sql
SELECT
  customer_id,
  customer_name,
  created_at
FROM {{ source('raw', 'customers') }}

Limitations

While Embucket provides excellent Snowflake compatibility, some differences exist:

The following Snowflake features have limited or no support in Embucket:

Snowflake-specific data types: VARIANT, GEOGRAPHY may have partial support
Clustering keys: Syntax accepted but not enforced
Time Travel: AT and BEFORE clauses not supported
Tasks and Streams: Not implemented
External functions: UDFs calling external services
Stored procedures: JavaScript/Python procedures

Feature Compatibility Matrix

Feature	Embucket Support	Notes
SELECT queries	Full	Complete Snowflake SQL dialect
JOINs	Full	All join types supported
Window functions	Full	RANK, ROW_NUMBER, LAG, LEAD, etc.
CTEs	Full	Common table expressions
CREATE TABLE	Full	Standard table creation
CREATE VIEW	Full	View definitions
Date/time functions	Full	dateadd, datediff, date_trunc, etc.
Aggregate functions	Full	SUM, AVG, COUNT, etc.
String functions	Full	CONCAT, SUBSTR, REGEXP, etc.
MERGE INTO	Full	Upsert operations
Transactions	Partial	Basic support
Query tags	Accepted	Tags stored but not used

Troubleshooting

dbt cannot connect to Embucket

Solution:

Verify Embucket is running: curl http://localhost:3000/session/v1/login-request
Check profile configuration in ~/.dbt/profiles.yml
Test connection: dbt debug
Ensure credentials match (default: embucket/embucket)

Model runs successfully but data is wrong

Solution:

Check if external catalog is configured correctly
Verify table permissions in metastore config
Run with verbose logging: dbt run --debug
Compare query output between Embucket and Snowflake

Incremental models not working as expected

Solution:

Embucket fully supports incremental models
Ensure unique key is properly defined
Check that the merge strategy matches your use case
Verify the incremental logic in your model

Compilation error on Snowflake-specific syntax

Solution:

Check if the feature is in Embucket’s limitations list above
Use dbt macros to adapt syntax: {% if target.name == 'local' %}
File an issue on Embucket’s GitHub for missing features

Best Practices

Use Environment Variables

Store credentials in environment variables rather than hardcoding them:

password: "{{ env_var('DBT_PASSWORD', 'embucket') }}"

Separate Schemas

Use different schemas for each developer to avoid conflicts:

schema: "dev_{{ env_var('USER') }}"

Test Locally First

Always test changes on Embucket before deploying to production Snowflake.

Use dbt Cloud for CI/CD

Run Embucket locally for development, but use dbt Cloud or production Snowflake for automated testing and deployment.

Next Steps

Querying Guide

Learn about SQL syntax and query patterns

External Catalogs

Connect to data lakes and Iceberg tables

Snowflake CLI

Interact with Embucket via command line

Deployment

Deploy Embucket for your team

Get Started

Core Concepts

Deployment

Catalogs & Storage

Usage Guides

Operations

Why Use Embucket with dbt?

Faster Development

Cost Savings

Offline Development

Consistent Environment

Prerequisites

Profile Configuration

Basic Setup

Multiple Environments

Running dbt Models

Standard Workflow

Development Workflow Example

Supported dbt Features

Development Workflow Benefits

Rapid Iteration

Cost-Efficient Testing

Collaborative Development

Working with External Data

Configure External Catalog

Start Embucket with Configuration

Reference External Tables in dbt

Limitations

Feature Compatibility Matrix

Troubleshooting

Best Practices

Next Steps

Querying Guide

External Catalogs

Snowflake CLI

Deployment

Get Started

Core Concepts

Deployment

Catalogs & Storage

Usage Guides

Operations

Documentation Index

​Why Use Embucket with dbt?

Faster Development

Cost Savings

Offline Development

Consistent Environment

​Prerequisites

​Profile Configuration

​Basic Setup

​Multiple Environments

​Running dbt Models

​Standard Workflow

​Development Workflow Example

​Supported dbt Features

​Development Workflow Benefits

​Rapid Iteration

​Cost-Efficient Testing

​Collaborative Development

​Working with External Data

​Configure External Catalog

​Start Embucket with Configuration

​Reference External Tables in dbt

​Limitations

​Feature Compatibility Matrix

​Troubleshooting

​Best Practices

​Next Steps

Querying Guide

External Catalogs

Snowflake CLI

Deployment

Why Use Embucket with dbt?

Prerequisites

Profile Configuration

Basic Setup

Multiple Environments

Running dbt Models

Standard Workflow

Development Workflow Example

Supported dbt Features

Development Workflow Benefits

Rapid Iteration

Cost-Efficient Testing

Collaborative Development

Working with External Data

Configure External Catalog

Start Embucket with Configuration

Reference External Tables in dbt

Limitations

Feature Compatibility Matrix

Troubleshooting

Best Practices

Next Steps