Table Materialization

Table Materialization

A table materialization rebuilds your model as a physical table in your data warehouse during each dbt run. Unlike views, tables store the actual data rather than just the query logic, using a CREATE TABLE AS statement.


How Table Materializations Work

When you materialize a model as a table, dbt executes the model's SQL query and stores the results as a table in your data warehouse. During each run, dbt:

  1. Runs DROP TABLE IF EXISTS on the existing table

  2. Executes CREATE TABLE AS with your model's SQL

  3. Applies any configured table properties (like indexes, distribution keys, etc.)

This means:

  • Data is physically stored in your warehouse

  • Queries against the table are faster but builds take longer

  • Data isn't automatically updated when source data changes

Under the hood, dbt executes a CREATE TABLE AS statement:

CREATE OR REPLACE TABLE "database"."schema"."customer_orders" AS (
    SELECT * FROM source_table WHERE condition = true
);

Tables are ideal when query performance is more important than build time or real-time data needs.


When to Use Table Materializations

Tables are particularly valuable for:

Use Case
Why Tables Work Well

Performance-critical models

Tables provide the fastest query performance, ideal for dashboards and reports

Complex transformations

Compute-intensive operations only need to run once during build

Frequently accessed data

Multiple users or systems can query without recomputing

Downstream dependencies

When many models reference this data, tables reduce overall processing


Configuring Table Materializations

Tables can be configured at both the model and project level.

Model-Level Configuration

-- In your model SQL file
{{
    config(
        materialized='table',
        sort='order_date',
        dist='customer_id'
    )
}}

SELECT
    customer_id,
    order_date,
    SUM(amount) as total_amount
FROM {{ ref('stg_orders') }}
GROUP BY 1, 2

Project-Level Configuration

# In your dbt_project.yml file
models:
  your_project:
    marts:
      +materialized: table

This sets all models in the marts/ directory to materialize as tables.


Performance Optimization

Different warehouses offer specific optimization options for tables:

Warehouse
Optimization Options

Snowflake

• Clustering keys • Search optimization • Automatic query optimization

BigQuery

• Partitioning • Clustering • Table expiration

Redshift

• Distribution keys • Sort keys • Table compression

To apply these optimizations, use the config() function with warehouse-specific parameters:

{{
    config(
        materialized='table',
        snowflake_cluster_by=['customer_id', 'order_date'],
        bigquery_partition_by={
            "field": "order_date",
            "data_type": "date"
        }
    )
}}

Advantages and Limitations

Advantages
Limitations

⚡ Fast query performance

🕒 Slower build times

📊 Efficient for BI tools

🔄 No automatic updates

💪 Great for complex queries

💾 Uses more storage

🔀 Ideal for multiple users

📈 Higher warehouse costs


When to Consider Other Materializations

While tables are powerful, consider alternatives when:

  • Data needs to be real-time (use views)

  • Table is very large and only needs incremental updates (use incremental)

  • Model is a simple intermediate transformation used by only one downstream model (use ephemeral)


Best Practices

  1. Materialization Strategy: Use tables for final reporting layers and complex transformations

  2. Build Frequency: Schedule table rebuilds based on source data update frequency

  3. Performance Tuning: Apply appropriate indexes, partitioning, or clustering for your warehouse

  4. Resource Management: Schedule builds during off-peak hours for large tables

  5. Monitoring: Track build times and storage usage to identify optimization opportunities

By using table materializations strategically, you can balance performance needs with resource utilization to create an efficient data transformation pipeline.

Last updated

Was this helpful?

OSZAR »