What is the difference between dbt build and dbt run?

dbt run executes only models. dbt build runs models, seeds, snapshots, AND tests together in DAG order — testing each node before moving downstream. It is the recommended command for production because it prevents bad data from propagating.

When should you use incremental materialization in dbt?

Use incremental when you have millions of rows and only new rows are appended each run. If a large percentage of rows update each run, a full table rebuild is more efficient. Incremental models use the is_incremental() macro and a unique_key for upserts.

What is the difference between ref() and source() in dbt?

ref() references another dbt model and tells dbt the dependency so it builds in the right order. source() references a raw table declared in sources.yml. Use ref() for dbt models and source() for raw/external tables loaded by other tools.

What does dbt deps do?

dbt deps installs packages listed in packages.yml into the dbt_packages/ directory. You must run dbt deps after adding or updating packages before the package macros become available in your project.

Can Python models reference other dbt models?

Yes — Python models reference both SQL and Python dbt models using dbt.ref('model_name') inside the model function. Python models can only be materialized as tables, not views or ephemeral, and require platform support (Snowpark, PySpark).

dbt Analytics Engineering Certification: Developing dbt Models Study Guide (Domain 1 of 6)

What This Domain Covers

Domain 1 is the foundation of the exam. It tests your ability to build well-structured, performant dbt projects from scratch — converting business logic into maintainable SQL using dbt's core primitives.

🧱

Materializations

Table, view, incremental, ephemeral — when and why to use each

🔗

ref() & source()

Building clean DAGs and declaring raw dependencies

♻️

DRY Principles

Macros, packages, and modular SQL for reusable code

⚙️

Configuration

dbt_project.yml, sources YAML, grants, Python models

Topics on the Exam

Raw dependencies Materializations DRY / modularity Business logic → SQL dbt run / test / seed DAGs & model order dbt_project.yml Configuring sources dbt Packages Git in dbt workflow Python models Grants configuration

🎯 Exam Tip

The most common Domain 1 question type involves identifying where ref() should replace hardcoded table names. dbt can only infer DAG order from ref() — hardcoded table names are invisible to the compiler. Always replace every FROM and JOIN clause that references another dbt model with {{ ref('model_name') }}.

Key Concepts Deep Dive

Click each topic to expand study notes.

🧱

Materializations

table · view · incremental · ephemeral

High Frequency ▾

view — default; re-runs query every time; no storage cost; use for lightweight staging models
table — fully rebuilds every run; faster to query; use for downstream-facing mart models
incremental — only processes new/changed rows; requires is_incremental() macro; best for large, append-only datasets; needs unique_key for upserts
ephemeral — not materialized at all; injected as a CTE; use for intermediate logic you don't need to query directly; can slow compile times if overused

💡 Exam logic: incremental ≠ "always faster". If a large % of rows update each run, a full table rebuild may be more efficient. The exam tests this nuance.

🔗

ref() and source()

DAG dependencies and raw object declarations

High Frequency ▾

{{ ref('model_name') }} — references another dbt model; dbt infers run order from these; use in every FROM/JOIN that references a dbt model
{{ source('source_name', 'table_name') }} — references a raw table declared in a sources.yml; enables source freshness checks
A source maps to one database + schema combination; tables are listed under it
Without ref(), dbt does not know the dependency and models may build in the wrong order

💡 If models build in the wrong order → check that all inter-model references use ref(), not hardcoded table names.

♻️

DRY Principles & Modularity

Macros, packages, and reusable logic

Medium Frequency ▾

DRY = Don't Repeat Yourself — avoid copy-pasting SQL; extract repeated logic into macros
Macros — Jinja-templated functions defined in macros/; called with {{ macro_name() }}
Packages — installable dbt projects (e.g. dbt_utils, dbt_expectations); declared in packages.yml; installed with dbt deps
Staging → Intermediate → Mart — the standard layer pattern; staging cleans raw data, intermediate joins/transforms, mart serves business logic

💡 The exam often asks: "What is the most appropriate way to avoid repeating this logic?" Answer: macros for SQL logic, packages for shared test suites.

⚙️

dbt_project.yml & Configuration

Project-wide settings and model configs

Medium Frequency ▾

dbt_project.yml is the project config file — defines project name, model paths, and default/folder-level configurations
Configs cascade: model-level config overrides folder-level, which overrides project-level
Grants — the grants config gives specific roles SELECT access on materialized models; defined in dbt_project.yml or inline
Seeds — CSV files in the seeds/ folder; loaded with dbt seed; referenced in models with ref('seed_name')

💡 Config precedence (highest to lowest): in-model config block → .yml property file → dbt_project.yml folder path.

🐍

Python Models

When SQL isn't enough

Lower Frequency ▾

Python models live in models/ with .py extension; must define a model(dbt, session) function that returns a DataFrame
Platform support: Snowpark (Snowflake), PySpark (Databricks, BigQuery Spark)
Use for ML transformations, statistical operations, or complex logic not feasible in SQL
Can reference dbt models with dbt.ref() and dbt.source() inside the function
Slower to run than SQL models — only use when SQL is genuinely insufficient

💡 Python models are materialized as tables only. They cannot be views or ephemeral.

🌿

Git in the dbt Workflow

Branching, pull requests, and syncing

Medium Frequency ▾

git pull = git fetch + git merge — use to sync your branch with the head/main branch
Feature branches for development; PRs to merge into main; never commit directly to main
Merge conflicts can be resolved in the dbt Cloud IDE or locally in any editor
dbt Cloud IDE has a built-in git panel for committing, pushing, and creating PRs

💡 The exam may ask about keeping a feature branch in sync. git pull (not just git fetch) is the correct answer — it fetches AND merges.

Domain 1 Study Checklist

Check each item as you complete it. Track your readiness before the exam.

Progress: 0 / 14 complete

✓

Concept

Understand all 4 materializations and their trade-offs

Know when to use view vs table vs incremental vs ephemeral

✓

Practice

Replace all hardcoded table refs with ref() in a sample project

Build a simple 3-model DAG: staging → intermediate → mart

✓

Concept

Declare a source in sources.yml and use source() in a model

Understand what a source maps to (database + schema + table)

✓

Practice

Write an incremental model with is_incremental() macro

Implement a unique_key for upsert behavior

✓

Concept

Understand config precedence: model > .yml > dbt_project.yml

Know how folder-level configs cascade to child models

✓

Practice

Install dbt_utils package and use a macro in a model

Edit packages.yml, run dbt deps, call {{ dbt_utils.generate_surrogate_key() }}

✓

Concept

Write a custom macro and call it from two different models

Practice Jinja syntax: {% macro %}, {{ }}, {% if %}

✓

Exam Prep

Know the dbt run command flags: --select, --exclude, +, @

Understand node selection syntax: model+, +model, @model

✓

Practice

Create and load a seed file, reference it in a model

Run dbt seed and verify it appears in the warehouse

✓

Concept

Configure grants for a model to give read access to a role

Understand the grants config key and how it maps to GRANT SELECT

✓

Concept

Understand Python model requirements and limitations

Only tables, platform-specific, uses dbt.ref() not ref()

✓

Exam Prep

Practice git pull vs git fetch — know the difference

git pull = fetch + merge; used to sync feature branch with main

✓

Practice

Run dbt compile and inspect the compiled SQL

Find compiled code in target/compiled/ directory

✓

Exam Prep

Review the official dbt project structure best practices

Read: "How we structure our dbt projects" — dbt Labs blog

Quick Reference

Key syntax patterns for Domain 1 — study these cold.

Incremental Model

-- models/fct_orders.sql
{{ config(
  materialized='incremental',
  unique_key='order_id'
) }}

SELECT * FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
  WHERE updated_at > (
    SELECT MAX(updated_at) FROM {{ this }}
  )
{% endif %}

Source Declaration (sources.yml)

sources:
  - name: salesforce
    database: raw
    schema: salesforce
    tables:
      - name: accounts
      - name: opportunities

Grants Configuration

# dbt_project.yml
models:
  my_project:
    marts:
      +grants:
        select: ['reporter', 'bi_tool']

Python Model

# models/my_python_model.py
def model(dbt, session):
    dbt.config(materialized="table")
    df = dbt.ref("stg_orders")
    # transform...
    return df

Key Commands

dbt run                    # run all models
dbt run --select +model    # model + parents
dbt run --select model+    # model + children
dbt seed                   # load seed CSVs
dbt deps                   # install packages
dbt compile                # compile to SQL
dbt docs generate          # build docs site
dbt build                  # run + test + seed

Node Selection Syntax

model_name        # exact model
+model_name       # model + all parents
model_name+       # model + all children
+model_name+      # parents + model + children
@model_name       # model + parents + children's parents
tag:tag_name      # models with tag
path:models/mart  # models in folder

Domain 1 Practice Quiz

5 exam-style questions. Select an answer to see the explanation.

Domain 1 Study Plan

Suggested 2-week approach — adjust based on your experience level.

Week 1, Days 1–2 · Foundation

Complete the dbt Fundamentals course on learn.getdbt.com
Build a dbt project from scratch: connect to a warehouse, create 3 models
Practice the staging → intermediate → mart pattern

Week 1, Days 3–4 · Materializations

Create one model of each type: view, table, incremental, ephemeral
Write an incremental model with is_incremental() and a unique_key
Run dbt run --full-refresh on your incremental model and understand what it does

Week 1, Days 5–7 · Sources, Configs, Packages

Declare a source in sources.yml; run dbt source freshness
Install dbt_utils; use generate_surrogate_key or star macro
Set up folder-level materialization configs in dbt_project.yml

Week 2 · Advanced Topics & Exam Prep

Write a custom macro for a repeated calculation in your project
Configure grants for a mart folder; verify the SQL that dbt generates
Review Python model documentation; understand the model(dbt, session) signature
Do 20 practice questions specifically on Domain 1 topics

Common Exam Mistakes — Domain 1

These are the traps candidates fall into most often. Study the fix, not just the mistake.

Using hardcoded table names instead of ref()

The silent DAG breaker

▾

What Goes Wrong

Models build in the wrong order. dbt can't infer dependencies from hardcoded names — the downstream model may try to query a table that hasn't been built yet.

The Fix

Replace every FROM schema.table and JOIN schema.table that references a dbt model with {{ ref('model_name') }}.

🛡️ Habit: After writing any new model, search for hardcoded database/schema references and replace them.

Choosing incremental when table is better

Misunderstanding incremental trade-offs

▾

What Goes Wrong

If a large % of rows update each run, an incremental model will still scan most of the table for changes, making it slower than a full rebuild.

The Fix

Use incremental only when rows are mostly appended (not updated) and the dataset is large. For heavily-updated tables, use table materialization.

🛡️ Rule of thumb: incremental wins when <10% of rows change per run. If most rows update, use table.

Confusing source() with ref()

Using ref() for raw tables

▾

What Goes Wrong

Using ref() for a raw source table that isn't a dbt model will fail at compile time. source() exists specifically for raw/external tables.

The Fix

Raw tables → declare in sources.yml and reference with {{ source('name','table') }}. dbt models → reference with {{ ref('model') }}.

🛡️ Memory trick: source() = "raw data I don't own", ref() = "models I built in dbt".

Overusing ephemeral models

When ephemeral becomes a performance problem

▾

What Goes Wrong

Ephemeral models are injected as CTEs — if many downstream models reference them, the same CTE is duplicated in every compiled query, causing slow compile times and large SQL.

The Fix

Use ephemeral for light intermediate logic with only 1–2 consumers. For logic used by many models, use view or table instead.

🛡️ If you see "compile is slow" in exam scenarios, look for overused ephemeral models.

Forgetting dbt deps before using a package

Package macros aren't available until installed

▾

What Goes Wrong

Adding a package to packages.yml doesn't install it. Running dbt run before dbt deps will fail with "macro not found" errors.

The Fix

After editing packages.yml, always run dbt deps first to install the packages into the dbt_packages/ directory.

🛡️ Workflow: edit packages.yml → dbt deps → dbt run. Never skip the middle step.

Frequently Asked Questions

What's the difference between dbt build and dbt run?

dbt run executes only models. dbt build runs models, seeds, snapshots, AND tests together in DAG order — it's the recommended command for production jobs because it tests each node immediately after building it, catching issues early.

Can an ephemeral model be tested?

Yes — you can define tests on ephemeral models in their .yml file. However, since ephemeral models don't exist as objects in the warehouse, dbt runs the tests by injecting the ephemeral CTE into the test query. This works but can be slower.

What happens if you run dbt run on a model that depends on an incremental model that hasn't been built yet?

On the first run of an incremental model (or after --full-refresh), dbt builds the full table. The is_incremental() macro returns false on the first run, so the WHERE clause is not applied. Subsequent runs use incremental logic.

How does dbt determine the schema where a model is materialized?

By default, dbt uses the target schema from your profiles.yml. You can override this with a custom schema macro or the schema config key. dbt appends the custom schema to the target schema by default (e.g., analytics_marketing), unless you override the generate_schema_name macro.

What does the @ selector do vs the + selector?

+model selects the model and all its parents (upstream). model+ selects the model and all its children (downstream). @model selects the model, all its parents, AND all the parents of its children — useful for CI to ensure all dependencies of a changed model's downstream consumers are also built.

When should I use a seed vs a source?

Seeds are for small, static reference data you manage in your dbt project (e.g., country codes, mapping tables). They're stored as CSV files in your repo. Sources are for raw data that lives in your warehouse and is loaded by external tools — they're declared but not loaded by dbt.

Can Python models reference other Python models?

Yes — Python models can reference both SQL models and other Python models using dbt.ref('model_name') inside the model function. The dependency is tracked in the DAG just like SQL model references.

What's the purpose of the grants configuration?

The grants config automatically runs GRANT statements after a model is materialized, giving specified roles or users SELECT access. This removes the need to manually manage permissions and ensures access is consistent across rebuilds. It's set in dbt_project.yml or inline in a model config block.

dbt Analytics Engineering Certification — Developing dbt Models

What This Domain Covers

Materializations

ref() & source()

DRY Principles

Configuration

Topics on the Exam

🎯 Exam Tip

Key Concepts Deep Dive

Domain 1 Study Checklist

Understand all 4 materializations and their trade-offs

Replace all hardcoded table refs with ref() in a sample project

Declare a source in sources.yml and use source() in a model

Write an incremental model with is_incremental() macro

Understand config precedence: model > .yml > dbt_project.yml

Install dbt_utils package and use a macro in a model

Write a custom macro and call it from two different models

Know the dbt run command flags: --select, --exclude, +, @

Create and load a seed file, reference it in a model

Configure grants for a model to give read access to a role

Understand Python model requirements and limitations

Practice git pull vs git fetch — know the difference

Run dbt compile and inspect the compiled SQL

Review the official dbt project structure best practices

Quick Reference

Incremental Model

Source Declaration (sources.yml)

Grants Configuration

Python Model

Key Commands

Node Selection Syntax

Domain 1 Practice Quiz

Domain 1 Study Plan

Week 1, Days 1–2 · Foundation

Week 1, Days 3–4 · Materializations

Week 1, Days 5–7 · Sources, Configs, Packages

Week 2 · Advanced Topics & Exam Prep

Common Exam Mistakes — Domain 1

Using hardcoded table names instead of ref()

What Goes Wrong

The Fix

Choosing incremental when table is better

What Goes Wrong

The Fix

Confusing source() with ref()

What Goes Wrong

The Fix

Overusing ephemeral models

What Goes Wrong

The Fix

Forgetting dbt deps before using a package

What Goes Wrong

The Fix

Frequently Asked Questions

Domain 1 Study Resources