FlashGenius Logo FlashGenius
dbt Certification · Domain 1 of 6

dbt Analytics Engineering Certification — Developing dbt Models

Master materializations, DAGs, DRY principles, sources, packages, and Python models — the largest domain on the exam.

~35%Exam Weight
12Sub-topics
dbt 1.7Version Tested
65 QsTotal Exam

What This Domain Covers

Domain 1 is the foundation of the exam. It tests your ability to build well-structured, performant dbt projects from scratch — converting business logic into maintainable SQL using dbt's core primitives.

🧱

Materializations

Table, view, incremental, ephemeral — when and why to use each

🔗

ref() & source()

Building clean DAGs and declaring raw dependencies

♻️

DRY Principles

Macros, packages, and modular SQL for reusable code

⚙️

Configuration

dbt_project.yml, sources YAML, grants, Python models

Topics on the Exam

Raw dependencies Materializations DRY / modularity Business logic → SQL dbt run / test / seed DAGs & model order dbt_project.yml Configuring sources dbt Packages Git in dbt workflow Python models Grants configuration

🎯 Exam Tip

The most common Domain 1 question type involves identifying where ref() should replace hardcoded table names. dbt can only infer DAG order from ref() — hardcoded table names are invisible to the compiler. Always replace every FROM and JOIN clause that references another dbt model with {{ ref('model_name') }}.

Key Concepts Deep Dive

Click each topic to expand study notes.

🧱
Materializations
table · view · incremental · ephemeral
High Frequency
  • view — default; re-runs query every time; no storage cost; use for lightweight staging models
  • table — fully rebuilds every run; faster to query; use for downstream-facing mart models
  • incremental — only processes new/changed rows; requires is_incremental() macro; best for large, append-only datasets; needs unique_key for upserts
  • ephemeral — not materialized at all; injected as a CTE; use for intermediate logic you don't need to query directly; can slow compile times if overused
💡 Exam logic: incremental ≠ "always faster". If a large % of rows update each run, a full table rebuild may be more efficient. The exam tests this nuance.
🔗
ref() and source()
DAG dependencies and raw object declarations
High Frequency
  • {{ ref('model_name') }} — references another dbt model; dbt infers run order from these; use in every FROM/JOIN that references a dbt model
  • {{ source('source_name', 'table_name') }} — references a raw table declared in a sources.yml; enables source freshness checks
  • A source maps to one database + schema combination; tables are listed under it
  • Without ref(), dbt does not know the dependency and models may build in the wrong order
💡 If models build in the wrong order → check that all inter-model references use ref(), not hardcoded table names.
♻️
DRY Principles & Modularity
Macros, packages, and reusable logic
Medium Frequency
  • DRY = Don't Repeat Yourself — avoid copy-pasting SQL; extract repeated logic into macros
  • Macros — Jinja-templated functions defined in macros/; called with {{ macro_name() }}
  • Packages — installable dbt projects (e.g. dbt_utils, dbt_expectations); declared in packages.yml; installed with dbt deps
  • Staging → Intermediate → Mart — the standard layer pattern; staging cleans raw data, intermediate joins/transforms, mart serves business logic
💡 The exam often asks: "What is the most appropriate way to avoid repeating this logic?" Answer: macros for SQL logic, packages for shared test suites.
⚙️
dbt_project.yml & Configuration
Project-wide settings and model configs
Medium Frequency
  • dbt_project.yml is the project config file — defines project name, model paths, and default/folder-level configurations
  • Configs cascade: model-level config overrides folder-level, which overrides project-level
  • Grants — the grants config gives specific roles SELECT access on materialized models; defined in dbt_project.yml or inline
  • Seeds — CSV files in the seeds/ folder; loaded with dbt seed; referenced in models with ref('seed_name')
💡 Config precedence (highest to lowest): in-model config block → .yml property file → dbt_project.yml folder path.
🐍
Python Models
When SQL isn't enough
Lower Frequency
  • Python models live in models/ with .py extension; must define a model(dbt, session) function that returns a DataFrame
  • Platform support: Snowpark (Snowflake), PySpark (Databricks, BigQuery Spark)
  • Use for ML transformations, statistical operations, or complex logic not feasible in SQL
  • Can reference dbt models with dbt.ref() and dbt.source() inside the function
  • Slower to run than SQL models — only use when SQL is genuinely insufficient
💡 Python models are materialized as tables only. They cannot be views or ephemeral.
🌿
Git in the dbt Workflow
Branching, pull requests, and syncing
Medium Frequency
  • git pull = git fetch + git merge — use to sync your branch with the head/main branch
  • Feature branches for development; PRs to merge into main; never commit directly to main
  • Merge conflicts can be resolved in the dbt Cloud IDE or locally in any editor
  • dbt Cloud IDE has a built-in git panel for committing, pushing, and creating PRs
💡 The exam may ask about keeping a feature branch in sync. git pull (not just git fetch) is the correct answer — it fetches AND merges.

Domain 1 Study Checklist

Check each item as you complete it. Track your readiness before the exam.

Progress: 0 / 14 complete
Concept

Understand all 4 materializations and their trade-offs

Know when to use view vs table vs incremental vs ephemeral

Practice

Replace all hardcoded table refs with ref() in a sample project

Build a simple 3-model DAG: staging → intermediate → mart

Concept

Declare a source in sources.yml and use source() in a model

Understand what a source maps to (database + schema + table)

Practice

Write an incremental model with is_incremental() macro

Implement a unique_key for upsert behavior

Concept

Understand config precedence: model > .yml > dbt_project.yml

Know how folder-level configs cascade to child models

Practice

Install dbt_utils package and use a macro in a model

Edit packages.yml, run dbt deps, call {{ dbt_utils.generate_surrogate_key() }}

Concept

Write a custom macro and call it from two different models

Practice Jinja syntax: {% macro %}, {{ }}, {% if %}

Exam Prep

Know the dbt run command flags: --select, --exclude, +, @

Understand node selection syntax: model+, +model, @model

Practice

Create and load a seed file, reference it in a model

Run dbt seed and verify it appears in the warehouse

Concept

Configure grants for a model to give read access to a role

Understand the grants config key and how it maps to GRANT SELECT

Concept

Understand Python model requirements and limitations

Only tables, platform-specific, uses dbt.ref() not ref()

Exam Prep

Practice git pull vs git fetch — know the difference

git pull = fetch + merge; used to sync feature branch with main

Practice

Run dbt compile and inspect the compiled SQL

Find compiled code in target/compiled/ directory

Exam Prep

Review the official dbt project structure best practices

Read: "How we structure our dbt projects" — dbt Labs blog

Quick Reference

Key syntax patterns for Domain 1 — study these cold.

Incremental Model

-- models/fct_orders.sql {{ config( materialized='incremental', unique_key='order_id' ) }} SELECT * FROM {{ ref('stg_orders') }} {% if is_incremental() %} WHERE updated_at > ( SELECT MAX(updated_at) FROM {{ this }} ) {% endif %}

Source Declaration (sources.yml)

sources: - name: salesforce database: raw schema: salesforce tables: - name: accounts - name: opportunities

Grants Configuration

# dbt_project.yml models: my_project: marts: +grants: select: ['reporter', 'bi_tool']

Python Model

# models/my_python_model.py def model(dbt, session): dbt.config(materialized="table") df = dbt.ref("stg_orders") # transform... return df

Key Commands

dbt run # run all models dbt run --select +model # model + parents dbt run --select model+ # model + children dbt seed # load seed CSVs dbt deps # install packages dbt compile # compile to SQL dbt docs generate # build docs site dbt build # run + test + seed

Node Selection Syntax

model_name # exact model +model_name # model + all parents model_name+ # model + all children +model_name+ # parents + model + children @model_name # model + parents + children's parents tag:tag_name # models with tag path:models/mart # models in folder

Domain 1 Practice Quiz

5 exam-style questions. Select an answer to see the explanation.

Domain 1 Study Plan

Suggested 2-week approach — adjust based on your experience level.

Week 1, Days 1–2 · Foundation

  • Complete the dbt Fundamentals course on learn.getdbt.com
  • Build a dbt project from scratch: connect to a warehouse, create 3 models
  • Practice the staging → intermediate → mart pattern

Week 1, Days 3–4 · Materializations

  • Create one model of each type: view, table, incremental, ephemeral
  • Write an incremental model with is_incremental() and a unique_key
  • Run dbt run --full-refresh on your incremental model and understand what it does

Week 1, Days 5–7 · Sources, Configs, Packages

  • Declare a source in sources.yml; run dbt source freshness
  • Install dbt_utils; use generate_surrogate_key or star macro
  • Set up folder-level materialization configs in dbt_project.yml

Week 2 · Advanced Topics & Exam Prep

  • Write a custom macro for a repeated calculation in your project
  • Configure grants for a mart folder; verify the SQL that dbt generates
  • Review Python model documentation; understand the model(dbt, session) signature
  • Do 20 practice questions specifically on Domain 1 topics

Common Exam Mistakes — Domain 1

These are the traps candidates fall into most often. Study the fix, not just the mistake.

1

Using hardcoded table names instead of ref()

The silent DAG breaker

What Goes Wrong

Models build in the wrong order. dbt can't infer dependencies from hardcoded names — the downstream model may try to query a table that hasn't been built yet.

The Fix

Replace every FROM schema.table and JOIN schema.table that references a dbt model with {{ ref('model_name') }}.

🛡️ Habit: After writing any new model, search for hardcoded database/schema references and replace them.
2

Choosing incremental when table is better

Misunderstanding incremental trade-offs

What Goes Wrong

If a large % of rows update each run, an incremental model will still scan most of the table for changes, making it slower than a full rebuild.

The Fix

Use incremental only when rows are mostly appended (not updated) and the dataset is large. For heavily-updated tables, use table materialization.

🛡️ Rule of thumb: incremental wins when <10% of rows change per run. If most rows update, use table.
3

Confusing source() with ref()

Using ref() for raw tables

What Goes Wrong

Using ref() for a raw source table that isn't a dbt model will fail at compile time. source() exists specifically for raw/external tables.

The Fix

Raw tables → declare in sources.yml and reference with {{ source('name','table') }}. dbt models → reference with {{ ref('model') }}.

🛡️ Memory trick: source() = "raw data I don't own", ref() = "models I built in dbt".
4

Overusing ephemeral models

When ephemeral becomes a performance problem

What Goes Wrong

Ephemeral models are injected as CTEs — if many downstream models reference them, the same CTE is duplicated in every compiled query, causing slow compile times and large SQL.

The Fix

Use ephemeral for light intermediate logic with only 1–2 consumers. For logic used by many models, use view or table instead.

🛡️ If you see "compile is slow" in exam scenarios, look for overused ephemeral models.
5

Forgetting dbt deps before using a package

Package macros aren't available until installed

What Goes Wrong

Adding a package to packages.yml doesn't install it. Running dbt run before dbt deps will fail with "macro not found" errors.

The Fix

After editing packages.yml, always run dbt deps first to install the packages into the dbt_packages/ directory.

🛡️ Workflow: edit packages.yml → dbt deps → dbt run. Never skip the middle step.

Frequently Asked Questions

What's the difference between dbt build and dbt run? +
dbt run executes only models. dbt build runs models, seeds, snapshots, AND tests together in DAG order — it's the recommended command for production jobs because it tests each node immediately after building it, catching issues early.
Can an ephemeral model be tested? +
Yes — you can define tests on ephemeral models in their .yml file. However, since ephemeral models don't exist as objects in the warehouse, dbt runs the tests by injecting the ephemeral CTE into the test query. This works but can be slower.
What happens if you run dbt run on a model that depends on an incremental model that hasn't been built yet? +
On the first run of an incremental model (or after --full-refresh), dbt builds the full table. The is_incremental() macro returns false on the first run, so the WHERE clause is not applied. Subsequent runs use incremental logic.
How does dbt determine the schema where a model is materialized? +
By default, dbt uses the target schema from your profiles.yml. You can override this with a custom schema macro or the schema config key. dbt appends the custom schema to the target schema by default (e.g., analytics_marketing), unless you override the generate_schema_name macro.
What does the @ selector do vs the + selector? +
+model selects the model and all its parents (upstream). model+ selects the model and all its children (downstream). @model selects the model, all its parents, AND all the parents of its children — useful for CI to ensure all dependencies of a changed model's downstream consumers are also built.
When should I use a seed vs a source? +
Seeds are for small, static reference data you manage in your dbt project (e.g., country codes, mapping tables). They're stored as CSV files in your repo. Sources are for raw data that lives in your warehouse and is loaded by external tools — they're declared but not loaded by dbt.
Can Python models reference other Python models? +
Yes — Python models can reference both SQL models and other Python models using dbt.ref('model_name') inside the model function. The dependency is tracked in the DAG just like SQL model references.
What's the purpose of the grants configuration? +
The grants config automatically runs GRANT statements after a model is materialized, giving specified roles or users SELECT access. This removes the need to manually manage permissions and ensures access is consistent across rebuilds. It's set in dbt_project.yml or inline in a model config block.
Official Resources

Domain 1 Study Resources

Official dbt Labs courses and documentation for this domain