FlashGenius Logo FlashGenius
dbt Certification · Domain 5 of 6

dbt Analytics Engineering Certification — Documentation & External Dependencies

Doc blocks, YAML descriptions, exposures, source freshness, and DAG lineage — how dbt projects communicate their data contracts to the world.

~15%Combined Weight
Domains 6+7Combined
YAML-HeavyQuestion Style
Often SkippedStudy Gap

What This Domain Covers

This combined domain covers how dbt projects communicate — internally through docs and descriptions, and externally through exposures that show where data is consumed. Plus source freshness to ensure raw data isn't stale.

📝

Descriptions

Model, source, and column descriptions in YAML that power the dbt docs site

📦

Doc Blocks

Reusable description text defined in .md files and referenced via doc()

🔌

Exposures

Declare downstream consumers of your data (dashboards, ML models, apps)

⏱️

Source Freshness

Check that raw source data was loaded within an expected window

🎯 Key Exam Insight

Documentation and exposures are often under-studied. The exam tests that you know: (1) how to use {{ doc('block_name') }} for reusable descriptions, (2) what exposures add to the DAG visualization, and (3) what dbt source freshness does and how to configure warn/error thresholds for it.

Key Concepts Deep Dive

📝
Model & Column Descriptions
YAML documentation for the dbt docs site
High Frequency
  • Descriptions are added in .yml files under description: at the model, source, or column level
  • Run dbt docs generate to build the static documentation site
  • Run dbt docs serve to view it locally in a browser
  • dbt Cloud hosts the docs site automatically after every job run
  • Descriptions support Markdown formatting for rich text
  • Column descriptions can include business definitions, data type notes, and accepted values
💡 dbt docs generate creates the catalog.json and manifest.json files that power the docs site. These artifacts are also used for state-based selection.
📦
Doc Blocks
Reusable documentation via .md files
Medium Frequency
  • Doc blocks are defined in .md files inside the models/ directory (any subdirectory)
  • Syntax: {% docs block_name %} ... description text ... {% enddocs %}
  • Referenced in .yml with: description: "{{ doc('block_name') }}"
  • Enables DRY documentation — write a complex column definition once, use it across many models
  • Best practice: use doc blocks for columns that appear in multiple models (e.g., customer_id)
  • Doc blocks support full Markdown including tables, lists, and code snippets
💡 Use doc blocks when the same column definition appears in 3+ models. One update to the .md file propagates everywhere.
🔌
Exposures
Declaring downstream consumers in the DAG
High Frequency
  • Exposures declare downstream consumers of dbt models: dashboards, ML models, APIs, applications
  • Defined in .yml files with type, owner, and depends_on (listing the dbt models they consume)
  • Types: dashboard, notebook, analysis, ml, application
  • Exposures appear in the DAG as leaf nodes — you can see "what uses this model?"
  • Allows impact analysis: if fct_orders changes, which dashboards are affected?
  • Exposures do NOT run any transformation — they are metadata only
  • Useful for scoping CI builds: select upstream models of a specific exposure
💡 Exposures complete the lineage picture: raw source → staging → mart → dashboard. Without exposures, the DAG ends at the last dbt model.
⏱️
Source Freshness
Detecting stale raw data before it enters your pipeline
High Frequency
  • Defined in sources.yml under a freshness: block on a table
  • Requires a loaded_at_field — the timestamp column that indicates when a row was loaded
  • warn_after — duration after which dbt emits a warning (e.g., 12 hours)
  • error_after — duration after which dbt fails the freshness check (e.g., 24 hours)
  • Run with dbt source freshness — checks all sources with a freshness config
  • Can be integrated into jobs to halt pipelines if source data is stale
  • If a source has no freshness block, dbt won't check its freshness
💡 Source freshness prevents the "garbage in, garbage out" problem — don't transform stale data and send wrong numbers to dashboards.
🗺️
DAG Lineage & Macros for Documentation
Using macros to show model lineage in docs
Medium Frequency
  • The generate_schema_name and generate_database_name macros can be overridden to customize where models are materialized
  • In descriptions, you can use {{ ref('model') }} to create links to related models in the docs site
  • The dbt docs DAG view shows all model-to-model relationships built from ref() calls
  • Adding exposures extends the DAG to show consumption by downstream tools
  • Tags in model configs appear in the docs site and enable filtering
💡 The docs DAG is powered entirely by ref() and source() declarations — every hardcoded table reference is a gap in your lineage graph.

Domain 5 Study Checklist

Progress: 0 / 11 complete
Concept

Understand how to add descriptions to models, sources, and columns

In .yml under description: key at each level

Practice

Run dbt docs generate and dbt docs serve on your practice project

Explore the generated docs site — lineage, descriptions, test results

Concept

Know what doc blocks are and how to create them in .md files

{% docs block_name %} ... {% enddocs %} in a .md file inside models/

Practice

Create a doc block for a column and reference it with {{ doc('name') }}

Apply the same doc block to the same column in two different models

Concept

Understand what exposures are and the 5 exposure types

dashboard, notebook, analysis, ml, application — all metadata, no SQL

Practice

Write an exposure for a "Sales Dashboard" that depends on fct_orders

Define it in a .yml file with type: dashboard, owner, and depends_on

Exam Prep

Know that exposures show in the DAG but run no SQL

They extend lineage to downstream consumers — metadata only

Concept

Configure source freshness with warn_after and error_after

Requires loaded_at_field pointing to a timestamp column in the source table

Practice

Run dbt source freshness and interpret the output

See which sources pass, warn, or error based on configured thresholds

Exam Prep

Know that hardcoded table refs create gaps in the lineage DAG

Only ref() and source() calls are tracked — hardcoded names are invisible

Exam Prep

Understand how to select upstream models of an exposure

dbt build --select +exposure:exposure_name builds all upstream dependencies

Quick Reference

Descriptions in schema.yml

models: - name: fct_orders description: "One row per order" columns: - name: order_id description: "Unique order identifier" - name: customer_id # Reference a doc block: description: "{{ doc('customer_id') }}"

Doc Block (.md file)

-- models/docs.md {% docs customer_id %} The unique identifier for a customer. Maps to `customers.id` in the CRM. Used across all fact tables. {% enddocs %}

Exposure Definition

exposures: - name: sales_dashboard type: dashboard maturity: high owner: name: Analytics Team email: analytics@co.com depends_on: - ref('fct_orders') - ref('dim_customers') url: https://bi.company.com/sales

Source Freshness Config

sources: - name: salesforce tables: - name: accounts loaded_at_field: _loaded_at freshness: warn_after: count: 12 period: hour error_after: count: 24 period: hour

Docs & Exposure Commands

dbt docs generate # build docs site dbt docs serve # view locally dbt source freshness # check source age # Select upstream of exposure: dbt build \ --select +exposure:sales_dashboard

Exposure Types Reference

# Valid exposure types: type: dashboard # BI tool type: notebook # Jupyter etc type: analysis # ad-hoc type: ml # ML model type: application # app/API # Maturity levels: maturity: low | medium | high

Domain 5 Practice Quiz

5 questions on documentation and external dependencies.

Domain 5 Study Plan

3–4 days of targeted study covers this domain well.

Day 1 · Documentation

  • Add descriptions to every model, source, and 3+ columns in your practice project
  • Run dbt docs generate and dbt docs serve — explore the DAG and documentation
  • Create a doc block .md file and reference it from two different model columns

Day 2 · Exposures

  • Write 2 exposures: one dashboard and one ML model depending on your mart models
  • Regenerate docs and find the exposures in the DAG visualization
  • Practice: dbt build --select +exposure:my_dashboard

Days 3–4 · Source Freshness & Exam Prep

  • Add a freshness block to a source with warn_after and error_after thresholds
  • Run dbt source freshness and review the output
  • Do 10–15 practice questions specifically on documentation and exposures

Common Exam Mistakes — Domain 5

1

Thinking exposures run SQL or transform data

Exposures are metadata only

What Goes Wrong

Candidates assume exposures do something at runtime — they don't. Exposures are pure metadata that extend the lineage graph in dbt docs. No SQL runs, nothing is materialized.

The Fix

Think of exposures as "documentation nodes" in the DAG. They exist only to show what consumes your data and to enable upstream selection (dbt build --select +exposure:name).

🛡️ Exposures = metadata. They extend lineage visualization and enable upstream selection — that's their entire value.
2

Forgetting loaded_at_field for source freshness

Freshness checks require a timestamp column

What Goes Wrong

Configuring freshness: blocks without a loaded_at_field causes dbt source freshness to fail — it doesn't know which column to check for recency.

The Fix

Always pair freshness: with loaded_at_field: pointing to the timestamp column in the source table that indicates when the row was loaded (e.g., _loaded_at, updated_at).

🛡️ freshness requires loaded_at_field. No timestamp column = no freshness check.
3

Not using doc blocks for shared column definitions

Copy-pasting descriptions instead of reusing them

What Goes Wrong

The same column (e.g., customer_id) appears in 10 models with slightly different descriptions — inconsistent, hard to maintain, and a sign of poor documentation hygiene.

The Fix

Create a doc block for customer_id in a .md file and reference it with {{ doc('customer_id') }} in every model's .yml. One change updates all 10 descriptions.

🛡️ If a column description appears in 3+ models, it should be a doc block.
4

Confusing dbt docs generate with dbt docs serve

Two separate steps with different purposes

What Goes Wrong

Candidates run dbt docs serve without first running dbt docs generate, or assume docs generate automatically opens a browser. These are two separate commands.

The Fix

Always run dbt docs generate first (creates catalog.json + manifest.json). Then dbt docs serve to launch a local web server to view the docs. In dbt Cloud, generate runs automatically as part of jobs.

🛡️ generate → creates the artifacts. serve → hosts them in a browser.
5

Leaving hardcoded table refs that break lineage

The DAG only shows what ref() and source() can see

What Goes Wrong

A model that uses hardcoded schema.table references won't show those upstream dependencies in the docs DAG. The lineage graph is incomplete, making impact analysis unreliable.

The Fix

Replace all hardcoded table references with ref() and source(). Complete lineage is a key benefit of dbt — don't sacrifice it for convenience.

🛡️ Complete lineage = every table reference uses ref() or source(). No hardcoded names.

Frequently Asked Questions

What files does dbt docs generate create? +
dbt docs generate creates two files in the target/ directory: manifest.json (the project's complete model graph, tests, and metadata) and catalog.json (column-level metadata fetched from the warehouse). Both are needed for the full docs site. The manifest.json is also used for state-based selection.
Can you define multiple doc blocks in a single .md file? +
Yes — a single .md file can contain multiple doc blocks, each with a unique name. This is a common pattern: create a single docs.md file at the model folder level containing all column definitions for that domain.
How do exposures help with CI/CD pipelines? +
Exposures enable upstream selection: dbt build --select +exposure:dashboard_name builds only the models that feed a specific dashboard. This is useful in CI to verify that all changes to models upstream of a critical exposure are valid before merging to production.
What happens if a source table doesn't have a freshness config? +
Sources without a freshness block are silently skipped by dbt source freshness. The command only checks tables that have an explicit freshness: configuration with a loaded_at_field. Sources without this config are excluded from freshness reporting.
What is the maturity field in an exposure and does it affect anything? +
The maturity field (low, medium, high) is informational metadata indicating how mature/stable the exposure is. It appears in the docs site to communicate to data consumers whether they should treat this exposure as experimental or production-ready. It does not affect how dbt builds models or runs tests.
Do exposures appear in the dbt docs DAG? +
Yes — exposures appear as leaf nodes (endpoint nodes) in the DAG visualization. They show what downstream tools consume your dbt models, completing the full data lineage picture from raw source → staging → mart → dashboard/ML model/application.
Official Resources

Domain 5 Study Resources

Documentation and exposures references