Domain 4 — What the Exam Tests
Domain 4 accounts for 21% of the SnowPro Core COF-C03 exam and covers everything that makes Snowflake fast, flexible, and production-ready. You need to understand all three caching layers, how to read the Query Profile in Snowsight, when to apply clustering keys versus Search Optimization Service, and the mechanics of Snowflake's transformation pipeline toolset — Streams, Tasks, Dynamic Tables, Stored Procedures, and UDFs. The exam frequently tests the boundary between caching types, the cost model for features like Automatic Clustering and Materialized Views, and the CDC pattern built with Streams and Tasks.
Core Concepts at a Glance
Eight foundational pillars of Domain 4 performance and transformation
Three Caching Layers
Result Cache (24h, cloud services), Metadata Cache (persistent, cloud services), and Local Disk Cache (VW lifetime, SSD). Knowing which layer applies to which scenario is a top exam question.
Query Profile
Visual DAG in Snowsight showing every operator node — TableScan, Join, Aggregate, Sort. Reveals partition pruning efficiency, spilling to local SSD or remote storage, and exploding joins.
Micro-Partitions & Clustering
Snowflake auto-manages columnar micro-partitions (50–500 MB compressed). Clustering keys reorder data for query-pattern-aligned pruning on large tables. Automatic Clustering keeps them fresh.
VW Sizing Strategy
Scale UP (bigger size) for complex queries and large aggregations. Scale OUT (multi-cluster) for high concurrency. Each size step doubles compute and cost — roughly linear for memory-bound work.
Materialized Views
Pre-computed, auto-refreshed SELECT results stored on disk (Enterprise+). Faster reads at query time. Limitations: no JOINs, no non-deterministic functions, no subqueries in definition.
Streams & Tasks
Streams track DML changes (INSERT/UPDATE/DELETE) since last consumption. Tasks schedule SQL or stored procedures. Combined = micro-batch CDC pipeline. Check stream data with SYSTEM$STREAM_HAS_DATA().
Dynamic Tables
Declarative transformation: define a SELECT query and a TARGET_LAG. Snowflake auto-refreshes when staleness threshold is reached. Simpler alternative to manual Streams + Tasks orchestration.
UDFs & Stored Procedures
Scalar UDFs return one value per row; UDTFs return a table. Stored procedures support full procedural logic (loops, conditionals) via Snowflake Scripting, Python, JavaScript, Java, or Scala.
Three Caching Layers Quick Reference
| Cache Type | Scope | Duration | What's Cached |
|---|---|---|---|
| Result Cache | Cloud Services Layer | 24 hours | Exact query results; no VW needed to reuse |
| Metadata Cache | Cloud Services Layer | Persistent | Object metadata, min/max stats per micro-partition |
| Local Disk Cache (Data Cache) | Virtual Warehouse | VW running | Decompressed micro-partition data on SSD |
Exam Quick Facts
| Item | Detail |
|---|---|
| Exam | SnowPro Core COF-C03 (launched Feb 16, 2026) |
| Domain 4 Weight | 21% of 100 questions |
| Duration | 115 minutes |
| Passing Score | 750 / 1000 |
| Cost | $175 |
Deep Dive — Domain 4
Detailed coverage of all exam topics with syntax, nuance, and edge cases
Three Caching Layers
Result Cache
The Result Cache stores the exact output of a query in the Cloud Services Layer for 24 hours. Snowflake returns cached results instantly — no Virtual Warehouse credits consumed. Reuse conditions must ALL be true:
- Exact same SQL text
- Same user role (security context)
- No underlying data changes in the referenced objects
- No non-deterministic functions (e.g.,
CURRENT_TIMESTAMP(),RANDOM())
Metadata Cache
Persistent cache in the Cloud Services Layer storing micro-partition statistics — min/max values per column, row counts, and null counts. This enables Snowflake to answer certain queries with zero warehouse credits:
SELECT COUNT(*) FROM my_table— no VW requiredSELECT MIN(col), MAX(col) FROM my_table— no VW required- Partition pruning decisions are also driven by metadata cache
Local Disk Cache (Data Cache)
Each Virtual Warehouse node caches decompressed micro-partition data on its local SSD during query execution. Subsequent queries on the same VW can reuse this cached data — reducing remote storage reads.
- Cache lifetime tied to VW uptime — lost when the VW suspends
- Key reason to avoid frequent suspend/resume for performance-sensitive workloads
- Automatically evicted using LRU policy when SSD fills up
- Larger VW sizes = more SSD capacity per cluster node
Query Profile & Performance Analysis
Reading the Query Profile
Access in Snowsight under Query History. The profile is a visual DAG (directed acyclic graph) where each node is an operator:
- TableScan: shows partitions scanned vs. total — low % scanned = good pruning
- Join: reveals join type (hash, merge, nested loop) and row counts
- Aggregate: grouping and aggregation node
- Sort: sorting operator — can spill to disk if result set is large
Spilling to Disk
When an operator (Sort, Aggregate, Join) runs out of in-memory space, it spills intermediate data first to local SSD, then to remote cloud storage. This is a major performance bottleneck.
- Visible in Query Profile as "Bytes spilled to local storage" or "Bytes spilled to remote storage"
- Fix: scale UP (larger VW size = more memory per node)
- Also optimize the query: break large aggregations, avoid unnecessary sorts, use clustering keys
- Remote storage spill is far worse than local SSD spill — indicates severely under-resourced VW
Exploding Joins & Partition Pruning
Exploding join: query produces far more rows than expected. Indicates missing or incorrect join predicates — a Cartesian or near-Cartesian join. Check join node's output row count vs. input.
Partition pruning: if % of partitions scanned is high on a large table, natural clustering doesn't match your filter columns. Consider adding a clustering key.
- Use
SYSTEM$CLUSTERING_INFORMATION('table', '(col1, col2)')'to assess clustering quality before committing - Pruning efficiency is the single biggest lever for reducing query cost on large tables
Micro-Partitions & Clustering Keys
Micro-Partition Fundamentals
Snowflake stores data in columnar micro-partitions of 50–500 MB (compressed). They are auto-created, immutable, and managed entirely by Snowflake. Each micro-partition stores metadata (min/max per column) enabling pruning without reading the data.
- Natural clustering = insertion order — works well if data is appended in filter-column order (e.g., event logs by date)
- Poor natural clustering = high % of partitions scanned for selective queries
Clustering Keys
Explicitly define columns to physically cluster data around. Best suited for:
- Very large tables (multi-TB)
- Tables frequently filtered on specific high-cardinality columns
- Tables with poor natural clustering relative to query patterns
-- Define clustering key ALTER TABLE my_table CLUSTER BY (region, order_date); -- Check clustering quality SELECT SYSTEM$CLUSTERING_INFORMATION('my_table', '(region, order_date)');
- Automatic Clustering: Snowflake background service continuously re-clusters the table — uses serverless credits billed separately
- Clustering depth < 1 = well-clustered; high depth = poor clustering, queries scan too many partitions
- Use
ALTER TABLE my_table SUSPEND RECLUSTER;to pause Automatic Clustering
Virtual Warehouse Sizing Strategy
Scale UP vs. Scale OUT
| Strategy | Mechanism | Use Case |
|---|---|---|
| Scale UP | Resize VW (XS → S → M → L → XL…) | Complex queries, large aggregations, joins on many partitions — single query performance |
| Scale OUT | Multi-cluster VW (additional clusters spin up) | High concurrency — many simultaneous users/queries hitting the same VW |
- Each size step doubles compute resources (and cost) — roughly linear improvement for memory-bound queries
- Queuing indicator: check
QUEUED_PROVISIONING_TIMEandQUEUED_OVERLOAD_TIMEinQUERY_HISTORY - High queuing = Scale OUT. High spill-to-disk / slow single queries = Scale UP
Materialized Views
How They Work
A Materialized View (MV) stores the pre-computed result of a SELECT query on disk. When the base table changes, Snowflake's background service automatically refreshes the MV using serverless credits.
- Enterprise edition or above required
- Queries can use the MV transparently — Snowflake may rewrite the query automatically
- Faster reads than standard views (no recomputation at query time)
Limitations (exam-tested):
- No JOINs in the MV definition
- No non-deterministic functions (e.g.,
CURRENT_TIMESTAMP) - No subqueries in the definition
- Only one base table allowed
Best use case: expensive aggregations (SUM, COUNT, AVG) over large tables, queried frequently with the same filters.
Search Optimization Service
Point Lookup Acceleration
The Search Optimization Service (SOS) is a serverless feature that builds a persistent search access path alongside the table data. It dramatically speeds up equality and IN predicate queries on large tables.
-- Enable Search Optimization ALTER TABLE my_table ADD SEARCH OPTIMIZATION; -- Disable ALTER TABLE my_table DROP SEARCH OPTIMIZATION;
- Best for: tables with billions of rows, ad-hoc point lookups, highly selective WHERE clauses with equality or IN predicates
- NOT for: range scans, full table scans, analytical aggregation workloads — use clustering keys instead
- Incurs additional storage cost for the search access path
- Serverless credits consumed to build and maintain the access path
Streams (Change Data Capture)
How Streams Work
A Stream tracks DML changes (INSERT, UPDATE, DELETE) on a source table since the last time the stream was consumed. It uses the table's Time Travel metadata to track an offset.
-- Create a standard stream CREATE STREAM my_stream ON TABLE source_table; -- Query changes SELECT * FROM my_stream;
Metadata columns added by streams:
METADATA$ACTION: INSERT or DELETE (updates = one DELETE row + one INSERT row)METADATA$ISUPDATE: TRUE if the row is part of an UPDATE operationMETADATA$ROW_ID: unique identifier for the row
Stream Types & Staleness
- Standard stream: captures inserts, updates (as delete+insert pair), and deletes
- Append-only stream: captures inserts only — more efficient for append-only source tables (event logs, Kafka landing tables)
- Insert-only stream: for external tables; inserts only
Staleness: a stream becomes STALE if not consumed within the source table's Time Travel retention window (default 1 day, max 90 days on Enterprise). Once stale, the stream must be recreated.
- Consuming a stream requires a DML statement inside a transaction — this advances the stream offset
- Check:
SYSTEM$STREAM_HAS_DATA('my_stream')— returns TRUE if unconsumed changes exist
Tasks
Scheduling SQL with Tasks
Tasks schedule a SQL statement or a stored procedure call on a fixed interval or cron schedule.
-- VW-based task (every 5 minutes) CREATE TASK my_task WAREHOUSE = my_wh SCHEDULE = '5 MINUTE' AS INSERT INTO target_table SELECT * FROM my_stream WHERE METADATA$ACTION = 'INSERT'; -- Tasks start SUSPENDED — must resume ALTER TASK my_task RESUME;
- Default state: SUSPENDED — you must explicitly
ALTER TASK ... RESUMEafter creation - Serverless tasks: omit the WAREHOUSE clause; Snowflake manages compute via
USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE - Task trees: child tasks triggered on parent completion using
AFTER parent_taskclause - Stream + Task pattern: task checks
SYSTEM$STREAM_HAS_DATA()before processing to avoid empty runs
Dynamic Tables
Declarative Pipelines
Dynamic Tables take a declarative approach: you define the transformation query and a TARGET_LAG (maximum staleness). Snowflake automatically determines when and how to refresh — no manual stream/task orchestration needed.
CREATE DYNAMIC TABLE customer_summary TARGET_LAG = '1 minute' WAREHOUSE = transform_wh AS SELECT customer_id, SUM(amount) AS total_spend FROM orders GROUP BY customer_id;
- TARGET_LAG: max acceptable staleness — e.g.,
'1 minute','1 hour', orDOWNSTREAM(inherit from downstream dynamic tables) - Snowflake chooses full or incremental refresh automatically
- Dynamic Tables can chain: output of one feeds the definition of another
- Simpler to manage than Streams + Tasks for multi-hop transformation pipelines
Stored Procedures & UDFs
Stored Procedures
Stored procedures execute procedural logic: loops, conditionals, multiple SQL statements, and DDL. Called with CALL my_proc(args);.
- Snowflake Scripting (SQL-based): recommended approach — native SQL with procedural constructs
- JavaScript: legacy option, still tested on exam
- Python, Java, Scala: available for complex logic and library access
- Can return a scalar value or a table (TABLE return type)
- Can execute DDL (CREATE TABLE, DROP TABLE) — unlike UDFs
- Run with caller's rights or owner's rights — controls which role's privileges apply
UDFs (User-Defined Functions)
Scalar UDFs: return one value per input row. Used inline in SQL:
SELECT my_udf(column_name) FROM my_table;
UDTFs (Table Functions): return a table (multiple rows) per input. Called with TABLE():
SELECT * FROM TABLE(my_udtf(arg1, arg2));
- Languages: SQL, JavaScript, Python, Java
- Secure UDFs: hide implementation from users who lack OWNERSHIP privilege — required for data sharing scenarios where the UDF logic is proprietary
- UDFs cannot execute DML or DDL — use stored procedures for that
Memory Hooks
Six high-yield mnemonics to lock in Domain 4 concepts before exam day
The Three Cache Layers
Result cache: 24h, cloud services, exact query results, no VW needed. Metadata cache: persistent, cloud services, min/max stats, enables zero-credit COUNT(*). Local Disk cache: VW lifetime only, SSD, lost on suspend.
Spilling to Disk
Spilling to local SSD is bad. Spilling to remote storage is worse. The fix is always Scale UP (bigger VW size = more memory). Scale OUT addresses concurrency, not memory pressure.
Clustering vs. Search Optimization
Clustering keys: range scans, large tables, date/region filters. Search Optimization Service: point lookups, equality and IN predicates, ad-hoc queries on billions of rows.
Stream + Task Pipeline
Stream + Task = micro-batch CDC pipeline (manual orchestration). Dynamic Table = declarative, set TARGET_LAG and walk away. Streams go stale if not consumed within the Time Travel window.
Stream Staleness Warning
Streams become STALE if not consumed within the source table's Time Travel retention window (default 1 day, up to 90 days on Enterprise). Once stale the stream is gone — you must recreate it and lose change history.
VW Sizing Decision Rule
Scale UP (larger VW): slow individual queries, spilling to disk, large aggregations and joins. Scale OUT (multi-cluster): queuing, many concurrent users, high QUEUED_OVERLOAD_TIME in QUERY_HISTORY.
Practice Quiz
10 exam-style questions · Domain 4
ALTER TASK my_task RESUME; to activate them.TARGET_LAG = DOWNSTREAM means the Dynamic Table does not have its own independent refresh schedule — it inherits and aligns its refresh timing from the Dynamic Tables that depend on it downstream, optimizing the overall pipeline refresh cadence.Flashcards
Click any card to reveal the answer. Click again to flip back.
Metadata Cache: Cloud Services, persistent, micro-partition stats (min/max, row count) — enables zero-credit COUNT(*).
Local Disk Cache: Virtual Warehouse, VW lifetime, decompressed micro-partition data on SSD — lost on suspend.
Fix: Scale UP (use a larger VW size to give operators more memory). Also consider optimizing the query or adding clustering keys to reduce data scanned.
• Very large tables (multi-TB)
• Tables with high % partitions scanned (poor pruning)
• Frequently filtered on specific columns (date, region, status)
• Range scan queries
Not for: point lookup / equality queries — use Search Optimization Service instead.
METADATA$ISUPDATE: TRUE if the row is part of an UPDATE DML operation
METADATA$ROW_ID: Unique identifier for the source row across stream records
Activate with:
ALTER TASK my_task RESUME;Common pattern: create the task, verify the definition, then RESUME. To stop:
ALTER TASK my_task SUSPEND;Examples:
'1 minute', '1 hour'DOWNSTREAM: inherit refresh schedule from dependent downstream Dynamic Tables — used to optimize multi-hop pipeline refresh cadence.• Equality and IN predicate queries
• Ad-hoc point lookups on billions of rows
• Highly selective WHERE clauses
Clustering Keys:
• Range scans (date ranges, region ranges)
• Analytical aggregation on large tables
• Regularly filtered columns with low-to-medium cardinality
• No JOINs in the definition
• No non-deterministic functions (CURRENT_TIMESTAMP, RANDOM)
• No subqueries in the definition
• Only one base table allowed
• Requires Enterprise edition or above
• Auto-refresh uses serverless credits
Study Advisor
Select a topic area to get targeted study guidance for Domain 4
Mastering Snowflake's Three Cache Layers
- Memorize the scope of each layer: Result and Metadata caches live in Cloud Services (no VW needed); Local Disk Cache lives in the VW
- Result cache is invalidated if the underlying data changes OR if a non-deterministic function is used — this is an exam favorite
- Metadata cache enables zero-credit COUNT(*), MIN, MAX — remember this for "which queries use no credits?" questions
- Local Disk Cache is lost on VW suspend — the exam may ask why frequent suspend/resume hurts performance
- Practice distinguishing result cache reuse conditions: same SQL, same role, no data change, no non-deterministic functions
- Know that the Local Disk Cache uses an LRU (Least Recently Used) eviction policy
- Larger VW size = more SSD per node = larger Local Disk Cache capacity
Reading the Query Profile Like an Expert
- Open the Query Profile in Snowsight — navigate to Query History, click a query ID, then the Profile tab
- Each node in the DAG is an operator — the numbers show rows processed and bytes transferred between nodes
- TableScan node: check "Partitions scanned" vs "Partitions total" — high % = poor pruning = consider clustering key
- Spill to local storage = VW is under-sized for memory; spill to remote storage = severely under-sized
- Exploding join: the Join node shows output rows far exceeding input rows — check join predicates for missing conditions
- Sort is expensive — always check if you can eliminate unnecessary ORDER BY or push it later in the pipeline
- Practice the fix decision tree: spilling → Scale UP; queuing → Scale OUT; poor pruning → clustering key or SOS
Clustering Keys, Automatic Clustering & Search Optimization
- Clustering key use case: large table, range scan queries, poor natural clustering — not for point lookups
- Search Optimization Service use case: point lookups (equality, IN) on large tables, ad-hoc queries — not for range scans
- Use
SYSTEM$CLUSTERING_INFORMATION('table', '(col)')before adding a clustering key — check average depth - Average depth < 1: well-clustered. High depth means data is scattered across many partitions for that key column
- Automatic Clustering: Snowflake continuously re-clusters in the background — uses serverless credits billed separately from VW credits
- You can suspend Automatic Clustering:
ALTER TABLE my_table SUSPEND RECLUSTER; - Materialized Views can also improve performance — but they have strict limitations (no JOINs, Enterprise only)
- Choose 1–3 clustering columns max; high-cardinality + query-common columns work best
Building CDC Pipelines with Streams, Tasks & Dynamic Tables
- Stream offset advances only when a DML statement consuming the stream commits inside a transaction
- Staleness rule: stream must be consumed within the source table's Time Travel window — default 1 day, Enterprise up to 90 days
- Append-only stream is more efficient than standard stream for insert-only sources (event logs, Kafka)
- Tasks are always created SUSPENDED — always remember to
ALTER TASK my_task RESUME; - Task tree pattern: parent task runs first, child tasks execute
AFTER parent_task - Common exam question: Stream + Task micro-batch pattern vs. Dynamic Table declarative pattern — know when each is preferred
- Dynamic Tables simplify multi-hop pipelines — define SELECT query and TARGET_LAG, let Snowflake manage refresh
- DOWNSTREAM target lag: useful when chaining Dynamic Tables to align refresh schedules automatically
- Serverless tasks: omit WAREHOUSE clause, set USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE instead
UDFs, UDTFs, and Stored Procedures
- Scalar UDF: returns one value per input row — used inline in SELECT, WHERE, GROUP BY
- UDTF (Table Function): returns a table (zero or more rows) per input — called with TABLE(my_udtf(args))
- Stored procedures: execute multiple SQL statements, loops, conditionals — called with CALL my_proc()
- Stored procedures can execute DDL (CREATE TABLE, DROP) — UDFs cannot
- Recommended SP language: Snowflake Scripting (SQL-based) — easier for SQL-fluent developers
- JavaScript stored procedures are legacy — still on exam but Snowflake Scripting is preferred
- Secure UDFs: hide implementation from users who lack OWNERSHIP — required for data sharing with proprietary logic
- Caller's rights vs. Owner's rights stored procedures: controls which role's privileges apply at execution time
- Know that Python UDFs support third-party libraries via Snowpark — referenced in IMPORTS clause
Study Resources
Official and community resources for SnowPro Core COF-C03 Domain 4
SnowPro Core Certification Page
Official Snowflake certification hub — exam guide, study resources, and registration.
learn.snowflake.com →Snowflake Query Profile Docs
Official guide to reading the Query Profile DAG — operator nodes, spill indicators, and performance troubleshooting.
docs.snowflake.com →Caching in Snowflake
Official documentation on Result Cache, Metadata Cache, and Local Disk Cache behavior and configuration.
docs.snowflake.com →Micro-Partitions & Clustering
Deep dive into micro-partition structure, clustering keys, Automatic Clustering, and SYSTEM$CLUSTERING_INFORMATION.
docs.snowflake.com →Streams & Tasks
Complete reference for stream types (standard, append-only), metadata columns, task scheduling, and serverless tasks.
docs.snowflake.com →Dynamic Tables
Official guide to Dynamic Table syntax, TARGET_LAG settings, DOWNSTREAM refresh, and pipeline patterns.
docs.snowflake.com →Stored Procedures & UDFs
Reference for Snowflake Scripting, JavaScript, Python procedures; scalar UDFs and UDTFs; secure UDFs.
docs.snowflake.com →FlashGenius Practice Tests
Full-length SnowPro Core COF-C03 practice exams with detailed explanations for every Domain 4 topic.
flashgenius.net →