FlashGenius Logo FlashGenius
Login Sign Up

SNOWPRO-CORE-C03 Practice Questions: Performance Optimization, Querying, and Transformation Domain

Test your SNOWPRO-CORE-C03 knowledge with 10 practice questions from the Performance Optimization, Querying, and Transformation domain. Includes detailed explanations and answers.

SNOWPRO-CORE-C03 Practice Questions

Master the Performance Optimization, Querying, and Transformation Domain

Test your knowledge in the Performance Optimization, Querying, and Transformation domain with these 10 practice questions. Each question is designed to help you prepare for the SNOWPRO-CORE-C03 certification exam with detailed explanations to reinforce your learning.

Question 1

An EVENT_LOG table stores 20 TB of application events in a VARIANT column DATA. Support engineers often run queries like: SELECT * FROM EVENT_LOG WHERE DATA:device_id = 'abc123' AND DATA:event_type = 'error'; Each query should return only tens of rows, but query profiles show that around 90% of micro-partitions are scanned, and runtimes are high even on a LARGE warehouse. Overall concurrency is low. The table is not clustered, and the team wants to accelerate these highly selective JSON-based lookups without major data model changes. What is the most appropriate optimization?

A) Enable search optimization service on the DATA:device_id and DATA:event_type fields.

B) Define a clustering key on the DATA:device_id field of EVENT_LOG.

C) Convert the VARIANT column into multiple relational columns and rebuild the table.

D) Increase the warehouse size to XLARGE to reduce scan time for each query.

Show Answer & Explanation

Correct Answer: A

Explanation:

Correct answer (A): The workload consists of highly selective filters on semi-structured JSON fields with poor pruning. Search optimization is designed for exactly this scenario: it builds data structures that support fast point lookups and very selective filters on high-cardinality or semi-structured columns, reducing the need to scan most micro-partitions.

Why the other options are wrong:
- Option B: Clustering may help pruning in some cases, but for highly selective queries on semi-structured fields, search optimization is more effective and explicitly built for this use case. Clustering also introduces ongoing reclustering overhead.
- Option C: Restructuring the table into relational columns is a major data model change, which the team wants to avoid. While it might help long term, Snowflake provides a more direct feature (search optimization) that addresses the current need.
- Option D: Scaling up the warehouse makes each scan faster but does not solve the fundamental issue that 90% of micro-partitions are scanned for very selective filters. It increases cost without fully addressing the performance problem.

Question 2

A data team maintains a curated CUSTOMER_METRICS table produced from several staging tables using a multi-step SQL transformation pipeline orchestrated by external tools. The pipeline joins, filters, and aggregates data and must keep CUSTOMER_METRICS reasonably up to date throughout the day. The team wants to simplify orchestration and let Snowflake maintain the transformed result over time, while still accepting that compute resources will be used for refreshes. Which Snowflake-native feature best fits this requirement?

A) Replace the pipeline with a single Dynamic Table that defines CUSTOMER_METRICS as a SQL query

B) Create a stream on CUSTOMER_METRICS and use a task to periodically rebuild it from scratch

C) Define a materialized view that directly stores CUSTOMER_METRICS based on the staging tables

D) Enable multi-cluster on the warehouse used by the existing external pipeline

Show Answer & Explanation

Correct Answer: A

Explanation:

Correct answer (A): Dynamic Tables let you declaratively define the transformation in SQL while Snowflake maintains the result over time and refreshes it according to defined policies. This reduces the need for complex external orchestration while still using compute resources behind the scenes for refresh operations, aligning well with the requirement to simplify orchestration of a curated CUSTOMER_METRICS table.

Why the other options are wrong:
- Option B: A stream on CUSTOMER_METRICS would track changes to the target, not simplify the upstream transformation logic. Using a task to periodically rebuild from scratch still leaves significant orchestration responsibilities with the team instead of letting Snowflake maintain the transformation declaratively.
- Option C: Materialized views are valuable for specific, repeated query patterns but are not described here as the main mechanism for replacing a multi-step transformation pipeline. They also incur maintenance and are better suited for single-query-like definitions rather than orchestrating an entire pipeline.
- Option D: Multi-cluster warehouses help with concurrency but do not simplify orchestration or maintenance of the transformation pipeline itself. The problem is not concurrency or warehouse capacity; it is the complexity of managing refresh logic over time.

Question 3

An analyst runs a complex reporting query against a SALES table and saves the SQL. A few minutes later, another user updates several rows in SALES. The analyst then re-runs the exact same saved query on the same warehouse and notices that it now takes much longer than before. What explains why the second execution did not benefit from the previous fast run?

A) The DML update invalidated the result cache for the SALES table, forcing the query to re-execute.

B) The metadata cache was cleared when the warehouse auto-suspended between the two runs.

C) The warehouse cache is only shared between different warehouses, not within the same warehouse.

D) The query text changed because the optimizer rewrote it, so the result cache could not be used.

Show Answer & Explanation

Correct Answer: A

Explanation:

Correct answer (A): Result cache entries are invalidated when DML operations modify the underlying data referenced by the cached query. Because rows in SALES were updated between runs, the cached result is no longer valid, and the query must be fully executed again, explaining the longer runtime.

Why the other options are wrong:
- Option B: Incorrect. The stem does not indicate that the warehouse suspended, and even if it did, result cache behavior depends on data changes, not strictly on warehouse suspend/resume. The key factor here is the update to SALES.
- Option C: Incorrect. Warehouse cache is local to a warehouse, not shared across warehouses, but that does not explain why the result cache was not used in this scenario.
- Option D: Incorrect. Optimizer rewrites do not typically cause the result cache to be bypassed when the query is logically identical. The primary reason here is the DML change to the underlying data.

Question 4

A data analyst notices the following behavior in Snowflake: - They run a long-running aggregation query against a sales table using role ANALYST and warehouse WH_BI. - A few minutes later, a colleague with role ANALYST_ADMIN runs a query that is textually identical (same SQL, same warehouse) and the sales table has not changed. - The colleague's query still takes the full execution time; it does not return instantly from cache. Why does the colleague's query not benefit from the result cache in this case?

A) Result cache is only reused within the same user session, regardless of role or warehouse

B) Result cache is invalidated automatically after the first reuse to avoid stale results

C) Result cache reuse requires the same role and session settings in addition to the same SQL and unchanged data

D) Result cache is disabled when using warehouses that have auto-suspend enabled

Show Answer & Explanation

Correct Answer: C

Explanation:

Correct answer (C): Result cache entries are reused only when the full query context matches: same SQL text, same warehouse, same role, and relevant session parameters, and the underlying data has not changed. Because the second user runs the query under a different role (ANALYST_ADMIN), the context differs, so Snowflake does not reuse the initial result cache entry and instead re-executes the query.

Why the other options are wrong:
- Option A: Result cache is not restricted to a single session; it can be reused across sessions and users as long as the SQL text, role, warehouse, and relevant session parameters match and underlying data is unchanged.
- Option B: Result cache entries can be reused multiple times for up to 24 hours as long as the context and data validity conditions are met. They are not invalidated after a single reuse.
- Option D: Auto-suspend affects when a warehouse is billed and when it starts or stops, but it has no direct impact on result cache behavior.

Question 5

An analytics team notices that the first large query of the morning against the SALES table is slower than similar queries executed later in the day. An engineer suggests keeping the reporting warehouse running overnight to "keep the cache warm" and avoid the slow first query. Finance wants to understand whether this is worth the extra cost. Which explanation about Snowflake's caching behavior is most accurate for this decision?

A) Metadata cache for micro-partitions is stored in the cloud services layer and is not tied to any specific warehouse, so keeping the warehouse running mainly preserves only its local data cache.

B) Result cache is cleared whenever a warehouse suspends, so keeping the warehouse running is essential to reuse query results.

C) All caches, including result, metadata, and warehouse caches, are reset when a warehouse is resized, so keeping it running at a constant size prevents any cache loss.

D) Warehouse cache stores both data pages and result sets, so it fully replaces the need for the result cache as long as the warehouse stays running.

Show Answer & Explanation

Correct Answer: A

Explanation:

Correct answer (A): Metadata cache, which supports pruning decisions, lives in the cloud services layer and does not depend on a specific warehouse. Keeping the warehouse running preserves only its local data cache. The team should weigh the benefit of avoiding a one-time warm-up of local data pages against the extra compute cost of not suspending.

Why the other options are wrong:
- Option B: Result cache is not tied to a specific warehouse and is not cleared simply because a warehouse suspends. This statement overstates the benefit of keeping the warehouse running.
- Option C: Metadata cache and result cache are not reset on warehouse resize; only the warehouse local cache (data pages) is lost. This answer incorrectly conflates different cache types.
- Option D: Warehouse cache holds data pages, not full query results, and does not replace the result cache. It helps repeated scans on the same warehouse but is not a substitute for the result cache behavior.

Question 6

A BI team runs hundreds of short dashboard queries every morning against the same set of core reporting tables. Users are complaining that queries sometimes sit in a queued state for several seconds during the busiest hour, even though individual query runtimes are acceptable once they start. The current warehouse is a single cluster X-LARGE. The architect wants to reduce queue times without significantly changing query code. What is the most appropriate Snowflake-native change to address this issue?

A) Increase the warehouse size from X-LARGE to 2X-LARGE while keeping a single cluster

B) Convert the warehouse to a multi-cluster X-LARGE warehouse with multiple clusters

C) Disable auto-suspend so the warehouse cache is always warm during peak periods

D) Add clustering keys to the core reporting tables to improve query speed

Show Answer & Explanation

Correct Answer: B

Explanation:

Correct answer (B): The symptom is query queueing during peak concurrent usage, not slow execution once queries run. Multi-cluster warehouses are specifically designed to address concurrency by adding more clusters that can run more queries in parallel. Keeping the same size (X-LARGE) but adding clusters improves throughput for many simultaneous dashboard queries without necessarily changing per-query latency.

Why the other options are wrong:
- Option A: Scaling up to a 2X-LARGE warehouse increases resources available to each query but does not directly increase the number of concurrent queries that can run without queueing on a single-cluster warehouse. This mainly helps individual heavy queries, not the concurrency bottleneck described.
- Option C: Auto-suspend affects cost and warm-up time, not how many queries can run concurrently once the warehouse is active. Disabling auto-suspend might reduce cold-start delays but does not address queueing under high concurrency.
- Option D: Clustering keys can help reduce bytes scanned for certain predicates, improving execution efficiency. However, the main problem here is queueing due to concurrency, not excessive scan time. Clustering would not prevent queries from being queued.

Question 7

A customer master table contains over 2 billion rows. An operations team frequently runs highly selective queries that look up individual customers by email address and return a single row. These queries often take several seconds because many micro-partitions are scanned even though only one row is returned. The table schema and workload pattern are stable, and these point lookups are mission critical. Which Snowflake feature is most appropriate to significantly reduce the response time for these point-lookup queries?

A) Create a clustering key on the email column

B) Enable search optimization on the email column

C) Create a materialized view that selects DISTINCT email and primary keys

D) Increase the size of the warehouse used for these queries

Show Answer & Explanation

Correct Answer: B

Explanation:

Correct answer (B): Search optimization is specifically designed to speed up highly selective lookups on large tables by creating additional data structures that allow Snowflake to locate matching rows with minimal scanning. Enabling search optimization on the email column is well-suited to stable, mission-critical point-lookups on a massive customer table.

Why the other options are wrong:
- Option A: A clustering key can improve pruning for certain range filters or ordered access patterns, but for random point-lookups on email in a very large table it may still require scanning many partitions and adds ongoing maintenance overhead.
- Option C: A materialized view of DISTINCT email and keys would still require lookups and joins to retrieve full row details and does not inherently provide the efficient index-like access that search optimization offers for these predicates.
- Option D: Scaling up the warehouse can reduce execution time slightly but does not solve the underlying issue of scanning many micro-partitions to find a single row, and it increases compute costs without the targeted benefit of search optimization.

Question 8

A VARIANT column in a Snowflake table stores JSON documents with a top-level field customer_id and a nested array of items. An analyst needs to filter rows where customer_id = 123 and compute a simple aggregate (COUNT(*)) over those rows. They do not need to explode the items array. What is the most performance-conscious way to access the JSON field in this query?

A) Use FLATTEN on the items array in a lateral join and then filter on VALUE:customer_id.

B) Use VARIANT path notation directly in the WHERE clause, such as data:customer_id = 123.

C) First unnest the items array into a separate table using FLATTEN, then join back to the original table to filter by customer_id.

D) Convert the VARIANT column to VARCHAR using TO_JSON and then use string functions to find customer_id = 123.

Show Answer & Explanation

Correct Answer: B

Explanation:

Correct answer (B): When only a top-level field is needed and there is no requirement to explode arrays, using VARIANT path notation (e.g., data:customer_id) in the filter is more efficient than FLATTEN. It avoids generating additional rows and operators, keeping the query simpler and more performant.

Why the other options are wrong:
- Option A: FLATTEN is useful when you need to explode arrays into separate rows for joins or aggregations on array elements. Here, only the top-level customer_id field is needed, so FLATTEN introduces unnecessary overhead.
- Option C: Creating a separate table via FLATTEN and then joining back adds unnecessary complexity and compute cost for a simple filter on a top-level field.
- Option D: Converting VARIANT to VARCHAR and using string operations is generally slower, less robust, and bypasses Snowflake’s native semi-structured data optimizations.

Question 9

A finance team uses a BI dashboard that repeatedly runs the same complex aggregation over a large transactions table to compute month-to-date and year-to-date metrics. The query is expensive and runs many times per hour by multiple users. The source data is updated throughout the day, and the dashboard can tolerate data being a few minutes behind the base table. Which Snowflake feature is the most appropriate to improve dashboard query performance in this scenario?

A) Create a standard view that defines the aggregation logic

B) Create a materialized view that precomputes the aggregation

C) Enable search optimization on the transactions table

D) Convert the aggregation into a Dynamic Table with a 1-hour freshness goal

Show Answer & Explanation

Correct Answer: B

Explanation:

Correct answer (B): Materialized views store precomputed results of an expensive query and are automatically maintained as the base table changes. For a complex aggregation that is run frequently by many users and can tolerate slight data latency, a materialized view is the most appropriate feature to significantly accelerate dashboard performance.

Why the other options are wrong:
- Option A: Standard views are just saved SQL definitions; they do not precompute or store results. Every query against the view still runs the full aggregation and will not materially improve performance.
- Option C: Search optimization targets highly selective lookups and certain semi-structured patterns. It does not optimize large aggregations over many rows, so it is not suitable here.
- Option D: Dynamic Tables can define incremental transformations with freshness goals, but the option specifies a 1-hour freshness target, which does not meet the 'few minutes' tolerance. For a single repeated aggregation, a materialized view is a more direct and standard pattern for accelerating BI queries.

Question 10

A CUSTOMER table contains 100 million rows. A support application frequently executes queries like: SELECT * FROM CUSTOMER WHERE EMAIL = :email_param; Each query returns at most one row but currently scans millions of rows, leading to noticeable latency. Concurrency is moderate, and the rest of the workload on this table is unchanged. The team wants to accelerate these point-lookups without changing the data model. What is the best Snowflake feature to apply?

A) Enable search optimization on the EMAIL column.

B) Create a materialized view that selects EMAIL and the most-used customer attributes.

C) Define a clustering key on the EMAIL column.

D) Resize the warehouse to a larger size to reduce scan time.

Show Answer & Explanation

Correct Answer: A

Explanation:

Correct answer (A): Search optimization is designed to speed up highly selective point lookups and filters, especially on high-cardinality columns. Enabling it on the EMAIL column creates additional data structures that allow Snowflake to return a tiny subset of rows quickly without scanning large portions of the table.

Why the other options are wrong:
- Option B: A materialized view containing the same columns does not inherently make lookups selective or avoid scanning many rows; it is more beneficial for repeated aggregations or complex queries than simple point lookups.
- Option C: Clustering can help pruning, but EMAIL-based lookups are highly selective on a high-cardinality column. Search optimization is better suited for this pattern and avoids the ongoing reclustering overhead.
- Option D: Scaling up the warehouse may reduce latency temporarily but does not address the underlying inefficiency of scanning many unnecessary rows for point lookups. It also increases cost without leveraging Snowflake's selective access features.

Ready to Accelerate Your SNOWPRO-CORE-C03 Preparation?

Join thousands of professionals who are advancing their careers through expert certification preparation with FlashGenius.

  • ✅ Unlimited practice questions across all SNOWPRO-CORE-C03 domains
  • ✅ Full-length exam simulations with real-time scoring
  • ✅ AI-powered performance tracking and weak area identification
  • ✅ Personalized study plans with adaptive learning
  • ✅ Mobile-friendly platform for studying anywhere, anytime
  • ✅ Expert explanations and study resources
Start Free Practice Now

Already have an account? Sign in here

About SNOWPRO-CORE-C03 Certification

The SNOWPRO-CORE-C03 certification validates your expertise in performance optimization, querying, and transformation and other critical domains. Our comprehensive practice questions are carefully crafted to mirror the actual exam experience and help you identify knowledge gaps before test day.