SnowPro Core COF-C03 · Domain 3: Data Loading, Unloading & Connectivity · 18% of Exam
Stages · File Formats · COPY INTO · Snowpipe · Unloading · JDBC/ODBC · Kafka · Spark Connector
Start Free Practice →18% of the COF-C03 exam — approximately 18 questions. Covers stages, file formats, bulk loading, continuous ingestion, unloading, and client connectivity.
@~, @%table, @stage)ON_ERROR options and defaultsFORCE paramVALIDATION_MODE purpose (validate without loading)STRIP_OUTER_ARRAY| Topic Area | Importance | Key Commands |
|---|---|---|
| Stages (internal & external) | High | PUT, GET, LIST, REMOVE, CREATE STAGE |
| File Formats | Medium | CREATE FILE FORMAT, inline in COPY |
| COPY INTO (bulk load) | High | COPY INTO table FROM @stage |
| Snowpipe (continuous) | High | CREATE PIPE, auto-ingest, REST API |
| Data Unloading | Medium | COPY INTO @stage FROM table |
| Connectivity (JDBC/ODBC/Connectors) | Medium | SnowSQL, Kafka, Spark, Python, Snowpark |
| Stage Type | Syntax | Shareable? | Best For |
|---|---|---|---|
| User Stage | @~ | No | Personal / one-off loads |
| Table Stage | @%tablename | No | Single-table loading |
| Named Internal Stage | @stage_name | Yes | Reusable, shared pipelines |
| External Stage (S3) | @ext_stage (s3://…) | Yes | Cloud storage integration |
| Attribute | COPY INTO | Snowpipe |
|---|---|---|
| Compute | Requires virtual warehouse | Serverless (Snowflake-managed) |
| Trigger | Manual SQL execution | Event notification or REST API |
| Latency | Batch (scheduled) | Near-real-time (seconds–minutes) |
| Use case | Scheduled ETL, large batches | Continuous / streaming ingest |
| Billing | VW credit consumption | Per-credit (serverless) |
Detailed explanations of every topic tested in Domain 3.
Internal stages store data inside Snowflake. Three types exist:
@~) — per-user, auto-created, not shareable. Use for personal or one-off loading.@%tablename) — per-table, auto-created, not shareable. Best for loading a single table.@stage_name) — explicitly created, shareable, reusable. Most flexible option.External stages reference cloud object storage. Snowflake reads/writes data there but does not own the storage.
URL='s3://bucket/path/' with AWS credentials or storage integrationazure:// URLgcs:// URLCREATE STAGE my_s3_stage
URL = 's3://mybucket/data/'
STORAGE_INTEGRATION = my_s3_int
FILE_FORMAT = (FORMAT_NAME = my_csv);
Authentication options: Storage Integration (preferred — no credentials in stage) or credentials directly (CREDENTIALS=(AWS_KEY_ID=... AWS_SECRET_KEY=...)).
PUT file://localpath @stage — upload local file to internal stage (SnowSQL/client only)GET @stage file://localpath — download from internal stage to local (SnowSQL only)LIST @stage — list files in stageREMOVE @stage/file — delete file from stageImportant: PUT and GET cannot be executed from the Snowsight web UI — only via SnowSQL or compatible clients.
Supported formats: CSV (most common), JSON, Avro, ORC, Parquet, XML
CREATE FILE FORMAT my_csv
TYPE = CSV
FIELD_DELIMITER = ','
SKIP_HEADER = 1
NULL_IF = ('NULL', 'null')
EMPTY_FIELD_AS_NULL = TRUE
COMPRESSION = AUTO;
STRIP_OUTER_ARRAY = TRUE — for JSON arrays: loads each array element as a separate rowCOMPRESSION = AUTO — Snowflake detects compression (gzip, bz2, etc.)COPY INTO my_table
FROM @my_stage/data/
FILE_FORMAT = (FORMAT_NAME = 'my_csv')
ON_ERROR = CONTINUE
PURGE = TRUE
FORCE = FALSE;
ABORT_STATEMENT (default) — rolls back the entire loadCONTINUE — skip bad rows, load the restSKIP_FILE — skip entire file containing errorsSKIP_FILE_<n> — skip file if it has more than n errorsFORCE = TRUE to reload.Snowflake tracks loaded files per table for 64 days. Files already successfully loaded are skipped by default. FORCE = TRUE overrides this. Requires a running virtual warehouse to execute.
CREATE PIPE my_pipe
AUTO_INGEST = TRUE
AS
COPY INTO my_table FROM @my_stage;
insertFiles endpointSYSTEM$PIPE_STATUS('my_pipe') — check pipe statusINFORMATION_SCHEMA.COPY_HISTORY — review load and error historyCOPY INTO @my_stage/output/
FROM my_table
FILE_FORMAT = (TYPE = CSV)
HEADER = TRUE
MAX_FILE_SIZE = 16777216
SINGLE = FALSE
OVERWRITE = TRUE;
data_0_0_0.csv.gzCommand-line client for executing SQL, PUT/GET, and scripting. The only way to run PUT/GET commands interactively.
Standard database connectivity for BI tools (Tableau, Power BI), ETL tools, and custom applications.
snowflake-connector-python — supports standard query execution, parameter binding, and async queries.
DataFrame API for Python, Java, and Scala. Runs code inside Snowflake (pushdown to engine). Used for ETL and transformations without moving data out of Snowflake.
Snowflake Connector for Apache Kafka — streams data from Kafka topics into Snowflake tables via Snowpipe. Ideal for real-time/streaming data pipelines.
Snowflake Connector for Apache Spark — enables reading from and writing to Snowflake from Spark workloads.
| Tool | Category | Connection Method |
|---|---|---|
| dbt | Transformation | JDBC / Python connector |
| Fivetran / Airbyte | ELT / Ingestion | Native Snowflake connector |
| Informatica / Talend | ETL | JDBC / ODBC |
| Matillion | ELT | Native Snowflake connector |
| Snowsight (Web UI) | Browser UI | Native — cannot run PUT/GET |
PURGE = TRUE or manually REMOVE @stage after loadingFLATTENSix memory anchors — study these until they're automatic.
Stage Syntax: @~ = user, @%table = table stage, @name = named stage (most flexible)
Load Method: Snowpipe = serverless + event-driven. COPY INTO = batch + needs a VW.
Load History: Load history = 64 days. FORCE=TRUE overrides it. Default FORCE=FALSE skips already-loaded files.
File Size: Optimal file size: 100–250 MB compressed. Too small = many files overhead. Too big = slow parallel.
PUT/GET: PUT needs SnowSQL (CLI). Web UI cannot PUT. GET also needs SnowSQL.
ON_ERROR: ABORT (rollback all), CONTINUE (skip rows), SKIP_FILE (skip whole file).
10 exam-style questions covering Domain 3. Select your answer, then click Submit Quiz to see results.
1. What is the correct syntax to reference a user stage in Snowflake?
2. Which compute model does Snowpipe use for data loading?
3. A COPY INTO command encounters 3 bad rows in a file. You want the load to continue and skip only those rows while loading the rest. Which ON_ERROR setting should you use?
4. What is the recommended compressed file size range for optimal parallel loading into Snowflake?
5. A data engineer wants to upload a local CSV file to a named internal stage. Which client/tool is required for this operation?
6. Which of the following are valid Snowpipe trigger mechanisms? (Choose the option that lists BOTH correct triggers)
7. How many days does Snowflake track load history per table to prevent duplicate file loading?
8. A COPY INTO command is not reloading files that were already loaded successfully. Which parameter forces Snowflake to reload them regardless of load history?
9. Which authentication method is preferred when creating an external stage pointing to S3, to avoid storing credentials in the stage definition?
10. What is the purpose of VALIDATION_MODE = RETURN_ERRORS in a COPY INTO statement?
Click any card to flip it. Master all 8 cards before exam day.
What are the 3 internal stage types in Snowflake and their syntax?
User Stage: @~ — per-user, auto-created, not shareable. One-off use.
Table Stage: @%tablename — per-table, auto-created, not shareable. Single-table loading.
Named Internal: @stage_name — explicitly created, shareable, most flexible.
How does Snowpipe differ from COPY INTO for data loading?
COPY INTO: Batch, requires virtual warehouse, manual SQL trigger, large scheduled loads.
Snowpipe: Continuous, serverless (Snowflake-managed compute), triggered by event or REST API, near-real-time (seconds–minutes latency).
What are the ON_ERROR options in COPY INTO and their behaviors?
ABORT_STATEMENT (default) — rolls back the entire load on any error.
CONTINUE — skip bad rows, load the rest of the data.
SKIP_FILE — skip the entire file that contains errors.
SKIP_FILE_<n> — skip file if it has more than n errors.
How do you load semi-structured JSON data (an array) so each element becomes a separate row?
Use VARIANT column type to load semi-structured data.
Add STRIP_OUTER_ARRAY = TRUE in the file format or COPY command — this removes the outer array brackets and loads each element as a separate row.
After loading, use FLATTEN or dot-notation to query VARIANT fields.
What is the Snowflake Kafka Connector used for?
The Snowflake Connector for Apache Kafka streams data from Kafka topics directly into Snowflake tables using Snowpipe under the hood.
Use it for real-time/streaming data pipelines where events are published to Kafka and need to land in Snowflake with low latency.
What is the recommended file size for loading data into Snowflake, and why?
100–250 MB compressed per file is optimal.
Too small: excessive file overhead, many small S3/GCS calls, poor parallel utilization.
Too large: cannot parallelize effectively across warehouse nodes.
Split large files into multiple 100–250 MB chunks before staging.
What does VALIDATION_MODE = RETURN_ERRORS do in a COPY INTO statement?
Validates files without loading any data.
Snowflake scans the files and returns error details (row numbers, error messages) as a result set — no rows are inserted into the target table.
Use it to test a file format or check data quality before committing to a real load.
What is the difference between Snowpipe auto-ingest and REST API triggering?
Auto-ingest (recommended): Cloud event notifications (S3 SQS, Azure Event Grid, GCS Pub/Sub) automatically trigger the pipe when new files arrive. Fully hands-off.
REST API: Programmatic call to the insertFiles endpoint triggers loading. Requires application code to call the API when files are ready.
Select a topic area, answer the self-assessment question, and get targeted study guidance.
How confident are you with Snowflake stage types and their syntax?
Start with the three-way split: @~ is your personal user stage (auto-created, not shared). @%tablename is the table stage (auto-created, per table). @stage_name is a named stage you create explicitly — and it's the only one that's shareable and reusable across pipelines. Memorize the memory hook: "~ = user, % = table, name = named." Study the Stages concept card and Flashcard 1 before attempting the quiz.
Quick clarification: Neither the user stage (@~) nor the table stage (@%table) are shareable — they're personal/private by design. Only named stages (created with CREATE STAGE) are shareable and reusable. External stages (S3, Azure, GCS) are always named stages. Review the Stage Syntax table in the Overview tab.
External stage auth: The preferred method is a Storage Integration object — it uses an IAM role trust policy so no credentials appear in the stage definition. Credentials can be specified directly with CREDENTIALS=(AWS_KEY_ID=...) but this is less secure. Study the external stage CREATE STAGE example in the Concepts tab. Know: Storage Integration = preferred, Credentials = direct but not recommended.
You're in great shape on stages. Key review points: @~ (user, not shareable), @%table (table, not shareable), @stage_name (named, shareable). PUT/GET only via SnowSQL. External stages use storage integration (preferred) or credentials. Spend 5 minutes on the flashcard and move to COPY INTO.
How well do you understand Snowflake file format options and semi-structured loading?
Supported formats: CSV (most common), JSON, Avro, ORC, Parquet, XML. Create them with CREATE FILE FORMAT or define inline in the COPY command. Key CSV options to know: FIELD_DELIMITER, SKIP_HEADER, NULL_IF, EMPTY_FIELD_AS_NULL, COMPRESSION. Review the File Formats concept card and the CREATE FILE FORMAT SQL example.
Semi-structured formats (Parquet, Avro, ORC, JSON): These are loaded into VARIANT columns. VARIANT is Snowflake's semi-structured data type that can hold JSON-like hierarchical data. After loading, use dot-notation (col:field::type) or FLATTEN to query. Don't try to load Parquet into a regular VARCHAR column — use VARIANT.
STRIP_OUTER_ARRAY = TRUE: When your JSON file is a top-level array (e.g., [{...},{...}]), this setting removes the outer brackets and loads each object as a separate row. Without it, the entire array would load as one VARIANT row. This is a common exam trick — know when to use it.
File format recap: 6 supported formats (CSV, JSON, Avro, ORC, Parquet, XML). Semi-structured → VARIANT. STRIP_OUTER_ARRAY = TRUE for JSON arrays. COMPRESSION = AUTO detects gzip/bz2. Named file formats are reusable objects; inline definitions are one-time. You're ready for this topic — focus your remaining time on COPY INTO and Snowpipe.
How well do you understand the COPY INTO command and its parameters?
COPY INTO basics: COPY INTO loads files from a stage into a table. Direction: FROM stage → INTO table. Requires a running virtual warehouse. You specify the stage path, file format (named or inline), and error handling. Study the full COPY INTO example in the Concepts tab, then practice the quiz questions on this topic.
ON_ERROR options (memorize all four):
• ABORT_STATEMENT (default) = roll back everything on any error
• CONTINUE = skip bad rows, load the rest
• SKIP_FILE = skip the entire file with any error
• SKIP_FILE_<n> = skip file only if it has more than n errors
Hook: "ABORT all, CONTINUE rows, SKIP_FILE whole file." Review Memory Hook 6.
FORCE vs PURGE:
• FORCE = TRUE: reloads files even if they're in the 64-day load history (overrides duplicate protection). Default = FALSE.
• PURGE = TRUE: deletes files from the stage after a successful load (cleanup). Default = FALSE.
These are independent parameters — you can use both together. VALIDATION_MODE = RETURN_ERRORS: validates without loading any data.
COPY INTO recap: Batch load from stage to table. ON_ERROR: ABORT (default), CONTINUE, SKIP_FILE. FORCE overrides 64-day load history. PURGE cleans stage after load. VALIDATION_MODE validates without loading. Requires virtual warehouse. You're solid — reinforce with Flashcard 3 (ON_ERROR) and quiz questions 3, 7, 8, 10.
How confident are you with Snowpipe continuous ingestion?
Snowpipe fundamentals: Snowpipe is Snowflake's continuous/automated data ingestion service. Key differentiators from COPY INTO: (1) Serverless — no virtual warehouse needed, billed per credit used; (2) Triggered automatically by new files; (3) Near-real-time (NOT exactly real-time). CREATE PIPE syntax: CREATE PIPE my_pipe AUTO_INGEST = TRUE AS COPY INTO my_table FROM @my_stage. Review Flashcard 2 and the Snowpipe concept card.
Trigger mechanisms:
• Auto-ingest (preferred): Set AUTO_INGEST = TRUE in pipe definition. Configure cloud event notifications (S3 → SQS, Azure → Event Grid, GCS → Pub/Sub). Files automatically trigger pipe when they arrive in stage. Zero application code needed.
• REST API: Call the insertFiles endpoint programmatically. Your app must detect new files and make the API call. More control, more code.
Snowpipe monitoring:
• SYSTEM$PIPE_STATUS('pipe_name') — returns pipe status (running, paused, etc.) and backlog info
• INFORMATION_SCHEMA.COPY_HISTORY — query to review load history and errors
• Error files go to an error queue for review
Know these for the exam — they're easy points if you memorize the function names.
Snowpipe recap: Serverless, event-driven, near-real-time (not exactly real-time). CREATE PIPE AS COPY INTO. Two triggers: auto-ingest (cloud events) and REST API. Monitor with SYSTEM$PIPE_STATUS. No VW needed — billed per serverless credit. Flashcard 2 and 8 cover the key distinctions. Quiz question 2 and 6 target this area.
How well do you know Snowflake's connectivity options and client tools?
Connector overview:
• SnowSQL: CLI for SQL + PUT/GET. Required for PUT command.
• JDBC/ODBC: standard DB connectivity for BI tools (Tableau, Power BI), ETL tools
• Python Connector: snowflake-connector-python for programmatic queries
• Snowpark: DataFrame API (Python/Java/Scala) running inside Snowflake
• Kafka Connector: streaming from Kafka topics via Snowpipe
• Spark Connector: read/write Snowflake from Spark
Review the Connectivity concept card for full details.
Kafka and Spark connectors:
• Kafka Connector: For streaming pipelines. Kafka topic → Snowflake table. Uses Snowpipe internally. Use when events are published to Kafka and need near-real-time landing in Snowflake.
• Spark Connector: For batch or Spark-based data engineering. Read/write Snowflake from Apache Spark jobs. Use when your pipeline already runs on Spark/EMR/Databricks.
Snowpark vs Python Connector:
• Python Connector: Sends SQL queries to Snowflake from your local Python environment. Data can move back to your app. Standard client-side execution.
• Snowpark: DataFrame API that PUSHES computation INTO Snowflake. Code runs inside Snowflake's engine (no data movement). Use for ETL/transformations at scale — it's like running PySpark but inside Snowflake. Supports Python, Java, and Scala.
Connectivity recap: SnowSQL = CLI, required for PUT/GET. JDBC/ODBC = BI tools/ETL. Python Connector = programmatic SQL from Python. Snowpark = DataFrame API, pushdown inside Snowflake. Kafka Connector = Kafka → Snowflake via Snowpipe. Spark Connector = Spark ↔ Snowflake. Snowsight = web UI, cannot PUT. Partner tools (dbt, Fivetran, Matillion) use JDBC or native connectors. Flashcard 5 covers Kafka specifically.
Official documentation and study links for Domain 3.
Official exam guide, objectives, and registration for COF-C03.
Snowflake Certification Page →Official documentation on stages, COPY INTO, and bulk loading best practices.
Snowflake Docs: Data Loading →Auto-ingest setup, REST API, monitoring, and error handling for continuous ingestion.
Snowflake Docs: Snowpipe →COPY INTO for unloading, file options, single vs. parallel output.
Snowflake Docs: Unloading →Complete reference for CSV, JSON, Parquet, Avro, ORC, XML format parameters.
CREATE FILE FORMAT Reference →Setup guide for streaming Kafka topics into Snowflake tables via Snowpipe.
Kafka Connector Docs →DataFrame API for Python, Java, and Scala with pushdown computation inside Snowflake.
Snowpark Docs →Complete COF-C03 question bank with 500+ practice questions, timed exams, and analytics.
Start Free Practice →