FlashGenius Logo FlashGenius
SnowPro Core COF-C03 · Page 3 of 5 · Domain 3

Data Loading, Unloading & Connectivity

SnowPro Core COF-C03 · Domain 3: Data Loading, Unloading & Connectivity · 18% of Exam

Stages · File Formats · COPY INTO · Snowpipe · Unloading · JDBC/ODBC · Kafka · Spark Connector

Start Free Practice →
100 Questions 115 Minutes Passing Score 750/1000 Exam Fee $175 Launched Feb 16, 2026

Domain 3 — Data Loading, Unloading & Connectivity

18% of the COF-C03 exam — approximately 18 questions. Covers stages, file formats, bulk loading, continuous ingestion, unloading, and client connectivity.

What You Must Know for This Domain

Loading Pathway

  • Identify correct stage type (@~, @%table, @stage)
  • Distinguish COPY INTO (batch) vs Snowpipe (continuous/serverless)
  • Know all ON_ERROR options and defaults
  • Know load history window (64 days) and FORCE param
  • Know VALIDATION_MODE purpose (validate without loading)

File Formats & Connectivity

  • Supported formats: CSV, JSON, Avro, ORC, Parquet, XML
  • Semi-structured data → VARIANT + STRIP_OUTER_ARRAY
  • PUT/GET require SnowSQL (not web UI)
  • Know when to use Kafka connector vs Spark connector vs Python connector
  • Optimal file size: 100–250 MB compressed

Domain Weight Breakdown

Topic AreaImportanceKey Commands
Stages (internal & external)HighPUT, GET, LIST, REMOVE, CREATE STAGE
File FormatsMediumCREATE FILE FORMAT, inline in COPY
COPY INTO (bulk load)HighCOPY INTO table FROM @stage
Snowpipe (continuous)HighCREATE PIPE, auto-ingest, REST API
Data UnloadingMediumCOPY INTO @stage FROM table
Connectivity (JDBC/ODBC/Connectors)MediumSnowSQL, Kafka, Spark, Python, Snowpark

Quick-Reference: Stage Syntax

Stage TypeSyntaxShareable?Best For
User Stage@~NoPersonal / one-off loads
Table Stage@%tablenameNoSingle-table loading
Named Internal Stage@stage_nameYesReusable, shared pipelines
External Stage (S3)@ext_stage (s3://…)YesCloud storage integration

COPY INTO vs Snowpipe

AttributeCOPY INTOSnowpipe
ComputeRequires virtual warehouseServerless (Snowflake-managed)
TriggerManual SQL executionEvent notification or REST API
LatencyBatch (scheduled)Near-real-time (seconds–minutes)
Use caseScheduled ETL, large batchesContinuous / streaming ingest
BillingVW credit consumptionPer-credit (serverless)

Core Concepts

Detailed explanations of every topic tested in Domain 3.

📦 Stages — Where Data Lives Before/After Loading

Internal Stages

Internal stages store data inside Snowflake. Three types exist:

External Stages

External stages reference cloud object storage. Snowflake reads/writes data there but does not own the storage.

CREATE STAGE my_s3_stage
  URL = 's3://mybucket/data/'
  STORAGE_INTEGRATION = my_s3_int
  FILE_FORMAT = (FORMAT_NAME = my_csv);

Authentication options: Storage Integration (preferred — no credentials in stage) or credentials directly (CREDENTIALS=(AWS_KEY_ID=... AWS_SECRET_KEY=...)).

Stage Commands

Important: PUT and GET cannot be executed from the Snowsight web UI — only via SnowSQL or compatible clients.

📋 File Formats

Supported formats: CSV (most common), JSON, Avro, ORC, Parquet, XML

CREATE FILE FORMAT my_csv
  TYPE = CSV
  FIELD_DELIMITER = ','
  SKIP_HEADER = 1
  NULL_IF = ('NULL', 'null')
  EMPTY_FIELD_AS_NULL = TRUE
  COMPRESSION = AUTO;

📥 COPY INTO — Bulk Loading

COPY INTO my_table
FROM @my_stage/data/
FILE_FORMAT = (FORMAT_NAME = 'my_csv')
ON_ERROR = CONTINUE
PURGE = TRUE
FORCE = FALSE;

Key Parameters

Load History

Snowflake tracks loaded files per table for 64 days. Files already successfully loaded are skipped by default. FORCE = TRUE overrides this. Requires a running virtual warehouse to execute.

🔄 Snowpipe — Continuous/Automated Loading

CREATE PIPE my_pipe
  AUTO_INGEST = TRUE
AS
  COPY INTO my_table FROM @my_stage;

Key Characteristics

Two Trigger Mechanisms

Monitoring

📤 Data Unloading

COPY INTO @my_stage/output/
FROM my_table
FILE_FORMAT = (TYPE = CSV)
HEADER = TRUE
MAX_FILE_SIZE = 16777216
SINGLE = FALSE
OVERWRITE = TRUE;

🔌 Connectivity Options

SnowSQL (CLI)

Command-line client for executing SQL, PUT/GET, and scripting. The only way to run PUT/GET commands interactively.

JDBC / ODBC Drivers

Standard database connectivity for BI tools (Tableau, Power BI), ETL tools, and custom applications.

Python Connector

snowflake-connector-python — supports standard query execution, parameter binding, and async queries.

Snowpark

DataFrame API for Python, Java, and Scala. Runs code inside Snowflake (pushdown to engine). Used for ETL and transformations without moving data out of Snowflake.

Kafka Connector

Snowflake Connector for Apache Kafka — streams data from Kafka topics into Snowflake tables via Snowpipe. Ideal for real-time/streaming data pipelines.

Spark Connector

Snowflake Connector for Apache Spark — enables reading from and writing to Snowflake from Spark workloads.

Partner / Third-Party Tools

ToolCategoryConnection Method
dbtTransformationJDBC / Python connector
Fivetran / AirbyteELT / IngestionNative Snowflake connector
Informatica / TalendETLJDBC / ODBC
MatillionELTNative Snowflake connector
Snowsight (Web UI)Browser UINative — cannot run PUT/GET

Data Loading Best Practices

Memory Hooks

Six memory anchors — study these until they're automatic.

🎯

Stage Syntax: @~ = user, @%table = table stage, @name = named stage (most flexible)

Load Method: Snowpipe = serverless + event-driven. COPY INTO = batch + needs a VW.

📅

Load History: Load history = 64 days. FORCE=TRUE overrides it. Default FORCE=FALSE skips already-loaded files.

📏

File Size: Optimal file size: 100–250 MB compressed. Too small = many files overhead. Too big = slow parallel.

💻

PUT/GET: PUT needs SnowSQL (CLI). Web UI cannot PUT. GET also needs SnowSQL.

⚠️

ON_ERROR: ABORT (rollback all), CONTINUE (skip rows), SKIP_FILE (skip whole file).

Practice Quiz

10 exam-style questions covering Domain 3. Select your answer, then click Submit Quiz to see results.

1. What is the correct syntax to reference a user stage in Snowflake?

2. Which compute model does Snowpipe use for data loading?

3. A COPY INTO command encounters 3 bad rows in a file. You want the load to continue and skip only those rows while loading the rest. Which ON_ERROR setting should you use?

4. What is the recommended compressed file size range for optimal parallel loading into Snowflake?

5. A data engineer wants to upload a local CSV file to a named internal stage. Which client/tool is required for this operation?

6. Which of the following are valid Snowpipe trigger mechanisms? (Choose the option that lists BOTH correct triggers)

7. How many days does Snowflake track load history per table to prevent duplicate file loading?

8. A COPY INTO command is not reloading files that were already loaded successfully. Which parameter forces Snowflake to reload them regardless of load history?

9. Which authentication method is preferred when creating an external stage pointing to S3, to avoid storing credentials in the stage definition?

10. What is the purpose of VALIDATION_MODE = RETURN_ERRORS in a COPY INTO statement?

Flashcards

Click any card to flip it. Master all 8 cards before exam day.

Stages

What are the 3 internal stage types in Snowflake and their syntax?

Click to reveal →

User Stage: @~ — per-user, auto-created, not shareable. One-off use.

Table Stage: @%tablename — per-table, auto-created, not shareable. Single-table loading.

Named Internal: @stage_name — explicitly created, shareable, most flexible.

Loading Methods

How does Snowpipe differ from COPY INTO for data loading?

Click to reveal →

COPY INTO: Batch, requires virtual warehouse, manual SQL trigger, large scheduled loads.

Snowpipe: Continuous, serverless (Snowflake-managed compute), triggered by event or REST API, near-real-time (seconds–minutes latency).

COPY INTO

What are the ON_ERROR options in COPY INTO and their behaviors?

Click to reveal →

ABORT_STATEMENT (default) — rolls back the entire load on any error.

CONTINUE — skip bad rows, load the rest of the data.

SKIP_FILE — skip the entire file that contains errors.

SKIP_FILE_<n> — skip file if it has more than n errors.

File Formats

How do you load semi-structured JSON data (an array) so each element becomes a separate row?

Click to reveal →

Use VARIANT column type to load semi-structured data.

Add STRIP_OUTER_ARRAY = TRUE in the file format or COPY command — this removes the outer array brackets and loads each element as a separate row.

After loading, use FLATTEN or dot-notation to query VARIANT fields.

Connectivity

What is the Snowflake Kafka Connector used for?

Click to reveal →

The Snowflake Connector for Apache Kafka streams data from Kafka topics directly into Snowflake tables using Snowpipe under the hood.

Use it for real-time/streaming data pipelines where events are published to Kafka and need to land in Snowflake with low latency.

Best Practices

What is the recommended file size for loading data into Snowflake, and why?

Click to reveal →

100–250 MB compressed per file is optimal.

Too small: excessive file overhead, many small S3/GCS calls, poor parallel utilization.

Too large: cannot parallelize effectively across warehouse nodes.

Split large files into multiple 100–250 MB chunks before staging.

COPY INTO

What does VALIDATION_MODE = RETURN_ERRORS do in a COPY INTO statement?

Click to reveal →

Validates files without loading any data.

Snowflake scans the files and returns error details (row numbers, error messages) as a result set — no rows are inserted into the target table.

Use it to test a file format or check data quality before committing to a real load.

Snowpipe

What is the difference between Snowpipe auto-ingest and REST API triggering?

Click to reveal →

Auto-ingest (recommended): Cloud event notifications (S3 SQS, Azure Event Grid, GCS Pub/Sub) automatically trigger the pipe when new files arrive. Fully hands-off.

REST API: Programmatic call to the insertFiles endpoint triggers loading. Requires application code to call the API when files are ready.

Study Advisor

Select a topic area, answer the self-assessment question, and get targeted study guidance.

How confident are you with Snowflake stage types and their syntax?

Start with the three-way split: @~ is your personal user stage (auto-created, not shared). @%tablename is the table stage (auto-created, per table). @stage_name is a named stage you create explicitly — and it's the only one that's shareable and reusable across pipelines. Memorize the memory hook: "~ = user, % = table, name = named." Study the Stages concept card and Flashcard 1 before attempting the quiz.

Quick clarification: Neither the user stage (@~) nor the table stage (@%table) are shareable — they're personal/private by design. Only named stages (created with CREATE STAGE) are shareable and reusable. External stages (S3, Azure, GCS) are always named stages. Review the Stage Syntax table in the Overview tab.

External stage auth: The preferred method is a Storage Integration object — it uses an IAM role trust policy so no credentials appear in the stage definition. Credentials can be specified directly with CREDENTIALS=(AWS_KEY_ID=...) but this is less secure. Study the external stage CREATE STAGE example in the Concepts tab. Know: Storage Integration = preferred, Credentials = direct but not recommended.

You're in great shape on stages. Key review points: @~ (user, not shareable), @%table (table, not shareable), @stage_name (named, shareable). PUT/GET only via SnowSQL. External stages use storage integration (preferred) or credentials. Spend 5 minutes on the flashcard and move to COPY INTO.

How well do you understand Snowflake file format options and semi-structured loading?

Supported formats: CSV (most common), JSON, Avro, ORC, Parquet, XML. Create them with CREATE FILE FORMAT or define inline in the COPY command. Key CSV options to know: FIELD_DELIMITER, SKIP_HEADER, NULL_IF, EMPTY_FIELD_AS_NULL, COMPRESSION. Review the File Formats concept card and the CREATE FILE FORMAT SQL example.

Semi-structured formats (Parquet, Avro, ORC, JSON): These are loaded into VARIANT columns. VARIANT is Snowflake's semi-structured data type that can hold JSON-like hierarchical data. After loading, use dot-notation (col:field::type) or FLATTEN to query. Don't try to load Parquet into a regular VARCHAR column — use VARIANT.

STRIP_OUTER_ARRAY = TRUE: When your JSON file is a top-level array (e.g., [{...},{...}]), this setting removes the outer brackets and loads each object as a separate row. Without it, the entire array would load as one VARIANT row. This is a common exam trick — know when to use it.

File format recap: 6 supported formats (CSV, JSON, Avro, ORC, Parquet, XML). Semi-structured → VARIANT. STRIP_OUTER_ARRAY = TRUE for JSON arrays. COMPRESSION = AUTO detects gzip/bz2. Named file formats are reusable objects; inline definitions are one-time. You're ready for this topic — focus your remaining time on COPY INTO and Snowpipe.

How well do you understand the COPY INTO command and its parameters?

COPY INTO basics: COPY INTO loads files from a stage into a table. Direction: FROM stage → INTO table. Requires a running virtual warehouse. You specify the stage path, file format (named or inline), and error handling. Study the full COPY INTO example in the Concepts tab, then practice the quiz questions on this topic.

ON_ERROR options (memorize all four):
• ABORT_STATEMENT (default) = roll back everything on any error
• CONTINUE = skip bad rows, load the rest
• SKIP_FILE = skip the entire file with any error
• SKIP_FILE_<n> = skip file only if it has more than n errors
Hook: "ABORT all, CONTINUE rows, SKIP_FILE whole file." Review Memory Hook 6.

FORCE vs PURGE:
• FORCE = TRUE: reloads files even if they're in the 64-day load history (overrides duplicate protection). Default = FALSE.
• PURGE = TRUE: deletes files from the stage after a successful load (cleanup). Default = FALSE.
These are independent parameters — you can use both together. VALIDATION_MODE = RETURN_ERRORS: validates without loading any data.

COPY INTO recap: Batch load from stage to table. ON_ERROR: ABORT (default), CONTINUE, SKIP_FILE. FORCE overrides 64-day load history. PURGE cleans stage after load. VALIDATION_MODE validates without loading. Requires virtual warehouse. You're solid — reinforce with Flashcard 3 (ON_ERROR) and quiz questions 3, 7, 8, 10.

How confident are you with Snowpipe continuous ingestion?

Snowpipe fundamentals: Snowpipe is Snowflake's continuous/automated data ingestion service. Key differentiators from COPY INTO: (1) Serverless — no virtual warehouse needed, billed per credit used; (2) Triggered automatically by new files; (3) Near-real-time (NOT exactly real-time). CREATE PIPE syntax: CREATE PIPE my_pipe AUTO_INGEST = TRUE AS COPY INTO my_table FROM @my_stage. Review Flashcard 2 and the Snowpipe concept card.

Trigger mechanisms:
Auto-ingest (preferred): Set AUTO_INGEST = TRUE in pipe definition. Configure cloud event notifications (S3 → SQS, Azure → Event Grid, GCS → Pub/Sub). Files automatically trigger pipe when they arrive in stage. Zero application code needed.
REST API: Call the insertFiles endpoint programmatically. Your app must detect new files and make the API call. More control, more code.

Snowpipe monitoring:
• SYSTEM$PIPE_STATUS('pipe_name') — returns pipe status (running, paused, etc.) and backlog info
• INFORMATION_SCHEMA.COPY_HISTORY — query to review load history and errors
• Error files go to an error queue for review
Know these for the exam — they're easy points if you memorize the function names.

Snowpipe recap: Serverless, event-driven, near-real-time (not exactly real-time). CREATE PIPE AS COPY INTO. Two triggers: auto-ingest (cloud events) and REST API. Monitor with SYSTEM$PIPE_STATUS. No VW needed — billed per serverless credit. Flashcard 2 and 8 cover the key distinctions. Quiz question 2 and 6 target this area.

How well do you know Snowflake's connectivity options and client tools?

Connector overview:
• SnowSQL: CLI for SQL + PUT/GET. Required for PUT command.
• JDBC/ODBC: standard DB connectivity for BI tools (Tableau, Power BI), ETL tools
• Python Connector: snowflake-connector-python for programmatic queries
• Snowpark: DataFrame API (Python/Java/Scala) running inside Snowflake
• Kafka Connector: streaming from Kafka topics via Snowpipe
• Spark Connector: read/write Snowflake from Spark
Review the Connectivity concept card for full details.

Kafka and Spark connectors:
Kafka Connector: For streaming pipelines. Kafka topic → Snowflake table. Uses Snowpipe internally. Use when events are published to Kafka and need near-real-time landing in Snowflake.
Spark Connector: For batch or Spark-based data engineering. Read/write Snowflake from Apache Spark jobs. Use when your pipeline already runs on Spark/EMR/Databricks.

Snowpark vs Python Connector:
Python Connector: Sends SQL queries to Snowflake from your local Python environment. Data can move back to your app. Standard client-side execution.
Snowpark: DataFrame API that PUSHES computation INTO Snowflake. Code runs inside Snowflake's engine (no data movement). Use for ETL/transformations at scale — it's like running PySpark but inside Snowflake. Supports Python, Java, and Scala.

Connectivity recap: SnowSQL = CLI, required for PUT/GET. JDBC/ODBC = BI tools/ETL. Python Connector = programmatic SQL from Python. Snowpark = DataFrame API, pushdown inside Snowflake. Kafka Connector = Kafka → Snowflake via Snowpipe. Spark Connector = Spark ↔ Snowflake. Snowsight = web UI, cannot PUT. Partner tools (dbt, Fivetran, Matillion) use JDBC or native connectors. Flashcard 5 covers Kafka specifically.

Resources

Official documentation and study links for Domain 3.

SnowPro Core Certification

Official exam guide, objectives, and registration for COF-C03.

Snowflake Certification Page →

Loading Data into Snowflake

Official documentation on stages, COPY INTO, and bulk loading best practices.

Snowflake Docs: Data Loading →

Snowpipe Documentation

Auto-ingest setup, REST API, monitoring, and error handling for continuous ingestion.

Snowflake Docs: Snowpipe →

Data Unloading Guide

COPY INTO for unloading, file options, single vs. parallel output.

Snowflake Docs: Unloading →

File Format Options

Complete reference for CSV, JSON, Parquet, Avro, ORC, XML format parameters.

CREATE FILE FORMAT Reference →

Kafka Connector for Snowflake

Setup guide for streaming Kafka topics into Snowflake tables via Snowpipe.

Kafka Connector Docs →

Snowpark Developer Guide

DataFrame API for Python, Java, and Scala with pushdown computation inside Snowflake.

Snowpark Docs →

FlashGenius Full Practice

Complete COF-C03 question bank with 500+ practice questions, timed exams, and analytics.

Start Free Practice →

Ready to Pass the SnowPro Core COF-C03?

Access 500+ practice questions, full-length timed exams, and all 5 domain study pages.

Start Free on FlashGenius →