SnowPro Core COF-C03 · Data Loading, Unloading & Connectivity

Domain 3 — Data Loading, Unloading & Connectivity

18% of the COF-C03 exam — approximately 18 questions. Covers stages, file formats, bulk loading, continuous ingestion, unloading, and client connectivity.

What You Must Know for This Domain

Loading Pathway

Identify correct stage type (@~, @%table, @stage)
Distinguish COPY INTO (batch) vs Snowpipe (continuous/serverless)
Know all ON_ERROR options and defaults
Know load history window (64 days) and FORCE param
Know VALIDATION_MODE purpose (validate without loading)

File Formats & Connectivity

Supported formats: CSV, JSON, Avro, ORC, Parquet, XML
Semi-structured data → VARIANT + STRIP_OUTER_ARRAY
PUT/GET require SnowSQL (not web UI)
Know when to use Kafka connector vs Spark connector vs Python connector
Optimal file size: 100–250 MB compressed

Domain Weight Breakdown

Topic Area	Importance	Key Commands
Stages (internal & external)	High	`PUT`, `GET`, `LIST`, `REMOVE`, `CREATE STAGE`
File Formats	Medium	`CREATE FILE FORMAT`, inline in COPY
COPY INTO (bulk load)	High	`COPY INTO table FROM @stage`
Snowpipe (continuous)	High	`CREATE PIPE`, auto-ingest, REST API
Data Unloading	Medium	`COPY INTO @stage FROM table`
Connectivity (JDBC/ODBC/Connectors)	Medium	SnowSQL, Kafka, Spark, Python, Snowpark

Quick-Reference: Stage Syntax

Stage Type	Syntax	Shareable?	Best For
User Stage	`@~`	No	Personal / one-off loads
Table Stage	`@%tablename`	No	Single-table loading
Named Internal Stage	`@stage_name`	Yes	Reusable, shared pipelines
External Stage (S3)	`@ext_stage` (s3://…)	Yes	Cloud storage integration

COPY INTO vs Snowpipe

Attribute	COPY INTO	Snowpipe
Compute	Requires virtual warehouse	Serverless (Snowflake-managed)
Trigger	Manual SQL execution	Event notification or REST API
Latency	Batch (scheduled)	Near-real-time (seconds–minutes)
Use case	Scheduled ETL, large batches	Continuous / streaming ingest
Billing	VW credit consumption	Per-credit (serverless)

Core Concepts

Detailed explanations of every topic tested in Domain 3.

📦 Stages — Where Data Lives Before/After Loading

Internal Stages

Internal stages store data inside Snowflake. Three types exist:

User Stage (@~) — per-user, auto-created, not shareable. Use for personal or one-off loading.
Table Stage (@%tablename) — per-table, auto-created, not shareable. Best for loading a single table.
Named Internal Stage (@stage_name) — explicitly created, shareable, reusable. Most flexible option.

External Stages

External stages reference cloud object storage. Snowflake reads/writes data there but does not own the storage.

S3: URL='s3://bucket/path/' with AWS credentials or storage integration
Azure Blob Storage: azure:// URL
GCS: gcs:// URL

CREATE STAGE my_s3_stage
  URL = 's3://mybucket/data/'
  STORAGE_INTEGRATION = my_s3_int
  FILE_FORMAT = (FORMAT_NAME = my_csv);

Authentication options: Storage Integration (preferred — no credentials in stage) or credentials directly (CREDENTIALS=(AWS_KEY_ID=... AWS_SECRET_KEY=...)).

Stage Commands

PUT file://localpath @stage — upload local file to internal stage (SnowSQL/client only)
GET @stage file://localpath — download from internal stage to local (SnowSQL only)
LIST @stage — list files in stage
REMOVE @stage/file — delete file from stage

Important: PUT and GET cannot be executed from the Snowsight web UI — only via SnowSQL or compatible clients.

📋 File Formats

Supported formats: CSV (most common), JSON, Avro, ORC, Parquet, XML

CREATE FILE FORMAT my_csv
  TYPE = CSV
  FIELD_DELIMITER = ','
  SKIP_HEADER = 1
  NULL_IF = ('NULL', 'null')
  EMPTY_FIELD_AS_NULL = TRUE
  COMPRESSION = AUTO;

Can be defined as a named object or inline in the COPY command
Parquet / Avro / ORC: typically loaded into VARIANT columns for semi-structured data
STRIP_OUTER_ARRAY = TRUE — for JSON arrays: loads each array element as a separate row
COMPRESSION = AUTO — Snowflake detects compression (gzip, bz2, etc.)

📥 COPY INTO — Bulk Loading

COPY INTO my_table
FROM @my_stage/data/
FILE_FORMAT = (FORMAT_NAME = 'my_csv')
ON_ERROR = CONTINUE
PURGE = TRUE
FORCE = FALSE;

Key Parameters

ON_ERROR — behavior when errors are found:
- ABORT_STATEMENT (default) — rolls back the entire load
- CONTINUE — skip bad rows, load the rest
- SKIP_FILE — skip entire file containing errors
- SKIP_FILE_<n> — skip file if it has more than n errors
PURGE = TRUE — delete stage files after successful load
FORCE = FALSE (default) — skip files already loaded (uses load history). Set FORCE = TRUE to reload.
VALIDATION_MODE = RETURN_ERRORS — validate without actually loading data

Load History

Snowflake tracks loaded files per table for 64 days. Files already successfully loaded are skipped by default. FORCE = TRUE overrides this. Requires a running virtual warehouse to execute.

🔄 Snowpipe — Continuous/Automated Loading

CREATE PIPE my_pipe
  AUTO_INGEST = TRUE
AS
  COPY INTO my_table FROM @my_stage;

Key Characteristics

Serverless — no virtual warehouse needed; uses Snowflake-managed compute, billed per credit
Near-real-time — seconds to minutes latency (NOT exactly real-time streaming)
Triggers on new files arriving in the stage

Two Trigger Mechanisms

Auto-ingest (recommended) — cloud event notifications (S3 SQS, Azure Event Grid, GCS Pub/Sub) trigger the pipe automatically
REST API — programmatic call to the insertFiles endpoint

Monitoring

SYSTEM$PIPE_STATUS('my_pipe') — check pipe status
INFORMATION_SCHEMA.COPY_HISTORY — review load and error history
Error files go to the error queue for review

📤 Data Unloading

COPY INTO @my_stage/output/
FROM my_table
FILE_FORMAT = (TYPE = CSV)
HEADER = TRUE
MAX_FILE_SIZE = 16777216
SINGLE = FALSE
OVERWRITE = TRUE;

Unloads data from a Snowflake table to an internal or external stage
SINGLE = FALSE (default) — multiple files in parallel (faster)
SINGLE = TRUE — single output file (slower, use for small exports)
MAX_FILE_SIZE — 16 MB default per file
HEADER = TRUE — include column headers (CSV)
File naming pattern: data_0_0_0.csv.gz

🔌 Connectivity Options

SnowSQL (CLI)

Command-line client for executing SQL, PUT/GET, and scripting. The only way to run PUT/GET commands interactively.

JDBC / ODBC Drivers

Standard database connectivity for BI tools (Tableau, Power BI), ETL tools, and custom applications.

Python Connector

snowflake-connector-python — supports standard query execution, parameter binding, and async queries.

Snowpark

DataFrame API for Python, Java, and Scala. Runs code inside Snowflake (pushdown to engine). Used for ETL and transformations without moving data out of Snowflake.

Kafka Connector

Snowflake Connector for Apache Kafka — streams data from Kafka topics into Snowflake tables via Snowpipe. Ideal for real-time/streaming data pipelines.

Spark Connector

Snowflake Connector for Apache Spark — enables reading from and writing to Snowflake from Spark workloads.

Partner / Third-Party Tools

Tool	Category	Connection Method
dbt	Transformation	JDBC / Python connector
Fivetran / Airbyte	ELT / Ingestion	Native Snowflake connector
Informatica / Talend	ETL	JDBC / ODBC
Matillion	ELT	Native Snowflake connector
Snowsight (Web UI)	Browser UI	Native — cannot run PUT/GET

Data Loading Best Practices

File size: 100–250 MB compressed per file — optimal for parallel loading
Split large files into multiple 100–250 MB chunks before staging
Dedicated loading warehouse: use a separate VW to avoid impacting query workloads
Stage cleanup: use PURGE = TRUE or manually REMOVE @stage after loading
Semi-structured: load JSON into VARIANT first, then query and transform with FLATTEN

Memory Hooks

Six memory anchors — study these until they're automatic.

🎯

Stage Syntax: @~ = user, @%table = table stage, @name = named stage (most flexible)

⚡

Load Method: Snowpipe = serverless + event-driven. COPY INTO = batch + needs a VW.

📅

Load History: Load history = 64 days. FORCE=TRUE overrides it. Default FORCE=FALSE skips already-loaded files.

📏

File Size: Optimal file size: 100–250 MB compressed. Too small = many files overhead. Too big = slow parallel.

💻

PUT/GET: PUT needs SnowSQL (CLI). Web UI cannot PUT. GET also needs SnowSQL.

⚠️

ON_ERROR: ABORT (rollback all), CONTINUE (skip rows), SKIP_FILE (skip whole file).

Practice Quiz

10 exam-style questions covering Domain 3. Select your answer, then click Submit Quiz to see results.

1. What is the correct syntax to reference a user stage in Snowflake?

@user_stage

@%username

@~

@$user

2. Which compute model does Snowpipe use for data loading?

A dedicated virtual warehouse

Serverless (Snowflake-managed compute)

Customer-managed EC2 instances

Multi-cluster warehouse

3. A COPY INTO command encounters 3 bad rows in a file. You want the load to continue and skip only those rows while loading the rest. Which ON_ERROR setting should you use?

ABORT_STATEMENT

SKIP_FILE

IGNORE_ERRORS

CONTINUE

4. What is the recommended compressed file size range for optimal parallel loading into Snowflake?

1–10 MB

10–50 MB

100–250 MB

1–5 GB

5. A data engineer wants to upload a local CSV file to a named internal stage. Which client/tool is required for this operation?

Snowsight (web UI)

SnowSQL (CLI)

Tableau

dbt

6. Which of the following are valid Snowpipe trigger mechanisms? (Choose the option that lists BOTH correct triggers)

CRON schedule and Lambda function

Manual SQL and Snowpark push

Virtual warehouse trigger and task

Cloud event notification (auto-ingest) and REST API (insertFiles)

7. How many days does Snowflake track load history per table to prevent duplicate file loading?

7 days

30 days

64 days

90 days

8. A COPY INTO command is not reloading files that were already loaded successfully. Which parameter forces Snowflake to reload them regardless of load history?

OVERRIDE = TRUE

FORCE = TRUE

RELOAD = TRUE

SKIP_HISTORY = TRUE

9. Which authentication method is preferred when creating an external stage pointing to S3, to avoid storing credentials in the stage definition?

AWS access keys stored in stage CREDENTIALS clause

IAM role embedded in stage URL

Environment variables set on the warehouse

Storage Integration object referenced in the stage definition

10. What is the purpose of VALIDATION_MODE = RETURN_ERRORS in a COPY INTO statement?

Loads data and logs errors to a separate error table

Skips all error rows and continues loading

Validates the files and returns error details without loading any data

Aborts the load on first error and returns a report

Flashcards

Click any card to flip it. Master all 8 cards before exam day.

Stages

What are the 3 internal stage types in Snowflake and their syntax?

Click to reveal →

User Stage: @~ — per-user, auto-created, not shareable. One-off use.

Table Stage: @%tablename — per-table, auto-created, not shareable. Single-table loading.

Named Internal: @stage_name — explicitly created, shareable, most flexible.

Loading Methods

How does Snowpipe differ from COPY INTO for data loading?

Click to reveal →

COPY INTO: Batch, requires virtual warehouse, manual SQL trigger, large scheduled loads.

Snowpipe: Continuous, serverless (Snowflake-managed compute), triggered by event or REST API, near-real-time (seconds–minutes latency).

COPY INTO

What are the ON_ERROR options in COPY INTO and their behaviors?

Click to reveal →

ABORT_STATEMENT (default) — rolls back the entire load on any error.

CONTINUE — skip bad rows, load the rest of the data.

SKIP_FILE — skip the entire file that contains errors.

SKIP_FILE_<n> — skip file if it has more than n errors.

File Formats

How do you load semi-structured JSON data (an array) so each element becomes a separate row?

Click to reveal →

Use VARIANT column type to load semi-structured data.

Add STRIP_OUTER_ARRAY = TRUE in the file format or COPY command — this removes the outer array brackets and loads each element as a separate row.

After loading, use FLATTEN or dot-notation to query VARIANT fields.

Connectivity

What is the Snowflake Kafka Connector used for?

Click to reveal →

The Snowflake Connector for Apache Kafka streams data from Kafka topics directly into Snowflake tables using Snowpipe under the hood.

Use it for real-time/streaming data pipelines where events are published to Kafka and need to land in Snowflake with low latency.

Best Practices

What is the recommended file size for loading data into Snowflake, and why?

Click to reveal →

100–250 MB compressed per file is optimal.

Too small: excessive file overhead, many small S3/GCS calls, poor parallel utilization.

Too large: cannot parallelize effectively across warehouse nodes.

Split large files into multiple 100–250 MB chunks before staging.

COPY INTO

What does VALIDATION_MODE = RETURN_ERRORS do in a COPY INTO statement?

Click to reveal →

Validates files without loading any data.

Snowflake scans the files and returns error details (row numbers, error messages) as a result set — no rows are inserted into the target table.

Use it to test a file format or check data quality before committing to a real load.

Snowpipe

What is the difference between Snowpipe auto-ingest and REST API triggering?

Click to reveal →

Auto-ingest (recommended): Cloud event notifications (S3 SQS, Azure Event Grid, GCS Pub/Sub) automatically trigger the pipe when new files arrive. Fully hands-off.

REST API: Programmatic call to the insertFiles endpoint triggers loading. Requires application code to call the API when files are ready.

Study Advisor

Select a topic area, answer the self-assessment question, and get targeted study guidance.

How confident are you with Snowflake stage types and their syntax?

Start with the three-way split: @~ is your personal user stage (auto-created, not shared). @%tablename is the table stage (auto-created, per table). @stage_name is a named stage you create explicitly — and it's the only one that's shareable and reusable across pipelines. Memorize the memory hook: "~ = user, % = table, name = named." Study the Stages concept card and Flashcard 1 before attempting the quiz.

Quick clarification: Neither the user stage (@~) nor the table stage (@%table) are shareable — they're personal/private by design. Only named stages (created with CREATE STAGE) are shareable and reusable. External stages (S3, Azure, GCS) are always named stages. Review the Stage Syntax table in the Overview tab.

External stage auth: The preferred method is a Storage Integration object — it uses an IAM role trust policy so no credentials appear in the stage definition. Credentials can be specified directly with CREDENTIALS=(AWS_KEY_ID=...) but this is less secure. Study the external stage CREATE STAGE example in the Concepts tab. Know: Storage Integration = preferred, Credentials = direct but not recommended.

You're in great shape on stages. Key review points: @~ (user, not shareable), @%table (table, not shareable), @stage_name (named, shareable). PUT/GET only via SnowSQL. External stages use storage integration (preferred) or credentials. Spend 5 minutes on the flashcard and move to COPY INTO.

How well do you understand Snowflake file format options and semi-structured loading?

Supported formats: CSV (most common), JSON, Avro, ORC, Parquet, XML. Create them with CREATE FILE FORMAT or define inline in the COPY command. Key CSV options to know: FIELD_DELIMITER, SKIP_HEADER, NULL_IF, EMPTY_FIELD_AS_NULL, COMPRESSION. Review the File Formats concept card and the CREATE FILE FORMAT SQL example.

Semi-structured formats (Parquet, Avro, ORC, JSON): These are loaded into VARIANT columns. VARIANT is Snowflake's semi-structured data type that can hold JSON-like hierarchical data. After loading, use dot-notation (col:field::type) or FLATTEN to query. Don't try to load Parquet into a regular VARCHAR column — use VARIANT.

STRIP_OUTER_ARRAY = TRUE: When your JSON file is a top-level array (e.g., [{...},{...}]), this setting removes the outer brackets and loads each object as a separate row. Without it, the entire array would load as one VARIANT row. This is a common exam trick — know when to use it.

File format recap: 6 supported formats (CSV, JSON, Avro, ORC, Parquet, XML). Semi-structured → VARIANT. STRIP_OUTER_ARRAY = TRUE for JSON arrays. COMPRESSION = AUTO detects gzip/bz2. Named file formats are reusable objects; inline definitions are one-time. You're ready for this topic — focus your remaining time on COPY INTO and Snowpipe.

How well do you understand the COPY INTO command and its parameters?

COPY INTO basics: COPY INTO loads files from a stage into a table. Direction: FROM stage → INTO table. Requires a running virtual warehouse. You specify the stage path, file format (named or inline), and error handling. Study the full COPY INTO example in the Concepts tab, then practice the quiz questions on this topic.

ON_ERROR options (memorize all four):
• ABORT_STATEMENT (default) = roll back everything on any error
• CONTINUE = skip bad rows, load the rest
• SKIP_FILE = skip the entire file with any error
• SKIP_FILE_<n> = skip file only if it has more than n errors
Hook: "ABORT all, CONTINUE rows, SKIP_FILE whole file." Review Memory Hook 6.

FORCE vs PURGE:
• FORCE = TRUE: reloads files even if they're in the 64-day load history (overrides duplicate protection). Default = FALSE.
• PURGE = TRUE: deletes files from the stage after a successful load (cleanup). Default = FALSE.
These are independent parameters — you can use both together. VALIDATION_MODE = RETURN_ERRORS: validates without loading any data.

COPY INTO recap: Batch load from stage to table. ON_ERROR: ABORT (default), CONTINUE, SKIP_FILE. FORCE overrides 64-day load history. PURGE cleans stage after load. VALIDATION_MODE validates without loading. Requires virtual warehouse. You're solid — reinforce with Flashcard 3 (ON_ERROR) and quiz questions 3, 7, 8, 10.

How confident are you with Snowpipe continuous ingestion?

Snowpipe fundamentals: Snowpipe is Snowflake's continuous/automated data ingestion service. Key differentiators from COPY INTO: (1) Serverless — no virtual warehouse needed, billed per credit used; (2) Triggered automatically by new files; (3) Near-real-time (NOT exactly real-time). CREATE PIPE syntax: CREATE PIPE my_pipe AUTO_INGEST = TRUE AS COPY INTO my_table FROM @my_stage. Review Flashcard 2 and the Snowpipe concept card.

Trigger mechanisms:
• Auto-ingest (preferred): Set AUTO_INGEST = TRUE in pipe definition. Configure cloud event notifications (S3 → SQS, Azure → Event Grid, GCS → Pub/Sub). Files automatically trigger pipe when they arrive in stage. Zero application code needed.
• REST API: Call the insertFiles endpoint programmatically. Your app must detect new files and make the API call. More control, more code.

Snowpipe monitoring:
• SYSTEM$PIPE_STATUS('pipe_name') — returns pipe status (running, paused, etc.) and backlog info
• INFORMATION_SCHEMA.COPY_HISTORY — query to review load history and errors
• Error files go to an error queue for review
Know these for the exam — they're easy points if you memorize the function names.

Snowpipe recap: Serverless, event-driven, near-real-time (not exactly real-time). CREATE PIPE AS COPY INTO. Two triggers: auto-ingest (cloud events) and REST API. Monitor with SYSTEM$PIPE_STATUS. No VW needed — billed per serverless credit. Flashcard 2 and 8 cover the key distinctions. Quiz question 2 and 6 target this area.

How well do you know Snowflake's connectivity options and client tools?

Connector overview:
• SnowSQL: CLI for SQL + PUT/GET. Required for PUT command.
• JDBC/ODBC: standard DB connectivity for BI tools (Tableau, Power BI), ETL tools
• Python Connector: snowflake-connector-python for programmatic queries
• Snowpark: DataFrame API (Python/Java/Scala) running inside Snowflake
• Kafka Connector: streaming from Kafka topics via Snowpipe
• Spark Connector: read/write Snowflake from Spark
Review the Connectivity concept card for full details.

Kafka and Spark connectors:
• Kafka Connector: For streaming pipelines. Kafka topic → Snowflake table. Uses Snowpipe internally. Use when events are published to Kafka and need near-real-time landing in Snowflake.
• Spark Connector: For batch or Spark-based data engineering. Read/write Snowflake from Apache Spark jobs. Use when your pipeline already runs on Spark/EMR/Databricks.

Snowpark vs Python Connector:
• Python Connector: Sends SQL queries to Snowflake from your local Python environment. Data can move back to your app. Standard client-side execution.
• Snowpark: DataFrame API that PUSHES computation INTO Snowflake. Code runs inside Snowflake's engine (no data movement). Use for ETL/transformations at scale — it's like running PySpark but inside Snowflake. Supports Python, Java, and Scala.

Connectivity recap: SnowSQL = CLI, required for PUT/GET. JDBC/ODBC = BI tools/ETL. Python Connector = programmatic SQL from Python. Snowpark = DataFrame API, pushdown inside Snowflake. Kafka Connector = Kafka → Snowflake via Snowpipe. Spark Connector = Spark ↔ Snowflake. Snowsight = web UI, cannot PUT. Partner tools (dbt, Fivetran, Matillion) use JDBC or native connectors. Flashcard 5 covers Kafka specifically.

Resources

Official documentation and study links for Domain 3.

SnowPro Core Certification

Official exam guide, objectives, and registration for COF-C03.

Snowflake Certification Page →

Loading Data into Snowflake

Official documentation on stages, COPY INTO, and bulk loading best practices.

Snowflake Docs: Data Loading →

Snowpipe Documentation

Auto-ingest setup, REST API, monitoring, and error handling for continuous ingestion.

Snowflake Docs: Snowpipe →

Data Unloading Guide

COPY INTO for unloading, file options, single vs. parallel output.

Snowflake Docs: Unloading →

File Format Options

Complete reference for CSV, JSON, Parquet, Avro, ORC, XML format parameters.

CREATE FILE FORMAT Reference →

Kafka Connector for Snowflake

Setup guide for streaming Kafka topics into Snowflake tables via Snowpipe.

Kafka Connector Docs →

Snowpark Developer Guide

DataFrame API for Python, Java, and Scala with pushdown computation inside Snowflake.

Snowpark Docs →

FlashGenius Full Practice

Complete COF-C03 question bank with 500+ practice questions, timed exams, and analytics.

Start Free Practice →

Data Loading, Unloading & Connectivity

Domain 3 — Data Loading, Unloading & Connectivity

What You Must Know for This Domain

Loading Pathway

File Formats & Connectivity

Domain Weight Breakdown

Quick-Reference: Stage Syntax

COPY INTO vs Snowpipe

Core Concepts

📦 Stages — Where Data Lives Before/After Loading

Internal Stages

External Stages

Stage Commands

📋 File Formats

📥 COPY INTO — Bulk Loading

Key Parameters

Load History

🔄 Snowpipe — Continuous/Automated Loading

Key Characteristics

Two Trigger Mechanisms

Monitoring

📤 Data Unloading

🔌 Connectivity Options

SnowSQL (CLI)

JDBC / ODBC Drivers

Python Connector

Snowpark

Kafka Connector

Spark Connector

Partner / Third-Party Tools

Data Loading Best Practices

Memory Hooks

Practice Quiz

Flashcards

Study Advisor

Resources

SnowPro Core Certification

Loading Data into Snowflake

Snowpipe Documentation

Data Unloading Guide

File Format Options

Kafka Connector for Snowflake

Snowpark Developer Guide

FlashGenius Full Practice

Ready to Pass the SnowPro Core COF-C03?