NCA-AIIO · AI & ML Foundations

AI Machine Learning Deep Learning Neural Networks Training Inference Transformer LLM Computer Vision NLP Generative AI Foundation Models

AI & ML Foundations is the core conceptual domain of the NCA-AIIO exam — covering 38% of all 50 questions. Mastering the AI/ML/DL hierarchy, understanding why AI has exploded in capability, and knowing the critical distinction between training and inference will underpin everything else in the exam.

What You'll Master

AI, ML & Deep Learning Hierarchy

Definitions, relationships, and differences between AI (broadest), ML (learns from data), and Deep Learning (many-layered neural networks). All DL is ML; all ML is AI — but not vice versa.

Why AI Has Accelerated

The three pillars: Data (internet-scale datasets), Compute (GPU acceleration, NVIDIA CUDA), and Algorithms (transformer architecture, 2017). All three converged simultaneously.

Types of Machine Learning

Supervised (labeled data), Unsupervised (finds patterns), Reinforcement (reward signals). Plus transfer learning and self-supervised pretraining (how LLMs are built).

Deep Learning Fundamentals

Artificial neurons, layers (input/hidden/output), activation functions (ReLU, Sigmoid, Softmax), backpropagation, gradient descent, and key hyperparameters (epochs, batch size, learning rate).

Key AI Use Cases

Computer vision (medical imaging, autonomous vehicles), NLP (translation, chatbots), generative AI (text/image/code generation), healthcare, financial services, and autonomous systems.

Training vs. Inference

Training = learning weights (expensive, backward pass). Inference = generating predictions (fast, fixed weights). Different hardware requirements: H100/B200 for training; T4/L4/A10G for inference.

Exam Weight

Domain	Coverage	Exam Questions (est.)
Essential AI Knowledge (this page)	38%	~19 questions
AI Infrastructure	~32%	~16 questions
AI Operations & MLOps	~30%	~15 questions
Total exam: 50 questions, 60 minutes, passing ~70%

Concept 1 — AI, ML, and Deep Learning: The Hierarchy

Artificial Intelligence (AI)

Broadest category. Any technique enabling machines to mimic human intelligence — includes rule-based systems, expert systems, and machine learning. Not all AI learns from data.

Machine Learning (ML)

Subset of AI. Algorithms that learn from data without explicit programming. Three main types: supervised, unsupervised, reinforcement learning. Model improves with experience.

Deep Learning (DL)

Subset of ML. Uses artificial neural networks with many layers (deep). Excels at unstructured data — images, text, speech. Requires large datasets and significant GPU compute.

Generative AI

Subset of deep learning. Models that generate new content — text, images, code, audio. Powered by foundation models (LLMs, diffusion models). Examples: GPT, Llama, DALL-E.

Foundation Models

Large pre-trained models (billions of parameters) trained on massive datasets. Fine-tuned for specific tasks. Examples: GPT family, Llama 3, DALL-E, Stable Diffusion, NVIDIA Nemotron.

Why the Hierarchy Matters

All DL is ML, and all ML is AI — but not all AI is ML, and not all ML is deep learning. On the exam, distinguish carefully when a question specifies which level of the hierarchy applies.

Concept 2 — Types of Machine Learning

Supervised Learning

Training on labeled data (input–output pairs). Model learns the mapping function. Examples: image classification, fraud detection, price prediction. Algorithms: linear regression, decision trees, neural networks.

Unsupervised Learning

Training on unlabeled data. Model finds hidden patterns or structure. Examples: customer segmentation, anomaly detection, recommendation engines. Algorithms: k-means, autoencoders, GANs.

Reinforcement Learning (RL)

Agent learns by interacting with an environment. Receives rewards or penalties. Optimizes long-term cumulative reward. Examples: game playing (AlphaGo), robotics, autonomous vehicles.

Self-Supervised / Semi-Supervised

Self-supervised: uses unlabeled data with structure (e.g., predict next token — how LLMs are pre-trained). Semi-supervised: small labeled + large unlabeled set. Foundation of modern AI stack.

Transfer Learning

Take a pre-trained model and fine-tune on a smaller domain-specific dataset. Dramatically reduces data and compute needed. Foundation of the modern AI deployment stack.

Key Distinction (Exam Focus)

Supervised = you provide labels. Unsupervised = model finds structure. RL = model learns from reward signals. Know which to apply for a given scenario.

Concept 3 — Deep Learning: Neural Networks Fundamentals

Artificial Neuron

Mimics biological neuron. Takes weighted inputs, applies an activation function, produces output. Weight values are learned during training via backpropagation.

Layers

Input layer (data enters) → Hidden layers (learn representations) → Output layer (prediction/classification). Depth = number of hidden layers. More layers = deeper network.

Activation Functions

Introduce non-linearity. ReLU = max(0,x), most common. Sigmoid = output 0–1, used in binary classification. Softmax = multiclass probabilities. Without activation, network is just linear regression.

Backpropagation

Algorithm to compute gradients of loss w.r.t. weights. Flows the error signal from output back through layers using the chain rule of calculus. Core of deep learning training.

Gradient Descent

Optimization algorithm. Updates weights in the direction that reduces loss. Variants: SGD (stochastic), Mini-batch, Adam (most popular for deep learning — adaptive learning rates).

Key Hyperparameters

Epochs = full passes through dataset. Batch size = samples per gradient update. Learning rate = step size for weight updates. Tuning these is critical for training performance.

Concept 4 — Why AI Has Accelerated: The Three Pillars

Data (Pillar 1)

Internet-scale data availability — text, images, video, sensor data. Labeling at scale. Digitization of industries. Key insight: more data → better models. Data is the fuel.

Compute (Pillar 2)

GPU-accelerated computing enabled practical deep learning. NVIDIA's CUDA platform unlocked parallel processing. Training compute has grown 300,000× in 6 years. Tensor cores, NVLink, HBM memory all contribute.

Algorithms (Pillar 3)

Transformer architecture (2017, "Attention Is All You Need"). Attention mechanisms, residual connections (ResNets), normalization techniques. These innovations made training large models tractable.

The Convergence Effect

All three improved simultaneously and reinforce each other: more compute enables larger models; larger models benefit from more data; better algorithms make compute more efficient.

Open Source & Ecosystem

PyTorch, TensorFlow, Hugging Face, and NVIDIA CUDA ecosystem democratized access. Pre-trained models via NGC and Hugging Face Hub reduced the barrier to entry dramatically.

Industry Adoption

Cloud providers (AWS, GCP, Azure) made GPU access easy via on-demand instances. Enterprises adopted AI for competitive advantage. Government investment accelerated research. Virtuous cycle of investment.

Concept 5 — Key AI Use Cases and Industries

Computer Vision

Image classification, object detection, facial recognition, medical imaging (cancer detection, X-ray analysis), autonomous vehicle perception, quality control in manufacturing.

Natural Language Processing (NLP)

Sentiment analysis, machine translation, chatbots, document summarization, code generation, search and information retrieval. Powered by transformer-based LLMs.

Generative AI Applications

Text generation (LLMs), image generation (Stable Diffusion, DALL-E), code generation (GitHub Copilot), synthetic data generation, drug discovery (molecular design).

Healthcare

Medical imaging analysis, drug discovery acceleration, genomics, clinical trial optimization, predictive patient monitoring, personalized medicine. High-impact, regulated domain.

Financial Services

Fraud detection, algorithmic trading, risk assessment, customer service automation, credit scoring, AML (anti-money laundering). Real-time inference is critical.

Autonomous Systems

Self-driving vehicles (NVIDIA DRIVE), robotics, drone navigation, industrial automation. Requires real-time inference at the edge. Combines CV, NLP, RL, and sensor fusion.

Concept 6 — Training vs. Inference: Key Differences

Aspect	Training	Inference
Definition	Learning model weights from data	Running fixed model to generate predictions
Frequency	Once or periodically	Billions of times per day
Compute	Very high — forward + backward pass	Lower per sample; scales with request volume
Memory	Max HBM bandwidth required	Can quantize (FP8/INT8) to reduce footprint
Latency	Not latency-sensitive	Often real-time latency requirements
GPUs	H100, B200 (HBM, NVLink, high BF16)	T4, L4, A10G, Jetson (edge)
Batching	Large batches for efficiency	Batch size limited by latency constraints
Optimization	Hyperparameter tuning, regularization	Quantization, pruning, distillation, TensorRT

Concept 7 — Transformer Architecture and LLMs

Transformer (2017)

Architecture that replaced RNNs for NLP. Based on self-attention mechanism. Enables parallel processing (vs sequential in RNNs). Paper: "Attention Is All You Need" — Vaswani et al.

Self-Attention

Each token attends to all other tokens in the sequence simultaneously. Captures long-range dependencies. Scales efficiently. Core of transformer expressiveness and scalability.

Encoder vs. Decoder

Encoder (BERT) — understanding/classification. Decoder (GPT) — text generation. Encoder-Decoder (T5, BART) — translation/summarization. Know which architecture fits which task.

LLM Scale

Measured in parameters (billions). GPT-3 = 175B. Modern models = hundreds of billions to trillions. Larger models generally better but require more compute and memory.

Prompt Engineering

Crafting inputs to guide LLM output. Zero-shot (no examples), few-shot (some examples), chain-of-thought (reasoning steps). Key skill for deploying LLMs effectively.

Context Window

Maximum tokens the model can process at once. Limits document length. Modern models: 128K–1M+ tokens. Key constraint for enterprise RAG and document analysis use cases.

Concept 8 — Generative AI and Foundation Models

Foundation Model Workflow

Pre-train (internet-scale data, general purpose) → Fine-tune (domain-specific data, update weights) → Prompt / RAG (runtime customization). Dramatically cheaper than training from scratch.

Large Language Models (LLMs)

Text-based foundation models. Autoregressive generation (predict next token). Few-shot learners. Examples: Llama 3, Mistral, GPT-4, NVIDIA Nemotron. Run on NVIDIA H100/A100/L40S.

Diffusion Models

Generate images by learning to reverse a noising process. State of the art for image/video generation. Examples: Stable Diffusion, DALL-E 3, Sora. Require significant GPU compute for generation.

NVIDIA NIM (Inference Microservices)

Pre-packaged, optimized inference containers for foundation models. Drop-in API-compatible deployment. Runs on NVIDIA GPUs on-prem or cloud. Accelerates time-to-production for AI applications.

RAG (Retrieval-Augmented Generation)

Combine LLM with external knowledge retrieval (vector DB). At inference time, retrieve relevant chunks and inject into prompt. Reduces hallucinations, keeps knowledge current without retraining.

Fine-tuning vs. Prompting vs. LoRA

Fine-tuning = update model weights on domain data (better performance, higher cost). Prompting = craft input (zero cost, less control). LoRA/PEFT = parameter-efficient fine-tuning (update small adapters, not full model).

Six visual memory anchors for the highest-yield concepts on the NCA-AIIO AI & ML Foundations domain. Each hook gives you a mental shortcut you can recall under exam pressure.

🎯

AI ⊃ ML ⊃ DL — Nested Circles

Picture three nested circles: AI is the biggest (any machine intelligence), ML sits inside it (learns from data), Deep Learning is inside that (many-layered neural networks). Every DL is ML, every ML is AI — but the reverse is never true.

📐

3 Pillars of AI Growth — "DCA"

Data (internet scale) + Compute (GPUs) + Algorithms (Transformers) = the "DCA" explosion. All three converged after 2012. Each pillar reinforces the others — more compute enables larger models that need more data that reward better algorithms.

🏋️

Training vs Inference — Learn Once, Run Billions

Training = learning (expensive, backward pass, done once). Inference = predicting (fast per query, happens billions of times). Train on H100s with HBM and NVLink; infer on T4s/L4s with quantization. Same model, different hardware story.

🔑

ML Types — SUR

Supervised (labeled data, predict output), Unsupervised (unlabeled, find patterns), Reinforcement (reward signals, learn by doing). For any scenario question: identify whether labels exist and whether there are reward signals.

⚡

Transformer = Attention = Parallel

"Attention Is All You Need" (2017). Every modern LLM = transformer decoder. Every token attends to every other token simultaneously — not sequentially like RNNs. That parallelism is why it scaled to billions of parameters.

🏗️

Foundation Model Stack

Pre-train (massive data, general purpose) → Fine-tune (domain data, update weights) → Prompt/RAG (runtime customization, no weight update). NVIDIA NIM = optimized inference container that plugs into any step of this stack.

Click any card to reveal the answer. Click again to flip back.

AI Hierarchy

AI vs ML vs DL — one-line definition for each

AI = machines mimicking intelligence.

ML = AI that learns from data.

DL = ML using deep neural networks.

DL ⊂ ML ⊂ AI. Every DL is ML; every ML is AI.

ML Types

Three types of machine learning

Supervised — labeled data, predict output (classification, regression)

Unsupervised — unlabeled, find patterns (clustering, anomaly detection)

Reinforcement — reward/penalty signals, learn by acting (robotics, games)

Transformers

What made transformers revolutionary?

Self-attention — every token attends to every other token simultaneously.

Enables parallel processing (vs sequential RNN). Scales to billions of parameters.

Foundation of all LLMs. Paper: "Attention Is All You Need" (2017).

Hardware

Training vs Inference — hardware implication

Training: max memory + FP16/BF16 throughput, backward pass → H100 / B200

Inference: latency/throughput + quantization (INT8/FP8) → T4, L4, A10G, or edge GPUs (Jetson)

Foundation Models

What is a Foundation Model?

A large model pre-trained on internet-scale data. General-purpose.

Adapted via fine-tuning (update weights on domain data) or prompting (craft input, no weight update).

Examples: Llama 3, GPT-4, NVIDIA Nemotron.

RAG

What is RAG?

Retrieval-Augmented Generation: at inference time, retrieve relevant chunks from a knowledge base (vector DB) and inject into the LLM prompt.

Reduces hallucinations. Keeps knowledge current without retraining. No weight updates needed.

Backprop

Backpropagation in one sentence

Compute the gradient of the loss with respect to each weight by applying the chain rule backward through the network, then update weights via gradient descent.

Optimization

Quantization — what and why?

Represent model weights/activations in lower precision: FP32 → FP16 → INT8 → FP8.

Reduces memory footprint, increases throughput, enables larger batch sizes.

Small accuracy trade-off. Key for inference optimization on T4/L4/edge GPUs.

Select your experience level or exam timing to get a personalized study recommendation for this domain.

Beginners

Watch 3Blue1Brown's "Neural Networks" series on YouTube — best visual introduction to how neural networks learn
Understand the AI/ML/DL hierarchy with concrete examples before moving to technical detail
Focus on why GPUs matter for training: parallel matrix multiplications vs sequential CPU processing
Learn the three ML types (SUR) with real-world examples: spam filter (supervised), customer grouping (unsupervised), game playing (reinforcement)
Explore NVIDIA's free "Getting Started with AI" resources on NVIDIA Academy

Official & Core Resources

NCA-AIIO Official Certification Page NVIDIA's official exam page — blueprint, objectives, registration, and study resources
NVIDIA NIM (Inference Microservices) Official product page for NIM — pre-optimized inference containers for foundation models
NVIDIA NGC Model Catalog Browse pre-trained models, containers, and SDKs optimized for NVIDIA GPUs
NVIDIA AI Infrastructure & Operations Fundamentals (NVIDIA Academy) Official self-paced course that directly maps to the NCA-AIIO exam objectives

Foundational Papers & Reading

"Attention Is All You Need" — Vaswani et al., 2017 The original transformer paper. Understanding the abstract and architecture diagram is sufficient for the exam. Available on arXiv.
3Blue1Brown — Neural Networks Series (YouTube) Best visual explanation of neural networks, backpropagation, and gradient descent. Free, beginner-friendly, exam-relevant.
fast.ai — Practical Deep Learning for Coders Free course covering DL fundamentals from a practical perspective. Excellent for reinforcing conceptual understanding.

Disclaimer

Not affiliated with NVIDIA. NVIDIA® is a registered trademark of NVIDIA Corporation. This page is an independent study resource. Official certification information: nvidia.com/en-us/learn/certification/ai-infrastructure-operations-associate/

What You'll Master

AI, ML & Deep Learning Hierarchy

Why AI Has Accelerated

Types of Machine Learning

Deep Learning Fundamentals

Key AI Use Cases

Training vs. Inference

Exam Weight

Concept 1 — AI, ML, and Deep Learning: The Hierarchy

Artificial Intelligence (AI)

Machine Learning (ML)

Deep Learning (DL)

Generative AI

Foundation Models

Why the Hierarchy Matters

Concept 2 — Types of Machine Learning

Supervised Learning

Unsupervised Learning

Reinforcement Learning (RL)

Self-Supervised / Semi-Supervised

Transfer Learning

Key Distinction (Exam Focus)

Concept 3 — Deep Learning: Neural Networks Fundamentals

Artificial Neuron

Layers

Activation Functions

Backpropagation

Gradient Descent

Key Hyperparameters

Concept 4 — Why AI Has Accelerated: The Three Pillars

Data (Pillar 1)

Compute (Pillar 2)

Algorithms (Pillar 3)

The Convergence Effect

Open Source & Ecosystem

Industry Adoption

Concept 5 — Key AI Use Cases and Industries

Computer Vision

Natural Language Processing (NLP)

Generative AI Applications

Healthcare

Financial Services

Autonomous Systems

Concept 6 — Training vs. Inference: Key Differences

Concept 7 — Transformer Architecture and LLMs

Transformer (2017)

Self-Attention

Encoder vs. Decoder

LLM Scale

Prompt Engineering

Context Window

Concept 8 — Generative AI and Foundation Models

Foundation Model Workflow

Large Language Models (LLMs)

Diffusion Models

NVIDIA NIM (Inference Microservices)

RAG (Retrieval-Augmented Generation)

Fine-tuning vs. Prompting vs. LoRA

AI ⊃ ML ⊃ DL — Nested Circles

3 Pillars of AI Growth — "DCA"

Training vs Inference — Learn Once, Run Billions

ML Types — SUR

Transformer = Attention = Parallel

Foundation Model Stack

Quiz Complete!

Beginners

Official & Core Resources

Foundational Papers & Reading

Disclaimer

Ready to Test Your Knowledge?