FlashGenius Logo FlashGenius
NCA-AIIO · Page 1 of 5 · AI & ML Foundations

AI & ML Foundations

NCA-AIIO · Essential AI Knowledge · 38% of Exam

AI vs ML vs DL · Deep Learning · Training vs Inference · Transformers · LLMs · Use Cases

Study with Practice Tests →
AI Machine Learning Deep Learning Neural Networks Training Inference Transformer LLM Computer Vision NLP Generative AI Foundation Models
AI & ML Foundations is the core conceptual domain of the NCA-AIIO exam — covering 38% of all 50 questions. Mastering the AI/ML/DL hierarchy, understanding why AI has exploded in capability, and knowing the critical distinction between training and inference will underpin everything else in the exam.

What You'll Master

AI, ML & Deep Learning Hierarchy

Definitions, relationships, and differences between AI (broadest), ML (learns from data), and Deep Learning (many-layered neural networks). All DL is ML; all ML is AI — but not vice versa.

Why AI Has Accelerated

The three pillars: Data (internet-scale datasets), Compute (GPU acceleration, NVIDIA CUDA), and Algorithms (transformer architecture, 2017). All three converged simultaneously.

Types of Machine Learning

Supervised (labeled data), Unsupervised (finds patterns), Reinforcement (reward signals). Plus transfer learning and self-supervised pretraining (how LLMs are built).

Deep Learning Fundamentals

Artificial neurons, layers (input/hidden/output), activation functions (ReLU, Sigmoid, Softmax), backpropagation, gradient descent, and key hyperparameters (epochs, batch size, learning rate).

Key AI Use Cases

Computer vision (medical imaging, autonomous vehicles), NLP (translation, chatbots), generative AI (text/image/code generation), healthcare, financial services, and autonomous systems.

Training vs. Inference

Training = learning weights (expensive, backward pass). Inference = generating predictions (fast, fixed weights). Different hardware requirements: H100/B200 for training; T4/L4/A10G for inference.

Exam Weight

DomainCoverageExam Questions (est.)
Essential AI Knowledge (this page)38%~19 questions
AI Infrastructure~32%~16 questions
AI Operations & MLOps~30%~15 questions
Total exam: 50 questions, 60 minutes, passing ~70%

Concept 1 — AI, ML, and Deep Learning: The Hierarchy

Artificial Intelligence (AI)

Broadest category. Any technique enabling machines to mimic human intelligence — includes rule-based systems, expert systems, and machine learning. Not all AI learns from data.

Machine Learning (ML)

Subset of AI. Algorithms that learn from data without explicit programming. Three main types: supervised, unsupervised, reinforcement learning. Model improves with experience.

Deep Learning (DL)

Subset of ML. Uses artificial neural networks with many layers (deep). Excels at unstructured data — images, text, speech. Requires large datasets and significant GPU compute.

Generative AI

Subset of deep learning. Models that generate new content — text, images, code, audio. Powered by foundation models (LLMs, diffusion models). Examples: GPT, Llama, DALL-E.

Foundation Models

Large pre-trained models (billions of parameters) trained on massive datasets. Fine-tuned for specific tasks. Examples: GPT family, Llama 3, DALL-E, Stable Diffusion, NVIDIA Nemotron.

Why the Hierarchy Matters

All DL is ML, and all ML is AI — but not all AI is ML, and not all ML is deep learning. On the exam, distinguish carefully when a question specifies which level of the hierarchy applies.

Concept 2 — Types of Machine Learning

Supervised Learning

Training on labeled data (input–output pairs). Model learns the mapping function. Examples: image classification, fraud detection, price prediction. Algorithms: linear regression, decision trees, neural networks.

Unsupervised Learning

Training on unlabeled data. Model finds hidden patterns or structure. Examples: customer segmentation, anomaly detection, recommendation engines. Algorithms: k-means, autoencoders, GANs.

Reinforcement Learning (RL)

Agent learns by interacting with an environment. Receives rewards or penalties. Optimizes long-term cumulative reward. Examples: game playing (AlphaGo), robotics, autonomous vehicles.

Self-Supervised / Semi-Supervised

Self-supervised: uses unlabeled data with structure (e.g., predict next token — how LLMs are pre-trained). Semi-supervised: small labeled + large unlabeled set. Foundation of modern AI stack.

Transfer Learning

Take a pre-trained model and fine-tune on a smaller domain-specific dataset. Dramatically reduces data and compute needed. Foundation of the modern AI deployment stack.

Key Distinction (Exam Focus)

Supervised = you provide labels. Unsupervised = model finds structure. RL = model learns from reward signals. Know which to apply for a given scenario.

Concept 3 — Deep Learning: Neural Networks Fundamentals

Artificial Neuron

Mimics biological neuron. Takes weighted inputs, applies an activation function, produces output. Weight values are learned during training via backpropagation.

Layers

Input layer (data enters) → Hidden layers (learn representations) → Output layer (prediction/classification). Depth = number of hidden layers. More layers = deeper network.

Activation Functions

Introduce non-linearity. ReLU = max(0,x), most common. Sigmoid = output 0–1, used in binary classification. Softmax = multiclass probabilities. Without activation, network is just linear regression.

Backpropagation

Algorithm to compute gradients of loss w.r.t. weights. Flows the error signal from output back through layers using the chain rule of calculus. Core of deep learning training.

Gradient Descent

Optimization algorithm. Updates weights in the direction that reduces loss. Variants: SGD (stochastic), Mini-batch, Adam (most popular for deep learning — adaptive learning rates).

Key Hyperparameters

Epochs = full passes through dataset. Batch size = samples per gradient update. Learning rate = step size for weight updates. Tuning these is critical for training performance.

Concept 4 — Why AI Has Accelerated: The Three Pillars

Data (Pillar 1)

Internet-scale data availability — text, images, video, sensor data. Labeling at scale. Digitization of industries. Key insight: more data → better models. Data is the fuel.

Compute (Pillar 2)

GPU-accelerated computing enabled practical deep learning. NVIDIA's CUDA platform unlocked parallel processing. Training compute has grown 300,000× in 6 years. Tensor cores, NVLink, HBM memory all contribute.

Algorithms (Pillar 3)

Transformer architecture (2017, "Attention Is All You Need"). Attention mechanisms, residual connections (ResNets), normalization techniques. These innovations made training large models tractable.

The Convergence Effect

All three improved simultaneously and reinforce each other: more compute enables larger models; larger models benefit from more data; better algorithms make compute more efficient.

Open Source & Ecosystem

PyTorch, TensorFlow, Hugging Face, and NVIDIA CUDA ecosystem democratized access. Pre-trained models via NGC and Hugging Face Hub reduced the barrier to entry dramatically.

Industry Adoption

Cloud providers (AWS, GCP, Azure) made GPU access easy via on-demand instances. Enterprises adopted AI for competitive advantage. Government investment accelerated research. Virtuous cycle of investment.

Concept 5 — Key AI Use Cases and Industries

Computer Vision

Image classification, object detection, facial recognition, medical imaging (cancer detection, X-ray analysis), autonomous vehicle perception, quality control in manufacturing.

Natural Language Processing (NLP)

Sentiment analysis, machine translation, chatbots, document summarization, code generation, search and information retrieval. Powered by transformer-based LLMs.

Generative AI Applications

Text generation (LLMs), image generation (Stable Diffusion, DALL-E), code generation (GitHub Copilot), synthetic data generation, drug discovery (molecular design).

Healthcare

Medical imaging analysis, drug discovery acceleration, genomics, clinical trial optimization, predictive patient monitoring, personalized medicine. High-impact, regulated domain.

Financial Services

Fraud detection, algorithmic trading, risk assessment, customer service automation, credit scoring, AML (anti-money laundering). Real-time inference is critical.

Autonomous Systems

Self-driving vehicles (NVIDIA DRIVE), robotics, drone navigation, industrial automation. Requires real-time inference at the edge. Combines CV, NLP, RL, and sensor fusion.

Concept 6 — Training vs. Inference: Key Differences

AspectTrainingInference
DefinitionLearning model weights from dataRunning fixed model to generate predictions
FrequencyOnce or periodicallyBillions of times per day
ComputeVery high — forward + backward passLower per sample; scales with request volume
MemoryMax HBM bandwidth requiredCan quantize (FP8/INT8) to reduce footprint
LatencyNot latency-sensitiveOften real-time latency requirements
GPUsH100, B200 (HBM, NVLink, high BF16)T4, L4, A10G, Jetson (edge)
BatchingLarge batches for efficiencyBatch size limited by latency constraints
OptimizationHyperparameter tuning, regularizationQuantization, pruning, distillation, TensorRT

Concept 7 — Transformer Architecture and LLMs

Transformer (2017)

Architecture that replaced RNNs for NLP. Based on self-attention mechanism. Enables parallel processing (vs sequential in RNNs). Paper: "Attention Is All You Need" — Vaswani et al.

Self-Attention

Each token attends to all other tokens in the sequence simultaneously. Captures long-range dependencies. Scales efficiently. Core of transformer expressiveness and scalability.

Encoder vs. Decoder

Encoder (BERT) — understanding/classification. Decoder (GPT) — text generation. Encoder-Decoder (T5, BART) — translation/summarization. Know which architecture fits which task.

LLM Scale

Measured in parameters (billions). GPT-3 = 175B. Modern models = hundreds of billions to trillions. Larger models generally better but require more compute and memory.

Prompt Engineering

Crafting inputs to guide LLM output. Zero-shot (no examples), few-shot (some examples), chain-of-thought (reasoning steps). Key skill for deploying LLMs effectively.

Context Window

Maximum tokens the model can process at once. Limits document length. Modern models: 128K–1M+ tokens. Key constraint for enterprise RAG and document analysis use cases.

Concept 8 — Generative AI and Foundation Models

Foundation Model Workflow

Pre-train (internet-scale data, general purpose) → Fine-tune (domain-specific data, update weights) → Prompt / RAG (runtime customization). Dramatically cheaper than training from scratch.

Large Language Models (LLMs)

Text-based foundation models. Autoregressive generation (predict next token). Few-shot learners. Examples: Llama 3, Mistral, GPT-4, NVIDIA Nemotron. Run on NVIDIA H100/A100/L40S.

Diffusion Models

Generate images by learning to reverse a noising process. State of the art for image/video generation. Examples: Stable Diffusion, DALL-E 3, Sora. Require significant GPU compute for generation.

NVIDIA NIM (Inference Microservices)

Pre-packaged, optimized inference containers for foundation models. Drop-in API-compatible deployment. Runs on NVIDIA GPUs on-prem or cloud. Accelerates time-to-production for AI applications.

RAG (Retrieval-Augmented Generation)

Combine LLM with external knowledge retrieval (vector DB). At inference time, retrieve relevant chunks and inject into prompt. Reduces hallucinations, keeps knowledge current without retraining.

Fine-tuning vs. Prompting vs. LoRA

Fine-tuning = update model weights on domain data (better performance, higher cost). Prompting = craft input (zero cost, less control). LoRA/PEFT = parameter-efficient fine-tuning (update small adapters, not full model).

Six visual memory anchors for the highest-yield concepts on the NCA-AIIO AI & ML Foundations domain. Each hook gives you a mental shortcut you can recall under exam pressure.
🎯

AI ⊃ ML ⊃ DL — Nested Circles

Picture three nested circles: AI is the biggest (any machine intelligence), ML sits inside it (learns from data), Deep Learning is inside that (many-layered neural networks). Every DL is ML, every ML is AI — but the reverse is never true.

📐

3 Pillars of AI Growth — "DCA"

Data (internet scale) + Compute (GPUs) + Algorithms (Transformers) = the "DCA" explosion. All three converged after 2012. Each pillar reinforces the others — more compute enables larger models that need more data that reward better algorithms.

🏋️

Training vs Inference — Learn Once, Run Billions

Training = learning (expensive, backward pass, done once). Inference = predicting (fast per query, happens billions of times). Train on H100s with HBM and NVLink; infer on T4s/L4s with quantization. Same model, different hardware story.

🔑

ML Types — SUR

Supervised (labeled data, predict output), Unsupervised (unlabeled, find patterns), Reinforcement (reward signals, learn by doing). For any scenario question: identify whether labels exist and whether there are reward signals.

Transformer = Attention = Parallel

"Attention Is All You Need" (2017). Every modern LLM = transformer decoder. Every token attends to every other token simultaneously — not sequentially like RNNs. That parallelism is why it scaled to billions of parameters.

🏗️

Foundation Model Stack

Pre-train (massive data, general purpose) → Fine-tune (domain data, update weights) → Prompt/RAG (runtime customization, no weight update). NVIDIA NIM = optimized inference container that plugs into any step of this stack.

10 exam-style questions covering AI & ML Foundations. Select one answer per question, then click Submit Quiz to see your score and explanations.
Question 1 of 10
Which of the following correctly describes the relationship between AI, ML, and Deep Learning?
Question 2 of 10
A model is trained on customer purchase histories (without labels) to discover natural groupings of customers with similar buying patterns. Which type of machine learning is this?
Question 3 of 10
Which three factors are most responsible for the rapid acceleration of AI capabilities in recent years?
Question 4 of 10
An organization trains a large language model from scratch on proprietary documents. Later, when a user submits a query, the model generates a response. Which phase uses the most total computational resources?
Question 5 of 10
What is the primary advantage of the transformer architecture over previous recurrent neural networks (RNNs) for NLP tasks?
Question 6 of 10
A company wants to deploy a large language model for customer service but needs to incorporate its proprietary product knowledge base without retraining the model. Which approach is MOST appropriate?
Question 7 of 10
A self-driving car system continuously receives sensor data and makes steering/braking decisions, receiving reward signals when it drives safely and penalties when it makes errors. Which ML paradigm does this represent?
Question 8 of 10
What is the key difference between training a model and running inference?
Question 9 of 10
Which statement BEST describes a foundation model?
Question 10 of 10
An activation function is used in neural networks primarily to:

Quiz Complete!

0/10

Click any card to reveal the answer. Click again to flip back.

AI Hierarchy

AI vs ML vs DL — one-line definition for each

AI = machines mimicking intelligence.

ML = AI that learns from data.

DL = ML using deep neural networks.

DL ⊂ ML ⊂ AI. Every DL is ML; every ML is AI.
ML Types

Three types of machine learning

Supervised — labeled data, predict output (classification, regression)

Unsupervised — unlabeled, find patterns (clustering, anomaly detection)

Reinforcement — reward/penalty signals, learn by acting (robotics, games)
Transformers

What made transformers revolutionary?

Self-attention — every token attends to every other token simultaneously.

Enables parallel processing (vs sequential RNN). Scales to billions of parameters.

Foundation of all LLMs. Paper: "Attention Is All You Need" (2017).
Hardware

Training vs Inference — hardware implication

Training: max memory + FP16/BF16 throughput, backward pass → H100 / B200

Inference: latency/throughput + quantization (INT8/FP8) → T4, L4, A10G, or edge GPUs (Jetson)
Foundation Models

What is a Foundation Model?

A large model pre-trained on internet-scale data. General-purpose.

Adapted via fine-tuning (update weights on domain data) or prompting (craft input, no weight update).

Examples: Llama 3, GPT-4, NVIDIA Nemotron.
RAG

What is RAG?

Retrieval-Augmented Generation: at inference time, retrieve relevant chunks from a knowledge base (vector DB) and inject into the LLM prompt.

Reduces hallucinations. Keeps knowledge current without retraining. No weight updates needed.
Backprop

Backpropagation in one sentence

Compute the gradient of the loss with respect to each weight by applying the chain rule backward through the network, then update weights via gradient descent.
Optimization

Quantization — what and why?

Represent model weights/activations in lower precision: FP32 → FP16 → INT8 → FP8.

Reduces memory footprint, increases throughput, enables larger batch sizes.

Small accuracy trade-off. Key for inference optimization on T4/L4/edge GPUs.
Select your experience level or exam timing to get a personalized study recommendation for this domain.

Beginners

  • Watch 3Blue1Brown's "Neural Networks" series on YouTube — best visual introduction to how neural networks learn
  • Understand the AI/ML/DL hierarchy with concrete examples before moving to technical detail
  • Focus on why GPUs matter for training: parallel matrix multiplications vs sequential CPU processing
  • Learn the three ML types (SUR) with real-world examples: spam filter (supervised), customer grouping (unsupervised), game playing (reinforcement)
  • Explore NVIDIA's free "Getting Started with AI" resources on NVIDIA Academy

Official & Core Resources

Foundational Papers & Reading

Disclaimer

Not affiliated with NVIDIA. NVIDIA® is a registered trademark of NVIDIA Corporation. This page is an independent study resource. Official certification information: nvidia.com/en-us/learn/certification/ai-infrastructure-operations-associate/

Ready to Test Your Knowledge?

Unlock full practice exams, progress tracking, and all NCA-AIIO topics on FlashGenius.

Start Studying Free →