NVIDIA NCP-AAI Exam Prep · Topic 1 of 5

Agent Architecture & Cognition

Master the design patterns behind production LLM agents — ReAct loops, reasoning and planning strategies, memory systems, tool calling, and multi-agent coordination for the NCP-AAI exam.

Start Free Practice →
Four Pillars of Agent Architecture & Cognition
Agent Architecture & Cognition covers 25% of the NCP-AAI exam (Architecture 15% + Cognition/Planning 10%). These four pillars cover every concept in those two combined domains.
Pillar 1 · Agent Foundations

Types, Loops & Tool Calling

AI agents perceive inputs, reason about them, and take actions toward a goal — unlike static LLMs that respond only once. The ReAct loop (Thought→Action→Observation) is the dominant production pattern. Tool calling lets LLMs invoke typed functions via structured JSON output.

3
ReAct loop steps
5
Agent type classes
JSON
Tool call format
Pillar 2 · Reasoning & Planning

CoT, ToT, MCTS & Task Decomposition

Chain-of-Thought generates a single linear reasoning path. Tree-of-Thought explores multiple branches and backtracks. MCTS adds probabilistic rollouts for deeper planning. Task decomposition breaks complex goals into subtask graphs that agents can execute in parallel.

CoT
Simplest method
ToT
Best for exploration
DAG
Task plan structure
Pillar 3 · Memory Systems

In-Context, Episodic, Semantic & Procedural

Agents need more than a prompt window. Episodic memory stores past interaction histories. Semantic memory retrieves long-term facts from vector databases. Procedural memory encodes skills as callable tools. In-context memory is the fast but limited working memory of the LLM itself.

4
Memory types
VDB
Semantic store
128K+
Max in-context
Pillar 4 · Multi-Agent Coordination

Orchestrator, Peer, Debate & Reflection

Complex tasks benefit from multiple specialized agents. An orchestrator decomposes tasks and assigns them to worker agents. Peer networks communicate directly. Debate patterns pit agents against each other to surface better answers. Reflection agents critique and revise outputs.

4
Coord. patterns
1
Orchestrator per team
N
Parallel workers

The ReAct Loop — Foundation of Production LLM Agents

💭
Thought
Reason about
what to do next

Action
Invoke a tool
or API call
👁️
Observation
Process the
tool result
💭
Thought
Reason again
or conclude

Answer
Final response
to the user

Loop repeats until the agent reaches a stopping condition — a final answer or a maximum step limit. ReAct = Reason + Act.

Why 25% of the exam? Agent Architecture & Cognition is the bedrock of the NCP-AAI exam. Every other domain — Development, Deployment, Monitoring — builds on top of understanding how agents reason, remember, plan, and coordinate. Mastering these four pillars unlocks the rest of the certification.
How Agent Architecture Works
From agent types through reasoning strategies to multi-agent coordination patterns.

Agent Type Taxonomy

Five Agent Classes — From Simple Reflexes to Continuous Learning

Reflex
Simple Reflex Agent
Condition-action rules only. No memory, no planning. Maps percept directly to action.
e.g. Rule-based chatbot
Model-Based
Model-Based Reflex Agent
Maintains internal state of the world. Handles partially observable environments.
e.g. Dialogue state tracker
Goal-Based
Goal-Based Agent
Searches for action sequences that achieve a specified goal. Plans ahead.
e.g. Task planning agent
Utility-Based
Utility-Based Agent
Maximizes a utility function when multiple goals conflict. Trades off objectives.
e.g. RLHF-trained LLM
Learning
Learning Agent
Improves its own performance from experience. Has a learning element and a critic.
e.g. RLHF + RAG agent

Reasoning & Planning Strategies

Four Reasoning Approaches — When to Use Each

Chain-of-Thought
Linear Reasoning Path

Triggers with "Let's think step by step." Generates a single sequential reasoning chain before the final answer. Fast and reliable for well-defined problems.

✔ Math, structured Q&A
Tree-of-Thought
Branch & Backtrack

Explores multiple reasoning branches simultaneously. Evaluates intermediate steps. Backtracks from dead ends using BFS or DFS. Much stronger on open-ended planning tasks.

✔ Creative, multi-step planning
ReAct Loop
Grounded in Reality

Interleaves reasoning and acting. Each Observation from a real tool grounds the next Thought. Prevents hallucination by anchoring reasoning in live tool results.

✔ Any tool-using agent
MCTS
Monte Carlo Tree Search

Simulates many rollouts from each decision node, estimates value, and selects the highest-value branch. Optimal for game-like scenarios with many future steps.

✔ Long-horizon planning

The Four Memory Types

How Agents Store and Retrieve Knowledge

In-Context Memory
Working Memory
The active prompt window. Fastest to access. Limited by the model's context length (8K–128K+ tokens). Lost between sessions unless explicitly saved.
Current conversation, recent tool results, system prompt
Episodic Memory
Past Experiences
Records of prior agent interactions and task outcomes. Retrieved by similarity to the current context. Enables agents to learn from past mistakes or successes.
Previous conversation summaries, past task logs
Semantic Memory
Long-Term Facts
Factual world knowledge stored in vector databases. Retrieved via embedding similarity (dense) or keyword search (sparse). Enables grounded, hallucination-resistant responses.
Product catalog, company policy, knowledge base
Procedural Memory
Skills & Tools
How-to knowledge encoded as callable tools, function definitions, or system prompt instructions. Tells the agent what it can do, not just what it knows.
Tool schemas, API specs, workflow templates

Tool Calling (Function Calling)

Tool calling allows an LLM to request execution of a predefined function by outputting structured JSON: {"name": "search_web", "arguments": {"query": "NVIDIA NCP-AAI exam date"}}. The host system executes the function and returns the result as an Observation. This enables agents to interact with APIs, databases, code interpreters, and external services in a controlled, type-safe way — without hallucinating API responses.

Tool Calling Flow

1
User sends a query
Agent receives the task and reasons about which tool(s) to use
2
LLM outputs structured tool call JSON
Specifies function name and typed arguments — not free text
3
Host system executes the function
Real API call, DB query, code run — result is injected back as Observation
4
Agent reasons over the Observation and responds
May trigger another tool call or produce the final answer

Multi-Agent Coordination Patterns

Four Patterns for Coordinating Multiple Agents

Orchestrator-Worker
Hub & Spoke Control

One orchestrator agent decomposes the task and dispatches subtasks to specialized worker agents. Workers report results back; orchestrator aggregates.

Used in: LangGraph supervisor, AutoGen GroupChat
Peer-to-Peer
Decentralized Collaboration

Agents communicate directly with each other without a central coordinator. Each agent can request help from any peer. More resilient but harder to control.

Used in: CrewAI peer delegation, AutoGen two-agent chat
Debate / Critique
Adversarial Improvement

Two or more agents argue opposing positions, or one agent critiques another's output. The process surfaces errors and improves final answer quality.

Used in: Constitutional AI, debate prompting
Reflection
Self-Critique Loop

An agent reviews and scores its own output against a rubric, then generates an improved version. Can iterate multiple rounds until quality meets a threshold.

Used in: Reflexion framework, self-refine
Concept Comparison Table
Filter by pillar or view all 22 key NCP-AAI Agent Architecture concepts.
ConceptPillarKey DetailExam Tip
AI AgentFoundationsA system that perceives its environment, reasons about it, and takes actions to achieve a goal — repeatedly, autonomouslyDistinguish from a single LLM call: agents loop, use tools, and have memory
ReAct LoopFoundationsThought → Action → Observation cycle that repeats until a final answer. ReAct = Reason + ActMost common production agent pattern; grounds reasoning in live tool results
Tool CallingFoundationsLLM outputs structured JSON specifying function name + typed arguments; host executes and returns resultAlso called function calling; prevents hallucinating API responses
Simple Reflex AgentFoundationsCondition-action rules only; no internal state or memory; fastest but brittleFails in partially observable environments
Goal-Based AgentFoundationsSearches for action sequences that achieve a specified goal; plans ahead; most LLM agents are goal-basedUses planning (CoT, ToT, MCTS) to find paths to the goal
Stopping ConditionFoundationsRule that terminates the agent loop — final answer reached, max steps exceeded, or confidence threshold metAlways implement; infinite loops are a common production failure mode
Chain-of-Thought (CoT)ReasoningGenerates a single linear reasoning chain before the final answer; triggered by "Let's think step by step"Simple and reliable; underperforms on tasks requiring exploration or backtracking
Tree-of-Thought (ToT)ReasoningExplores multiple reasoning branches; evaluates intermediate steps; backtracks from dead ends with BFS/DFSSignificantly outperforms CoT on multi-step planning and creative tasks
MCTSReasoningMonte Carlo Tree Search — simulates rollouts from each node, estimates value, selects highest-value branchBest for long-horizon planning with many possible futures; computationally expensive
Task DecompositionReasoningBreaking a complex goal into a DAG (directed acyclic graph) of simpler subtasks that can be parallelizedEnables parallel agent execution; LangGraph uses a DAG to express agent workflows
Self-ConsistencyReasoningGenerate N independent CoT reasoning paths; take the majority-vote answer across all pathsImproves reliability at the cost of N × inference time; reduces variance in answers
ReflexionReasoningAgent critiques its own output in natural language, stores critique as episodic memory, retries with that contextDoes not update model weights; improvement is in-context only
In-Context MemoryMemoryThe active prompt window; fastest to access; limited by context length; lost between sessionsWorking memory of the LLM; fills up with long agentic conversations
Episodic MemoryMemoryRecords of past interactions and task outcomes; retrieved by similarity to current contextEnables learning from past errors; often stored in a key-value store with embeddings
Semantic MemoryMemoryLong-term factual knowledge in a vector database; retrieved via embedding similarity or BM25This is the RAG knowledge base; grounded recall vs hallucinated knowledge
Procedural MemoryMemorySkills and tools the agent can invoke; encoded as function schemas, system prompts, or retrieval-augmented toolsTells agent what it can do; separate from what it knows (semantic memory)
Memory ConsolidationMemorySummarizing and compressing older in-context memory into external storage before the context window fillsPrevents context overflow; common in long-running agentic tasks
Orchestrator AgentMulti-AgentHigh-level agent that decomposes tasks and delegates subtasks to worker agents; aggregates resultsLangGraph "supervisor" node; AutoGen GroupChatManager
Worker AgentMulti-AgentSpecialized subagent that executes a specific task type (search, code, critique, summarize)Each worker has a restricted tool set and system prompt focused on its specialty
Debate PatternMulti-AgentTwo+ agents argue opposing positions; process surfaces errors; final answer emerges from structured disagreementImproves factual accuracy; used in Constitutional AI and adversarial prompting
Reflection PatternMulti-AgentAgent (or separate critic agent) reviews output against a rubric, scores it, and produces an improved revisionCan be self-reflection (same agent) or cross-reflection (separate critic agent)
Shared State / BlackboardMulti-AgentCommon data structure all agents can read/write to share intermediate results and contextLangGraph State object; prevents agents from duplicating work or contradicting each other
Real-World Examples
How agent architecture patterns apply in production agentic AI deployments.
Pillar 1 · Agent Foundations

Customer Support ReAct Agent with Tool Calling

A production support agent for a SaaS company that handles billing inquiries, ticket lookups, and policy questions using a ReAct loop over three tools.

  • User asks: "Why was I charged $149 last month when my plan is $99?"
  • Thought: "I need to look up this user's account and recent invoices."
  • Action: get_account(user_id="u_8821") → Observation: Plan upgraded to Team on Mar 15
  • Thought: "Upgrade explains the charge. Let me retrieve the invoice to confirm."
  • Action: get_invoice(invoice_id="inv_334") → Observation: $149 charge on Mar 16, prorated Team plan
  • Final Answer: Explains upgrade date, prorated charge, and offers to show plan comparison
Result: Zero hallucination — every claim is grounded in a real API response. Agent completes the full task in 2 ReAct iterations with no human handoff.
Pillar 2 · Reasoning & Planning

Research Agent Using Tree-of-Thought for Report Generation

A competitive intelligence agent that generates a structured market analysis report by exploring multiple research angles in parallel before committing to a structure.

  • Goal: "Write a 5-section market analysis for NVIDIA's AI chip competitive landscape"
  • ToT expands 3 possible report structures: by geography, by use case, or by competitor
  • Agent evaluates each branch: by-competitor structure scores highest on specificity
  • Sub-branches generated for each competitor: AMD, Intel, Google TPU, custom silicon
  • Weak sub-branches pruned; agent deep-dives on the 3 highest-scoring paths
  • Final report assembled from the best evaluated branches — not a single linear draft
Result: ToT produces a more comprehensive, better-structured report than CoT — at the cost of 3× inference calls. Appropriate for high-value, non-latency-critical tasks.
Pillar 3 · Memory Systems

Long-Running Coding Agent with All Four Memory Types

A developer assistant agent that works on a multi-week software project, demonstrating all four memory types in a single system.

  • In-context: Current conversation, open files, recent test output (within 128K window)
  • Episodic: Summaries of yesterday's debugging session — "auth module had a circular import bug" — retrieved at session start
  • Semantic: Company coding standards, API docs, and architecture decisions in a vector DB — retrieved when writing new code
  • Procedural: Tool schemas for run_tests(), lint_code(), git_commit(), open_file() — agent knows what it can do
  • Memory consolidation: At end of each session, agent writes a structured summary to episodic store before context resets
Result: Agent maintains coherent project context across sessions without requiring the full codebase in the prompt — each memory type serves a distinct function.
Pillar 4 · Multi-Agent Coordination

Multi-Agent Research Pipeline (Orchestrator + 4 Workers)

A financial analysis system where an orchestrator decomposes a complex query and dispatches to specialized worker agents in parallel.

  • User query: "Give me a full investment analysis of NVIDIA (NVDA) for Q2 2026"
  • Orchestrator decomposes into 4 parallel subtasks and assigns each to a worker
  • Worker 1 (Search Agent): Retrieves latest news, earnings calls, analyst ratings
  • Worker 2 (Financials Agent): Pulls P/E, revenue growth, margins from financial DB
  • Worker 3 (Sentiment Agent): Analyzes social media and forum sentiment via NLP
  • Worker 4 (Critic Agent): Reviews other workers' outputs for inconsistencies or hallucinations
  • Orchestrator aggregates all results into a structured investment brief
Result: 4× faster than a single agent doing all steps sequentially. Critic agent catches two factual errors before the final report is delivered.
Practice Quiz
10 questions across all four pillars. Answer each question and read the explanation before moving on.
Agent Design Advisor
Answer a few questions to get a tailored agent architecture recommendation.
Memory Hooks — Flip Cards
Click each card to reveal the answer. Perfect for exam day review.
Agent Foundations

What does ReAct stand for?

And what are the three loop steps?

ReAct = Reason + Act

The three steps: Thought (reason about what to do) → Action (invoke a tool) → Observation (process the tool result). The loop repeats until the agent produces a final answer or hits a stopping condition.

Agent Foundations

Tool Calling vs Free-Text Output

Why does tool calling prevent hallucination?

Tool calling forces the LLM to output structured JSON (function name + typed arguments) rather than inventing a response. The host system executes the real function and returns the actual result as an Observation — grounding every subsequent answer in real data, not LLM imagination.

Reasoning & Planning

CoT vs ToT — Key Difference

When should you use each?

CoT = single linear path, fast, good for well-defined problems. Trigger: "Let's think step by step."
ToT = multiple branches, evaluates intermediate steps, backtracks from dead ends. Use when tasks require exploration, creativity, or multi-step planning where the path isn't obvious upfront.

Reasoning & Planning

Self-Consistency

How does it improve reliability?

Generate N independent CoT reasoning paths, then take the majority-vote answer across all N outputs. Reduces variance and improves accuracy at the cost of N × inference time. Works because diverse paths to the same answer increase confidence that the answer is correct.

Memory Systems

The Four Memory Types

In-context, episodic, semantic, procedural

In-context: Active prompt window — fast, limited
Episodic: Past interaction records — experience
Semantic: Facts in a vector DB — knowledge
Procedural: Tool schemas and skills — capabilities

Think: What I'm doing · What I've done · What I know · What I can do

Memory Systems

Memory Consolidation

Why is it needed in long-running agents?

As an agent converses, the in-context window fills up. Memory consolidation summarizes older context into external (episodic or semantic) storage before the window overflows, then injects only the relevant summary. Prevents context overflow without losing the agent's history entirely.

Multi-Agent

Orchestrator vs Worker Agent

What does each one do?

Orchestrator: Decomposes the high-level goal into subtasks, assigns them to workers, aggregates results. One per team. (LangGraph supervisor, AutoGen GroupChatManager)
Worker: Executes a specific subtask with a restricted tool set. Specialized. Runs in parallel with other workers.

Multi-Agent

Debate vs Reflection Pattern

How do they differ in improving output quality?

Debate: Two+ agents argue opposing positions — adversarial, surfaces factual errors through disagreement. Used in Constitutional AI.
Reflection: One agent (or a critic agent) reviews its own output against a rubric and iterates. Cooperative self-improvement. Used in the Reflexion framework.

NCP-AAI Exam Series · 5 Topics

Ready to Master the Full NCP-AAI Exam?

Continue with Agent Development & Frameworks, Knowledge Integration, NVIDIA Platform for Agents, and Evaluation & Safety — all five topics with quizzes and decision tools.