What is the ReAct agent loop?

ReAct (Reason + Act) is an agent loop pattern where the agent alternates between Thought (reasoning about what to do), Action (invoking a tool or API), and Observation (processing the tool result). This loop repeats until the agent reaches a final answer. It grounds reasoning in real-world feedback and is the foundation of most production LLM agents.

What are the four types of memory in an agentic AI system?

Agentic AI systems use four memory types: (1) In-context memory — the active prompt window, fast but limited by context length; (2) Episodic memory — records of past interactions and experiences stored externally; (3) Semantic memory — long-term factual knowledge retrieved from vector databases; (4) Procedural memory — learned skills, tool definitions, and how-to knowledge encoded as callable functions or system prompts.

What is the difference between an orchestrator and a worker agent in multi-agent systems?

An orchestrator agent manages the high-level plan — it decomposes tasks, assigns them to worker agents, and aggregates results. Worker agents are specialized subagents that execute specific tasks (e.g., a search agent, a code-writing agent, a critic agent). This separation of concerns enables parallelism and specialization in complex agentic workflows.

How does Chain-of-Thought (CoT) differ from Tree-of-Thought (ToT) reasoning?

Chain-of-Thought (CoT) generates a single linear reasoning path — 'Let's think step by step' — before producing an answer. Tree-of-Thought (ToT) explores multiple reasoning branches in parallel, evaluates each branch at intermediate steps, and backtracks from dead ends using search algorithms like BFS or DFS. ToT significantly outperforms CoT on tasks requiring exploration, planning, or multi-step problem solving.

What is tool calling in the context of LLM agents?

Tool calling (also called function calling) allows an LLM to request the execution of predefined functions by outputting structured JSON specifying the function name and arguments. The host system executes the function and returns the result to the LLM as an Observation. This enables agents to interact with APIs, databases, code interpreters, and external services in a controlled, type-safe way.

Agent Architecture & Cognition — NVIDIA NCP-AAI Exam Prep

Four Pillars of Agent Architecture & Cognition

Agent Architecture & Cognition covers 25% of the NCP-AAI exam (Architecture 15% + Cognition/Planning 10%). These four pillars cover every concept in those two combined domains.

Pillar 1 · Agent Foundations

Types, Loops & Tool Calling

AI agents perceive inputs, reason about them, and take actions toward a goal — unlike static LLMs that respond only once. The ReAct loop (Thought→Action→Observation) is the dominant production pattern. Tool calling lets LLMs invoke typed functions via structured JSON output.

3

ReAct loop steps

5

Agent type classes

JSON

Tool call format

Pillar 2 · Reasoning & Planning

CoT, ToT, MCTS & Task Decomposition

Chain-of-Thought generates a single linear reasoning path. Tree-of-Thought explores multiple branches and backtracks. MCTS adds probabilistic rollouts for deeper planning. Task decomposition breaks complex goals into subtask graphs that agents can execute in parallel.

CoT

Simplest method

ToT

Best for exploration

DAG

Task plan structure

Pillar 3 · Memory Systems

In-Context, Episodic, Semantic & Procedural

Agents need more than a prompt window. Episodic memory stores past interaction histories. Semantic memory retrieves long-term facts from vector databases. Procedural memory encodes skills as callable tools. In-context memory is the fast but limited working memory of the LLM itself.

4

Memory types

VDB

Semantic store

128K+

Max in-context

Pillar 4 · Multi-Agent Coordination

Orchestrator, Peer, Debate & Reflection

Complex tasks benefit from multiple specialized agents. An orchestrator decomposes tasks and assigns them to worker agents. Peer networks communicate directly. Debate patterns pit agents against each other to surface better answers. Reflection agents critique and revise outputs.

4

Coord. patterns

1

Orchestrator per team

N

Parallel workers

The ReAct Loop — Foundation of Production LLM Agents

💭
Thought

Reason about
what to do next

→

⚡
Action

Invoke a tool
or API call

→

👁️
Observation

Process the
tool result

→

💭
Thought

Reason again
or conclude

→

✅
Answer

Final response
to the user

Loop repeats until the agent reaches a stopping condition — a final answer or a maximum step limit. ReAct = Reason + Act.

Why 25% of the exam? Agent Architecture & Cognition is the bedrock of the NCP-AAI exam. Every other domain — Development, Deployment, Monitoring — builds on top of understanding how agents reason, remember, plan, and coordinate. Mastering these four pillars unlocks the rest of the certification.

How Agent Architecture Works

From agent types through reasoning strategies to multi-agent coordination patterns.

Agent Type Taxonomy

Five Agent Classes — From Simple Reflexes to Continuous Learning

Reflex

Simple Reflex Agent

Condition-action rules only. No memory, no planning. Maps percept directly to action.

e.g. Rule-based chatbot

Model-Based

Model-Based Reflex Agent

Maintains internal state of the world. Handles partially observable environments.

e.g. Dialogue state tracker

Goal-Based

Goal-Based Agent

Searches for action sequences that achieve a specified goal. Plans ahead.

e.g. Task planning agent

Utility-Based

Utility-Based Agent

Maximizes a utility function when multiple goals conflict. Trades off objectives.

e.g. RLHF-trained LLM

Learning

Learning Agent

Improves its own performance from experience. Has a learning element and a critic.

e.g. RLHF + RAG agent

Reasoning & Planning Strategies

Four Reasoning Approaches — When to Use Each

Chain-of-Thought

Linear Reasoning Path

Triggers with "Let's think step by step." Generates a single sequential reasoning chain before the final answer. Fast and reliable for well-defined problems.

✔ Math, structured Q&A

Tree-of-Thought

Branch & Backtrack

Explores multiple reasoning branches simultaneously. Evaluates intermediate steps. Backtracks from dead ends using BFS or DFS. Much stronger on open-ended planning tasks.

✔ Creative, multi-step planning

ReAct Loop

Grounded in Reality

Interleaves reasoning and acting. Each Observation from a real tool grounds the next Thought. Prevents hallucination by anchoring reasoning in live tool results.

✔ Any tool-using agent

MCTS

Monte Carlo Tree Search

Simulates many rollouts from each decision node, estimates value, and selects the highest-value branch. Optimal for game-like scenarios with many future steps.

✔ Long-horizon planning

The Four Memory Types

How Agents Store and Retrieve Knowledge

In-Context Memory

Working Memory

The active prompt window. Fastest to access. Limited by the model's context length (8K–128K+ tokens). Lost between sessions unless explicitly saved.

Current conversation, recent tool results, system prompt

Episodic Memory

Past Experiences

Records of prior agent interactions and task outcomes. Retrieved by similarity to the current context. Enables agents to learn from past mistakes or successes.

Previous conversation summaries, past task logs

Semantic Memory

Long-Term Facts

Factual world knowledge stored in vector databases. Retrieved via embedding similarity (dense) or keyword search (sparse). Enables grounded, hallucination-resistant responses.

Product catalog, company policy, knowledge base

Procedural Memory

Skills & Tools

How-to knowledge encoded as callable tools, function definitions, or system prompt instructions. Tells the agent what it can do, not just what it knows.

Tool schemas, API specs, workflow templates

Tool Calling (Function Calling)

Tool calling allows an LLM to request execution of a predefined function by outputting structured JSON: {"name": "search_web", "arguments": {"query": "NVIDIA NCP-AAI exam date"}}. The host system executes the function and returns the result as an Observation. This enables agents to interact with APIs, databases, code interpreters, and external services in a controlled, type-safe way — without hallucinating API responses.

Tool Calling Flow

1

User sends a query
Agent receives the task and reasons about which tool(s) to use

2

LLM outputs structured tool call JSON
Specifies function name and typed arguments — not free text

3

Host system executes the function
Real API call, DB query, code run — result is injected back as Observation

4

Agent reasons over the Observation and responds
May trigger another tool call or produce the final answer

Multi-Agent Coordination Patterns

Four Patterns for Coordinating Multiple Agents

Orchestrator-Worker

Hub & Spoke Control

One orchestrator agent decomposes the task and dispatches subtasks to specialized worker agents. Workers report results back; orchestrator aggregates.

Used in: LangGraph supervisor, AutoGen GroupChat

Peer-to-Peer

Decentralized Collaboration

Agents communicate directly with each other without a central coordinator. Each agent can request help from any peer. More resilient but harder to control.

Used in: CrewAI peer delegation, AutoGen two-agent chat

Debate / Critique

Adversarial Improvement

Two or more agents argue opposing positions, or one agent critiques another's output. The process surfaces errors and improves final answer quality.

Used in: Constitutional AI, debate prompting

Reflection

Self-Critique Loop

An agent reviews and scores its own output against a rubric, then generates an improved version. Can iterate multiple rounds until quality meets a threshold.

Used in: Reflexion framework, self-refine

Concept Comparison Table

Filter by pillar or view all 22 key NCP-AAI Agent Architecture concepts.

Concept	Pillar	Key Detail	Exam Tip
AI Agent	Foundations	A system that perceives its environment, reasons about it, and takes actions to achieve a goal — repeatedly, autonomously	Distinguish from a single LLM call: agents loop, use tools, and have memory
ReAct Loop	Foundations	Thought → Action → Observation cycle that repeats until a final answer. ReAct = Reason + Act	Most common production agent pattern; grounds reasoning in live tool results
Tool Calling	Foundations	LLM outputs structured JSON specifying function name + typed arguments; host executes and returns result	Also called function calling; prevents hallucinating API responses
Simple Reflex Agent	Foundations	Condition-action rules only; no internal state or memory; fastest but brittle	Fails in partially observable environments
Goal-Based Agent	Foundations	Searches for action sequences that achieve a specified goal; plans ahead; most LLM agents are goal-based	Uses planning (CoT, ToT, MCTS) to find paths to the goal
Stopping Condition	Foundations	Rule that terminates the agent loop — final answer reached, max steps exceeded, or confidence threshold met	Always implement; infinite loops are a common production failure mode
Chain-of-Thought (CoT)	Reasoning	Generates a single linear reasoning chain before the final answer; triggered by "Let's think step by step"	Simple and reliable; underperforms on tasks requiring exploration or backtracking
Tree-of-Thought (ToT)	Reasoning	Explores multiple reasoning branches; evaluates intermediate steps; backtracks from dead ends with BFS/DFS	Significantly outperforms CoT on multi-step planning and creative tasks
MCTS	Reasoning	Monte Carlo Tree Search — simulates rollouts from each node, estimates value, selects highest-value branch	Best for long-horizon planning with many possible futures; computationally expensive
Task Decomposition	Reasoning	Breaking a complex goal into a DAG (directed acyclic graph) of simpler subtasks that can be parallelized	Enables parallel agent execution; LangGraph uses a DAG to express agent workflows
Self-Consistency	Reasoning	Generate N independent CoT reasoning paths; take the majority-vote answer across all paths	Improves reliability at the cost of N × inference time; reduces variance in answers
Reflexion	Reasoning	Agent critiques its own output in natural language, stores critique as episodic memory, retries with that context	Does not update model weights; improvement is in-context only
In-Context Memory	Memory	The active prompt window; fastest to access; limited by context length; lost between sessions	Working memory of the LLM; fills up with long agentic conversations
Episodic Memory	Memory	Records of past interactions and task outcomes; retrieved by similarity to current context	Enables learning from past errors; often stored in a key-value store with embeddings
Semantic Memory	Memory	Long-term factual knowledge in a vector database; retrieved via embedding similarity or BM25	This is the RAG knowledge base; grounded recall vs hallucinated knowledge
Procedural Memory	Memory	Skills and tools the agent can invoke; encoded as function schemas, system prompts, or retrieval-augmented tools	Tells agent what it can do; separate from what it knows (semantic memory)
Memory Consolidation	Memory	Summarizing and compressing older in-context memory into external storage before the context window fills	Prevents context overflow; common in long-running agentic tasks
Orchestrator Agent	Multi-Agent	High-level agent that decomposes tasks and delegates subtasks to worker agents; aggregates results	LangGraph "supervisor" node; AutoGen GroupChatManager
Worker Agent	Multi-Agent	Specialized subagent that executes a specific task type (search, code, critique, summarize)	Each worker has a restricted tool set and system prompt focused on its specialty
Debate Pattern	Multi-Agent	Two+ agents argue opposing positions; process surfaces errors; final answer emerges from structured disagreement	Improves factual accuracy; used in Constitutional AI and adversarial prompting
Reflection Pattern	Multi-Agent	Agent (or separate critic agent) reviews output against a rubric, scores it, and produces an improved revision	Can be self-reflection (same agent) or cross-reflection (separate critic agent)
Shared State / Blackboard	Multi-Agent	Common data structure all agents can read/write to share intermediate results and context	LangGraph State object; prevents agents from duplicating work or contradicting each other

Real-World Examples

How agent architecture patterns apply in production agentic AI deployments.

Pillar 1 · Agent Foundations

Customer Support ReAct Agent with Tool Calling

A production support agent for a SaaS company that handles billing inquiries, ticket lookups, and policy questions using a ReAct loop over three tools.

User asks: "Why was I charged $149 last month when my plan is $99?"
Thought: "I need to look up this user's account and recent invoices."
Action: get_account(user_id="u_8821") → Observation: Plan upgraded to Team on Mar 15
Thought: "Upgrade explains the charge. Let me retrieve the invoice to confirm."
Action: get_invoice(invoice_id="inv_334") → Observation: $149 charge on Mar 16, prorated Team plan
Final Answer: Explains upgrade date, prorated charge, and offers to show plan comparison

Result: Zero hallucination — every claim is grounded in a real API response. Agent completes the full task in 2 ReAct iterations with no human handoff.

Pillar 2 · Reasoning & Planning

Research Agent Using Tree-of-Thought for Report Generation

A competitive intelligence agent that generates a structured market analysis report by exploring multiple research angles in parallel before committing to a structure.

Goal: "Write a 5-section market analysis for NVIDIA's AI chip competitive landscape"
ToT expands 3 possible report structures: by geography, by use case, or by competitor
Agent evaluates each branch: by-competitor structure scores highest on specificity
Sub-branches generated for each competitor: AMD, Intel, Google TPU, custom silicon
Weak sub-branches pruned; agent deep-dives on the 3 highest-scoring paths
Final report assembled from the best evaluated branches — not a single linear draft

Result: ToT produces a more comprehensive, better-structured report than CoT — at the cost of 3× inference calls. Appropriate for high-value, non-latency-critical tasks.

Pillar 3 · Memory Systems

Long-Running Coding Agent with All Four Memory Types

A developer assistant agent that works on a multi-week software project, demonstrating all four memory types in a single system.

In-context: Current conversation, open files, recent test output (within 128K window)
Episodic: Summaries of yesterday's debugging session — "auth module had a circular import bug" — retrieved at session start
Semantic: Company coding standards, API docs, and architecture decisions in a vector DB — retrieved when writing new code
Procedural: Tool schemas for run_tests(), lint_code(), git_commit(), open_file() — agent knows what it can do
Memory consolidation: At end of each session, agent writes a structured summary to episodic store before context resets

Result: Agent maintains coherent project context across sessions without requiring the full codebase in the prompt — each memory type serves a distinct function.

Pillar 4 · Multi-Agent Coordination

Multi-Agent Research Pipeline (Orchestrator + 4 Workers)

A financial analysis system where an orchestrator decomposes a complex query and dispatches to specialized worker agents in parallel.

User query: "Give me a full investment analysis of NVIDIA (NVDA) for Q2 2026"
Orchestrator decomposes into 4 parallel subtasks and assigns each to a worker
Worker 1 (Search Agent): Retrieves latest news, earnings calls, analyst ratings
Worker 2 (Financials Agent): Pulls P/E, revenue growth, margins from financial DB
Worker 3 (Sentiment Agent): Analyzes social media and forum sentiment via NLP
Worker 4 (Critic Agent): Reviews other workers' outputs for inconsistencies or hallucinations
Orchestrator aggregates all results into a structured investment brief

Result: 4× faster than a single agent doing all steps sequentially. Critic agent catches two factual errors before the final report is delivered.

Practice Quiz

10 questions across all four pillars. Answer each question and read the explanation before moving on.

Agent Design Advisor

Answer a few questions to get a tailored agent architecture recommendation.

Memory Hooks — Flip Cards

Click each card to reveal the answer. Perfect for exam day review.

Agent Foundations

What does ReAct stand for?

And what are the three loop steps?

ReAct = Reason + Act

The three steps: Thought (reason about what to do) → Action (invoke a tool) → Observation (process the tool result). The loop repeats until the agent produces a final answer or hits a stopping condition.

Agent Foundations

Tool Calling vs Free-Text Output

Why does tool calling prevent hallucination?

Tool calling forces the LLM to output structured JSON (function name + typed arguments) rather than inventing a response. The host system executes the real function and returns the actual result as an Observation — grounding every subsequent answer in real data, not LLM imagination.

Reasoning & Planning

CoT vs ToT — Key Difference

When should you use each?

CoT = single linear path, fast, good for well-defined problems. Trigger: "Let's think step by step."
ToT = multiple branches, evaluates intermediate steps, backtracks from dead ends. Use when tasks require exploration, creativity, or multi-step planning where the path isn't obvious upfront.

Reasoning & Planning

Self-Consistency

How does it improve reliability?

Generate N independent CoT reasoning paths, then take the majority-vote answer across all N outputs. Reduces variance and improves accuracy at the cost of N × inference time. Works because diverse paths to the same answer increase confidence that the answer is correct.

Memory Systems

The Four Memory Types

In-context, episodic, semantic, procedural

In-context: Active prompt window — fast, limited
Episodic: Past interaction records — experience
Semantic: Facts in a vector DB — knowledge
Procedural: Tool schemas and skills — capabilities

Think: What I'm doing · What I've done · What I know · What I can do

Memory Systems

Memory Consolidation

Why is it needed in long-running agents?

As an agent converses, the in-context window fills up. Memory consolidation summarizes older context into external (episodic or semantic) storage before the window overflows, then injects only the relevant summary. Prevents context overflow without losing the agent's history entirely.

Multi-Agent

Orchestrator vs Worker Agent

What does each one do?

Orchestrator: Decomposes the high-level goal into subtasks, assigns them to workers, aggregates results. One per team. (LangGraph supervisor, AutoGen GroupChatManager)
Worker: Executes a specific subtask with a restricted tool set. Specialized. Runs in parallel with other workers.

Multi-Agent

Debate vs Reflection Pattern

How do they differ in improving output quality?

Debate: Two+ agents argue opposing positions — adversarial, surfaces factual errors through disagreement. Used in Constitutional AI.
Reflection: One agent (or a critic agent) reviews its own output against a rubric and iterates. Cooperative self-improvement. Used in the Reflexion framework.

Agent Architecture & Cognition

Types, Loops & Tool Calling

CoT, ToT, MCTS & Task Decomposition

In-Context, Episodic, Semantic & Procedural

Orchestrator, Peer, Debate & Reflection

The ReAct Loop — Foundation of Production LLM Agents

Agent Type Taxonomy

Five Agent Classes — From Simple Reflexes to Continuous Learning

Reasoning & Planning Strategies

Four Reasoning Approaches — When to Use Each

Linear Reasoning Path

Branch & Backtrack

Grounded in Reality

Monte Carlo Tree Search

The Four Memory Types

How Agents Store and Retrieve Knowledge

Tool Calling (Function Calling)

Tool Calling Flow

Multi-Agent Coordination Patterns

Four Patterns for Coordinating Multiple Agents

Hub & Spoke Control

Decentralized Collaboration

Adversarial Improvement

Self-Critique Loop

Customer Support ReAct Agent with Tool Calling

Research Agent Using Tree-of-Thought for Report Generation

Long-Running Coding Agent with All Four Memory Types

Multi-Agent Research Pipeline (Orchestrator + 4 Workers)

What does ReAct stand for?

Tool Calling vs Free-Text Output

CoT vs ToT — Key Difference

Self-Consistency

The Four Memory Types

Memory Consolidation

Orchestrator vs Worker Agent

Debate vs Reflection Pattern

Ready to Master the Full NCP-AAI Exam?