AI-103: Plan & Manage Azure AI Solutions

#	Domain	Weight
1	🎯 Plan and manage an Azure AI solution THIS PAGE	25–30%
2	Implement generative AI and agentic solutions	30–35%
3	Implement computer vision solutions	10–15%
4	Implement text analysis solutions	10–15%
5	Implement information extraction solutions	10–15%

🧠 Domain 1 Core Concepts

Comprehensive study material for all Domain 1 sub-topics

🏗️ Microsoft Azure AI Foundry Fundamentals

Azure AI Foundry Portal vs Azure AI Studio

Azure AI Foundry is the rebranded and expanded version of Azure AI Studio (renamed in late 2024). It is the unified platform for discovering, building, testing, and deploying AI models and solutions on Azure. The Foundry portal integrates model catalog browsing, project management, prompt flow authoring, evaluation, and monitoring into a single interface.

Exam tip: If a question references "Azure AI Studio," treat it as equivalent to the pre-Foundry era. Exam questions will predominantly use "Azure AI Foundry" terminology.

Foundry Hubs and Projects

The Foundry organizational hierarchy has two main levels:

Hub

Project

Shared infrastructure layer — networking, security policy, connected resources, compute

Individual workspace where developers build, evaluate, and deploy specific AI solutions

Created once per environment (dev/prod), managed by AI platform team or admins

Created per use-case or application; scoped within a parent Hub

Holds connections: Azure OpenAI, AI Search, Storage, Key Vault

Inherits connections from Hub; can have project-specific connections

Analogous to a data center floor (shared infrastructure)

Analogous to a developer's desk on that floor

Connected Resources in a Foundry Hub

Azure OpenAI Service — LLM deployments, embeddings, DALL-E
Azure AI Search — vector index for RAG scenarios
Azure Blob Storage — ground truth datasets, evaluation data, artifacts
Azure Key Vault — secrets, API keys, connection strings for connected resources
Azure Container Registry — custom container images for compute
Application Insights — telemetry, traces, performance monitoring

Foundry Model Catalog

The model catalog aggregates models from multiple providers into a single discovery surface:

Azure OpenAI models (GPT-4o, o1, o3) Hugging Face open-source models Meta (Llama family) Mistral models Cohere models Microsoft research models (Phi)

Models can be deployed via serverless API (pay-per-token, no compute provisioning) or as managed compute deployments (dedicated GPU instances with provisioned throughput).

Compute Options: Serverless vs Provisioned

Serverless API: No infrastructure to manage. Pay per token consumed. Best for variable or low-volume workloads. Model hosted by Microsoft. Shared capacity.

Provisioned Throughput: Reserved capacity with guaranteed tokens-per-minute (TPM). Best for production workloads needing predictable latency. Billed hourly regardless of usage.

🤖 Model Selection & Deployment

Key Azure OpenAI Models on AI-103

Model	Best For	Key Trait
GPT-4o	Multimodal (text + vision)	Balanced speed & capability; default for most production apps
GPT-4 Turbo	Long-context reasoning	128K context window; older but still widely used
GPT-4o-mini	Cost-sensitive tasks	Faster, cheaper; smaller context but good for classification/extraction
o1 / o3 series	Complex reasoning	"Thinking" models with internal chain-of-thought; slower, more expensive
text-embedding-3-large	RAG / vector search	High-dimensional embeddings for semantic similarity

Deployment Types

When deploying an Azure OpenAI model, you choose one of four deployment types:

Global Standard — Dynamically routes requests to datacenters with available capacity globally. Best throughput, lowest latency on average, but no data residency guarantee. Use for high-volume general applications.
Standard — Capacity in a specific Azure region. Data stays in that region. Suitable for compliance scenarios. Subject to quota limits per region.
Provisioned Throughput Units (PTU) — Reserved, predictable capacity measured in PTUs. Guaranteed TPM SLA. Billed hourly. Use for production workloads with strict latency/throughput requirements.
Batch — Asynchronous bulk processing. Submit large job files, results returned when complete. Up to 50% cost savings vs Standard. Use for offline data processing, evaluation jobs, bulk inference.

Exam trap: "Provisioned" and "Global Standard" are often confused. Provisioned = predictable SLA + hourly billing. Global Standard = best effort + pay-per-token. If a question asks for guaranteed throughput, choose Provisioned.

Tokens Per Minute (TPM) and Requests Per Minute (RPM)

TPM (Tokens Per Minute) — The total number of tokens (input + output) your deployment can process per minute. Determines how large and how many requests you can handle. Exceeding TPM results in HTTP 429 (rate limit) errors.

RPM (Requests Per Minute) — The maximum number of individual API calls per minute. Automatically calculated based on your TPM allocation (approximately TPM / 6 = RPM for most models).

Example: 100,000 TPM → ~16,667 RPM theoretical max
Actual RPM limit = 1,000 for Standard deployments at 100K TPM

Content Filter Configuration

Azure OpenAI includes built-in content filters that score input and output content across 4 harm categories:

Hate Violence Sexual Self-Harm

Each category uses a severity scale from 0 (safe) to 7 (severe). You configure a threshold per category (Low=2, Medium=4, High=6). Content scoring at or above threshold is blocked.

Additional filter features: custom blocklists (block specific terms/phrases), jailbreak detection (prompt injection attempts), protected material detection (copyright content), groundedness detection (hallucination in RAG responses).

Model Versioning and Auto-Update Policies

Azure OpenAI models have explicit versions (e.g., gpt-4o-2024-11-20). When creating a deployment you can choose:

Pinned version — Stays on exact version until manually updated or retired.
Auto-update to default — Automatically moves to the latest model version when the current version is retired. Recommended for non-sensitive applications.
No auto-update — Fails when version is retired unless you manually update the deployment.

🔐 Identity, Security & Access Control

Managed Identity Types

System-Assigned Managed Identity

User-Assigned Managed Identity

Tied to a single Azure resource lifecycle. Deleted when the resource is deleted.

Standalone resource. Can be assigned to multiple Azure resources simultaneously.

Automatically created and managed by Azure

Created by the user, then attached to resources

Best for: simple single-service scenarios

Best for: shared identity across multiple services; reuse patterns

Why managed identity over API keys? Managed identities authenticate via Microsoft Entra ID (formerly Azure AD). No credentials stored in code or config files. Tokens are automatically rotated. Supports audit logging. This is the preferred production pattern.

Azure RBAC Roles for Azure OpenAI / Cognitive Services

Role	Permissions	When to Use
Cognitive Services OpenAI User	Call inference endpoints; cannot view keys or manage deployments	Application service accounts, end-user authenticated apps
Cognitive Services OpenAI Contributor	Create/manage deployments; view keys; fine-tuning	Developers building and testing models
Cognitive Services Contributor	Full resource management except IAM	DevOps/MLOps engineers
Owner	All permissions including IAM role assignments	Admins only — follow least privilege

Key-Based Authentication vs Microsoft Entra ID Token

Key-based authentication: Uses the resource API key in the Authorization header or as a query parameter. Simple to implement but risks: keys can be leaked, no per-caller audit trail, revocation requires key rotation which impacts all callers.

Entra ID token authentication: Uses OAuth 2.0 bearer tokens issued by Microsoft Entra ID. Benefits: per-caller identity in audit logs, token expiry (short-lived), revoke by disabling identity, works with managed identities (no credential management).

Best practice rule: Always prefer Entra ID token-based authentication for production. Reserve key-based auth for development/testing or third-party integrations that cannot use managed identities.

Azure Key Vault Integration

Store all secrets (API keys, connection strings, third-party credentials) in Azure Key Vault. Foundry hubs support a connected Key Vault that stores secrets for all connected resources. Applications retrieve secrets at runtime using managed identity — never hardcoded values.

Key Vault access policies vs RBAC: Modern pattern uses Azure RBAC (Key Vault Secrets User role) rather than legacy access policies.

Microsoft Defender for Cloud

Defender for Cloud provides security posture management and threat protection for AI services. Key capabilities: configuration recommendations (e.g., "enable private endpoint"), threat detection alerts (e.g., unusual spike in token consumption indicating key compromise), regulatory compliance dashboards.

🌐 Networking & Private Endpoints

Public Access vs Private Endpoint

By default, Azure AI services accept connections from the public internet (over HTTPS). For production environments with strict data security requirements, you should disable public access and use private endpoints.

A private endpoint assigns a private IP address from your VNet to the Azure AI service, making all traffic traverse the Azure backbone network without traversing the public internet. The service appears as if it lives inside your VNet.

VNet Integration Components

Private Endpoint — Network interface in your VNet with a private IP. The AI service resolves to this private IP from within the VNet.
Private DNS Zone — Overrides public DNS resolution so that myaccount.openai.azure.com resolves to the private IP (not the public IP) from within the VNet. Zone: privatelink.openai.azure.com
VNet Service Endpoint — Older, less secure alternative. Traffic stays on Azure backbone but the resource still has a public IP. Private endpoints are preferred over service endpoints for AI services.
Network Security Groups (NSG) — Control inbound/outbound traffic at the VNet level. Not the primary mechanism for AI service access control (use private endpoints instead).

Outbound Firewall Rules for Foundry

Azure AI Foundry components (compute instances, managed online endpoints) need outbound access to specific Microsoft endpoints for model downloads, container registry pulls, and monitoring data. In locked-down VNets, you must add allowlist rules for:

*.blob.core.windows.net *.azurecr.io *.openai.azure.com management.azure.com login.microsoftonline.com

🔄 CI/CD & MLOps for AI

Azure DevOps Pipelines for Model Deployment

Model deployment to Azure AI Foundry can be automated using Azure DevOps pipelines. A typical pipeline includes stages: validate configuration → evaluate model → deploy to staging → run integration tests → promote to production.

# Example: Azure CLI task in DevOps pipeline
az cognitiveservices account deployment create \
  --name $OPENAI_RESOURCE \
  --resource-group $RG \
  --deployment-name "gpt4o-prod" \
  --model-name "gpt-4o" \
  --model-version "2024-11-20" \
  --model-format OpenAI \
  --sku-capacity 100 \
  --sku-name "Standard"

GitHub Actions with Azure Credentials

GitHub Actions workflows authenticate to Azure using either a Service Principal (client secret or certificate) or Workload Identity Federation (OIDC — preferred, no long-lived secrets). The azure/login action handles authentication, then subsequent actions deploy AI resources.

Prompt Flow as Pipeline Component

Azure AI Foundry Prompt Flow is a development framework for building LLM-based applications. In a CI/CD context, prompt flows can be:

Versioned as YAML files in Git
Evaluated automatically in pipeline gates using evaluation flows
Deployed as managed online endpoints via pipeline commands
Tested with ground truth datasets for quality regression detection

Model Evaluation in CI Gates

Before promoting a new model version or prompt configuration to production, evaluation gates check metrics such as:

Groundedness score (0–5) Coherence score (0–5) Relevance score (0–5) Fluency score (0–5) F1 (for extraction tasks)

If evaluation scores fall below defined thresholds, the pipeline fails the gate and blocks deployment. This is the primary mechanism to prevent quality regressions in AI application delivery.

Blue-Green Deployment for Models

Blue-green deployment maintains two identical production environments. The "blue" environment runs the current model; "green" is the new version. Traffic is switched to green only after validation. Rollback is instant — switch traffic back to blue. In Azure AI Foundry, implement using traffic splitting on managed online endpoints (e.g., 90% to blue, 10% to green during canary testing).

Infrastructure as Code (IaC) for AI Resources

Use Bicep or ARM templates to provision Azure AI Foundry hubs, projects, Azure OpenAI accounts, and deployments repeatably across environments. Key resources to define in IaC:

Microsoft.CognitiveServices/accounts Microsoft.CognitiveServices/accounts/deployments Microsoft.MachineLearningServices/workspaces (Foundry hub) privateEndpoints diagnosticSettings

📈 Monitoring & Observability

Azure Monitor + Diagnostic Settings

Enable Diagnostic Settings on Azure OpenAI and AI Foundry resources to route logs and metrics to:

Log Analytics Workspace — Query with KQL; long-term retention; alerts
Storage Account — Archival for compliance
Event Hub — Stream to SIEM tools (Microsoft Sentinel, Splunk)

Key log categories to enable: RequestResponse, Audit, ChatCompletions

Log Analytics for AI Services

In Log Analytics, Azure OpenAI logs appear in the AzureDiagnostics table (legacy) or resource-specific tables. Key KQL queries for exam scenarios:

// Token usage over time
AzureDiagnostics
| where ResourceType == "OPENAI"
| where Category == "RequestResponse"
| summarize TotalTokens = sum(todouble(properties_s)), by bin(TimeGenerated, 1h)
| order by TimeGenerated desc

Application Insights for AI Apps

Application Insights tracks end-to-end request traces for AI applications built on top of Azure OpenAI. Key telemetry: request latency (dependency tracking to OpenAI endpoint), token usage per request, failure rates, dependency failures.

The Azure OpenAI SDK automatically integrates with Application Insights when configured with a connection string. Use distributed tracing to correlate frontend requests → your app → OpenAI API calls.

Azure AI Foundry Tracing

Azure AI Foundry provides built-in tracing for prompt flows and agent runs. Tracing captures:

Traces — End-to-end execution of a flow or agent invocation
Spans — Individual operations within a trace (LLM call, tool call, retrieval step)
Input/output at each span for debugging
Latency per span to identify bottlenecks
Token counts per LLM call span

OpenTelemetry standard: Foundry tracing is built on OpenTelemetry. Traces can be exported to Application Insights or any OTLP-compatible backend. Look for questions about "spans" or "traces" in monitoring context.

Alerts on Key Metrics

Quota utilization alert — Trigger when TPM usage exceeds 80% of quota (prevents unexpected throttling)
Error rate alert — Alert on HTTP 4xx/5xx rates from AI service diagnostic logs
Latency alert — Alert when P95 response time exceeds SLO threshold
Content filter alert — Track blocked requests rate; spike may indicate misuse or misconfiguration

Cost Analysis and Budget Alerts

Use Azure Cost Management to analyze AI service spending. Tag Azure OpenAI deployments with project/team/environment tags to enable granular cost attribution. Set Azure Budgets with alert thresholds (e.g., alert at 80% of monthly budget, hard stop at 100%). Note: Budget spending limits don't automatically throttle Azure services — they only alert. Actual throttling requires quota management at the service level.

⚖️ Responsible AI

Microsoft's 6 Responsible AI Principles

Principle	Definition	Example Violation
Fairness	AI systems should treat all people equitably, without bias based on protected characteristics	Loan model approves applications at lower rates for one ethnic group
Reliability & Safety	AI should behave as intended, be safe to use, and handle unexpected inputs gracefully	Autonomous agent crashes production system during an edge case
Privacy & Security	AI should respect data privacy, handle personal data responsibly, and resist attacks	LLM reveals PII from training data in outputs
Inclusiveness	AI should empower and benefit all people, including those with disabilities and from all backgrounds	Voice assistant only works well for native English speakers
Transparency	AI systems and their limitations should be understandable; users should know when they're interacting with AI	Chatbot claims to be human without disclosure
Accountability	People and organizations should be accountable for AI systems and their impacts	No process to audit or appeal AI-driven decisions

Azure AI Content Safety Service

A dedicated Azure AI service (separate from Azure OpenAI content filters) for analyzing text and images for harmful content. Used for user-generated content moderation, custom model output validation, and multilingual content analysis.

Harm categories and severity scale:

Categories: Hate, Violence, Sexual, Self-Harm
Severity: 0 (safe) → 2 (low) → 4 (medium) → 6 (high) → 7 (severe)
Configure thresholds per category based on use-case risk (children's app = threshold 0, general audience = threshold 2 or 4)

Advanced Content Safety Features

Groundedness detection — Validates whether AI-generated responses are factually supported by the provided context/documents. Detects hallucination in RAG scenarios. Returns a groundedness score and identifies ungrounded claims.
Protected material detection — Identifies whether AI output contains copyrighted text (song lyrics, news articles, book excerpts). Helps organizations avoid copyright liability.
Prompt injection (jailbreak) detection — Detects attempts by users to override AI system instructions via crafted prompts. Identifies both direct jailbreaks ("Ignore previous instructions...") and indirect injections (malicious content in retrieved documents).

Responsible AI Impact Assessment

Before deploying AI systems that could significantly impact people, Microsoft recommends completing a Responsible AI Impact Assessment. This structured document captures: intended use cases, potential harms and affected groups, mitigation measures implemented, monitoring plan post-deployment, and escalation processes. Required for high-risk AI applications; strongly recommended for all customer-facing AI.

💰 Quota & Cost Management

TPM Quota by Model Tier

Each Azure subscription gets a default TPM quota per model per region. Quota is model-specific (GPT-4o has different quota limits than GPT-4o-mini). Default quotas are typically 30K–240K TPM depending on subscription type and region.

Quota is regional: A 100K TPM quota for GPT-4o in East US is separate from a 100K TPM quota in West Europe. Quota cannot be automatically moved between regions. A Global Standard deployment draws from a global pool rather than regional pool.

Quota Increase Requests

If default quota is insufficient, submit a quota increase request through the Azure portal (Subscription > Usage + Quotas > Request Increase). For Azure OpenAI, quota increases go through a separate review process and may not be immediate. Best practice: submit requests 2–4 weeks before planned production launch.

Cost Per 1K Tokens (Approximate)

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-4o	~$2.50	~$10.00
GPT-4o-mini	~$0.15	~$0.60
o1	~$15.00	~$60.00
text-embedding-3-large	~$0.13	N/A

* Prices are approximate and subject to change. Exam tests concepts, not exact prices.

Azure Budgets and Spending Controls

Azure Budget — Set a monthly spend limit; configure alert thresholds (50%, 80%, 100% of budget). Alerts send email notifications. Does NOT automatically stop services.
Spending Limit — Available for trial/MSDN subscriptions only. Automatically disables services when limit is hit. Not available for Pay-As-You-Go or EA subscriptions.
Cost Allocation with Tags — Tag Azure OpenAI resources with project, environment, team for granular cost reporting and chargeback.

Reservation Pricing for Provisioned Deployments

Provisioned Throughput Units (PTUs) can be purchased as reservations (1-year or 3-year) for significant discounts vs pay-as-you-go hourly rates. Reservations commit to a fixed PTU count for the reservation term. Best for stable, high-volume production workloads with predictable usage patterns.

🪝 Memory Hooks & Mnemonics

Memorable patterns to lock in Domain 1 concepts for exam day

🔤 "FRPITA" — Microsoft's 6 Responsible AI Principles

F · R · P · I · T · A

Fairness — treat all people equitably, no bias
Reliability & Safety — behave as intended, handle edge cases safely
Privacy & Security — respect personal data, resist attacks
Inclusiveness — empower everyone, including underserved groups
Transparency — understandable, disclose AI involvement
Accountability — humans are responsible for AI impacts

Memory sentence: "Fairly Reliable Products Include Transparent Accountability"

🛡️ "HVSS" — Content Safety Harm Categories

H · V · S · S

Hate — discriminatory, hateful content targeting groups
Violence — depictions or promotion of physical harm
Sexual — sexually explicit or inappropriate content
Self-Harm — content promoting or depicting self-injury or suicide

Severity scale: 0 (safe) → 2 (low) → 4 (medium) → 6 (high) → 7 (severe). Each category is evaluated independently.

🔑 "Key or Token?" — Auth Decision Rule

KEY = Legacy/Simple | TOKEN = Production/Preferred

Use Key-based auth when: Rapid prototyping, third-party tools that can't do OAuth, simple local testing.

Use Entra ID Token auth when: Production environment, needs audit trail (who called what), managed identity available, revocability required without impacting other callers.

Think: A key fits any lock — simple but risky if copied. A token is like a badge with your photo — tied to your identity, expires, and can be deactivated without changing the lock.

🏢 Hub vs Project — The Office Building Analogy

HUB = Building Floor | PROJECT = Your Desk

Hub = The shared data center floor. Has shared utilities: networking (VNet), security policies, connected resources (Azure OpenAI, Storage, Key Vault). Built once by admins.
Project = Your individual workspace/desk on that floor. Where you build, test, and deploy your specific AI solution. Inherits the hub's infrastructure.

Multiple projects can share one hub. One project belongs to exactly one hub. Projects can have project-specific connected resources in addition to hub-level ones.

🚀 Deployment Types Rhyme

Global = Go-Fast | Standard = Steady | Provisioned = Predictable | Batch = Bulk

Global Standard — Go-Fast: routes to lowest-latency datacenter globally, best throughput, no data residency guarantee
Standard — Steady: regional deployment, data stays local, subject to per-region quota
Provisioned (PTU) — Predictable: reserved capacity, guaranteed TPM SLA, hourly billing even if idle
Batch — Bulk: async job processing, cheapest per-token, not for real-time use

⚡ TPM vs RPM — Size vs Speed

TPM = Token Size Budget | RPM = Request Count Budget

TPM (Tokens Per Minute) — How BIG your combined requests can be per minute. Large documents = burns TPM fast. Think: "how much data can I pump through"
RPM (Requests Per Minute) — How MANY individual API calls per minute. Lots of small short requests = burns RPM. Think: "how many times can I knock on the door"

You can hit the RPM limit with tiny requests, OR hit the TPM limit with a few large ones. Rate limiting (HTTP 429) triggers when either limit is exceeded.

🚪 "VIPs Use Private Entrances" — Private Endpoints

Production AI = Private Endpoint. Always.

Public internet access = using the front door (anyone can see you). Private endpoint = using the VIP back entrance through Azure's backbone (private, invisible to internet).

Private endpoint assigns a private IP in your VNet
Pair with Private DNS Zone so DNS resolves to private IP (not public)
Disable public network access on the Azure AI resource
Service endpoints are NOT as secure as private endpoints (resource still has public IP)

👥 RBAC Minimum Privilege Ladder

User → Contributor → Contributor (Cognitive) → Owner

App Service / Function App → "Cognitive Services OpenAI User" — just call the API, nothing else
Developer → "Cognitive Services OpenAI Contributor" — create deployments, run evals, fine-tune
DevOps Engineer → "Cognitive Services Contributor" — full resource management
Admin only → "Owner" — assign IAM roles; restrict to 1–2 people per resource

Exam trick: If a question says "the application should call Azure OpenAI," the answer is almost always "Cognitive Services OpenAI User" — not Contributor or Owner.

📊 Evaluation Metrics for CI Gates — "GCRF"

G · C · R · F = Groundedness, Coherence, Relevance, Fluency

Groundedness — Is the response supported by the retrieved context? (Anti-hallucination metric)
Coherence — Does the response flow logically and make sense?
Relevance — Does the response address what the user actually asked?
Fluency — Is the language natural and grammatically correct?

All scored 1–5 in Azure AI Foundry's built-in evaluation. Groundedness is most critical for RAG-based applications.

💡 Budget Alert vs Spending Limit — The Alarm vs Fuse Analogy

Budget Alert = Smoke Alarm | Spending Limit = Circuit Breaker

Azure Budget Alert — Like a smoke alarm: it warns you (sends email) but does NOT stop the fire. Services keep running even if you exceed the budget.
Azure Spending Limit — Like a circuit breaker: automatically cuts off services when limit is hit. BUT only available for free trial and MSDN subscriptions — NOT for production Pay-As-You-Go or Enterprise Agreement accounts.

🎯 Domain 1 Quiz

10 scenario-based questions · Select the best answer

Question 1 of 10

Score: 0

🃏 Flashcards

20 cards · Click any card to reveal the answer

📊 Personalized Study Advisor

Select your background to get a tailored Domain 1 study plan

🔄 Transition Path: AI-102 → AI-103

You already know Azure AI concepts. Focus on what's new or changed in AI-103 vs AI-102.

Master the Azure AI Foundry Hub/Project Model

AI-102 uses "Cognitive Services resources" and "Azure AI Studio projects." AI-103 uses the new Foundry hub + project hierarchy. Understand what moves from resource-level to hub-level (networking, security) vs what's project-level (deployments, flows).

HIGH PRIORITYNew in AI-103

Learn the 4 Deployment Types

AI-102 focused on provisioning Azure OpenAI resources. AI-103 tests knowledge of Global Standard vs Standard vs Provisioned (PTU) vs Batch deployment types and when to choose each. This is heavily tested.

HIGH PRIORITY

Study Foundry Tracing (Traces & Spans)

AI-103 introduces Azure AI Foundry's built-in tracing based on OpenTelemetry. Understand the difference between traces (end-to-end) and spans (individual operations), and how this differs from Application Insights monitoring you knew from AI-102.

HIGH PRIORITYNew in AI-103

Review CI/CD and MLOps Patterns

AI-103 has much stronger emphasis on DevOps integration: GitHub Actions, Azure DevOps pipelines, IaC with Bicep/ARM, blue-green deployment for models, and evaluation gates in CI pipelines. AI-102 barely touched this.

HIGH PRIORITY

Review Responsible AI Updates

The 6 principles are the same as AI-102 but AI-103 adds groundedness detection, protected material detection, and prompt injection detection from Azure AI Content Safety. Understand the distinction between Content Safety service vs Azure OpenAI built-in filters.

MED PRIORITY

Refresh Quota and Cost Management

TPM/RPM concepts carry over from AI-102 but you now need to know provisioned throughput reservations, PTU pricing model, and the distinction between budget alerts (no enforcement) vs spending limits (enforcement, limited subscription types).

LOW PRIORITY

🌱 Foundational Study Path: New to Azure AI

Build your knowledge from the ground up with this structured sequence.

Start with Azure AI Foundry Portal Orientation

Spend time actually exploring the Azure AI Foundry portal (free trial available). Understand the hub/project hierarchy, model catalog, and connected resources before attempting any exam questions. Hands-on beats reading for this topic.

HIGH PRIORITYStart Here

Learn Azure Identity and RBAC Fundamentals

Understand Microsoft Entra ID (formerly Azure AD), managed identities (system vs user-assigned), and Azure RBAC roles. This foundational knowledge is required for AI-103's security topics. Complete Microsoft Learn's "Describe Azure identity, access, and security" module first.

HIGH PRIORITY

Study Model Deployment Types Thoroughly

The 4 deployment types (Global Standard, Standard, Provisioned, Batch) are heavily tested. Create a comparison table mapping each deployment type to: pricing model, throughput guarantee, data residency, and use case. Memorize these before the exam.

HIGH PRIORITY

Master Private Endpoints and VNet Integration

Networking concepts trip up many candidates. Understand the difference between service endpoints and private endpoints, why private endpoints are preferred for AI services, and the role of Private DNS zones in making private endpoints work correctly.

HIGH PRIORITY

Memorize the 6 Responsible AI Principles

Use the FRPITA mnemonic from the Memory Hooks tab. Practice identifying which principle is violated in given scenarios. Fairness (demographic bias), Reliability (edge case failures), Privacy (data leakage), Inclusiveness (accessibility gaps), Transparency (AI disclosure), Accountability (no audit process).

MED PRIORITY

Study Monitoring: Azure Monitor vs App Insights vs Foundry Tracing

Learn which tool answers which question: Azure Monitor/Log Analytics for infrastructure metrics and audit logs; Application Insights for request tracing in your app code; Foundry Tracing for debugging prompt flows and agent runs at the span level.

MED PRIORITY

Review Cost Management Basics

Understand TPM vs RPM and how they affect pricing. Know that Azure Budgets alert but don't enforce for production subscriptions. Practice calculating rough cost estimates using token pricing for given model/volume scenarios.

LOW PRIORITY

💻 Developer-Focused Study Path

You know how to code. Focus on Azure-specific patterns and operational concerns.

Managed Identity Patterns — Replace Secrets in Your Code

If you're currently using API keys in environment variables, understand why managed identities are superior for Azure deployments. Practice: App Service → System-assigned managed identity → Cognitive Services OpenAI User role → no key needed. Learn the DefaultAzureCredential class in the Azure Identity SDK.

HIGH PRIORITYDev Focus

CI/CD Integration: GitHub Actions + Azure DevOps

You already know CI/CD — apply it to AI model deployments. Learn Workload Identity Federation for GitHub Actions (OIDC, no service principal secrets). Study the Azure CLI commands for deploying Azure OpenAI model deployments. Understand evaluation gates: how to fail a pipeline when groundedness drops below threshold.

HIGH PRIORITYDev Focus

Infrastructure as Code for AI Resources

Learn Bicep/ARM templates for AI resources: Microsoft.CognitiveServices/accounts (Azure OpenAI), deployments sub-resource, private endpoints, diagnostic settings. Practice deploying a full Foundry hub+project stack with IaC. This is heavily relevant for MLOps scenarios in the exam.

HIGH PRIORITY

Azure AI Foundry SDK and Prompt Flow

Understand Prompt Flow as a DAG-based workflow for LLM applications. Learn how to version prompt flows in Git (YAML), run evaluation flows in code, and deploy flows as endpoints. The azure-ai-projects Python SDK is the primary interface for Foundry operations.

HIGH PRIORITY

Study RBAC Least Privilege for Applications

Your apps need the minimum required role. Application calling Azure OpenAI = Cognitive Services OpenAI User. Pipeline deploying a model = Cognitive Services Contributor. Never give your application Owner rights. Map each of your use cases to the correct RBAC role.

MED PRIORITY

Distributed Tracing with OpenTelemetry and Foundry

You likely know OpenTelemetry. Azure AI Foundry uses OTLP for its tracing. Understand traces (flow execution) vs spans (individual LLM calls, tool calls, retrieval steps). Know that Foundry traces export to Application Insights via OTLP exporter. Useful for debugging agent reasoning chains.

MED PRIORITY

Content Safety API Integration

Azure AI Content Safety has its own SDK. As a developer, understand how to integrate it: analyze text before sending to LLM (input moderation), analyze LLM output before sending to users (output moderation), and use groundedness detection in RAG pipelines. Know the API response structure including severity scores per category.

LOW PRIORITY

AI-103: Plan & ManageAzure AI Solutions

🗂️ Exam Domain Weights

📌 About the AI-103 Exam

📖 What's on This Page (Domain 1)

🧠 Domain 1 Core Concepts

🏗️ Microsoft Azure AI Foundry Fundamentals

Azure AI Foundry Portal vs Azure AI Studio

Foundry Hubs and Projects

Connected Resources in a Foundry Hub

Foundry Model Catalog

Compute Options: Serverless vs Provisioned

🤖 Model Selection & Deployment

Key Azure OpenAI Models on AI-103

Deployment Types

Tokens Per Minute (TPM) and Requests Per Minute (RPM)

Content Filter Configuration

Model Versioning and Auto-Update Policies

🔐 Identity, Security & Access Control

Managed Identity Types

Azure RBAC Roles for Azure OpenAI / Cognitive Services

Key-Based Authentication vs Microsoft Entra ID Token

Azure Key Vault Integration

Microsoft Defender for Cloud

🌐 Networking & Private Endpoints

Public Access vs Private Endpoint

VNet Integration Components

Outbound Firewall Rules for Foundry

🔄 CI/CD & MLOps for AI

Azure DevOps Pipelines for Model Deployment

GitHub Actions with Azure Credentials

Prompt Flow as Pipeline Component

Model Evaluation in CI Gates

Blue-Green Deployment for Models

Infrastructure as Code (IaC) for AI Resources

📈 Monitoring & Observability

Azure Monitor + Diagnostic Settings

Log Analytics for AI Services

Application Insights for AI Apps

Azure AI Foundry Tracing

Alerts on Key Metrics

Cost Analysis and Budget Alerts

⚖️ Responsible AI

Microsoft's 6 Responsible AI Principles

Azure AI Content Safety Service

Advanced Content Safety Features

Responsible AI Impact Assessment

💰 Quota & Cost Management

TPM Quota by Model Tier

Quota Increase Requests

Cost Per 1K Tokens (Approximate)

Azure Budgets and Spending Controls

Reservation Pricing for Provisioned Deployments

🪝 Memory Hooks & Mnemonics

🔤 "FRPITA" — Microsoft's 6 Responsible AI Principles

🛡️ "HVSS" — Content Safety Harm Categories

🔑 "Key or Token?" — Auth Decision Rule

🏢 Hub vs Project — The Office Building Analogy

🚀 Deployment Types Rhyme

⚡ TPM vs RPM — Size vs Speed

🚪 "VIPs Use Private Entrances" — Private Endpoints

👥 RBAC Minimum Privilege Ladder

📊 Evaluation Metrics for CI Gates — "GCRF"

💡 Budget Alert vs Spending Limit — The Alarm vs Fuse Analogy

🎯 Domain 1 Quiz

🃏 Flashcards

📊 Personalized Study Advisor

🔄 Transition Path: AI-102 → AI-103

Master the Azure AI Foundry Hub/Project Model

Learn the 4 Deployment Types

Study Foundry Tracing (Traces & Spans)

Review CI/CD and MLOps Patterns

Review Responsible AI Updates

Refresh Quota and Cost Management

🌱 Foundational Study Path: New to Azure AI

Start with Azure AI Foundry Portal Orientation

Learn Azure Identity and RBAC Fundamentals

Study Model Deployment Types Thoroughly

Master Private Endpoints and VNet Integration

Memorize the 6 Responsible AI Principles

Study Monitoring: Azure Monitor vs App Insights vs Foundry Tracing

AI-103: Plan & Manage
Azure AI Solutions