FlashGenius Logo FlashGenius
AI-103 · Microsoft Azure AI

Text Analysis & Speech Solutions

Domain 4 of 5 — 10–15% of Exam

Azure AI Language • Azure AI Speech • Translator • Content Safety

Get Full Access on FlashGenius →

AI-103: Text Analysis & Speech Solutions

Domain 4 covers how to implement text analysis and speech processing solutions using Azure AI services. This domain represents 10–15% of the AI-103 exam — roughly 7–11 questions. You'll need to know when and how to apply Azure AI Language, Azure AI Speech, Azure Translator, and Content Safety.

Azure AI Language Azure AI Speech Azure AI Translator Azure OpenAI (text tasks) Content Safety

Exam Domain Weights

Domain Topic Weight
Domain 1 Design and plan AI solutions on Azure 10–15%
Domain 2 Implement computer vision solutions 10–15%
Domain 3 Implement natural language processing solutions 25–30%
Domain 4 Implement text analysis solutions ← This Page 10–15%
Domain 5 Implement generative AI solutions 30–35%

Key Services in This Domain

🔤 Azure AI Language

  • Named Entity Recognition (NER)
  • Sentiment Analysis & Opinion Mining
  • Key Phrase Extraction
  • PII Detection & Redaction
  • Text Classification (single/multi-label)
  • Summarization (extractive & abstractive)
  • Question Answering (QnA)
  • CLU (Conversational Language Understanding)
  • Entity Linking, Language Detection

🎙️ Azure AI Speech

  • Speech-to-Text (real-time & batch)
  • Text-to-Speech (prebuilt + Custom Neural Voice)
  • Speech Translation (STT + translate + TTS)
  • Custom Speech (domain vocabulary)
  • Diarization (multi-speaker labeling)
  • SSML for voice control
  • Speaker Recognition (verify/identify)
  • Keyword Recognition
  • Intent Recognition (Speech + CLU)

🌐 Azure AI Translator

  • Text translation (100+ languages)
  • Document Translation (async batch)
  • Custom Translator (domain corpus)
  • Transliteration (script conversion)
  • Language auto-detection
  • Neural Machine Translation (NMT)

🛡️ Content Safety

  • Harmful content categories: Hate, Violence, Self-harm, Sexual
  • Severity scoring: 0–7
  • Prompt Shield (jailbreak + indirect injection)
  • Groundedness detection
  • Protected material detection
  • Custom blocklists

⚡ Exam Quick Facts

Duration
100 minutes
Passing Score
700 / 1000
This Domain
10–15%
Platform
Microsoft Foundry
Unlock All AI-103 Study Materials →

Core Concepts

Master the Azure AI services you need for Domain 4. Each section covers the key features, decision points, and exam-relevant details.

Azure AI Language Service

Named Entity Recognition (NER) & Custom NER

Prebuilt NER categories: Person, Location, Organization, DateTime, Quantity, URL, IP Address, Email

Custom NER: Train the model with your own labeled entities. Evaluation uses precision (of what the model predicted, how much was correct), recall (of all actual entities, how many were found), and F1 score (harmonic mean of precision and recall).

  • Label training documents with entity spans in Language Studio
  • Requires minimum data per entity type; more labeled data = better performance

Key Phrase Extraction

Identifies the main topics or talking points from unstructured text. Returns a list of phrases that represent the key ideas. No training required — fully prebuilt.

Sentiment Analysis & Opinion Mining

Document-level sentiment: Classifies the entire text as positive, neutral, or negative (with confidence scores for each).

Opinion Mining (aspect-based sentiment): Goes deeper — identifies specific aspects (e.g., "coffee", "service") and the sentiment expressed toward each. Example: "The coffee was great but the service was slow" → coffee=positive, service=negative.

Entity Linking

Disambiguates recognized entities to known entries in a knowledge base (Wikipedia). Example: "Mercury" in an astronomy context links to the planet, not the element or the god. Returns a data source (Wikipedia URL) and confidence score.

Language Detection

Identifies the language of input text and returns a confidence score (0.0–1.0). Handles mixed-language content by returning the dominant language. Use when source language is unknown before translation.

PII (Personally Identifiable Information) Detection

Categories detected: Name, Phone number, Social Security Number (SSN), Credit card number, Email address, IP address, date of birth, passport number, and more.

Redaction: The API can return text with PII replaced by category labels (e.g., "[PERSON]") for safe downstream processing. Separate endpoint for PHI (Protected Health Information).

Text Classification

Single-label classification: Each document gets exactly one category. Used for simple categorization (e.g., "sports", "politics", "tech").

Multi-label classification: A document can belong to multiple categories simultaneously. Used when content spans multiple topics.

Both require custom training with labeled examples in Language Studio.

Summarization

Extractive summarization: Selects and returns the most important existing sentences from the source document. Output sentences are verbatim from input.

Abstractive summarization: Generates a new summary in the model's own words. May not use exact source sentences.

Conversation summarization: Summarizes multi-turn dialogue — returns issue, resolution, and chapter structure.

Question Answering (QnA)

Build FAQ-style knowledge bases from documents, URLs, or manually entered Q&A pairs. Returns answers with a confidence score (0–1). Supports follow-up prompts for multi-turn conversations. Hosted in Azure AI Language (replaces QnA Maker).

Conversational Language Understanding (CLU)

The modern replacement for LUIS. Understands natural language input by predicting intents (what the user wants) and extracting entities (key data). Training utterances teach the model variations of each intent.

  • Intents: BookFlight, CancelOrder, GetWeather
  • Entities: prebuilt (datetime, number) or custom (FlightDestination)
  • Integrates with Speech SDK for voice-driven applications

Azure OpenAI for Text Tasks

When to Use Azure OpenAI (GPT-4o) vs Azure AI Language

FactorAzure AI LanguageGPT-4o (Azure OpenAI)
Output structureConsistent, typed JSONStructured outputs via JSON Schema
LatencyLowerHigher
CostLowerHigher (token-based)
Training data neededMore labeled examplesFew-shot or zero-shot
Complex reasoningLimitedChain-of-thought capable
Compliance/auditEasier (deterministic)More variable outputs

Use Language service for: PII detection at scale, structured NER, compliance scenarios, cost-sensitive applications.

Use GPT-4o for: nuanced sentiment with reasoning, complex entity extraction with context, when you have few labeled examples.

Azure AI Translator

Translation Modes

  • Text Translation: Synchronous API call, source language auto-detected or specified, target language required. Supports 100+ languages.
  • Document Translation: Asynchronous batch translation of complete documents (PDF, DOCX, PPTX, etc.). Use for large-scale document processing. Results stored in Azure Blob Storage.
  • Custom Translator: Train a domain-specific translation model with parallel corpus (source + target sentence pairs). Use when standard NMT produces poor results for specialized vocabulary (legal, medical, technical).
  • Transliteration: Converts text from one script to another without changing the language. Example: Arabic text → Latin characters. Not a translation — pronunciation stays the same.

Azure AI Speech Service

Speech-to-Text (STT)

  • Real-time recognition: Continuous (ongoing stream, e.g., live call) or single utterance (one phrase, then stops). Use Speech SDK.
  • Batch transcription: Asynchronous — submit audio files (WAV, MP3, OGG) to REST API, poll for results. Best for large volumes of recorded audio.
  • Custom Speech: Fine-tune the acoustic and language models with domain-specific audio data and pronunciation dictionaries. Use when standard STT misrecognizes industry terms.
  • Diarization: Labels who spoke each segment in multi-speaker audio. Returns speaker IDs (Speaker 1, Speaker 2...) with timestamps.
  • Word-level timestamps: Returns the start/end time of each recognized word.
  • Language identification: Detect which language is being spoken before or during transcription.

Text-to-Speech (TTS)

  • Prebuilt neural voices: Hundreds of voices across languages — no training required.
  • Custom Neural Voice: Create a unique AI voice from voice talent recordings. Requires Microsoft approval (limited access program). Voice talent must give explicit written consent.
  • SSML (Speech Synthesis Markup Language): XML-based markup to control speech rate, pitch, emphasis, pauses (breaks), volume, and pronunciation. Wrap text in <speak> and use tags like <prosody rate="slow">, <break time="500ms"/>, <emphasis level="strong">.
  • Audio formats: WAV (uncompressed), MP3 (compressed), OGG. Choose based on quality vs. file size needs.
  • Real-time vs batch synthesis: Real-time for interactive apps, batch for pre-generating large audio libraries.

Speech Translation, Intent Recognition, Speaker Recognition

  • Speech Translation: Pipeline of STT → translation → TTS. Speak in one language, get speech output in another. Single SDK call.
  • Keyword Recognition: Detect specific wake words or trigger phrases locally (on-device). Low-latency, always-listening capability.
  • Intent Recognition: Combines Speech SDK (STT) with CLU to understand natural language voice commands. One round trip: audio → text → intent + entities.
  • Speaker Verification: Confirm that a speaker is who they claim to be (1:1 comparison against enrolled voiceprint).
  • Speaker Identification: Determine which of several enrolled speakers is talking (1:N comparison against a group).

Content Safety for Text

Harm Categories & Prompt Shield

  • Harm categories: Hate, Violence, Self-harm, Sexual. Each scored 0–7 (0 = safe, 7 = severe).
  • Prompt Shield — User jailbreak detection: Detects when a user message attempts to override model safety guidelines or extract unsafe behaviors ("ignore previous instructions...").
  • Prompt Shield — Indirect injection detection: Detects malicious instructions embedded in documents fed to the model (the document tells the model to do something harmful). Two separate shields, each independently configurable.
  • Groundedness detection: Verifies that a model's response is factually supported by the provided grounding documents. Helps detect hallucination in RAG systems.
  • Protected material detection: Detects if model output contains copyrighted text (song lyrics, news articles, books).
  • Custom blocklists: Define your own banned words, phrases, or regex patterns specific to your use case.

Decision Table: Text Task → Service

TaskRecommended Service / Feature
Extract named entities from textAzure AI Language — NER
Detect sentiment + specific aspect opinionsAzure AI Language — Sentiment + Opinion Mining
Transcribe meeting audio with speaker labelsSpeech Service — Batch Transcription + Diarization
Translate 10,000 documents to FrenchAzure Translator — Document Translation (async batch)
Build FAQ chatbot from existing docsAzure AI Language — Question Answering
Detect PII in medical recordsAzure AI Language — PII Detection (PHI endpoint)
Build voice assistant with intent understandingSpeech SDK (STT) + CLU + TTS
Complex nuanced text reasoningAzure OpenAI — GPT-4o
Detect harmful / unsafe textAzure Content Safety
Convert Arabic script to Latin charactersAzure Translator — Transliteration
Detect jailbreak in user chat messageContent Safety — Prompt Shield (user)
Summarize using exact source sentencesAzure AI Language — Extractive Summarization
Generate fluent paraphrased summaryAzure AI Language — Abstractive Summarization
Understand spoken commands ("book me a flight")Speech SDK Intent Recognition (STT + CLU)
Translate technical documents with jargonAzure Translator — Custom Translator

Memory Hooks

High-impact mnemonics and mental models to anchor exam concepts. These are the patterns that stick when exam pressure is high.

🧠
Language Service Tasks
"NERVES" for Language Tasks
NER • Entity Linking • Redaction (PII) • Verification (sentiment) • Extraction (key phrases) • Summarization. Six core Azure AI Language capabilities in one word.
🎙️
STT Modes
Real-time = live phone call; Batch = recorded voicemail
Real-time recognition streams audio as it's spoken — like a live call center agent. Batch transcription processes stored audio files asynchronously — like transcribing all yesterday's voicemails overnight.
✏️
Summarization Types
Extractive = highlight pen; Abstractive = your own words
Extractive summary: you take a yellow highlighter and mark sentences that already exist. Abstractive summary: you close the book and write what you remember in your own words. Same distinction in the Azure Language API.
🔄
CLU vs LUIS
CLU is LUIS 2.0 — same concept, newer service
CLU (Conversational Language Understanding) replaces LUIS with same intent/entity model. If an exam scenario describes "LUIS" functionality, the answer is CLU. LUIS is retired; all new development uses CLU.
🎤
Custom Neural Voice
Must apply — Microsoft doesn't let anyone clone voices freely
Custom Neural Voice requires submitting an application to Microsoft and getting approved. Voice talent must give explicit written consent. This is gated to prevent unauthorized voice cloning.
🗣️
SSML Mnemonic
"Slow Speech Makes Listeners" — SSML
SSML controls Speech Speed, eMphasis, and pauses (Length). If you need to control how text is spoken — rate, pitch, breaks, pronunciation — reach for SSML tags inside your TTS call.
😊
Sentiment: Document vs Aspect
Document = how the whole review feels; Aspect = how they feel about the coffee specifically
A 3-star restaurant review could be document-level: "neutral." But opinion mining finds: food=positive, noise=negative, service=negative. Aspect-level is what you need when you care about what drove the sentiment.
🚫
PII Categories
Name, phone, SSN, credit card, email — 5 things you'd never write on a public whiteboard
These are the core PII categories Azure AI Language detects. If a scenario involves protecting any of these from appearing in logs, transcripts, or output, PII Detection (with redaction) is the answer.
👥
Diarization
Diarization = "Who said that?" — speaker labeling in transcripts
Diarization segments a transcript by speaker. A meeting recording comes back labeled "Speaker 1: ... Speaker 2: ..." rather than one unbroken wall of text. Combine with batch transcription for recorded meetings.
🛡️
Prompt Shield
Front door guard + mail screener
Prompt Shield has two modes: user jailbreak (the front door guard — stops bad user messages before they reach the model) and indirect injection (the mail screener — checks documents you feed to the model for hidden instructions). Both can be enabled independently.
📊
Custom NER Evaluation
Precision = "Did I predict correctly?" Recall = "Did I find them all?"
Precision: of all entity spans the model returned, what % were actually correct? Recall: of all true entity spans in the data, what % did the model find? F1 = the balance between both. Low recall = missing entities; low precision = false positives.
🌐
Transliteration vs Translation
Transliteration = same sound, new alphabet. Translation = same meaning, new language.
Transliteration converts the script only — "مرحبا" → "Marhaba" (still Arabic, just Latin letters). Translation changes the language — "مرحبا" → "Hello" (now English). Know which one a scenario is asking for.

Practice Quiz

10 scenario-based questions covering the key decision points in Domain 4. Select the best answer for each question.

Question 1 of 10
out of 10 questions correct

Flashcards

20 cards covering essential Domain 4 concepts. Click any card to flip and reveal the answer.

20 cards · Click to flip Click a card to reveal answer

Study Advisor

Personalized focus recommendations based on what you're building. Match your use case to the services that matter most for your scenario.

📞 Building a Call Center App

Primary Focus
Azure AI Speech — STT (real-time + batch), Diarization, Custom Speech, TTS
Secondary Focus
Azure AI Language — NER (for caller intent), Sentiment Analysis (call quality scoring), PII Redaction (remove sensitive data from transcripts)
Day 1: Speech STT modes — real-time vs batch, when to use each
Day 2: Diarization setup, word-level timestamps, batch transcription async flow
Day 3: Language NER + Sentiment for post-call analytics
Day 4: PII detection and redaction pipeline
Day 5: Content Safety integration + Custom Speech for domain jargon

📝 Building a Content Platform

Primary Focus
Content Safety — harm categories, Prompt Shield, custom blocklists, groundedness detection
Secondary Focus
Azure AI Language — PII Detection (user-generated content), Text Classification (auto-categorize articles), Summarization (content previews)
Day 1: Content Safety harm categories (Hate, Violence, Self-harm, Sexual) and severity 0–7
Day 2: Prompt Shield — user jailbreak vs document indirect injection, differences
Day 3: PII Detection + redaction for user content
Day 4: Text Classification (single vs multi-label) for content tagging
Day 5: Extractive vs abstractive summarization for previews and digests

💻 General Azure AI Developer

Start Here
Azure AI Language service overview — all capabilities and when each applies
Then Expand
Speech SDK fundamentals, Azure Translator (text vs document vs custom), Content Safety basics
Day 1: Azure AI Language — NER, Key Phrase, Sentiment, Entity Linking
Day 2: PII detection, Text Classification, Summarization (both types)
Day 3: QnA and CLU (what replaces LUIS and QnA Maker)
Day 4: Speech SDK — STT modes, TTS + SSML, translation pipeline
Day 5: Azure Translator modes, Content Safety, Language vs GPT-4o decision

🌐 Building a Global App (Translation)

Primary Focus
Azure AI Translator — text translation, document translation (async), Custom Translator, transliteration
Secondary Focus
Language Detection (pre-translation), Speech Translation (voice apps), Content Safety (multilingual content moderation)
Day 1: Translator text API — source/target language, auto-detect, 100+ languages
Day 2: Document Translation async flow — Blob Storage input/output, polling
Day 3: Custom Translator — when standard NMT fails, parallel corpus training
Day 4: Transliteration vs translation distinction, language detection confidence
Day 5: Speech Translation pipeline (STT → translate → TTS), Azure AI Language + Translator combination

Key Exam Traps to Avoid

  • Trap 1: "Build a FAQ bot" → Answer is Question Answering (not CLU, which is for intent/entity extraction from conversation)
  • Trap 2: "Custom Speech" is about recognition accuracy for domain vocabulary — not about creating a custom voice (that's Custom Neural Voice)
  • Trap 3: Document Translation is async — you submit, get a job ID, then poll for results. Not synchronous.
  • Trap 4: Extractive summarization returns existing sentences verbatim. If the scenario says "do not paraphrase," choose extractive.
  • Trap 5: Speaker Verification = 1 person confirming identity. Speaker Identification = figuring out which of N enrolled speakers is talking.

Official Resources

Direct links to Microsoft Learn documentation and the official AI-103 certification page. All resources are free to access.

📚
AI-103 Official Study Guide

Microsoft's official exam objectives and topic breakdown for AI-103.

🏅
AI-103 Certification Page

Exam registration, prerequisites, and skills measured overview.

🔤
Azure AI Language Documentation

NER, Sentiment, PII, CLU, QnA, Summarization — complete service reference.

🎙️
Azure AI Speech Documentation

STT, TTS, SSML, Custom Speech, Diarization, Speech Translation quickstarts.

🌐
Azure AI Translator Documentation

Text translation, Document Translation, Custom Translator, transliteration API reference.

🛡️
Azure Content Safety Documentation

Harm categories, Prompt Shield, Groundedness, Protected Material detection.

Ready for the Full AI-103 Experience?

Get all 5 AI-103 domains with adaptive flashcards, full-length practice exams, and progress tracking on FlashGenius.

Start Free on FlashGenius →