NVIDIA NCA-GENL · Generative AI Fundamentals

Generative AI & LLM Fundamentals Explained

Master transformer architecture, tokenization, pre-training, fine-tuning, and text generation mechanics — the core concepts tested on the NVIDIA NCA-GENL certification.

Practice with Flashcards →

The Four LLM Knowledge Pillars

Every NCA-GENL question on LLM fundamentals traces back to one of these four interconnected areas

Transformer Architecture

The Engine Inside Every LLM

The Transformer is the neural network architecture that powers all modern LLMs. Its key innovation — self-attention — allows every token to directly relate to every other token in the input, enabling the model to capture context and meaning regardless of how far apart words are in a sequence.

QKV
Attention
MHA
Multi-Head
Tokenization & Embeddings

How Text Becomes Numbers

LLMs don't read words — they read tokens. Tokenization splits text into subword units using algorithms like Byte-Pair Encoding (BPE). Each token is mapped to a high-dimensional embedding vector. Context windows limit how many tokens a model can process at once — a key practical constraint.

BPE
Tokenizer
≈4
Chars/Token
Pre-training & Fine-tuning

How LLMs Learn

Foundation models are pre-trained on massive text corpora using self-supervised objectives (next-token prediction for decoder models; masked language modeling for encoders). Fine-tuning adapts them to specific tasks. RLHF aligns models with human preferences. LoRA makes fine-tuning efficient.

RLHF
Alignment
LoRA
Efficient FT
Text Generation & NVIDIA Tools

From Model to Output

LLMs generate text autoregressively — one token at a time. Temperature, top-p, and top-k control the randomness of outputs. NVIDIA NeMo, TensorRT-LLM, and NIM provide the infrastructure for training, optimizing, and deploying LLMs at scale on NVIDIA GPU hardware.

Temp
Randomness
NeMo
NVIDIA FW

LLM Architecture Families

Decoder-Only
Autoregressive Generator
GPT-4, Llama, Mistral, Falcon, Gemma
Text generation, chat, code, instruction following. Left-to-right causal generation. Most chat LLMs use this architecture.
Encoder-Only
Bidirectional Understander
BERT, RoBERTa, DistilBERT, ALBERT
Classification, NER, sentiment analysis, embeddings. Sees full context in both directions. Cannot generate open-ended text.
Encoder-Decoder
Sequence-to-Sequence
T5, BART, mT5, FLAN-T5
Translation, summarization, question answering. Encoder understands input; decoder generates output. Best for structured transformation tasks.
NVIDIA Models
NVIDIA Foundation Models
Nemotron, NVLM, BioNeMo models
Optimized for NVIDIA GPU infrastructure. Available via NVIDIA NIM for enterprise deployment. Domain-specific variants (biology, chemistry, etc.).
💡
Exam tip — architecture → use case: When the NCA-GENL asks which architecture to use, map task type to architecture. Need to generate text or answer questions freely? → Decoder-only. Need to classify or extract from text? → Encoder-only. Need to transform one sequence into another (translate, summarize)? → Encoder-Decoder.

How It Works

Deep-dive into each pillar — transformer internals, tokenization, training objectives, and generation mechanics

Transformer Architecture

Self-Attention, Multi-Head Attention & the Transformer Stack

Scaled Dot-Product Attention
Attention(Q, K, V) = softmax(QKᵀ / √dₖ) × V
Q = Query · K = Key · V = Value · dₖ = dimension of Key vectors (scaling prevents vanishing gradients) · Output = weighted sum of Values

Transformer Layer Stack (decoder-only, e.g. GPT)

Input Text
"The cat sat on the mat" → raw text string
Tokenization + Embedding
Text split into tokens → each token mapped to a dense embedding vector + positional encoding added
Multi-Head Self-Attention
Each token attends to all other tokens (or causal mask for decoder). Multiple attention heads run in parallel, each learning different relationship types.
Feed-Forward Network (FFN)
Position-wise fully connected layers applied to each token independently. Typically 4× the model dimension. LayerNorm and residual connections wrap each sub-layer.
↓ (layers ③ and ④ repeat N times)
Output Logits → Softmax → Token
Final hidden state projected to vocabulary size → softmax gives probabilities over all tokens → sampling selects the next token
1

What Self-Attention Actually Does

For each token, three vectors are created: a Query (what am I looking for?), a Key (what do I represent?), and a Value (what information do I carry?). The dot product of a token's Query with every other token's Key produces raw attention scores. After scaling by √dₖ and applying softmax, these become attention weights — how much focus each token places on every other. The output is a weighted sum of the Value vectors.

2

Multi-Head Attention

Instead of running attention once, Multi-Head Attention runs h parallel attention heads, each with its own learned Q, K, V projection matrices. Each head can learn to attend to different types of relationships simultaneously — syntactic, semantic, long-range, short-range. Outputs from all heads are concatenated and linearly projected back to the model dimension.

3

Positional Encoding

Transformers process all tokens in parallel (unlike RNNs) — so they have no inherent sense of order. Positional encodings are added to token embeddings to inject sequence position information. Original Transformers used fixed sinusoidal encodings. Modern LLMs use learned positional embeddings or Rotary Positional Encoding (RoPE), which enables better generalization to longer sequences than seen during training.

4

Causal (Masked) Self-Attention in Decoders

Decoder-only models (GPT, Llama) use causal masking — when generating token N, the model can only attend to tokens 1 through N-1. Future tokens are masked out with -∞ before softmax. This ensures autoregressive generation is valid: each token is predicted only from preceding context, not from future tokens it hasn't generated yet.

ℹ️
Parameters = scale = capability: A model's parameter count reflects the total number of learnable weights across all attention layers, FFN layers, and embeddings. GPT-3 has 175B parameters. Llama 3 has 8B–405B variants. More parameters generally means more capability — but also more compute, memory, and inference cost.
Tokenization & Embeddings

BPE, Context Windows & Embedding Spaces

Context Window Constraint
Max tokens = input tokens + output tokens ≤ context window size
GPT-4: 128K tokens · Llama 3.1: 128K tokens · ~1 token ≈ 4 chars of English · 1,000 tokens ≈ 750 words
BPE
Byte-Pair Encoding
Dominant tokenizer algorithm. Iteratively merges frequent character pairs. Used by GPT models.
WordPiece
WordPiece Tokenizer
Used by BERT. Similar to BPE but uses likelihood maximization instead of frequency.
SentencePiece
SentencePiece
Language-agnostic tokenizer. Treats text as raw characters. Used by T5, Llama, Gemma.
Token
Subword Unit
A piece of text: full word, prefix, suffix, or character. "unhappy" → ["un", "happy"].
Embedding
Token Vector
Dense floating-point vector (e.g., 4096-dim). Encodes semantic meaning. Similar words cluster nearby.
Vocab Size
Vocabulary
Typical: 32K–128K tokens. GPT-4: ~100K. Larger vocab = fewer tokens per sentence.
Context Window
Max Sequence Length
Total tokens the model can process at once (input + output combined).
Positional Enc.
Position Information
Added to embeddings so model knows token order. RoPE is the current dominant approach.
💡
Tokenization quirk to know for the exam: Common words are often single tokens ("cat" → 1 token). Rare or long words split into multiple subword tokens ("tokenization" → ["token", "ization"] → 2 tokens). Numbers often tokenize inefficiently — "12345" may become 3–5 tokens. Code with unusual symbols can also be very token-heavy.
⚠️
Context window ≠ training data: A model trained on data up to a certain date cannot know events after that cutoff — this is the knowledge cutoff. The context window is how much text the model can process at inference time. These are independent constraints. A model with a 128K context window still has a fixed training knowledge cutoff.
Pre-training & Fine-tuning

Foundation Models, RLHF & Parameter-Efficient Fine-tuning

1

Pre-training: Next Token Prediction (Decoder-Only)

Decoder-only models (GPT, Llama) are pre-trained with causal language modeling — given a sequence of tokens, predict the next token. No labels are needed — the next token IS the label. Training on trillions of tokens from internet text, books, and code gives the model broad world knowledge and language understanding. This is self-supervised learning at massive scale.

2

Pre-training: Masked Language Modeling (Encoder-Only)

Encoder-only models (BERT) are pre-trained with masked language modeling (MLM) — randomly mask 15% of tokens and train the model to predict the masked tokens using both left and right context. This forces bidirectional understanding. BERT also uses Next Sentence Prediction (NSP) as a secondary objective. The result: deep contextual text representations ideal for classification and extraction tasks.

3

Supervised Fine-tuning (SFT) & Instruction Tuning

A pre-trained base model generates raw text continuations — it's not yet a chat assistant. Supervised fine-tuning on curated (instruction, response) pairs teaches the model to follow instructions. This transforms a text-predictor into an instruction-following assistant (e.g., InstructGPT, Llama-Instruct). The fine-tuning dataset is much smaller than pre-training data — thousands to millions of examples vs. trillions of tokens.

4

RLHF — Reinforcement Learning from Human Feedback

RLHF is the alignment technique used to make models more helpful, harmless, and honest. Process: (1) Collect human preference data — rank model outputs from best to worst. (2) Train a Reward Model (RM) to predict human preferences. (3) Fine-tune the LLM using PPO (Proximal Policy Optimization) to maximize reward model scores, with a KL-divergence penalty to prevent drifting too far from the base model. ChatGPT, Claude, and Gemini all use RLHF variants.

5

LoRA — Low-Rank Adaptation (Parameter-Efficient Fine-tuning)

Full fine-tuning of a 70B-parameter model requires enormous GPU resources. LoRA freezes all original model weights and injects small trainable low-rank matrices (A and B, where rank r ≪ model_dim) into each attention layer. The weight update ΔW = A × B is low-rank. This reduces trainable parameters by 100–10,000× while achieving performance very close to full fine-tuning. QLoRA extends this by quantizing the frozen base model to 4-bit, further reducing memory requirements.

🟢
NVIDIA NeMo: NVIDIA's end-to-end framework for training, fine-tuning, and deploying LLMs. Supports LoRA and full fine-tuning, distributed training across multiple GPUs/nodes, PEFT methods, and direct export to TensorRT-LLM for optimized inference. Used for building custom domain-specific models on NVIDIA infrastructure.
Text Generation & NVIDIA Tools

Sampling Strategies, Inference Optimization & NVIDIA Stack

Autoregressive Generation Loop
output[t] = sample(softmax(logits / temperature)) → append → repeat until [EOS] or max_tokens
Each token is sampled from the probability distribution over the vocabulary · Previous tokens become part of the context for the next step · Generation is inherently sequential
1

Temperature Sampling

Temperature T scales the logits before softmax: logits_scaled = logits / T. T = 0: Deterministic greedy — always picks the highest-probability token. Same input always gives same output. T = 1: Unmodified probabilities from the model. T > 1: Flatter distribution — more random, more diverse, but more likely to make errors. Typical production values: 0.0 for factual tasks, 0.7–0.9 for balanced chat, 1.0+ for creative tasks.

2

Top-k Sampling

At each generation step, restrict sampling to only the top-k most probable tokens (discard all others). Common values: k = 50 or k = 100. Prevents very unlikely tokens from ever being sampled. The weakness: k is a fixed number regardless of how concentrated or spread the probability distribution is — sometimes k tokens may cover almost 100% of probability mass, sometimes much less.

3

Top-p (Nucleus) Sampling

Select the smallest set of tokens whose cumulative probability mass is ≥ p (e.g., p = 0.9). The size of this set adapts dynamically: when the model is very confident, only a few tokens may cover 90% of probability — so sampling is tight. When uncertain, many tokens may be needed — so the set expands. This is generally preferred over top-k alone because it adapts to the model's confidence. Top-p and top-k are often combined.

4

Beam Search

Instead of sampling one token at a time, beam search maintains b candidate sequences ("beams") in parallel and expands each at every step, keeping only the top b most probable full sequences. Produces more coherent and grammatically correct output than pure sampling. Commonly used in translation and summarization. Downside: deterministic and less diverse than sampling; computationally more expensive than greedy decoding.

🟢
NVIDIA TensorRT-LLM: An open-source library for optimizing and running LLM inference on NVIDIA GPUs. Applies techniques including quantization (INT8/INT4/FP8), kernel fusion, continuous batching, paged KV-cache, and tensor parallelism. Dramatically reduces latency and increases throughput vs. running PyTorch models directly. Underlies NVIDIA NIM (NVIDIA Inference Microservices).
🟢
NVIDIA NIM (Inference Microservices): Pre-packaged, optimized containers for deploying NVIDIA-optimized models as API endpoints. Provides OpenAI-compatible APIs for easy integration. Handles GPU selection, TensorRT-LLM optimization, and scaling automatically — enabling enterprises to deploy LLMs on their own infrastructure with minimal configuration.

Compare

Filter by pillar to compare architectures, training methods, sampling strategies, and NVIDIA tools

Millions → Trillionsb beams (e.g., 4–10)
ConceptCategoryKey Value / RangeWhat It ControlsNCA-GENL Exam Point
Self-AttentionArchitectureO(n²) complexityHow tokens relate to each other; captures long-range dependenciesUses Q, K, V vectors. Score = QKᵀ/√dₖ then softmax. Foundation of all Transformers.
Multi-Head AttentionArchitectureh parallel heads (e.g., 32)Multiple relationship types learned simultaneouslyEach head learns different attention patterns. Outputs concatenated then projected.
Feed-Forward Network (FFN)Architecture4× model dimensionPer-token nonlinear transformation after attentionApplied position-wise (independently to each token). Usually ReLU or GELU activation.
Decoder-Only (GPT-style)ArchitectureCausal maskText generation — attends only to past tokensGPT, Llama, Mistral, Falcon. Best for chat, generation, code completion.
Encoder-Only (BERT-style)ArchitectureBidirectional attentionText understanding — attends to all tokens in both directionsBERT, RoBERTa. Best for classification, NER, embeddings. Cannot generate freely.
Encoder-Decoder (T5-style)ArchitectureCross-attention bridgeSequence-to-sequence tasksT5, BART. Encoder processes input; decoder generates output. Best for translation, summarization.
Model ParametersArchitectureTotal learnable weights across all layersMore params → more capacity. GPT-3 = 175B. Llama 3.1 = 8B–405B. Compute scales with params.
BPE (Byte-Pair Encoding)Tokenization32K–100K vocabHow text is split into subword tokensDominant algorithm. Merges most-frequent character pairs iteratively. Used by GPT models.
TokenTokenization≈4 chars avg (English)Atomic unit the model processes1K tokens ≈ 750 words. Numbers and rare words may tokenize inefficiently (more tokens).
Token EmbeddingTokenizationDims: 768–8192+Dense vector representation of each tokenLearned during pre-training. Similar meanings → nearby vectors. Foundation of semantic search.
Context WindowTokenization4K–1M tokensMax tokens (input + output) processed at onceGPT-4 = 128K. Gemini 1.5 = 1M. Longer = more context but quadratic attention cost.
Positional Encoding (RoPE)TokenizationRotary embeddingInjects token position into attention mechanismRoPE used by Llama, Mistral, many modern LLMs. Better length generalization than learned PE.
Causal Language ModelingTrainingNext token predictionPre-training objective for decoder-only modelsSelf-supervised — no labels needed. Predicts next token from preceding context. Used by GPT, Llama.
Masked Language ModelingTraining15% tokens maskedPre-training objective for encoder-only modelsPredict masked tokens using bidirectional context. Used by BERT, RoBERTa.
Supervised Fine-tuning (SFT)TrainingThousands–millions of examplesAdapts base model to follow instructionsUses (instruction, response) pairs. Transforms base LLM into chat/instruction-following model.
RLHFTrainingHuman preference pairsAligns model with human values and preferencesTrain Reward Model → PPO optimization. Used by ChatGPT, Claude, Gemini. Reduces harmful outputs.
LoRA (Low-Rank Adaptation)TrainingRank r = 4–64Reduces trainable params for fine-tuningFreezes base model; trains small A×B matrices. 100–10,000× fewer params than full fine-tuning.
QLoRATraining4-bit quantizationLoRA with quantized base model for lower memoryFine-tune 70B model on a single GPU. Base model in NF4; LoRA adapters in BF16.
TemperatureGeneration0.0 – 2.0Randomness of token samplingT=0 → greedy (deterministic). T=1 → raw probs. T>1 → more random/creative. T=0.7-0.9 typical.
Top-k SamplingGenerationk = 10–100Restricts to top-k tokens at each stepFixed number regardless of distribution shape. Prevents very unlikely tokens.
Top-p (Nucleus) SamplingGenerationp = 0.8–0.95Dynamic token selection by cumulative probabilityAdapts to confidence. Tight set when confident; expands when uncertain. Generally preferred over top-k alone.
Beam SearchGenerationMaintains multiple candidate sequences in parallelMore coherent output than greedy. Deterministic. Higher compute. Used in translation/summarization.
NVIDIA NeMoNVIDIA ToolsTraining + fine-tuning FWEnd-to-end LLM training and customizationSupports LoRA, SFT, RLHF, multi-GPU distributed training. Exports to TensorRT-LLM.
NVIDIA TensorRT-LLMNVIDIA ToolsInference optimizationAccelerates LLM inference on NVIDIA GPUsQuantization (INT8/FP8), kernel fusion, continuous batching, paged KV-cache. Major latency reduction.
NVIDIA NIMNVIDIA ToolsInference microservicesPackaged model deployment with OpenAI-compatible APIDeploy optimized LLMs on own infrastructure. Abstracts GPU selection and TensorRT optimization.

Real Examples

Intuitive scenarios that make abstract LLM concepts concrete

Transformer Architecture

How Self-Attention Resolves Ambiguity: "The bank by the river was steep"

A traditional RNN processes "the bank" before seeing "river" — it might incorrectly associate "bank" with "financial institution." How does self-attention handle this differently?
  • In a Transformer, all tokens in the sentence are processed simultaneously. When computing the representation for "bank," the model calculates attention scores against every other token.
  • The token "river" has a strong semantic association with "steep, natural terrain" rather than "finance." The attention mechanism learns high weights between "bank" and "river" during training.
  • When the model attends to "bank" with high weight on "river" and "steep," the resulting contextual embedding of "bank" shifts toward the "riverbank" meaning in this vector space.
  • An RNN or LSTM processes left-to-right sequentially — the signal from "river" must propagate back through multiple recurrent steps, often degrading for long distances. Self-attention is distance-agnostic.
  • This is why Transformers dramatically outperformed RNNs on tasks requiring long-range contextual understanding.
✅ Key point: Self-attention is O(n²) but computes all pairwise token relationships at once — long-range context is captured as efficiently as short-range. This is the Transformer's core advantage over RNNs.
Tokenization & Embeddings

Why Token Count Matters: The Hidden Cost of Numbers and Code

You're building an application that processes financial reports. Your model has a 128K context window — plenty, you think. But you're hitting context limits much faster than expected. Why?
  • Text tokenization is not uniform. Common English words like "the" or "is" are single tokens. But numbers tokenize very inefficiently — "1,234,567.89" may tokenize to 8–10 separate tokens.
  • Financial reports contain dense tables of numbers, dollar amounts, percentages, and dates — all of which expand token counts dramatically compared to plain prose.
  • Code is similar: special characters, variable names with underscores, and syntax tokens often require more tokens per character than English text.
  • A 100-page financial report that appears to be ~50,000 words may actually consume 90,000+ tokens due to numerical content — approaching a 128K limit much faster than expected.
  • Solution options: chunking the document, using a model with longer context, or pre-processing to summarize dense numerical tables before passing to the LLM.
✅ Key point: ≈4 chars/token is an average for English prose. Numbers, code, and rare words tokenize less efficiently. Always estimate token counts — not word counts — when planning LLM applications.
Pre-training & Fine-tuning

Base Model vs. Instruction-Tuned Model: Why You Can't Just Prompt a Base LLM

You download Llama 3 base weights and send it the message: "What is the capital of France?" Instead of answering "Paris," it continues the sentence as if writing a geography quiz. What's happening, and how do you fix it?
  • A base (pre-trained) LLM is trained on next-token prediction — it learns to continue text patterns, not to answer questions or follow instructions.
  • Given "What is the capital of France?" the base model has seen many quiz-like documents in training — it predicts the most likely continuation, which might be "What is the capital of Germany? What is the capital of Italy?" (continuing the list of quiz questions).
  • Supervised Fine-tuning (SFT) on (instruction, response) pairs teaches the model that a question is the beginning of a (Q, A) pair, not a quiz to continue. After SFT, the model learns to respond with "Paris."
  • RLHF further refines the response style — making it concise, helpful, and safe rather than just technically correct.
  • The fix: use the Llama 3 Instruct variant (not the base model) — or fine-tune the base model with SFT on instruction-response data using NVIDIA NeMo.
✅ Key point: Base model = text completer. Instruction-tuned model = question answerer. SFT is what bridges them. For production chat applications, always use an instruction-tuned checkpoint.
Text Generation

Temperature in Practice: Writing a Legal Contract vs. Writing a Poem

You're building a multi-use LLM application. For one feature it drafts legal contract language. For another it writes creative poetry. Should you use the same temperature setting for both?
  • For legal contract drafting: set temperature = 0 (or very close to 0). Legal language must be precise, deterministic, and reproducible. You want the most probable, safest phrasing — not creative variation.
  • At T=0, the model greedily picks the highest-probability token at every step — the output is deterministic. Running the same prompt twice gives the same output.
  • For creative poetry: set temperature = 0.9–1.2. Poetry benefits from surprising word choices, novel metaphors, and linguistic variation. Higher temperature flattens the probability distribution, making lower-probability (more "creative") tokens more likely to be sampled.
  • At T=1.2, running the same prompt twice will almost certainly produce different poems — each drawing from a broader range of the model's vocabulary.
  • In practice: combine temperature with top-p = 0.9 for creative tasks (prevents genuinely nonsensical outputs while allowing creativity). Use temperature = 0 alone for factual/deterministic tasks.
✅ Key point: Temperature controls the creativity/precision dial. T=0 for deterministic factual tasks. T=0.7–1.0 for balanced chat. T=1.0+ for creative generation. Match temperature to the task, not the model.

Practice Quiz

10 NCA-GENL style questions across all four pillars — with instant explanations after each answer

Question 1 of 10

Arch.
Tokens
Training
Generation

LLM Navigator

Answer a few questions to get targeted explanations on any LLM fundamentals concept

What do you need to understand about LLMs?

Memory Hooks

Click any card to flip it — 8 high-yield LLM mnemonics for the NCA-GENL exam

🎯
What do Q, K, V stand for in self-attention?
Tap to reveal →
Query · Key · Value
Q = "what am I looking for?" K = "what do I contain?" V = "what info do I carry?" Score = QKᵀ/√dₖ → softmax → weighted sum of V.
🏗️
Decoder-only vs Encoder-only — which generates text?
Tap to reveal →
Decoder-only generates. Encoder-only understands.
Decoder-only (GPT, Llama): causal mask, autoregressive generation, chat. Encoder-only (BERT): bidirectional, classification, embeddings — cannot generate freely.
✂️
What is BPE and what does it produce?
Tap to reveal →
Byte-Pair Encoding → subword tokens
Iteratively merges most-frequent character pairs to build a vocabulary of subword units. "unhappy" → ["un","happy"]. Balances vocabulary size with rare-word coverage.
📏
Context window — what does it limit?
Tap to reveal →
Total tokens (input + output) at inference
~4 chars = 1 token (English). 1,000 tokens ≈ 750 words. Exceeding the window → model cannot "see" earlier content. GPT-4 = 128K tokens.
🎓
What is the pre-training objective for GPT-style models?
Tap to reveal →
Causal Language Modeling — next token prediction
Self-supervised: the next token is the label. No human annotation needed. Train on trillions of tokens from text, books, and code to learn language patterns and world knowledge.
🔧
What does LoRA freeze, and what does it train?
Tap to reveal →
Freezes base model weights. Trains small low-rank matrices.
ΔW = A×B where rank r ≪ model dimension. 100–10,000× fewer trainable params than full fine-tuning. QLoRA also quantizes base model to 4-bit.
🌡️
Temperature = 0 → what kind of output?
Tap to reveal →
Deterministic — always picks the highest-probability token
T=0 = greedy decoding. Same prompt → same output every time. Best for factual, legal, or code tasks. T=0.7–0.9 for chat. T>1 for creative tasks.
🟢
NVIDIA TensorRT-LLM — what does it optimize?
Tap to reveal →
LLM inference on NVIDIA GPUs
Quantization (INT8/FP8), kernel fusion, continuous batching, paged KV-cache. Reduces latency and increases throughput vs. raw PyTorch. Underlies NVIDIA NIM microservices.
🟢 NVIDIA NCA-GENL Exam Prep Platform

Ready to Pass the NCA-GENL? Get Everything You Need in One Place.

These concept pages are just the start. FlashGenius gives you a complete NCA-GENL prep toolkit — practice tests, flashcard decks, concept cheat sheets, and scenario quizzes built for the NVIDIA Generative AI LLMs exam.