Domain 2: Data Analysis & Visualization
This domain covers how to mine, engineer, analyze, and visualize data in the context of generative and multimodal AI. While it carries 10% of exam weight (≈5 questions), these concepts bridge all other domains — data quality drives model quality.
What This Domain Covers
- Data mining techniques: clustering, classification, pattern discovery
- Feature engineering: transforming raw data into informative model inputs
- Explainability: Grad-CAM, attention maps, SHAP for multimodal models
- Data visualization: selecting and interpreting the right chart type
- Trend and anomaly detection in AI system outputs
Exam Strategy
- 10% = ≈5 questions — focused but specific
- Know Grad-CAM vs attention maps: which applies to CNNs vs transformers
- Chart selection: heatmaps for correlation/attention, line for trends, scatter for correlation
- Feature engineering: one-hot encoding for categoricals, normalization for continuous
- K-means requires K specified upfront; DBSCAN does not
Domain 2 Subtopics
| Subtopic | Key Concepts | Exam Priority |
|---|---|---|
| 2.1 — Data Mining & Feature Engineering | Clustering, classification, normalization, encoding, PCA | ⭐⭐⭐ |
| 2.2 — Attention Maps & Explainability | Grad-CAM (CNNs), attention weights (transformers), SHAP, LIME | ⭐⭐⭐ |
| 2.3 — Charts & Visualization Tools | Bar, line, scatter, heatmap, histogram, confusion matrix | ⭐⭐ |
| 2.4 — Trend & Anomaly Detection | Moving averages, Z-score, IQR, learning curves, ROC curves | ⭐⭐ |
Data Mining & Feature Engineering
Data mining extracts useful patterns from large datasets. Feature engineering transforms raw data into the structured numerical inputs that machine learning models can learn from effectively.
Clustering (Unsupervised)
- K-means: requires K (number of clusters) specified upfront; assigns each point to nearest centroid; minimizes within-cluster variance; sensitive to outliers and initial centroid placement
- DBSCAN: does NOT require K; finds clusters by density; handles irregular shapes and outliers; marks low-density points as noise
- Hierarchical: builds a tree (dendrogram) of cluster merges; agglomerative (bottom-up) most common
- Use case: customer segmentation, document grouping, anomaly detection
Classification & Pattern Discovery
- Association rules: discover co-occurrence patterns (market basket analysis); support, confidence, lift metrics
- Decision trees: interpretable; split on features that maximize information gain (entropy reduction)
- Random forests: ensemble of trees; reduces variance via bagging
- Gradient boosting (XGBoost): sequential trees correct prior errors; often best on tabular data
- Evaluation: accuracy, precision, recall, F1, AUC-ROC
Numerical Feature Transforms
- Min-max normalization: scale to [0,1] range:
(x − min)/(max − min) - Z-score standardization: scale to mean=0, std=1:
(x − μ)/σ; better when data follows a normal distribution - Log transform: compress right-skewed distributions (e.g. income, word frequency)
- Binning: convert continuous to categorical ranges (age → 0–18, 19–35, 36+)
- Polynomial features: add interaction terms (x₁×x₂) to capture non-linear relationships
Categorical Encoding
- One-hot encoding: create binary column for each category value; avoids ordinal assumption; high cardinality → many columns
- Label encoding: assign integer to each category; only for ordinal variables (Low=1, Medium=2, High=3)
- Target encoding: replace category with mean target value; risk of leakage without proper validation
- Embedding layers: learned dense vector for each category (NLP tokens, entity IDs)
Dimensionality Reduction
- PCA (Principal Component Analysis): find orthogonal axes of maximum variance; project data to fewer dimensions; preserves global structure; output: principal components ranked by explained variance
- t-SNE: non-linear; preserves local neighborhood structure; great for visualizing high-dim embeddings in 2D/3D; not deterministic
- UMAP: faster than t-SNE; better at preserving global structure; used for embedding visualization
- Feature selection: filter (correlation), wrapper (RFE), embedded (Lasso L1 penalty)
Missing Value Handling
- Mean/median imputation: simple; distorts distribution; don't use with tree models (they handle missing values)
- Mode imputation: for categorical variables
- KNN imputation: fill missing values using similar rows; better than mean for non-normal data
- Multiple imputation: generate multiple complete datasets and pool results; gold standard
- Indicator column: add binary "was_missing" column to let model learn from missingness pattern
Attention Maps & Explainability
Understanding what a model "looks at" when making predictions is critical for debugging, trust, and regulatory compliance. Different architectures require different explainability techniques.
Grad-CAM (Gradient-weighted Class Activation Mapping)
- Target architecture: CNNs — uses the final convolutional layer feature maps
- Mechanism: compute gradients of the target class score with respect to the last conv layer feature maps → pool gradients to get per-channel importance weights → weighted sum → ReLU → resize to input size
- Output: heatmap overlaid on input image showing which regions were most influential
- Use case: diagnose model failures, verify that model attends to correct image regions, build trust
- Limitation: coarse resolution (tied to final conv feature map size); doesn't apply to transformers
Attention Map Visualization
- Target architecture: transformers (not CNNs)
- Self-attention: shows which tokens in a sequence attend to which other tokens; can reveal long-range dependencies (e.g. pronoun resolves to distant noun)
- Cross-attention (multimodal): shows which image regions a text token attends to — directly interprets text-image alignment in VLMs
- Multi-head: each attention head captures different relationships; aggregate or select specific heads
- Rollout: propagate attention across layers to get end-to-end attention from input tokens to output
SHAP (SHapley Additive exPlanations)
- Based on game theory cooperative Shapley values
- Measures each feature's marginal contribution to a specific prediction
- Consistent and locally accurate — satisfies mathematical fairness axioms
- Works for any model (tree, neural net, linear)
- SHAP waterfall plot: shows feature impact for single prediction; SHAP summary: shows feature importance across dataset
LIME (Local Interpretable Model-agnostic Explanations)
- Perturbs the input (e.g. masks image patches or words), observes prediction changes
- Fits a simple interpretable model (linear) locally around the instance
- Works for images (superpixels), text (word removal), tabular data
- Faster than SHAP for some models; less theoretically grounded
- Result: which features most influenced this specific prediction
Explainability in Multimodal Models
- Cross-attention maps reveal which image patches influence each text token
- Grad-CAM can be adapted to VLMs by targeting the visual encoder's final layer
- Probing classifiers: train lightweight classifiers on intermediate representations to understand what each layer encodes
- Concept Activation Vectors (CAVs): test whether human-defined concepts are encoded in model representations
Explainability Technique Comparison
| Technique | Architecture | Output | Granularity |
|---|---|---|---|
| Grad-CAM | CNNs only | Spatial heatmap on image | Coarse (conv map resolution) |
| Attention maps | Transformers | Token-to-token attention weights | Fine (per token) |
| SHAP | Any model | Feature importance per prediction | Feature-level |
| LIME | Any model | Local linear approximation | Superpixel / word / feature |
Data Visualization & Trend Analysis
Choosing the right chart type and correctly interpreting trends, anomalies, and model performance curves are core data analysis skills tested on the NCA-GENM exam.
When to Use Each Chart
- Bar chart: compare values across discrete categories (accuracy per model, revenue per quarter)
- Line chart: show trends over continuous time or ordered sequence (training loss over epochs, stock price)
- Scatter plot: show relationship/correlation between two continuous variables; add color for third variable
- Heatmap: visualize a matrix of values — correlation matrices, confusion matrices, attention weights
- Histogram: show frequency distribution of a single continuous variable (pixel intensity, embedding magnitude)
- Box plot: show distribution summary (median, IQR, min/max, outliers) for one or more groups
Model Performance Charts
- Learning curve: plot training and validation loss/accuracy over epochs; diagnose overfitting (train↑ val↓) or underfitting (both low)
- ROC curve: plot True Positive Rate vs False Positive Rate across thresholds; AUC = area under curve (1.0 = perfect)
- Precision-Recall curve: better than ROC for class-imbalanced datasets; shows tradeoff between precision and recall
- Confusion matrix: heatmap of TP/TN/FP/FN per class; reveals which classes are confused
- Feature importance plot: horizontal bar chart ranked by SHAP or impurity-based importance
Statistical Trend Detection
- Moving average: smooth time series noise; simple (SMA), exponential (EMA weights recent more)
- Seasonal decomposition: separate trend, seasonality, and residual components from a time series
- Autocorrelation: detect repeating patterns by measuring correlation with lagged versions of itself
- Linear regression trend: fit line to detect upward/downward trends in scatter data
Anomaly Detection Methods
- Z-score: flag points where |z| > 3; assumes normal distribution;
z = (x − μ)/σ - IQR method: outlier if x < Q1 − 1.5×IQR or x > Q3 + 1.5×IQR; distribution-free
- Isolation Forest: anomalies are easier to isolate in random feature splits → shorter path lengths
- Autoencoder: high reconstruction error = anomaly; especially useful for image/sequence anomalies
- DBSCAN noise points: low-density points classified as anomalies automatically
Python Visualization Tools
- Matplotlib: low-level; full control; plots, histograms, scatter, any chart type
- Seaborn: statistical visualization on top of Matplotlib; correlation heatmaps, distribution plots, pairplots
- Plotly: interactive web-based charts; hover tooltips, zoom, drill-down
- TensorBoard: visualize training metrics (loss, accuracy), histograms, embeddings, model graphs during training
- Weights & Biases (WandB): experiment tracking, hyperparameter sweep visualization, model comparison dashboards
Practice Quiz — Domain 2
10 questions covering data mining, feature engineering, explainability, and visualization. Select the best answer for each question.
Memory Hooks
Anchor key concepts with these mnemonics before exam day.
Flashcards & Advisor
Click a card to reveal the answer
Study Advisor
Data Mining Techniques
- K-means: specify K upfront; minimizes within-cluster variance; sensitive to outliers and init
- DBSCAN: no K needed; density-based; marks low-density points as noise; handles arbitrary cluster shapes
- Hierarchical clustering: builds dendrogram of cluster merges; agglomerative (bottom-up) is most common
- Association rules: support (how often), confidence (precision), lift (co-occurrence above random)
- Decision trees: split on feature that maximizes information gain (entropy reduction)
- Random forests: ensemble of decision trees via bagging; reduces variance over a single tree
- XGBoost: sequential boosting; each tree corrects residuals of previous; best on tabular data