FlashGenius Logo FlashGenius
NCA-GENM Exam Prep · Domain 2

Data Analysis & Visualization

Data Mining · Feature Engineering · Attention Maps · Charts & Trends

10% of the NCA-GENM Exam (≈ 5 questions)

Study with Practice Tests →

Domain 2: Data Analysis & Visualization

This domain covers how to mine, engineer, analyze, and visualize data in the context of generative and multimodal AI. While it carries 10% of exam weight (≈5 questions), these concepts bridge all other domains — data quality drives model quality.

What This Domain Covers

  • Data mining techniques: clustering, classification, pattern discovery
  • Feature engineering: transforming raw data into informative model inputs
  • Explainability: Grad-CAM, attention maps, SHAP for multimodal models
  • Data visualization: selecting and interpreting the right chart type
  • Trend and anomaly detection in AI system outputs

Exam Strategy

  • 10% = ≈5 questions — focused but specific
  • Know Grad-CAM vs attention maps: which applies to CNNs vs transformers
  • Chart selection: heatmaps for correlation/attention, line for trends, scatter for correlation
  • Feature engineering: one-hot encoding for categoricals, normalization for continuous
  • K-means requires K specified upfront; DBSCAN does not

Domain 2 Subtopics

SubtopicKey ConceptsExam Priority
2.1 — Data Mining & Feature EngineeringClustering, classification, normalization, encoding, PCA⭐⭐⭐
2.2 — Attention Maps & ExplainabilityGrad-CAM (CNNs), attention weights (transformers), SHAP, LIME⭐⭐⭐
2.3 — Charts & Visualization ToolsBar, line, scatter, heatmap, histogram, confusion matrix⭐⭐
2.4 — Trend & Anomaly DetectionMoving averages, Z-score, IQR, learning curves, ROC curves⭐⭐

Data Mining & Feature Engineering

Data mining extracts useful patterns from large datasets. Feature engineering transforms raw data into the structured numerical inputs that machine learning models can learn from effectively.

Data Mining Techniques

Clustering (Unsupervised)

  • K-means: requires K (number of clusters) specified upfront; assigns each point to nearest centroid; minimizes within-cluster variance; sensitive to outliers and initial centroid placement
  • DBSCAN: does NOT require K; finds clusters by density; handles irregular shapes and outliers; marks low-density points as noise
  • Hierarchical: builds a tree (dendrogram) of cluster merges; agglomerative (bottom-up) most common
  • Use case: customer segmentation, document grouping, anomaly detection

Classification & Pattern Discovery

  • Association rules: discover co-occurrence patterns (market basket analysis); support, confidence, lift metrics
  • Decision trees: interpretable; split on features that maximize information gain (entropy reduction)
  • Random forests: ensemble of trees; reduces variance via bagging
  • Gradient boosting (XGBoost): sequential trees correct prior errors; often best on tabular data
  • Evaluation: accuracy, precision, recall, F1, AUC-ROC
Feature Engineering

Numerical Feature Transforms

  • Min-max normalization: scale to [0,1] range: (x − min)/(max − min)
  • Z-score standardization: scale to mean=0, std=1: (x − μ)/σ; better when data follows a normal distribution
  • Log transform: compress right-skewed distributions (e.g. income, word frequency)
  • Binning: convert continuous to categorical ranges (age → 0–18, 19–35, 36+)
  • Polynomial features: add interaction terms (x₁×x₂) to capture non-linear relationships

Categorical Encoding

  • One-hot encoding: create binary column for each category value; avoids ordinal assumption; high cardinality → many columns
  • Label encoding: assign integer to each category; only for ordinal variables (Low=1, Medium=2, High=3)
  • Target encoding: replace category with mean target value; risk of leakage without proper validation
  • Embedding layers: learned dense vector for each category (NLP tokens, entity IDs)

Dimensionality Reduction

  • PCA (Principal Component Analysis): find orthogonal axes of maximum variance; project data to fewer dimensions; preserves global structure; output: principal components ranked by explained variance
  • t-SNE: non-linear; preserves local neighborhood structure; great for visualizing high-dim embeddings in 2D/3D; not deterministic
  • UMAP: faster than t-SNE; better at preserving global structure; used for embedding visualization
  • Feature selection: filter (correlation), wrapper (RFE), embedded (Lasso L1 penalty)

Missing Value Handling

  • Mean/median imputation: simple; distorts distribution; don't use with tree models (they handle missing values)
  • Mode imputation: for categorical variables
  • KNN imputation: fill missing values using similar rows; better than mean for non-normal data
  • Multiple imputation: generate multiple complete datasets and pool results; gold standard
  • Indicator column: add binary "was_missing" column to let model learn from missingness pattern

Attention Maps & Explainability

Understanding what a model "looks at" when making predictions is critical for debugging, trust, and regulatory compliance. Different architectures require different explainability techniques.

Gradient-Based Explainability

Grad-CAM (Gradient-weighted Class Activation Mapping)

  • Target architecture: CNNs — uses the final convolutional layer feature maps
  • Mechanism: compute gradients of the target class score with respect to the last conv layer feature maps → pool gradients to get per-channel importance weights → weighted sum → ReLU → resize to input size
  • Output: heatmap overlaid on input image showing which regions were most influential
  • Use case: diagnose model failures, verify that model attends to correct image regions, build trust
  • Limitation: coarse resolution (tied to final conv feature map size); doesn't apply to transformers

Attention Map Visualization

  • Target architecture: transformers (not CNNs)
  • Self-attention: shows which tokens in a sequence attend to which other tokens; can reveal long-range dependencies (e.g. pronoun resolves to distant noun)
  • Cross-attention (multimodal): shows which image regions a text token attends to — directly interprets text-image alignment in VLMs
  • Multi-head: each attention head captures different relationships; aggregate or select specific heads
  • Rollout: propagate attention across layers to get end-to-end attention from input tokens to output
Model-Agnostic Explainability

SHAP (SHapley Additive exPlanations)

  • Based on game theory cooperative Shapley values
  • Measures each feature's marginal contribution to a specific prediction
  • Consistent and locally accurate — satisfies mathematical fairness axioms
  • Works for any model (tree, neural net, linear)
  • SHAP waterfall plot: shows feature impact for single prediction; SHAP summary: shows feature importance across dataset

LIME (Local Interpretable Model-agnostic Explanations)

  • Perturbs the input (e.g. masks image patches or words), observes prediction changes
  • Fits a simple interpretable model (linear) locally around the instance
  • Works for images (superpixels), text (word removal), tabular data
  • Faster than SHAP for some models; less theoretically grounded
  • Result: which features most influenced this specific prediction

Explainability in Multimodal Models

  • Cross-attention maps reveal which image patches influence each text token
  • Grad-CAM can be adapted to VLMs by targeting the visual encoder's final layer
  • Probing classifiers: train lightweight classifiers on intermediate representations to understand what each layer encodes
  • Concept Activation Vectors (CAVs): test whether human-defined concepts are encoded in model representations

Explainability Technique Comparison

TechniqueArchitectureOutputGranularity
Grad-CAMCNNs onlySpatial heatmap on imageCoarse (conv map resolution)
Attention mapsTransformersToken-to-token attention weightsFine (per token)
SHAPAny modelFeature importance per predictionFeature-level
LIMEAny modelLocal linear approximationSuperpixel / word / feature

Data Visualization & Trend Analysis

Choosing the right chart type and correctly interpreting trends, anomalies, and model performance curves are core data analysis skills tested on the NCA-GENM exam.

Chart Type Selection

When to Use Each Chart

  • Bar chart: compare values across discrete categories (accuracy per model, revenue per quarter)
  • Line chart: show trends over continuous time or ordered sequence (training loss over epochs, stock price)
  • Scatter plot: show relationship/correlation between two continuous variables; add color for third variable
  • Heatmap: visualize a matrix of values — correlation matrices, confusion matrices, attention weights
  • Histogram: show frequency distribution of a single continuous variable (pixel intensity, embedding magnitude)
  • Box plot: show distribution summary (median, IQR, min/max, outliers) for one or more groups

Model Performance Charts

  • Learning curve: plot training and validation loss/accuracy over epochs; diagnose overfitting (train↑ val↓) or underfitting (both low)
  • ROC curve: plot True Positive Rate vs False Positive Rate across thresholds; AUC = area under curve (1.0 = perfect)
  • Precision-Recall curve: better than ROC for class-imbalanced datasets; shows tradeoff between precision and recall
  • Confusion matrix: heatmap of TP/TN/FP/FN per class; reveals which classes are confused
  • Feature importance plot: horizontal bar chart ranked by SHAP or impurity-based importance
Trend & Anomaly Detection

Statistical Trend Detection

  • Moving average: smooth time series noise; simple (SMA), exponential (EMA weights recent more)
  • Seasonal decomposition: separate trend, seasonality, and residual components from a time series
  • Autocorrelation: detect repeating patterns by measuring correlation with lagged versions of itself
  • Linear regression trend: fit line to detect upward/downward trends in scatter data

Anomaly Detection Methods

  • Z-score: flag points where |z| > 3; assumes normal distribution; z = (x − μ)/σ
  • IQR method: outlier if x < Q1 − 1.5×IQR or x > Q3 + 1.5×IQR; distribution-free
  • Isolation Forest: anomalies are easier to isolate in random feature splits → shorter path lengths
  • Autoencoder: high reconstruction error = anomaly; especially useful for image/sequence anomalies
  • DBSCAN noise points: low-density points classified as anomalies automatically

Python Visualization Tools

  • Matplotlib: low-level; full control; plots, histograms, scatter, any chart type
  • Seaborn: statistical visualization on top of Matplotlib; correlation heatmaps, distribution plots, pairplots
  • Plotly: interactive web-based charts; hover tooltips, zoom, drill-down
  • TensorBoard: visualize training metrics (loss, accuracy), histograms, embeddings, model graphs during training
  • Weights & Biases (WandB): experiment tracking, hyperparameter sweep visualization, model comparison dashboards

Practice Quiz — Domain 2

10 questions covering data mining, feature engineering, explainability, and visualization. Select the best answer for each question.

Memory Hooks

Anchor key concepts with these mnemonics before exam day.

🔥
Grad-CAM
"Grad-CAM: Gradients Spotlight the Region"
Grad-CAM backpropagates the class score gradient to the final conv layer and creates a spotlight heatmap. CNN only — transformers use attention maps instead.
🔭
Attention Maps vs Grad-CAM
"Attention for Transformers, Grad for CNNs"
Attention maps show token-to-token relationships (transformers). Grad-CAM shows spatial importance via gradients (CNNs). They complement each other but each applies to only one architecture type.
🎯
K-means
"K-means: You Must Know K"
K-means requires you to specify K clusters before running. DBSCAN does not. If the question asks which clustering algorithm doesn't need K upfront — the answer is DBSCAN.
🌡️
Heatmap Use Cases
"Heat shows HOW things relate"
Heatmaps reveal correlation — between features (correlation matrix), between classes (confusion matrix), between tokens/pixels (attention weights). When you need to show a matrix of relationships, reach for a heatmap.
📉
Learning Curves
"Train↑ Val↓ = Overfit; Both Low = Underfit"
A learning curve diagnostic: if training accuracy is high but validation is low → overfitting (model memorized training data). If both are low → underfitting (model too simple). The gap between curves reveals generalization.
🔢
One-Hot Encoding
"One-Hot: One Binary Column Per Category"
Color: {Red, Blue, Green} → [1,0,0], [0,1,0], [0,0,1]. Avoids false ordinal assumptions that label encoding introduces. Use for nominal (unordered) categoricals.

Flashcards & Advisor

Click a card to reveal the answer

Grad-CAM
What is it, how does it work, and for which architecture?
Gradient-weighted Class Activation Mapping. For CNNs. Backpropagates class score gradients to final conv layer → pool channels → ReLU → resize to input. Produces a spatial heatmap showing influential image regions.
Attention Maps
Which architecture, and what do they show?
For transformers (not CNNs). Self-attention shows token-to-token relationships. Cross-attention (multimodal) shows which image regions a text token attends to — directly interprets text-image alignment.
K-means vs DBSCAN
Key difference in how they cluster?
K-means: requires K (number of clusters) specified upfront; assigns each point to nearest centroid; sensitive to outliers. DBSCAN: no K needed; density-based; handles irregular shapes; marks outliers as noise points.
One-Hot Encoding
When to use it and how does it work?
For nominal (unordered) categorical variables. Creates one binary column per category value. Avoids ordinal assumptions of label encoding. High cardinality → many columns (use embedding layers instead).
PCA
What does it do and when should you use it?
Principal Component Analysis. Finds orthogonal axes (principal components) of maximum variance and projects data to fewer dimensions. Use to: reduce dimensionality, remove collinear features, speed up training. Output ranked by explained variance.
SHAP Values
What do they measure and why are they preferred?
Shapley Additive exPlanations — measures each feature's marginal contribution to a specific prediction. Model-agnostic. Theoretically grounded (satisfies consistency + local accuracy). SHAP waterfall: per-instance; SHAP summary: global importance.
Heatmap vs Histogram
When do you choose each?
Heatmap: visualize a 2D matrix — correlation between features, confusion matrix, attention weights. Histogram: distribution of a single continuous variable (frequency vs value). Never swap them — wrong chart = wrong insight.
Overfitting vs Underfitting
How to diagnose from a learning curve?
Overfitting: high training accuracy, low validation accuracy — large gap between curves. Fix: more data, dropout, L2 regularization, early stopping. Underfitting: both training and validation accuracy low. Fix: larger model, more features, less regularization.

Study Advisor

Data Mining Techniques

  • K-means: specify K upfront; minimizes within-cluster variance; sensitive to outliers and init
  • DBSCAN: no K needed; density-based; marks low-density points as noise; handles arbitrary cluster shapes
  • Hierarchical clustering: builds dendrogram of cluster merges; agglomerative (bottom-up) is most common
  • Association rules: support (how often), confidence (precision), lift (co-occurrence above random)
  • Decision trees: split on feature that maximizes information gain (entropy reduction)
  • Random forests: ensemble of decision trees via bagging; reduces variance over a single tree
  • XGBoost: sequential boosting; each tree corrects residuals of previous; best on tabular data

Ready to Pass the NCA-GENM?

Practice with full-length exams, timed simulations, and detailed explanations

Unlock Full Practice Tests on FlashGenius →