This domain accounts for roughly 10–15% of the NCE. Questions test conceptual understanding — recognizing research designs, interpreting statistical results, and distinguishing reliability from validity.
The #1 NCE Trap: Confusing reliability and validity, and confusing Type I and Type II errors. Reliability = consistency; validity = accuracy. Type I = false positive (rejecting a true null); Type II = false negative (missing a real effect). These distinctions appear repeatedly across multiple question formats.
Four Core Content Areas
🔬
Research Design
How We Investigate
Experimental, quasi-experimental, correlational, descriptive, and qualitative designs — plus threats to internal and external validity.
Key exam: identifying design type from a scenario
📊
Descriptive Statistics
Describing Data
Central tendency, variability, normal distribution, skewness, standard scores (z, T, stanine), correlation coefficients.
Key exam: normal curve properties, skew direction
📋
Reliability & Validity
Test Quality
Five reliability types (test-retest, parallel forms, split-half, inter-rater, Cronbach's alpha) and four validity types (content, construct, concurrent, predictive).
Key exam: matching type to scenario
📏
Scales of Measurement
NOIR Framework
Nominal, Ordinal, Interval, Ratio — hierarchical scales with increasing mathematical power and properties.
Key exam: classifying a variable by scale type
High-Priority Exam Topics at a Glance
Topic
What the NCE Tests
Common Trap
Experimental Design
Random assignment, IV/DV, control group
Correlation ≠ causation; quasi-exp has no random assignment
Normal Distribution
68-95-99.7 rule, mean=median=mode
Confusing positive/negative skew with mean position
z-scores & T-scores
Convert and interpret standard scores
T-score mean=50 SD=10; not same as t-test statistic
Face validity ≠ true validity; construct is broadest
NOIR Scales
Classify a variable's measurement scale
IQ = interval (no true zero), not ratio; class rank = ordinal
Type I / II Errors
Distinguish false positive from false negative
Type I = α level set by researcher; Type II = β (missed effect)
Research Design
The NCE tests your ability to identify a research design from a brief scenario, understand what conclusions each design allows, and recognize threats to validity.
Five Major Research Designs
Gold Standard
True Experimental
Random assignment of participants to conditions. Researcher manipulates the independent variable (IV) and measures the dependent variable (DV). Control group receives no treatment or a comparison treatment.
Only design that allows cause-and-effect conclusions. Random assignment equates groups on all known and unknown variables.
Approximation
Quasi-Experimental
Resembles experimental design but lacks random assignment. Groups may be pre-existing (e.g., classrooms, clinics). Researcher still manipulates an IV and measures a DV.
Cannot fully rule out confounds. Allows some causal inference but weaker than true experimental.
Relationship Study
Correlational
Examines the relationship between two or more variables without manipulation. No IV or DV — just variables. Produces a correlation coefficient (r).
CANNOT establish causation — only association. "Correlation ≠ causation" is the single most tested research principle.
Observation
Descriptive
Documents characteristics of a phenomenon as it naturally occurs. Methods: surveys, case studies, naturalistic observation, archival research.
No manipulation; no cause-effect claims. Describes "what is" — generates hypotheses for further testing.
Meaning & Experience
Qualitative
Non-numerical; explores lived experience, meaning, and context. Methods: phenomenology, grounded theory, ethnography, narrative inquiry, case study.
Results not statistically generalizable. Aims for depth (transferability) over breadth. Trustworthiness replaces reliability/validity.
Key Research Concepts: Variables, Hypotheses & Ethics
Foundational concepts required for all research design questions
Research
Variables
Independent Variable (IV): The variable the researcher manipulates — the presumed cause. Dependent Variable (DV): The outcome measured — the presumed effect. Extraneous/Confounding Variables: Uncontrolled variables that can explain results. Operational definition: How a variable is specifically measured or defined.
Hypotheses
Null hypothesis (H₀): States no relationship or no effect — what we attempt to disprove. Alternative hypothesis (H₁): States a relationship or effect exists. A statistically significant result (p < .05) means we reject H₀ — not that we have "proved" the alternative.
Research Ethics
Informed consent: Participants must understand and voluntarily agree. Confidentiality: Data protected; distinguishable from anonymity (identity not collected at all). Debriefing: Explain true purpose after deception studies. IRB: Institutional Review Board must approve human subjects research.
Sampling
Random sampling: Every member of population has equal chance of selection — enables generalizability. Convenience sampling: Use of available participants — common but limits external validity. Stratified sampling: Population divided into subgroups, then randomly sampled proportionally.
⚑ NCE Focus: Distinguish random assignment (used within a study to create groups — ensures internal validity) from random sampling (used to select participants from a population — ensures external validity). These serve completely different purposes and are frequently confused on the NCE.
Threats to Internal Validity
History
An external event (outside the study) occurs during the study and affects outcomes — not the treatment.
Maturation
Participants naturally change over time (grow, fatigue, learn) regardless of the intervention.
Testing Effect
Taking the pretest affects performance on the posttest — practice or sensitization effect.
Instrumentation
The measurement tool or observers change over time, creating inconsistency in measurement.
Regression to Mean
Extreme scores at pretest tend to move toward the mean at posttest — not because of treatment.
Selection Bias
Non-equivalent groups at the start — systematic differences between treatment and control groups.
Mortality / Attrition
Differential dropout — if certain types of participants leave one group more than another, results are skewed.
Diffusion
Control group learns about or adopts the treatment, reducing the difference between groups.
📊 Correlation Strength — Interpreting r Values
|r| = 0.90–1.00
Very Strong
|r| = 0.70–0.89
Strong
|r| = 0.50–0.69
Moderate
|r| = 0.30–0.49
Weak
|r| = 0.00–0.29
Negligible
Key: Strength = absolute value of r. Direction (positive/negative) indicates the type of relationship, not its strength. r² = coefficient of determination = proportion of shared variance.
Statistics & the Normal Curve
Descriptive statistics, the normal distribution, standard scores, and inferential testing — the numerical backbone of NCE research questions.
Central Tendency & Variability
Describing the center and spread of a distribution
Statistics
Measures of Central Tendency
Mean: Arithmetic average — sensitive to outliers; most commonly used. Median: Middle value when data are ordered; robust to outliers; best for skewed distributions. Mode: Most frequently occurring value; only measure for nominal data; a distribution can have no mode or multiple modes.
Measures of Variability
Range: Max minus min — simple but highly sensitive to outliers. Variance (σ²): Average squared deviation from the mean. Standard Deviation (σ or SD): Square root of variance; same units as data; most used measure of spread. Larger SD = more spread out scores.
⚑ NCE Focus: Know which measure of central tendency is most appropriate for each distribution type. Skewed data → use median (not affected by extreme scores). Nominal data → use mode only. The mean is pulled toward the tail in a skewed distribution.
Skewness — Mean, Median & Mode Relationships
Tail points RIGHT
Positive Skew
Mode < Median < Mean
Mean is pulled toward the positive (right) tail by high outliers. Example: income distribution — a few very high earners pull the mean up. Median better represents the "typical" person.
Symmetric distribution
Normal (No Skew)
Mode = Median = Mean
Perfect bell curve — mean, median, and mode are identical. The 68-95-99.7 rule applies. Most psychological tests are designed to approximate a normal distribution.
Tail points LEFT
Negative Skew
Mean < Median < Mode
Mean is pulled toward the negative (left) tail by low outliers. Example: an easy test where most people score high but a few score very low pulls the mean down.
The Normal Distribution — 68-95-99.7 Rule
68%
of scores fall within ±1 SD of the mean
95%
of scores fall within ±2 SD of the mean
99.7%
of scores fall within ±3 SD of the mean
Standard Score Comparison
z-score
Mean = 0
SD = 1
Formula: z = (X − μ) / σ. Negative = below mean; positive = above. Basis for all other standard scores.
T-score
Mean = 50
SD = 10
T = 50 + 10z. Eliminates negative scores. Used in MMPI, many personality tests. T=60 = 1 SD above mean.
IQ (WAIS/WISC)
Mean = 100
SD = 15
IQ = 100 + 15z. Stanford-Binet historically used SD=16. IQ 115 = +1 SD. IQ 70 = −2 SD (intellectual disability threshold).
Using sample data to draw conclusions about populations
Inferential
Statistical Significance
p-value: Probability of obtaining results at least as extreme as observed, assuming H₀ is true. p < .05: Reject H₀ (statistically significant at 5% level). p < .01: More stringent threshold. Significance ≠ practical importance — statistical significance can occur with large samples even for trivial effects.
Common Statistical Tests
t-test: Compare means of 2 groups. ANOVA: Compare means of 3+ groups (avoids inflated Type I error from multiple t-tests). Chi-square (χ²): Relationships between categorical variables. Pearson r: Correlation between two continuous variables. Spearman rho: Correlation for ranked/ordinal data.
⚑ NCE Focus: Know when to use each test — t-test for 2 groups, ANOVA for 3+. The reason to use ANOVA instead of multiple t-tests is to control the familywise Type I error rate. Statistical significance (p < .05) means the probability of getting these results by chance is less than 5% — it does NOT prove the alternative hypothesis.
Type I & Type II Errors
Type I Error · False Positive
Alpha Error (α)
Rejecting the null hypothesis when it is actually true. Concluding there is an effect when there really isn't one. The probability of a Type I error equals the significance level (α = .05 means 5% chance of false positive).
"The boy who cried wolf" — claiming something real when it isn't.
Type II Error · False Negative
Beta Error (β)
Failing to reject the null hypothesis when it is actually false. Missing a real effect — concluding there's no difference when one actually exists. Reduced by increasing sample size, effect size, or α level.
"Missing the wolf" — failing to detect something real that exists.
Statistical Power = 1 − β = the probability of correctly detecting a true effect. Power increases with larger sample size, larger effect size, and higher α level (but higher α also increases Type I error risk).
Assessment & Measurement
Reliability, validity, scales of measurement, norm vs. criterion-referenced testing, and test score interpretation — the assessment concepts most frequently tested on the NCE.
NOIR Scales of Measurement
N
Nominal
Properties: Categories only; no order; no meaningful distance between values. The weakest scale.
OK operations: Count (frequency), mode, chi-square Examples: Gender, diagnosis (DSM category), race/ethnicity, type of treatment, yes/no responses
O
Ordinal
Properties: Rank order; unequal intervals between ranks; no true zero. Knows position, not distance.
OK operations: Median, percentile rank, Spearman rho Examples: Class rank, Likert-scale responses, severity ratings, socioeconomic status levels
I
Interval
Properties: Equal intervals between values; no true zero (zero is arbitrary, not absence of trait).
OK operations: Mean, SD, Pearson r, t-test, ANOVA Examples: IQ scores, SAT scores, temperature (°C/°F), most standardized psychological tests
R
Ratio
Properties: Equal intervals + true zero (zero = complete absence of the attribute). Highest level scale.
OK operations: All mathematical operations including ratios Examples: Height, weight, age, income, reaction time, number of absences
Key exam trap: IQ scores are interval, not ratio. An IQ of 0 doesn't mean "no intelligence" — zero is not meaningful. Similarly, a person with an IQ of 100 does not have "twice the intelligence" of someone with IQ 50. Likert scales are technically ordinal (the interval between "agree" and "strongly agree" may not equal that between "neutral" and "agree").
Reliability — Five Types
Stability over time
Test-Retest
Same test administered to same group on two occasions; scores correlated. Measures temporal stability. Time interval matters — too short = carryover; too long = real change.
Key word: "same test, two times"
Equivalence across forms
Parallel / Alternate Forms
Two equivalent versions of the same test administered; scores correlated. Eliminates practice effects from test-retest. Expensive — requires creating two equivalent forms.
Key word: "two equivalent versions"
Internal consistency
Split-Half
Single test split into two halves (e.g., odd vs. even items); halves correlated. Corrected upward using the Spearman-Brown prophecy formula to estimate full-test reliability.
Key word: "one test, two halves"
Internal consistency
Cronbach's Alpha (α)
The most widely used measure of internal consistency. Represents the average of all possible split-half correlations. Values range 0–1; α ≥ .70 is generally acceptable; α ≥ .90 preferred for high-stakes decisions.
Key word: "average of all split-halves"
Agreement between raters
Inter-Rater Reliability
Two or more raters/observers score the same subject; their ratings are correlated or compared using Cohen's kappa (categorical) or intraclass correlation (continuous). Essential for observational or projective measures.
Key word: "two raters, same person"
Validity — Four Types
Domain coverage
Content Validity
Does the test adequately sample the full content domain it claims to measure? Evaluated by expert judgment, not by correlation. A math exam covering only addition lacks content validity if algebra is also in the curriculum.
Key: expert review; no correlation coefficient needed
Current performance prediction
Concurrent Validity
Test scores correlate with another established criterion measured at the same time. A new depression scale given alongside the BDI-II — if they correlate strongly, the new scale has concurrent validity.
Key: "concurrent" = same time; "at once"
Future performance prediction
Predictive Validity
Test scores correlate with a criterion measured in the future. SAT scores predicting college GPA. An aptitude test given now that predicts job performance later. Criterion is measured after the test.
Does the test measure the theoretical construct it claims to measure? The broadest validity type — encompassing content, criterion, convergent, and discriminant evidence. Required for psychological constructs like "anxiety" or "intelligence."
⚖️ Reliability vs. Validity — The Critical Distinction
Concept
Definition
Relationship
Example
Reliability
Consistency — produces the same results repeatedly
Necessary but NOT sufficient for validity
A scale that consistently reads 5 lbs too heavy is reliable but not valid
Validity
Accuracy — measures what it claims to measure
Implies reliability — a valid test must be reliable
If the scale consistently overestimates, it's not valid for measuring true weight
Neither
Inconsistent AND inaccurate
Worst outcome — random error dominates
Scale reads 4 lbs one day, 7 lbs next — neither consistent nor accurate
Valid only?
Cannot exist — an inconsistent test cannot be accurate
Validity requires reliability as a prerequisite
Impossible: accuracy requires consistency first
Norm-Referenced vs. Criterion-Referenced Assessment
Two fundamentally different frameworks for interpreting test scores
Assessment
Norm-Referenced
Compares an individual's score to a normative group (standardization sample). Results expressed as percentile ranks, standard scores (z, T, IQ), or stanines. Designed to produce a spread of scores — bell curve distribution. Most standardized psychological tests (WAIS, MMPI). Purpose: rank individuals relative to peers.
Criterion-Referenced
Compares performance to a predetermined standard or criterion — not to other people. Results expressed as percentage correct or mastery/non-mastery. A score is interpreted regardless of how others perform. Examples: driver's license test, professional licensing exams, NCLEX. Purpose: determine if a standard has been met.
⚑ NCE Focus: The NCE itself is criterion-referenced — you pass by meeting a set score, not by outperforming others. The Standard Error of Measurement (SEM) reflects the precision of individual scores — a smaller SEM means greater measurement precision. Confidence intervals around a score use SEM to reflect uncertainty.
Practice Quiz — Research, Statistics & Assessment
10 NCE-style questions. Select the best answer for each.
Question 1 of 10
A researcher finds a correlation of r = +0.72 between childhood stress and adult anxiety. The coefficient of determination (r²) for this relationship is approximately 0.52. This means:
AThe correlation is not statistically significant because r² is less than r
BApproximately 52% of the variance in adult anxiety is explained by childhood stress
C52% of adults with childhood stress will develop anxiety disorders
DChildhood stress causes adult anxiety in just over half of cases
The coefficient of determination (r²) represents the proportion of variance in one variable that is explained by the other. r = 0.72, so r² = 0.72² ≈ 0.52 = 52% of shared variance. This does NOT imply causation (correlation study only), and it does NOT mean 52% of people develop the outcome. Options C and D incorrectly imply causation and prediction of specific cases.
Question 2 of 10
On a test where most students scored very high but a small number scored extremely low, the distribution would be:
APositively skewed, with Mean > Median > Mode
BNormal, with Mean = Median = Mode
CNegatively skewed, with Mean < Median < Mode
DBimodal, with two distinct peaks in the distribution
When most scores are high but a few are extremely low, the tail extends to the left — this is a negative skew. In a negatively skewed distribution, the mean is pulled downward by the low outliers, so Mean < Median < Mode. The mode (most common score) remains at the high end. A classic example: an easy exam where most people score 90–100 but a few score very low.
Question 3 of 10
A researcher conducts a study and rejects the null hypothesis. Later it is determined that the null hypothesis was actually true. This is an example of:
AType I error — rejecting a true null hypothesis (false positive)
BType II error — failing to reject a false null hypothesis (false negative)
CA problem with statistical power — the study was underpowered
DAn acceptable outcome when p < .05 was used as the significance level
Type I error (α) = rejecting H₀ when H₀ is true = false positive. The researcher concluded there was an effect when there really wasn't one. This is the "boy who cried wolf" error. Type II error = failing to reject H₀ when H₀ is false. Setting α = .05 means we accept a 5% chance of making a Type I error — so while it's "expected" statistically, it's still an error when it occurs.
Question 4 of 10
A counselor wants to assess whether a new therapy outcomes scale gives consistent results when administered to the same clients one week apart. Which reliability method should be used?
ASplit-half reliability — dividing the scale into two halves and correlating them
BTest-retest reliability — administering the same test twice and correlating the scores
CInter-rater reliability — having two counselors independently score the same client
DParallel forms reliability — creating a second equivalent version of the scale
Test-retest reliability measures temporal stability — consistency of scores across time. The scenario describes administering "the same test to the same clients one week apart" — the defining feature of test-retest. Split-half measures internal consistency from one administration. Inter-rater involves multiple scorers. Parallel forms requires two equivalent test versions.
Question 5 of 10
IQ scores, SAT scores, and most standardized psychological tests are classified on which scale of measurement?
ANominal — categories with no inherent order or numeric meaning
BOrdinal — rank-ordered with unequal intervals between values
CInterval — equal intervals between values but no true zero point
DRatio — equal intervals with a true zero representing complete absence
IQ and SAT scores are interval scale. They have equal intervals (the gap between 100 and 110 = the gap between 90 and 100), but there is no true zero — an IQ of 0 does not mean "zero intelligence." Because there's no true zero, you cannot make ratio statements: a person with IQ 150 does not have "twice the intelligence" of someone with IQ 75. Ratio scales (height, weight, age) do have a true zero.
Question 6 of 10
A researcher wants to compare therapy outcome scores across three treatment groups (CBT, DBT, and medication). Which statistical test is most appropriate?
APearson r — to examine the correlation between treatment type and outcomes
BChi-square — to compare frequencies across the three groups
Ct-test — to compare means of two independent groups
DANOVA — to compare means across three or more independent groups
ANOVA (Analysis of Variance) is used when comparing means across 3 or more groups. Pearson r measures correlation between continuous variables. Chi-square tests relationships between categorical variables (not mean comparisons). A t-test can only compare 2 groups — running multiple t-tests across 3 groups would inflate the Type I error rate, which is exactly why ANOVA exists.
Question 7 of 10
A student scores 60 on a psychological measure that uses T-scores (mean = 50, SD = 10). What does this score indicate?
AThe student scored at the mean for this measure
BThe student scored 1 standard deviation above the mean
CThe student scored 2 standard deviations above the mean
DThe student scored at the 60th percentile
For T-scores: mean = 50, SD = 10. A score of 60 = 50 + 1(10) = 1 standard deviation above the mean. This corresponds to approximately the 84th percentile on the normal curve. T = 70 would be +2 SD; T = 50 = mean; T = 40 = −1 SD. Option D is incorrect — a score of 60 on a T-score scale does not equal the 60th percentile (it's actually the 84th).
Question 8 of 10
A test development team asks a panel of subject matter experts to review each item and judge whether it adequately represents the content domain of the construct being measured. This process evaluates:
AConstruct validity — whether the test measures the theoretical construct
BPredictive validity — whether test scores predict future performance
CContent validity — whether the test adequately covers the full domain
DConcurrent validity — whether the test correlates with another measure given simultaneously
Content validity is established through expert judgment — reviewing whether test items adequately sample the full domain. It does NOT involve computing a correlation coefficient. Construct validity (broadest) requires multiple sources of evidence including convergent and discriminant evidence. Predictive and concurrent validity both require correlating test scores with an external criterion.
Question 9 of 10
A researcher uses random assignment to place participants into treatment and control groups. The primary purpose of random assignment is to:
AEnsure the sample is representative of the larger population (external validity)
BEquate groups on known and unknown variables, ruling out selection bias (internal validity)
CIncrease statistical power by reducing between-group variance
DPrevent participant dropout (mortality) from affecting the results
Random assignment's purpose is to create equivalent groups — it equates participants on all characteristics (known and unknown) through probability, eliminating selection bias and supporting causal inference. This ensures internal validity. It does NOT address external validity (random sampling does that), does not directly reduce variance, and does not prevent attrition.
Question 10 of 10
Which of the following correlation coefficients indicates the strongest relationship between two variables?
Ar = +0.45
Br = −0.78
Cr = +0.30
Dr = −0.15
The strength of a correlation is determined by its absolute value — direction (positive/negative) does not indicate strength. |−0.78| = 0.78, which is greater than |+0.45| = 0.45, |+0.30| = 0.30, and |−0.15| = 0.15. Therefore r = −0.78 represents the strongest relationship. A negative correlation simply means the variables move in opposite directions — it can be just as strong as a positive one.
0/10
Questions correct — review explanations above
Memory Hooks
Mnemonics and shortcuts for the statistical and research concepts most commonly tested on the NCE.
🎯
Reliability vs. Validity — Dartboard
Reliable but not valid = darts clustered together but away from the bullseye. Valid = darts clustered on the bullseye. An unreliable test cannot be valid. A reliable test can still fail to be valid. Reliability is necessary but NOT sufficient for validity.
Mnemonic: "You must be consistent before you can be accurate."
🐺
Type I vs. Type II Errors
Type I = False Positive — "The boy who cried wolf" — you say there's an effect when there isn't. Type II = False Negative — "Missing the wolf" — a real effect exists but you didn't detect it. α controls Type I; power (1−β) reduces Type II.
Mnemonic: "Type I = I cried wolf. Type II = I missed the wolf."
📏
NOIR Scales — Power Increases
N–O–I–R = progressively more powerful scales. Nominal (labels only) → Ordinal (rank) → Interval (equal gaps, no zero) → Ratio (equal gaps + true zero). IQ = Interval (zero isn't "no intelligence"). Age = Ratio (zero = birth).
Mnemonic: "NOIR — each step adds a new superpower."
📐
Skew — Mean Follows the Tail
The mean is always pulled toward the tail. Positive skew = tail right → Mean > Median > Mode. Negative skew = tail left → Mean < Median < Mode. Think: income = positive skew (the ultra-wealthy pull the mean up past the median).
Mnemonic: "The mean chases the tail like a dog."
🔢
Standard Score Quick Reference
z: mean=0, SD=1. T: mean=50, SD=10. IQ: mean=100, SD=15. Stanine: mean=5, SD=2. SAT: mean=500, SD=100. Pattern: each adds a zero to the mean, except stanines (smallest scale). T=60 = z=+1 = IQ=115 = stanine=7.
Mnemonic: "0, 50, 100, 500 — the means keep adding a zero."
🔬
Random Assignment vs. Random Sampling
Random Sampling = selecting WHO is in the study → external validity (generalizability). Random Assignment = deciding WHICH GROUP participants go into → internal validity (causal inference). They serve different purposes and are the most commonly confused research design concepts.
Mnemonic: "Sampling = who's IN. Assignment = which GROUP."
⚡ Research & Stats Quick-Reference Cheat Sheet
Concept / Term
Key Fact
Common Trap
r²
Proportion of shared variance between two variables
r² ≠ r; does not imply causation
Type I Error (α)
Reject true H₀ = false positive
Opposite of Type II; α = p-value threshold
Type II Error (β)
Fail to reject false H₀ = false negative
Reduced by increasing power/sample size
Test-retest reliability
Same test, two times → temporal stability
≠ parallel forms (which uses two versions)
Content validity
Expert review of domain coverage; no correlation needed
≠ face validity (which is just appearance)
IQ scale type
Interval — equal intervals, no true zero
Students often say "ratio" — wrong; IQ 0 ≠ no intelligence
Negative skew
Tail left; Mean < Median < Mode
Students reverse this — remember "mean follows the tail"
ANOVA
Compare means of 3+ groups
t-test = 2 groups only; multiple t-tests inflate Type I error
Correlation ≠ Causation
r shows association, not cause-and-effect
Most commonly tested research principle on the NCE
T-score 60
+1 SD above mean (T mean=50, SD=10)
T=60 ≠ 60th percentile; it's approximately the 84th
Flashcards & Study Advisor
Tap any card to flip it. Use the advisor panel for targeted study by topic area.
Flashcards — Research, Statistics & Assessment
Statistics
What does r² (coefficient of determination) tell you, and how does it differ from r?
tap to reveal
Answer
r² = proportion of variance in one variable explained by the other (shared variance). r = strength and direction of the linear relationship. If r = 0.80, r² = 0.64 → 64% shared variance. r² is always positive; r can be negative.
Research
What is the key difference between a true experimental design and a quasi-experimental design?
tap to reveal
Answer
True experimental = random assignment to conditions → can establish causation. Quasi-experimental = NO random assignment (uses pre-existing groups) → weaker causal inference. Both involve manipulation of an IV and measurement of a DV.
Errors
Define Type I and Type II errors and identify which is controlled by the significance level (α).
tap to reveal
Answer
Type I = reject true H₀ = false positive (α controls this). Type II = fail to reject false H₀ = false negative (β). Power = 1 − β. Setting α = .05 means you accept a 5% chance of a Type I error. Reducing α increases Type II error risk.
NOIR
Why are IQ scores classified as interval scale rather than ratio scale?
tap to reveal
Answer
IQ is interval because it lacks a true zero — an IQ of 0 does not mean "zero intelligence." Without a true zero, you cannot form ratio statements (IQ 100 ≠ twice IQ 50). Ratio scale requires meaningful zero (e.g., height, age, weight where 0 = complete absence).
Reliability
Which reliability type uses the Spearman-Brown prophecy formula, and why?
tap to reveal
Answer
Split-half reliability uses the Spearman-Brown formula to correct for the fact that splitting a test in half creates a shorter test — and shorter tests are generally less reliable. The formula estimates what the full-length test's reliability would be.
Normal Curve
What percentage of scores fall within ±1, ±2, and ±3 standard deviations of the mean?
tap to reveal
Answer
±1 SD = 68% of scores. ±2 SD = 95% of scores. ±3 SD = 99.7% of scores. This is the 68-95-99.7 rule (empirical rule). The mean = median = mode in a perfectly normal distribution.
Skewness
In a positively skewed distribution, what is the correct order of mean, median, and mode?
tap to reveal
Answer
Positive skew (tail right): Mode < Median < Mean. The mean is pulled farthest toward the positive tail by high outliers. The median is the best measure of central tendency for skewed distributions. Negative skew reverses this: Mean < Median < Mode.
Validity
Which type of validity is established by expert review and does NOT require computing a correlation coefficient?
tap to reveal
Answer
Content validity — established through systematic expert review of whether items adequately sample the content domain. No correlation needed. Concurrent and predictive validity both require correlating the test with an external criterion. Construct validity requires multiple lines of evidence.
Master All NCE Research & Stats on FlashGenius
Spaced repetition flashcards covering all NCE content areas. Study smarter.
True experimental design is the ONLY design that allows cause-and-effect conclusions. Random assignment to groups is the defining feature. Without it, you have quasi-experimental at best.
Correlational research cannot establish causation — this is the single most tested research principle on the NCE. Even a very high correlation (r = 0.99) cannot prove causation.
Random assignment vs. random sampling is a critical distinction: random assignment → internal validity (causation). Random sampling → external validity (generalizability). They are not interchangeable.
History threat = external event during the study. Maturation threat = participants naturally change. Regression to the mean = extreme scores at pretest naturally move toward average at posttest.
Qualitative research uses terms like "transferability" (not generalizability) and "trustworthiness" (not reliability/validity) — different epistemological framework.
Descriptive Statistics — Exam Focus
Mean is sensitive to outliers; use median for skewed distributions. Mode is the only appropriate measure for nominal data.
Standard deviation vs. variance: SD = √Variance. Both measure spread. SD is in the original units; variance is in squared units. Larger SD = more spread in scores.
Negative skew: tail goes LEFT; Mean < Median < Mode. Most people score HIGH but a few score very low. Positive skew: tail goes RIGHT; Mode < Median < Mean. Most people score LOW but a few score very high (e.g., income).
Correlation strength: determined by the absolute value of r, not the sign. r = −0.85 is stronger than r = +0.40. The sign tells direction only.
r² interpretation: always square r to get shared variance. r = 0.70 → r² = 0.49 → 49% shared variance. This is the coefficient of determination.
Normal Curve & Standard Scores — Exam Focus
68-95-99.7 rule: ±1 SD = 68%; ±2 SD = 95%; ±3 SD = 99.7%. Must know these for interpreting standard scores on the NCE.
T-score: mean = 50, SD = 10. T = 60 → +1 SD → ~84th percentile. T = 70 → +2 SD → ~98th percentile. Used in MMPI-3, many personality measures.
IQ: mean = 100, SD = 15 (WAIS/WISC). IQ 115 = +1 SD; IQ 130 = +2 SD. Intellectual disability typically defined as ≤70 (−2 SD), with adaptive behavior deficits.
z-score is the basis for all other standard scores. Positive z = above mean; negative z = below mean. z = (X − mean) / SD.
Stanines: 1–9 scale, mean = 5, SD = 2. Stanines 4–6 = average range. Broader bands than other standard scores — used to classify general performance levels.
Reliability & Validity — Exam Focus
Test-retest = same test, two times (stability). Parallel forms = two equivalent versions (equivalence). Split-half = one test, two halves (internal consistency, corrected with Spearman-Brown). Cronbach's alpha = most common internal consistency measure. Inter-rater = two raters, one person (agreement).
Content validity = expert review, no correlation. Concurrent validity = correlates with criterion now. Predictive validity = correlates with future criterion. Construct validity = broadest; measures the theoretical construct.
Reliability is necessary but not sufficient for validity. A perfectly reliable test can measure the wrong thing. A valid test must be reliable.
Face validity is NOT a true form of validity — it just means the test appears to measure what it claims. Does not require empirical evidence.
SEM (Standard Error of Measurement): smaller SEM = more precise scores = more reliable test. Used to create confidence intervals around individual scores.
NOIR Scales & Inferential Stats — Exam Focus
Nominal: categories only (diagnosis, gender). Mode only. Chi-square. Ordinal: rank order, unequal intervals (class rank, Likert). Median, percentile. Interval: equal intervals, no true zero (IQ, SAT, temperature °C). Mean, SD, Pearson r. Ratio: equal intervals + true zero (height, age, income). All operations.
IQ = Interval (most commonly missed NOIR question). No true zero → cannot make ratio statements.
t-test = 2 groups. ANOVA = 3+ groups (controls familywise error). Chi-square = categorical data frequencies. Pearson r = continuous variables correlation.
Type I error (α) = false positive = reject true H₀. Set by researcher as significance level (typically .05). Type II error (β) = false negative = fail to reject false H₀. Reduced by increasing sample size or effect size.
Power = 1 − β. To increase power: larger sample size, larger effect size, higher α level. A study with power = .80 has an 80% chance of detecting a true effect.