Skip to main content
Education 3% exam weight

Topic 7

Part of the NCE (Nigeria) study roadmap. Education topic educat-007 of Education.

Testing and Measurement

🟢 Lite — Quick Review (1h–1d)

Rapid summary for last-minute revision before your exam.

Testing and Measurement — Key Facts for NCE (Nigeria)

  • Measurement: Assigning numbers to objects/events according to rules
  • Assessment: Broader process including tests and non-test data
  • Evaluation: Making judgments based on assessment data
  • Test: Formal instrument measuring a sample of behavior
  • Exam tip: Validity ensures test measures what it claims; Reliability ensures consistency of measurement

🟡 Standard — Regular Study (2d–2mo)

Standard content for students with a few days to months.

Testing and Measurement — NCE (Nigeria) Study Guide

Basic Concepts

Measurement: The process of assigning numbers to objects or events according to rules.

Assessment: Broad term including tests, observations, portfolios, etc.

Evaluation: Making judgments or decisions based on assessment information.

Test: A standardized instrument designed to measure a sample of behavior.

Scales of Measurement

1. Nominal Scale:

  • Categorization only
  • Numbers used as labels
  • Example: Gender (Male=1, Female=2), Ethnicity
  • Operations: Count, mode

2. Ordinal Scale:

  • Rank order
  • Differences not equal
  • Example: Class position (1st, 2nd, 3rd)
  • Operations: Median, percentile

3. Interval Scale:

  • Equal intervals
  • No absolute zero
  • Example: Temperature in Celsius
  • Operations: Mean, standard deviation

4. Ratio Scale:

  • Equal intervals + absolute zero
  • True ratios possible
  • Example: Height, weight, age
  • Operations: All statistical operations

Qualities of Good Tests

Validity: The test measures what it claims to measure.

Types of Validity:

  • Content Validity: Test covers all aspects of content
  • Criterion-Related Validity: Comparison with external criterion
    • Concurrent: Correlates with criterion at same time
    • Predictive: Predicts future performance
  • Construct Validity: Measures theoretical construct

Reliability: The consistency of test results.

Types of Reliability:

  • Test-Retest: Same test given twice
  • Parallel Forms: Two equivalent versions
  • Split-Half: Two halves of same test
  • Inter-rater: Agreement between raters

Reliability vs. Validity:

  • A test can be reliable without being valid
  • A test cannot be valid without being reliable

NCE Exam Pattern

Common question types:

  1. Differences between measurement scales
  2. Types and characteristics of validity/reliability
  3. Computing measures of central tendency and dispersion
  4. Interpretation of test scores
  5. Construction of tests and rubrics

🔴 Extended — Deep Study (3mo+)

Comprehensive coverage for students on a longer study timeline.

Testing and Measurement — Comprehensive NCE (Nigeria) Notes

Detailed Theory

1. Nature of Educational Measurement

Definition: Educational measurement involves assigning numbers to student performance according to systematic rules.

Why Measure?

  • Diagnose learning difficulties
  • Evaluate instruction effectiveness
  • Assign grades and credits
  • Selection and placement
  • Accountability

Limitations of Measurement:

  • Cannot measure everything important
  • Always some measurement error
  • What gets measured may not be what matters most
  • Social context affects measurement

2. Scales of Measurement — Detailed

NOMINAL SCALE:

  • Purpose: Classification into distinct categories
  • Characteristics: Mutually exclusive categories, no order implied
  • Permissible Statistics: Mode, frequency counts, chi-square
  • Examples:
    • Types of schools (public, private, mission)
    • States of Nigeria (36 + FCT)
    • Pass/Fail

ORDINAL SCALE:

  • Purpose: Rank ordering
  • Characteristics: Categories have order, but intervals unequal/unknown
  • Permissible Statistics: Median, percentile, rank correlation
  • Examples:
    • Class position (1st, 2nd, 3rd)
    • Socioeconomic status (low, middle, high)
    • Grade levels

INTERVAL SCALE:

  • Purpose: Measure magnitude with equal intervals
  • Characteristics: Zero point is arbitrary, no true ratio
  • Permissible Statistics: Mean, standard deviation, correlation
  • Examples:
    • Temperature (Celsius/Fahrenheit)
    • Standard scores (z-scores, T-scores)
    • Dates on calendar

RATIO SCALE:

  • Purpose: Measure with true zero and equal intervals
  • Characteristics: Absolute zero, true ratios meaningful
  • Permissible Statistics: All statistical operations
  • Examples:
    • Height
    • Weight
    • Age
    • Number of correct answers

3. Validity — Comprehensive Treatment

Definition: The degree to which evidence and theory support the interpretations of test scores for intended purposes.

Evidence-Based Validity:

  • Content evidence (test content)
  • Response process evidence (how test-takers respond)
  • Internal structure evidence (relationships within test)
  • Relations to other variables (criterion evidence)

CONTENT VALIDITY:

  • Degree to which test samples the content domain
  • Subject matter expert judgment required
  • Test blueprint/table of specifications
  • Example: Math test covering only algebra when geometry also required = low content validity

CRITERION-RELATED VALIDITY:

  • Concurrent Validity: Test correlates highly with criterion measured at same time

    • Example: New IQ test correlates 0.85 with established IQ test
  • Predictive Validity: Test predicts future criterion

    • Example: JAMB scores predict university performance
    • Validity coefficient indicates predictive power

CONSTRUCT VALIDITY:

  • Degree to which test measures a theoretical construct
  • Construct: A theoretical concept (intelligence, anxiety, motivation)
  • Multiple forms of evidence gathered
  • Example: Intelligence test validates against theories of intelligence

FACTORS AFFECTING VALIDITY:

  • Test content unrepresentative
  • Item ambiguity
  • Test anxiety
  • Guessing
  • Administration errors
  • Interpretation errors

4. Reliability — Comprehensive Treatment

Definition: The consistency of scores obtained by the same persons on different occasions, with different items, or under different conditions.

TRUE SCORE THEORY:

  • Observed Score = True Score + Error Score
  • X = T + E
  • Perfect reliability = error variance of zero

TEST-RETEST RELIABILITY:

  • Same test administered twice
  • Time interval between tests
  • Correlation between scores = reliability coefficient
  • High correlation = high reliability
  • Problem: Memory effects, practice effects

PARALLEL-FORMS (EQUIVALENT-FORMS) RELIABILITY:

  • Two equivalent versions of test
  • Both administered to same group
  • Correlation between forms
  • Minimizes memory effects

SPLIT-HALF RELIABILITY:

  • One test, divided into two halves
  • Odd-numbered vs. even-numbered items
  • Correlation between halves
  • Spearman-Brown prophecy formula adjusts for full test

INTER-RATER RELIABILITY:

  • Agreement between two or more raters
  • Cohen’s Kappa for categorical judgments
  • Pearson correlation for continuous scores
  • ICC (Intraclass Correlation Coefficient)

RELIABILITY COEFFICIENTS:

  • Range: 0 to 1.00
  • 0.90+ = Excellent (high-stakes decisions)
  • 0.80-0.89 = Good (classroom use)
  • 0.70-0.79 = Adequate (group decisions)
  • Below 0.70 = Questionable

RELIABILITY AND STANDARD ERROR OF MEASUREMENT:

SEM = SD × √(1 - r)
  • SEM provides range within which true score likely falls
  • Higher reliability → Smaller SEM

5. Measures of Central Tendency

MEAN:

  • Arithmetic average
  • Most sensitive to extreme scores
  • Best for interval/ratio data
  • Formula: Σx/n

MEDIAN:

  • Middle value when arranged in order
  • Less affected by extreme scores
  • Better for ordinal or skewed distributions
  • Position = (n+1)/2

MODE:

  • Most frequently occurring value
  • Used with nominal data
  • May have no mode or multiple modes

When to Use Each:

Data TypeBest MeasureReason
NominalModeOnly appropriate
OrdinalMedianRank order
Interval/Ratio (symmetric)MeanMost sensitive
Interval/Ratio (skewed)MedianResistant to outliers

6. Measures of Dispersion

RANGE:

  • Maximum - Minimum
  • Simplest measure
  • Affected by outliers

VARIANCE:

  • Average of squared deviations from mean
  • Population variance: Σ(x-μ)²/N
  • Sample variance: Σ(x-x̄)²/(n-1)

STANDARD DEVIATION:

  • Square root of variance
  • In same units as original data
  • Most commonly used measure
  • Formula: σ = √[Σ(x-μ)²/N]

COEFFICIENT OF VARIATION:

  • CV = (SD/Mean) × 100
  • Allows comparison across different scales
  • Useful for comparing variability of different distributions

7. Normal Distribution and Standard Scores

Normal Distribution:

  • Bell-shaped, symmetric
  • Mean = Median = Mode
  • Defined by mean and standard deviation
  • 68% within 1 SD, 95% within 2 SD, 99.7% within 3 SD

Z-SCORES:

  • Standard score showing position in SD units
  • z = (X - μ)/σ
  • Mean of z-scores = 0
  • SD of z-scores = 1

T-SCORES:

  • z-score transformed to have mean of 50 and SD of 10
  • T = 50 + 10(z)

PERCENTILE RANKS:

  • Percentage of scores below given score
  • 60th percentile = scored higher than 60% of test-takers
  • Not equal intervals — difference between percentiles varies

8. Types of Tests

Standardized Tests:

  • Norm-referenced or criterion-referenced
  • Administered under uniform conditions
  • Content and scoring standardized
  • Examples: WAEC, NECO, JAMB

Teacher-Made Tests:

  • Designed for specific classroom
  • Based on specific instruction
  • More flexible format
  • Diagnostic purposes

CRITERION-REFERENCED vs. NORM-REFERENCED:

AspectCriterion-ReferencedNorm-Referenced
PurposeMastery of objectivesRelative standing
ComparisonTo standardTo other test-takers
Interpretation% who masteredPercentile rank
ExampleDriving test (pass/fail)IQ test (percentile)

9. Test Construction

STEPS IN TEST CONSTRUCTION:

  1. Define objectives/content to be tested
  2. Prepare table of specifications
  3. Select item types
  4. Write items
  5. Review and edit items
  6. Produce final test
  7. Administer
  8. Analyze items
  9. Revise as needed

TABLE OF SPECIFICATIONS (Test Blueprint):

  • Grid showing content areas vs. cognitive levels
  • Ensures representative sampling
  • Guides item writing
  • Documents content validity

ITEM WRITING PRINCIPLES:

  • Clear, unambiguous language
  • One main idea per item
  • Avoid clues (grammatical cues, word frequency)
  • Appropriate difficulty
  • Free from bias
  • Correct answer only one option

10. Item Analysis

DIFFICULTY INDEX:

  • P = Number correct / Total number
  • Range 0 to 1
  • 0.30-0.70 ideal for most purposes
  • Too easy (P>0.90) or too hard (P<0.20) = poor discrimination

DISCRIMINATION INDEX:

  • Difference between upper and lower groups
  • D = (% in upper group correct) - (% in lower group correct)
  • Range -1 to +1
  • 0.40+ = Good discrimination
  • Negative = Item may be keyed incorrectly

Practice Questions for NCE

  1. Differentiate between validity and reliability, explaining why a test can be reliable without being valid.
  2. A test has a mean of 50 and standard deviation of 10. Calculate the z-score for a student scoring 70.
  3. Explain the differences between norm-referenced and criterion-referenced tests.
  4. What is the Standard Error of Measurement and how does it affect interpretation of test scores?
  5. Describe the steps involved in constructing a classroom test.

Content adapted based on your selected roadmap duration. Switch tiers using the selector above.