Effect size is a quantitative measure of the magnitude of a difference or relationship in experimental data, independent of sample size. While the p-value tells whether an effect is likely to be real, the effect size tells whether it matters.
Common effect size measures:
- Cohen's d — standardised difference between two means (small: 0.2, medium: 0.5, large: 0.8)
- Pearson's r — correlation strength (small: 0.1, medium: 0.3, large: 0.5)
- Odds ratio — ratio of odds between conditions (for binary outcomes)
- Eta squared (η²) — proportion of variance explained by the factor
- Relative risk — ratio of probabilities between conditions
Reporting effect sizes alongside p-values has become essential for rigorous research. A comparison with 1,000 participants may find a statistically significant difference of 0.3 seconds in task time — real, but practically irrelevant. A smaller study may find a large effect that fails to reach significance but is worth investigating further.
Jacob Cohen (1988) provided the original benchmarks for interpreting effect size magnitudes, though these should be adjusted based on field-specific norms. In usability research, effect sizes tend to be moderate: users are varied, tasks are varied, and large effects are rare.
The minimum effect size of interest is an important concept in pre-study planning. Before running an experiment, decide what size improvement would actually matter in practice, and design the study with enough power to detect it.
Related terms: Statistical Power, A/B Testing
Discussed in:
- Chapter 18: Experimental Design and Statistics for Usability — Common Statistical Tests
Also defined in: Textbook of Usability, Textbook of Medical Statistics