Effect Size | Glossary | Textbook of Usability

Effect size is a quantitative measure of the magnitude of a difference or relationship in experimental data, independent of sample size. While the p-value tells whether an effect is likely to be real, the effect size tells whether it matters.

Common effect size measures:

Cohen's d: standardised difference between two means (small: 0.2, medium: 0.5, large: 0.8)
Pearson's r: correlation strength (small: 0.1, medium: 0.3, large: 0.5)
Odds ratio: ratio of odds between conditions (for binary outcomes)
Eta squared (η²): proportion of variance explained by the factor
Relative risk: ratio of probabilities between conditions

Reporting effect sizes alongside p-values has become essential for rigorous research. A comparison with 1,000 participants may find a statistically significant difference of 0.3 seconds in task time, real, but practically irrelevant. A smaller study may find a large effect that fails to reach significance but is worth investigating further.

Jacob Cohen (1988) provided the original benchmarks for interpreting effect size magnitudes, though these should be adjusted based on field-specific norms. In usability research, effect sizes tend to be moderate: users are varied, tasks are varied, and large effects are rare.

The minimum effect size of interest is an important concept in pre-study planning. Before running an experiment, decide what size improvement would actually matter in practice, and design the study with enough power to detect it.

Related terms: Statistical Power, A/B Testing

Discussed in:

Chapter 18: Experimental Design and Statistics for Usability (Common Statistical Tests)

Also defined in: Textbook of Usability, Textbook of Medical Statistics