A/B Testing | Glossary | Textbook of Usability

A/B testing is the application of between-subjects experimental design to live software systems at scale. Users are randomly assigned to see version A or version B of a feature, and their behaviour is measured through analytics. It has become the dominant method for data-driven design decisions in consumer web and mobile products.

The standard methodology:

Define the hypothesis ("Changing the button colour will increase conversion")
Define the metric (conversion rate, time on task, click-through)
Calculate sample size via power analysis
Randomise: assign each visitor to condition A or B
Run until the required sample size is reached (at least one full business cycle)
Analyse with a chi-squared test, z-test for proportions, or t-test

Practical considerations:

Duration: at least one week to account for day-of-week effects
Multiple testing correction: simultaneous tests or repeated peeking inflate false positives
Primary vs secondary metrics: choose one primary metric aligned with the real goal
Novelty effects: new designs may show temporary boosts that fade

A/B testing is powerful but limited: it measures what users do, not why. A winning variant may succeed for reasons the designer didn't anticipate (and doesn't understand). Combining A/B testing with qualitative research provides the "why" behind the "what".

Ethical questions remain. Users are not typically informed they are in an experiment. A/B tests that manipulate emotional content, pricing, or access to features raise consent and harm concerns that routine UI variation does not.

Related terms: Effect Size, Statistical Power

Discussed in:

Chapter 18: Experimental Design and Statistics for Usability (A/B Testing)

Also defined in: Textbook of Usability