A/B testing is the application of between-subjects experimental design to live software systems at scale. Users are randomly assigned to see version A or version B of a feature, and their behaviour is measured through analytics. It has become the dominant method for data-driven design decisions in consumer web and mobile products.
The standard methodology:
- Define the hypothesis ("Changing the button colour will increase conversion")
- Define the metric (conversion rate, time on task, click-through)
- Calculate sample size via power analysis
- Randomise — assign each visitor to condition A or B
- Run until the required sample size is reached (at least one full business cycle)
- Analyse with a chi-squared test, z-test for proportions, or t-test
Practical considerations:
- Duration — at least one week to account for day-of-week effects
- Multiple testing correction — simultaneous tests or repeated peeking inflate false positives
- Primary vs secondary metrics — choose one primary metric aligned with the real goal
- Novelty effects — new designs may show temporary boosts that fade
A/B testing is powerful but limited: it measures what users do, not why. A winning variant may succeed for reasons the designer didn't anticipate (and doesn't understand). Combining A/B testing with qualitative research provides the "why" behind the "what".
Ethical questions remain. Users are not typically informed they are in an experiment. A/B tests that manipulate emotional content, pricing, or access to features raise consent and harm concerns that routine UI variation does not.
Related terms: Effect Size, Statistical Power
Discussed in:
Also defined in: Textbook of Usability