Glossary

A/B Testing

A/B testing is the application of between-subjects experimental design to live software systems at scale. Users are randomly assigned to see version A or version B of a feature, and their behaviour is measured through analytics. It has become the dominant method for data-driven design decisions in consumer web and mobile products.

The standard methodology:

  1. Define the hypothesis ("Changing the button colour will increase conversion")
  2. Define the metric (conversion rate, time on task, click-through)
  3. Calculate sample size via power analysis
  4. Randomise — assign each visitor to condition A or B
  5. Run until the required sample size is reached (at least one full business cycle)
  6. Analyse with a chi-squared test, z-test for proportions, or t-test

Practical considerations:

  • Duration — at least one week to account for day-of-week effects
  • Multiple testing correction — simultaneous tests or repeated peeking inflate false positives
  • Primary vs secondary metrics — choose one primary metric aligned with the real goal
  • Novelty effects — new designs may show temporary boosts that fade

A/B testing is powerful but limited: it measures what users do, not why. A winning variant may succeed for reasons the designer didn't anticipate (and doesn't understand). Combining A/B testing with qualitative research provides the "why" behind the "what".

Ethical questions remain. Users are not typically informed they are in an experiment. A/B tests that manipulate emotional content, pricing, or access to features raise consent and harm concerns that routine UI variation does not.

Related terms: Effect Size, Statistical Power

Discussed in:

Also defined in: Textbook of Usability