- Conduct a heuristic evaluation using Nielsen's framework
- Apply severity ratings to prioritise identified usability problems
- Perform a cognitive walkthrough for a specific user task
- Compare the strengths and limitations of expert review methods versus user testing
- Determine when to use expert review methods versus empirical testing
Introduction
Usability testing (Chapter 15) observes real users performing real tasks. Expert review methods take a different approach: trained evaluators examine the interface and predict usability problems based on established principles and cognitive models. These methods are faster, cheaper, and less logistically demanding than user testing, making them valuable for early-stage evaluation, quick assessments, and situations where user recruitment is difficult. This chapter covers the two most widely used expert review methods: heuristic evaluation and cognitive walkthrough.
Heuristic Evaluation
Heuristic evaluation, developed by Jakob Nielsen and Rolf Molich Nielsen, 1990, is a systematic inspection method in which evaluators examine an interface and judge its compliance with recognised usability principles (heuristics). The heuristics serve as the evaluation criteria; the evaluator's expertise and judgment determine how they are applied Nielsen, 1994.
Procedure
- Briefing: evaluators are given a description of the user population, the primary tasks, and the context of use.
- Individual evaluation: each evaluator independently examines the interface, going through it at least twice — first to get an overall sense of the system, then to focus on specific elements.
- Problem documentation: each evaluator records every usability problem found, noting which heuristic it violates, where it occurs, and its likely impact.
- Consolidation: evaluators' findings are combined into a single list, with duplicates merged.
- Severity rating: the consolidated list is rated for severity, either by the evaluators or by a separate group.
Heuristic evaluation must be performed by multiple independent evaluators. Individual evaluators find different subsets of problems; no single evaluator finds them all Hertzum, 2003. Nielsen recommends 3–5 evaluators for a cost-effective analysis Nielsen, 1993. With 5 evaluators, approximately 75% of usability problems are typically identified. Using a single evaluator dramatically reduces the coverage.
The Heuristics
The evaluation criteria are typically Nielsen's 10 heuristics (described in Chapter 8) Nielsen, 1994: visibility of system status, match between system and real world, user control, consistency, error prevention, recognition rather than recall, flexibility, aesthetic minimalism, error recovery, and help. Evaluators may supplement these with domain-specific heuristics — for instance, Gerhardt-Powals' cognitive engineering principles Gerhardt‐Powals, 1996 or medical device heuristics that include patient safety considerations.
Severity Rating
After consolidation, each problem is rated for severity. A common framework combines two dimensions. Impact: how serious is the problem when it occurs? (cosmetic, minor, major, critical) Frequency: how often will users encounter it? (rarely, sometimes, frequently) The product of impact and frequency yields a priority ranking that guides remediation efforts.
A heuristic evaluation of a hospital EHR system might produce findings such as. Problem: Allergy information is not visible on the medication ordering screen (violates: visibility of system status, error prevention). Severity: Critical — high impact (potential patient harm) × high frequency (every medication order). Recommendation: Display active allergies in a persistent banner on the ordering screen. Problem: The "Cancel order" button is styled identically to the "Confirm order" button (violates: error prevention, consistency). Severity: Major — high impact (wrong action) × medium frequency (occurs under time pressure). Recommendation: Visually differentiate destructive and confirmatory actions using colour, size, and position.
Strengths and Limitations
Strengths:
- Fast (a single evaluator can review a system in 1–2 hours)
- Inexpensive (no participant recruitment, no lab)
- Can be conducted early (on wireframes, prototypes, or specifications)
- Produces actionable, specific findings Limitations:
- Depends on evaluator expertise; novice evaluators miss problems and report false positives
- Does not reveal actual user behaviour, task times, or satisfaction
- Evaluators may disagree on severity ratings
- Cannot identify problems that arise from the user's mental model rather than from heuristic violations
Cognitive Walkthrough
The cognitive walkthrough, developed by Lewis, Polson, Wharton, and Rieman Lewis, 1990, focuses specifically on learnability. It traces the steps required to complete a specific task and, at each step, asks whether a new user would know what to do Wharton, 1994.
Procedure
- Define the task: specify a realistic user task and the correct sequence of actions to complete it.
- Define the user: describe the target user's goals, knowledge, and experience.
- Walk through each step: for each action in the correct sequence, the evaluator answers four questions.
- Will the user try to achieve the right effect? (Does the user's goal match what the interface requires?)
- Will the user notice that the correct action is available? (Is the control visible and recognisable?)
- Will the user associate the correct action with the desired effect? (Does the label, icon, or affordance suggest the right action?)
- If the correct action is performed, will the user see that progress is being made? (Is the feedback adequate?)
- Record failures: any step where the answer to one or more questions is "no" (or "probably not") represents a learnability problem.
The cognitive walkthrough's four questions operationalise the core of learnability: visibility (Can the user see what to do?), affordance (Does the control suggest the right action?), mapping (Does the label match the user's goal description?), and feedback (Can the user tell that the action worked?). These correspond directly to Norman's design principles (Chapter 8) Norman, 2016. A control that fails any of these four tests will cause problems for new users.
Strengths and Limitations
Strengths:
- Focuses specifically on learnability — the first-use experience
- Forces the evaluator to adopt the user's perspective
- Identifies specific points of failure in specific tasks
- Can be conducted on paper prototypes or wireframes Limitations:
- Time-consuming (each task must be walked through step by step)
- Narrow focus (only evaluates the specific tasks analysed)
- Does not address efficiency, satisfaction, or error recovery
- Assumes a specific "correct" path, which may not match how users actually approach the task
Pluralistic Walkthrough
The pluralistic walkthrough combines elements of expert review and user testing. A group comprising users, developers, and usability experts walks through the interface together, discussing each step. The group format generates diverse perspectives: users reveal mental model mismatches, developers explain technical constraints, and usability experts identify principle violations. The pluralistic walkthrough is particularly effective for building shared understanding among team members about usability issues. Its limitation is the social dynamics of group settings — quieter participants may defer to more vocal ones, and the presence of developers may inhibit users from expressing confusion.
Comparing Expert Review with User Testing
Expert review and user testing are complementary, not competing methods. Research comparing the two approaches [Jeffries, 1991; Hertzum, 2003] consistently finds:
- Different problems: expert review methods and user testing identify overlapping but distinct sets of problems. Expert reviewers find problems that users work around (and therefore might not report), while users encounter problems that experts fail to predict.
- False positives: expert reviewers sometimes flag issues that do not cause problems for actual users (false positives). User testing avoids this because problems are identified by observing actual difficulty.
- Context sensitivity: user testing reveals problems that arise from the user's context, mental model, and task approach — factors that expert reviewers can only approximate.
- Efficiency: expert review is faster and cheaper; user testing is more expensive but produces more ecologically valid results.
A common pattern in practice is to use expert review early (to catch obvious problems before investing in user testing) and user testing later (to validate the design with real users). But this sequence means that expert review findings are treated as more urgent than user testing findings, simply because they come first. Is this the right prioritisation? Could it lead to over-investment in fixing predicted problems while missing discovered problems?
When to Use Which Method
Use heuristic evaluation when:
- The design is early-stage (wireframes, prototypes)
- Budget or time constraints prevent user testing
- You need a quick assessment of a competitor's product
- You want to identify low-hanging fruit before user testing Use cognitive walkthrough when:
- Learnability is a primary concern (new users, infrequent use)
- The task flow is complex and sequential
- You want to evaluate whether a specific task can be completed without training Use user testing when:
- You need to validate that the design works for real users
- Quantitative metrics (task time, completion rate, satisfaction) are needed
- The design is at a stage where user feedback can still influence changes
- You want to discover problems that experts cannot predict Use multiple methods when:
- The stakes are high (safety-critical systems, high-traffic consumer products)
- The budget allows iterative evaluation
- You want the most comprehensive assessment possible
Key Takeaways
- Heuristic evaluation uses trained evaluators and established principles to identify usability problems without user involvement. Use 3–5 evaluators for adequate coverage.
- Cognitive walkthrough focuses on learnability by tracing task steps and asking whether a new user would succeed at each step.
- Expert review is faster and cheaper than user testing but identifies different (overlapping) problems and is subject to evaluator expertise and bias.
- Severity ratings based on impact and frequency prioritise remediation efforts.
- Expert review and user testing are complementary; the strongest evaluation programmes use both.
Further Reading
- Nielsen, J., & Molich, R. (1990). Heuristic evaluation of user interfaces. Proceedings of CHI '90, 249–256.
- Nielsen, J. (1994). Heuristic evaluation. In J. Nielsen & R. L. Mack (Eds.), Usability Inspection Methods (pp. 25–62). John Wiley & Sons.
- Wharton, C., Rieman, J., Lewis, C., & Polson, P. (1994). The cognitive walkthrough method: A practitioner's guide. In J. Nielsen & R. L. Mack (Eds.), Usability Inspection Methods (pp. 105–140). John Wiley & Sons.
- Hertzum, M., & Jacobsen, N. E. (2003). The evaluator effect: A chilling fact about usability evaluation methods. International Journal of Human-Computer Interaction, 15(1), 183–204.
- Jeffries, R., Miller, J. R., Wharton, C., & Uyeda, K. (1991). User interface evaluation in the real world: A comparison of four techniques. Proceedings of CHI '91, 119–124.