Heuristic Evaluation and Expert Review

Dr Chris Paton

Learning Objectives

Conduct a heuristic evaluation using Nielsen's framework
Apply severity ratings to prioritise identified usability problems
Perform a cognitive walkthrough for a specific user task
Compare the strengths and limitations of expert review methods versus user testing
Determine when to use expert review methods versus empirical testing

Introduction

Usability testing (Chapter 15) observes real users performing real tasks. Expert review methods take a different approach: trained evaluators examine the interface and predict usability problems based on established principles and cognitive models. These methods are faster, cheaper, and less logistically demanding than user testing, making them valuable for early-stage evaluation, quick assessments, and situations where user recruitment is difficult. This chapter covers the two most widely used expert review methods: heuristic evaluation and cognitive walkthrough.

Heuristic Evaluation

Heuristic evaluation, developed by Jakob Nielsen and Rolf Molich Nielsen, 1990, is a systematic inspection method in which evaluators examine an interface and judge its compliance with recognised usability principles (heuristics). The heuristics serve as the evaluation criteria; the evaluator's expertise and judgment determine how they are applied Nielsen, 1994.

Procedure

Briefing: evaluators are given a description of the user population, the primary tasks, and the context of use.
Individual evaluation: each evaluator independently examines the interface, going through it at least twice: first to get an overall sense of the system, then to focus on specific elements.
Problem documentation: each evaluator records every usability problem found, noting which heuristic it violates, where it occurs, and its likely impact.
Consolidation: evaluators' findings are combined into a single list, with duplicates merged.
Severity rating: the consolidated list is rated for severity, either by the evaluators or by a separate group.

Key Principle

Heuristic evaluation must be performed by multiple independent evaluators. Individual evaluators find different subsets of problems; no single evaluator finds them all Hertzum, 2003. Nielsen recommends 3–5 evaluators for a cost-effective analysis Nielsen, 1993. With 5 evaluators, approximately 75% of usability problems are typically identified. Using a single evaluator dramatically reduces the coverage.

The Heuristics

The evaluation criteria are typically Nielsen's 10 heuristics (described in Chapter 8) Nielsen, 1994: visibility of system status, match between system and real world, user control, consistency, error prevention, recognition rather than recall, flexibility, aesthetic minimalism, error recovery, and help. Evaluators may supplement these with domain-specific heuristics, for instance, Gerhardt-Powals' cognitive engineering principles Gerhardt‐Powals, 1996 or medical device heuristics that include patient safety considerations.

Severity Rating

After consolidation, each problem is rated for severity. A common framework combines two dimensions. Impact: how serious is the problem when it occurs? (cosmetic, minor, major, critical) Frequency: how often will users encounter it? (rarely, sometimes, frequently) The product of impact and frequency yields a priority ranking that guides remediation efforts.

Example

A heuristic evaluation of a hospital EHR system might produce findings such as. Problem: Allergy information is not visible on the medication ordering screen (violates: visibility of system status, error prevention). Severity: Critical: high impact (potential patient harm) × high frequency (every medication order). Recommendation: Display active allergies in a persistent banner on the ordering screen. Problem: The "Cancel order" button is styled identically to the "Confirm order" button (violates: error prevention, consistency). Severity: Major: high impact (wrong action) × medium frequency (occurs under time pressure). Recommendation: Visually differentiate destructive and confirmatory actions using colour, size, and position.

Strengths and Limitations

Strengths:

Fast (a single evaluator can review a system in 1–2 hours)
Inexpensive (no participant recruitment, no lab)
Can be conducted early (on wireframes, prototypes, or specifications)
Produces actionable, specific findings Limitations:
Depends on evaluator expertise; novice evaluators miss problems and report false positives
Does not reveal actual user behaviour, task times, or satisfaction
Evaluators may disagree on severity ratings
Cannot identify problems that arise from the user's mental model rather than from heuristic violations

Cognitive Walkthrough

The cognitive walkthrough, developed by Lewis, Polson, Wharton, and Rieman Lewis, 1990, focuses specifically on learnability. It traces the steps required to complete a specific task and, at each step, asks whether a new user would know what to do Wharton, 1994.

Procedure

Define the task: specify a realistic user task and the correct sequence of actions to complete it.
Define the user: describe the target user's goals, knowledge, and experience.
Walk through each step: for each action in the correct sequence, the evaluator answers four questions.

Will the user try to achieve the right effect? (Does the user's goal match what the interface requires?)
Will the user notice that the correct action is available? (Is the control visible and recognisable?)
Will the user associate the correct action with the desired effect? (Does the label, icon, or affordance suggest the right action?)
If the correct action is performed, will the user see that progress is being made? (Is the feedback adequate?)

Record failures: any step where the answer to one or more questions is "no" (or "probably not") represents a learnability problem.

Design Law

The cognitive walkthrough's four questions operationalise the core of learnability: visibility (Can the user see what to do?), affordance (Does the control suggest the right action?), mapping (Does the label match the user's goal description?), and feedback (Can the user tell that the action worked?). These correspond directly to Norman's design principles (Chapter 8) Norman, 2016. A control that fails any of these four tests will cause problems for new users.

Strengths and Limitations

Strengths:

Focuses specifically on learnability: the first-use experience
Forces the evaluator to adopt the user's perspective
Identifies specific points of failure in specific tasks
Can be conducted on paper prototypes or wireframes Limitations:
Time-consuming (each task must be walked through step by step)
Narrow focus (only evaluates the specific tasks analysed)
Does not address efficiency, satisfaction, or error recovery
Assumes a specific "correct" path, which may not match how users actually approach the task

Pluralistic Walkthrough

The pluralistic walkthrough combines elements of expert review and user testing. A group comprising users, developers, and usability experts walks through the interface together, discussing each step. The group format generates diverse perspectives: users reveal mental model mismatches, developers explain technical constraints, and usability experts identify principle violations. The pluralistic walkthrough is particularly effective for building shared understanding among team members about usability issues. Its limitation is the social dynamics of group settings: quieter participants may defer to more vocal ones, and the presence of developers may inhibit users from expressing confusion.

Comparing Expert Review with User Testing

Expert review and user testing are complementary, not competing methods. Research comparing the two approaches [Jeffries, 1991; Hertzum, 2003] consistently finds:

Different problems: expert review methods and user testing identify overlapping but distinct sets of problems. Expert reviewers find problems that users work around (and therefore might not report), while users encounter problems that experts fail to predict.
False positives: expert reviewers sometimes flag issues that do not cause problems for actual users (false positives). User testing avoids this because problems are identified by observing actual difficulty.
Context sensitivity: user testing reveals problems that arise from the user's context, mental model, and task approach, factors that expert reviewers can only approximate.
Efficiency: expert review is faster and cheaper; user testing is more expensive but produces more ecologically valid results.

Think About It

A common pattern in practice is to use expert review early (to catch obvious problems before investing in user testing) and user testing later (to validate the design with real users). But this sequence means that expert review findings are treated as more urgent than user testing findings, simply because they come first. Is this the right prioritisation? Could it lead to over-investment in fixing predicted problems while missing discovered problems?

When to Use Which Method

Use heuristic evaluation when:

The design is early-stage (wireframes, prototypes)
Budget or time constraints prevent user testing
You need a quick assessment of a competitor's product
You want to identify low-hanging fruit before user testing Use cognitive walkthrough when:
Learnability is a primary concern (new users, infrequent use)
The task flow is complex and sequential
You want to evaluate whether a specific task can be completed without training Use user testing when:
You need to validate that the design works for real users
Quantitative metrics (task time, completion rate, satisfaction) are needed
The design is at a stage where user feedback can still influence changes
You want to discover problems that experts cannot predict Use multiple methods when:
The stakes are high (safety-critical systems, high-traffic consumer products)
The budget allows iterative evaluation
You want the most comprehensive assessment possible

Key Takeaways

Heuristic evaluation uses trained evaluators and established principles to identify usability problems without user involvement. Use 3–5 evaluators for adequate coverage.
Cognitive walkthrough focuses on learnability by tracing task steps and asking whether a new user would succeed at each step.
Expert review is faster and cheaper than user testing but identifies different (overlapping) problems and is subject to evaluator expertise and bias.
Severity ratings based on impact and frequency prioritise remediation efforts.
Expert review and user testing are complementary; the strongest evaluation programmes use both.

Textbook of Usability

Heuristic Evaluation and Expert Review

Introduction

Heuristic Evaluation

Procedure

The Heuristics

Severity Rating

Strengths and Limitations

Cognitive Walkthrough

Procedure

Strengths and Limitations

Pluralistic Walkthrough

Comparing Expert Review with User Testing

When to Use Which Method

Key Takeaways

Further Reading