Predictive Modelling

Dr Chris Paton

Learning Objectives

Apply Fitts's Law calculations to compare target layouts
Construct KLM analyses for realistic interactive tasks
Distinguish the GOMS family variants (KLM, CMN-GOMS, NGOMSL, CPM-GOMS) and their trade-offs
Estimate working memory load for different interface designs
Use predictive models to compare designs without human subjects
Understand the role of computational cognitive architectures in usability prediction

Introduction

Chapters 6 and 7 introduced the Model Human Processor and GOMS as theoretical frameworks Card, 1983. This chapter treats them as practical tools: methods for predicting usability without testing with human participants Kieras, 2003. Predictive modelling answers the question "which design is likely to be faster, more accurate, or less cognitively demanding?" using calculations rather than experiments. The appeal is clear: predictions can be made before any code is written, any prototype built, or any participant recruited. The limitation is equally clear: models predict specific aspects of performance (time, cognitive load) for specific tasks under specific assumptions (expert, error-free performance). They do not predict satisfaction, learnability, or the full range of real-world human behaviour.

Fitts's Law in Practice

Calculating Movement Time

Fitts's Law (Chapter 5) [Fitts, 1954; MacKenzie, 1992] predicts the time to move to and acquire a target.

MT = a + b × log2(D/W + 1)

For mouse pointing, typical values of the constants are a ≈ 0 ms and b ≈ 150 ms (these vary with device and conditions). Using these values:

Distance (D)	Width (W)	ID = log2(D/W + 1)	MT (ms)
100 px	50 px	1.58	237
200 px	50 px	2.32	348
400 px	50 px	3.17	476
200 px	100 px	1.58	237
200 px	25 px	3.17	476

Example

Design comparison: Where should the "Submit" button go? Option A: Submit button at the bottom-right of a form, 400 px from the last input field, 80 px wide. MT = 150 × log2(400/80 + 1) = 150 × 2.58 = 387 ms Option B: Submit button directly below the last input field, 60 px away, 80 px wide. MT = 150 × log2(60/80 + 1) = 150 × 0.81 = 121 ms Option B is predicted to be 266 ms faster per submission. For a form submitted 1,000 times per day across an organisation, this saves 266 seconds (about 4.4 minutes) per day, a modest but measurable improvement at zero development cost.

Example

Worked example: a small distant target versus a large near target. A common intuition is that a button placed close to the cursor is always quicker to reach than one further away. Fitts's Law lets us test that intuition with numbers, because it trades distance against size. Consider two real controls:

Target X: a 12 px close button (the small cross in a dialog corner) sitting 400 px from the current cursor position.
Target Y: a 40 px toolbar button sitting 100 px from the current cursor position.

First compute the index of difficulty for each, using ID = log2(D/W + 1):

Target X: ID = log2(400/12 + 1) = log2(34.33) = 5.10 bits
Target Y: ID = log2(100/40 + 1) = log2(3.50) = 1.81 bits

Now apply MT = a + b × ID with plausible mouse-pointing constants a = 50 ms (a small fixed reaction and click cost) and b = 150 ms per bit:

Target X: MT = 50 + 150 × 5.10 = 50 + 765 = 815 ms
Target Y: MT = 50 + 150 × 1.81 = 50 + 272 = 322 ms

Target Y is predicted to be roughly 493 ms faster, about two and a half times quicker to acquire, even though both controls require a single pointing movement. The intuition is correct here, but for a sharper reason than mere distance: the small 12 px target has a very high index of difficulty (5.10 bits) because its width is tiny relative to the distance, forcing a slow, carefully corrected final approach. The 40 px target tolerates a coarser, faster movement. The design lesson is that shrinking a frequently used control to save screen space is rarely free: a close button that is both small and far carries a real, calculable time penalty, and the penalty grows with the logarithm of D/W, not with distance alone.

Comparing Layouts

Fitts's Law becomes most powerful when comparing total movement cost across an entire workflow. For a sequence of actions (click field A, then field B, then button C), the total movement time is the sum of the individual Fitts's Law predictions for each movement.

Design Law

To minimise total interaction time for a sequential workflow, place the controls in the order they will be used and minimise the distance between consecutive controls. The optimal layout clusters frequently used controls and arranges sequential controls along a short path. This can be calculated precisely using Fitts's Law for each transition.

Limitations of Fitts's Law Predictions

Fitts's Law predicts movement time only. It does not account for:

Time to visually locate the target (visual search, which depends on target-distractor similarity)
Time to decide which target to select (Hick's Law for choices)
Error rates (the model predicts optimal, error-free performance)
Learning effects (initial movements are slower than practised ones)

A complete prediction requires combining Fitts's Law with visual search models and Hick's Law, as the MHP framework does.

KLM Analysis in Practice

Step-by-Step Method

Describe the task in terms of user goals
Identify the method: the sequence of actions an expert user would perform
Assign operators (K, P, B, H, M, R(t)) to each action
Place mental operators (M) using Card, Moran, and Newell's heuristic rules
Sum the operator times to get the predicted task time

The standard unit times, calibrated by Card, Moran, and Newell from typing and pointing studies Card, 1980, are:

Operator	Meaning	Time
K	Keystroke or button press (average, skilled typist)	0.20 s
P	Pointing to a target with the mouse	1.10 s
H	Homing the hand between mouse and keyboard	0.40 s
M	Mental preparation (retrieving the next step)	1.35 s

The pointing operator P (1.10 s) is a rule-of-thumb average; where a precise figure matters, P can be replaced by a Fitts's Law calculation for the specific target, which is exactly how CPM-GOMS handles pointing.

Worked Example: Correcting a Word in a Document

Task: the cursor is in a word processor and the user has noticed a typo a few words back. They double-click the misspelt word to select it, type the correct word to replace it, then return their hand to the mouse. We model an expert performing this error-free.

Method and operator sequence:

M (decide to make the correction and recall the method): M
Point to the misspelt word with the mouse: P
Double-click to select the word (two button presses): K K
Home the hand from the mouse to the keyboard: H
M (prepare to type the replacement): M
Type the five-letter replacement word: K K K K K
Home the hand back to the mouse to continue: H

Counting operators: 2 M, 1 P, 2 H, and 7 K (two clicks plus five letters). Applying the standard times:

M: 2 × 1.35 s = 2.70 s
P: 1 × 1.10 s = 1.10 s
H: 2 × 0.40 s = 0.80 s
K: 7 × 0.20 s = 1.40 s

Predicted execution time = 2.70 + 1.10 + 0.80 + 1.40 = 6.00 s.

Two observations follow directly from the breakdown. First, the two mental operators (2.70 s) account for almost half the total, more than the keystrokes and pointing combined; this is typical, and it is why reducing the number of decision points usually buys more than speeding up the typing. Second, the placement of M operators is governed by Card, Moran, and Newell's heuristic rules: an M is inserted before a cognitive unit such as a meaningful command or a chunk of text, but not before each individual keystroke within a familiar word. Misapplying these rules is the most common source of error in hand-built KLM analyses.

Worked Example: Booking a Meeting Room

Task: Book Conference Room B for 2pm to 3pm on Thursday using a calendar application.

Method (expert user, mouse-based): open the booking dialog (M, P to the menu, K to click), point to and select the room (P, K), point to the date and start time fields and type them (M, several P and K operators), then confirm (P, K). A full operator list for this method sums to roughly 14 s of execution time.

Now compare with a keyboard-shortcut method that skips the menu navigation and pointing by typing the room code and times directly into a quick-entry field (replacing several P operators with K operators, each P at 1.10 s giving way to a 0.20 s K).

The keyboard method is predicted to be about 1 second faster, a small difference for a single booking, but meaningful for an administrative assistant who books 30 rooms per day (30 seconds saved daily).

When KLM Predictions Are Most Useful

KLM is most valuable for:

Comparing alternative designs for the same task: even if the absolute predictions are imprecise, the relative comparison is usually reliable
Identifying slow steps: the M operator (1.35 s) is often the largest single component; reducing the number of mental preparations (through better defaults, clearer labels, or predictable layouts) offers the greatest time savings
Justifying design decisions: quantitative predictions carry weight with stakeholders who might dismiss qualitative usability arguments

The GOMS Family

KLM is the simplest member of a family of related techniques, all descended from the original GOMS formulation (Goals, Operators, Methods, and Selection rules) Card, 1983. The members differ in how much structure they model and, correspondingly, in how much effort they demand to build and how much they can predict. John and Kieras provide the definitive comparison John, 1996.

KLM (Keystroke-Level Model) is the lightest variant. It assumes the analyst already knows the exact sequence of physical actions an expert will perform, and it simply sums operator times. It predicts execution time and nothing else. It is quick to apply (an afternoon, no special software) and is the right tool for comparing two concrete, well-specified methods for the same task.
CMN-GOMS (the original Card, Moran, and Newell formulation) adds an explicit goal hierarchy and selection rules. Instead of one flat action list, the task is decomposed into goals and subgoals, with methods attached to each and selection rules choosing between competing methods (for example, "use the scrollbar for long jumps, the arrow keys for short ones"). CMN-GOMS predicts execution time like KLM, but it also makes the procedural structure visible, which helps when a task has branches or alternative methods.
NGOMSL (Natural GOMS Language) is a structured, almost programmatic notation introduced by Kieras Kieras, 1988. Because the method steps are written as explicit numbered statements, NGOMSL can predict not only execution time but also learning time: the number of statements a user must learn is proportional to the time to acquire the procedure. This makes NGOMSL useful when training cost or transfer between similar interfaces matters, not just steady-state speed.
CPM-GOMS (Cognitive, Perceptual, Motor, also read as Critical Path Method) is the heaviest variant. It drops the assumption that operators run one at a time. Instead it schedules perceptual, cognitive, and motor operators in parallel on a critical-path diagram, capturing the way a skilled operator overlaps thinking, looking, and moving. CPM-GOMS predicts the shortest possible time for highly practised performance and can reveal that a redesign which looks faster step by step is actually slower because it disrupts overlapping activity.

Example

Case study: Project Ernestine. Gray, John, and Atwood used a CPM-GOMS model to evaluate a proposed new workstation for NYNEX telephone operators Gray, 1993. The new equipment was marketed as faster and more ergonomic, and a step-by-step (serial) analysis suggested it would help. The CPM-GOMS model, which represented the parallel overlap of perception, cognition, and hand movement that experienced operators had developed, predicted instead that the new workstation would be about 4% slower per call, because the redesigned workflow broke that overlap. A field trial confirmed the prediction. Since each second per call translated into large recurring costs, the model saved the company an estimated several million dollars a year by preventing a plausible-looking but counterproductive change. The lesson is that for expert, time-critical work the serial models can mislead, and the extra effort of CPM-GOMS pays off.

Key Principle

There is a direct trade-off across the GOMS family between predictive power and the effort to build the model. KLM is fastest to apply but predicts only execution time for a known method; CPM-GOMS captures parallel skilled behaviour and learning-sensitive variants such as NGOMSL add learning-time predictions, but each step up the ladder requires more analysis time and more expertise. Choose the lightest model that answers your question: use KLM to compare two concrete methods, CMN-GOMS or NGOMSL when procedure structure or learning cost matters, and reserve CPM-GOMS for high-stakes expert tasks where overlapping activity dominates.

Predictive Models versus Empirical Testing

Predictive (engineering) models and empirical user testing answer different questions, and it is a mistake to treat one as a substitute for the other.

The strengths of predictive models are their speed, repeatability, and diagnostic clarity. They require no participants, so they can be applied to a paper sketch before a line of code exists, and they can compare a dozen layout variants in the time it would take to recruit a single test session Kieras, 2003. Because a model is a transparent sum of named operators, it does not merely say that design A is faster; it says why, pointing to the specific mental preparations or long pointing movements responsible, which directs the redesign. The predictions are also free of the sampling noise and individual variation that make small empirical studies hard to interpret.

The limits are equally important. Predictive models describe the expert, error-free, routine case. They say nothing about how a novice will explore an unfamiliar interface, how often users will make and recover from errors, whether a feature is discoverable, or whether people find the result satisfying or trustworthy. They model the method the analyst supplies; if real users adopt a different (perhaps worse) strategy, the prediction will not reflect it. And they are silent on everything outside the narrow performance dimensions they were built for.

Think About It

A KLM analysis tells you that the keyboard-shortcut method is one second faster than the menu method. Empirical testing might reveal that 80% of users never discover the shortcut, so the average real-world time is dominated by the menu method the model said to avoid. Which number should drive the design decision, and what does this tell you about combining the two approaches?

Empirical testing, by contrast, captures the messy reality the models exclude: discovery, errors, confusion, satisfaction, and the strategies real users actually choose. Its weaknesses mirror the models' strengths: it is slower and more expensive, it requires a working prototype, its results carry sampling variability, and a finding that "design A tested slower" does not by itself explain the cause. The sound practice is to use predictive models to narrow the design space cheaply and to generate explanations, then use empirical testing to validate the surviving candidates and to catch the problems no model can foresee.

Estimating Cognitive Load

While Fitts's Law and KLM predict time, cognitive load estimation predicts mental effort. A design that is fast but mentally exhausting is not usable in the long run.

Working Memory Load Analysis

For each step in a task, count the number of items the user must hold in working memory simultaneously. Compare this count against Cowan's estimate of ~4 items (Chapter 3) Cowan, 2001.

Example

Task: Manually calculate a drug dose. The user must simultaneously hold:

The patient's weight (72.5 kg)
The dose per kilogram (5 mg/kg)
The intermediate calculation (72.5 × 5 = 362.5 mg)
The available concentration (50 mg/mL)
The volume to administer (362.5 / 50 = 7.25 mL)

This task requires holding 5 items, exceeding typical working memory capacity. The prediction: errors are likely, especially under time pressure or interruption. The design intervention: a dose calculator that accepts weight and drug, computes the volume automatically, and displays all intermediate values. This externalises the memory load.

Split-Attention Analysis

For each step, identify whether the required information is co-located (visible in a single glance) or split across locations (requiring the user to look back and forth or remember information across screens). Split attention increases cognitive load by requiring mental integration of separated information (Chapter 3) Sweller, 1998. A systematic split-attention analysis walks through each task step and notes: "What information does the user need? Where is it displayed? Can the user see all needed information without scrolling, switching screens, or remembering values?"

Computational Cognitive Architectures

Beyond the hand-calculated models, computational cognitive architectures provide more detailed (and more effortful) predictions.

ACT-R

ACT-R (Adaptive Control of Thought, Rational), developed by John Anderson at Carnegie Mellon Anderson, 2007, is a cognitive architecture that models human cognition at the level of individual memory retrievals, perceptual encodings, and motor actions. ACT-R models can predict not only task times but also error patterns, learning curves, and the effects of cognitive load. ACT-R has been used to model tasks ranging from menu selection to air traffic control. Its predictions are often within 10 to 15% of observed human performance Newell, 1990. However, building an ACT-R model requires substantial expertise and effort; it is a research tool rather than a practitioner tool.

CogTool and Cogulator

CogTool (developed at Carnegie Mellon) John, 2004 and Cogulator (developed by Steven Estes at the MITRE Corporation) are practical tools that lower the barrier to cognitive modelling. Cogulator, in particular, provides a web-based interface where practitioners can specify a task as a sequence of operators and receive predictions of task time, working memory load, and a CPM-GOMS schedule, without needing to understand the underlying cognitive theory.

Think About It

Predictive models work best for routine, well-defined tasks performed by experts. But many important usability questions concern non-routine situations: What happens when the user encounters an unexpected error? How does a novice explore an unfamiliar interface? What happens when the user is interrupted mid-task? Can predictive models be extended to these situations, or are they fundamentally limited to the expert, error-free case?

Combining Predictive Models with Other Methods

Predictive modelling is most effective when integrated into a broader evaluation strategy:

Use predictive models early (during design, before prototypes exist) to compare alternative layouts, workflows, and interaction patterns
Use heuristic evaluation (Chapter 16) to catch problems that models do not address (consistency, error recovery, aesthetic quality)
Use usability testing (Chapter 15) to validate predictions and discover problems that models cannot predict
Use analytics and A/B testing (Chapter 18) to measure actual performance at scale

No single method provides a complete picture. Predictive models provide precision for specific questions; other methods provide coverage for the broader usability landscape.

Key Takeaways

Fitts's Law can be applied directly to compare target layouts: larger targets and shorter distances produce faster acquisition. Total workflow time is the sum of individual movement predictions.
KLM analysis predicts expert task time by summing operator times (K = 0.20 s, P = 1.10 s, H = 0.40 s, M = 1.35 s). The mental operator is often the largest component and the best target for optimisation.
The GOMS family ranges from lightweight KLM, through CMN-GOMS and learning-sensitive NGOMSL, to parallel-scheduling CPM-GOMS; predictive power rises with the effort to build the model, so choose the lightest variant that answers the question.
Predictive (engineering) models are fast, repeatable, and diagnostic but cover only expert, error-free, routine performance; empirical user testing captures discovery, errors, and satisfaction. Use models to narrow the design space, then test to validate.
Cognitive load can be estimated by counting working memory items per step and identifying split-attention situations.
Computational tools (Cogulator, CogTool) make predictive modelling accessible to practitioners.
Predictive models are most valuable for comparing alternatives early in design; they should be combined with expert review and user testing for comprehensive evaluation.

Textbook of Usability