AI and Usability

Dr Chris Paton

Learning Objectives

Describe how AI is changing the design and evaluation of user interfaces
Identify usability challenges specific to AI-powered systems
Explain the role of explainability and trust in AI interface design
Evaluate the potential and limitations of AI-assisted usability evaluation
Apply existing usability principles to the design of AI interactions

Introduction

Artificial intelligence is transforming usability from two directions simultaneously Shneiderman, 2022. First, AI is becoming a design tool: generating interfaces, predicting usability problems, and automating aspects of evaluation. Second, AI is becoming the interface itself: large language models, recommendation systems, and autonomous agents create interaction patterns that do not fit the traditional model of a user operating deterministic controls Amershi, 2019. This chapter examines both directions: AI as a tool for usability and the usability of AI systems themselves.

AI as a Design Tool

Generative Design

Large language models and image generation systems can produce interface designs from text descriptions. A designer can describe a desired layout ("a dashboard with three cards showing patient vital signs, an alert panel on the right, and a navigation sidebar") and receive a rendered mockup in seconds. The speed of generation enables rapid exploration of design alternatives. Where a designer might manually create 3–5 layout variations, AI can generate dozens. This supports the divergent phase of design: generating many options before converging on the best.

Key Principle

AI-generated designs are starting points, not finished products. They can accelerate exploration but they cannot evaluate their own usability. A generated layout may look plausible while violating Fitts's Law Fitts, 1954 (targets too small), cognitive load principles Sweller, 1988 (too many elements competing for attention), or accessibility standards Initiative, 2018 (insufficient contrast). The designer's role shifts from generating designs to evaluating and refining AI-generated candidates using the principles in this textbook.

Automated Usability Evaluation

AI systems can automate some aspects of usability evaluation:

Automated accessibility checking: tools that scan interfaces for WCAG violations (missing alt text, insufficient contrast, missing form labels), already mature and widely deployed.
Predictive analytics: machine learning models trained on interaction data to predict where users will struggle, for instance, identifying form fields with high abandonment rates or navigation paths with frequent backtracking.
Heuristic detection: AI systems that evaluate interfaces against design heuristics, flagging potential violations. These are less mature than accessibility checkers and prone to false positives, but improving.
Gaze prediction: models that predict where users will look on a page (based on saliency maps and trained on eye-tracking data), providing an approximation of visual attention without requiring eye-tracking hardware.

AI-Assisted User Research

AI can assist (but not replace) user research:

Transcription and coding: automated transcription of usability test sessions with preliminary thematic coding
Sentiment analysis: detecting frustration or satisfaction in user verbalisations during think-aloud testing
Pattern detection: identifying common behaviour sequences across many users in large-scale analytics data

Think About It

If AI can generate designs and evaluate them against usability heuristics, does it follow that AI could design usable interfaces autonomously, without a human designer? What aspects of usability judgment require human expertise that current AI systems lack? Is the limitation primarily technical (AI will improve) or fundamental (usability judgment requires understanding human experience in a way that AI cannot)?

The Usability of AI Systems

The Unpredictability Problem

Traditional interfaces are deterministic: the same input always produces the same output. A button always performs the same action. A menu always contains the same items. Users build mental models based on this predictability, and predictability supports learning and trust. AI systems, particularly those based on large language models, are non-deterministic. The same prompt may produce different responses. A recommendation system changes its suggestions based on accumulated data. An autocomplete system offers different completions depending on context. This non-determinism undermines the formation of stable mental models.

Design Law

When incorporating AI into an interface, make the deterministic components clearly distinguishable from the non-deterministic ones. Users should know which elements will behave consistently and which may change. This supports the formation of partial mental models: "The navigation always works the same way, but the recommendations change based on my behaviour."

The Explanation Problem

Many AI systems operate as black boxes: they produce outputs without explaining how or why Cai, 2019. A clinical decision support system that recommends a drug without explaining its reasoning provides no basis for the clinician to evaluate the recommendation, to decide whether to trust it, modify it, or override it. Explainability in AI interfaces is not a technical add-on; it is a usability requirement. The usability question is not "Can the system explain itself?" but "Does the explanation help the user make a better decision?"

Levels of Explanation

Not all users need the same level of explanation. A radiologist reviewing an AI's chest X-ray classification may want to see which regions of the image the model focused on (a visual saliency map). A patient told by their doctor that "the AI recommends further testing" needs a different kind of explanation: one framed in terms of risk and benefit, not model attention weights.

Example

A clinical AI system that predicts patient deterioration might provide explanations at three levels:

Summary: "This patient has a high risk of deterioration in the next 12 hours."
Contributing factors: "Key factors: declining blood pressure trend, elevated heart rate, rising lactate."
Technical detail: "The model's prediction is based on a gradient-boosted decision tree trained on 50,000 ICU admissions, with an AUROC of 0.87."

Most clinicians need level 1 (to act) and level 2 (to evaluate). Level 3 is relevant for clinical informaticists evaluating the system, not for bedside decision-making.

Trust Calibration

The goal of AI interface design is not to maximise user trust in the AI but to calibrate trust appropriately. Users should trust the AI when it is likely to be correct and question it when it might be wrong. Under-trust (ignoring accurate recommendations) and over-trust (accepting inaccurate recommendations without scrutiny) are both harmful.

Key Principle

Trust in AI should be calibrated, not maximised. Interface design can support calibrated trust by: (1) communicating uncertainty (showing confidence scores or probability ranges), (2) providing evidence (showing the data or reasoning behind the recommendation), (3) highlighting disagreement (flagging cases where the AI's recommendation conflicts with clinical guidelines or the user's input), and (4) tracking accuracy (showing the system's historical accuracy so users can develop informed expectations).

Automation Bias

Automation bias is the tendency to accept automated recommendations without sufficient scrutiny, even when the recommendations are incorrect Parasuraman, 1997. This is the over-trust failure. It has been extensively documented in aviation (pilots accepting incorrect autopilot actions) Sarter, 1995 and healthcare (clinicians accepting incorrect drug dose recommendations from CPOE systems) Koppel, 2005. Interface design can mitigate automation bias by:

Requiring active confirmation rather than passive acceptance of AI recommendations
Presenting the AI's recommendation alongside the relevant data, so the user evaluates both
Occasionally presenting cases where no AI recommendation is provided, forcing the user to make independent judgments and maintaining their skill

The Dialogue Paradigm

Large language model interfaces (chatbots, AI assistants) introduce a fundamentally different interaction paradigm: conversation rather than command. The user expresses intent in natural language; the system interprets and responds. This is closer to human-human interaction than to traditional GUI interaction. The dialogue paradigm introduces usability challenges that traditional interface design principles do not fully address:

Discoverability: in a GUI, available actions are visible (menus, buttons). In a dialogue interface, the user must guess what the system can do.
Error recovery: in a GUI, undo reverses the last action. In a dialogue, a misunderstood request may produce a long, irrelevant response that the user must read to determine that it missed the point.
Mental model formation: users struggle to form accurate mental models of what an LLM knows, what it can do, and when it is likely to be wrong.
Consistency: the same question asked in different phrasing may produce substantially different answers, violating users' expectations of consistency.

AI and Accessibility

AI offers significant potential for accessibility:

Real-time captioning for deaf and hard-of-hearing users
Image description for blind and low-vision users
Gaze-based interaction for users with motor impairments
Simplified language generation for users with cognitive disabilities
Personalised interfaces that adapt to individual user capabilities

However, AI accessibility features also raise concerns: AI-generated image descriptions may be inaccurate, AI captions may misrepresent speech, and personalised interfaces may inadvertently exclude users from features or information they need.

AI in Usability Research

AI is increasingly used as a research tool in usability studies:

Large-scale behavioural analysis: AI can process interaction logs from millions of users to identify usability patterns that would be invisible in small-sample studies
Automated prototype testing: AI agents that simulate user interactions with prototypes, identifying potential usability issues before human testing
Synthetic users: AI models trained on human behaviour data that can predict how users would respond to design changes; a complement to (not replacement for) testing with real users

Think About It

AI "synthetic users" can generate large volumes of simulated usability data quickly and cheaply. But they are trained on historical human behaviour, which means they reproduce the biases, capabilities, and limitations of the training population. Can synthetic users ever replace real user testing? Under what circumstances might they be sufficient, and when are they clearly inadequate?

Designing AI Interactions: Principles

The existing usability principles from this textbook apply to AI interfaces, but with specific adaptations [Nielsen, 1994; Amershi, 2019]:

Visibility of system status (Nielsen #1): show what the AI is doing (processing, searching, generating), not just a generic loading spinner
User control (Nielsen #3): allow users to stop, redirect, and undo AI actions easily
Consistency (Nielsen #4): where possible, make AI responses consistent and predictable; where not possible, communicate the variability
Error prevention (Nielsen #5): provide guardrails that prevent the AI from producing harmful outputs (content filtering, output validation)
Recognition rather than recall (Nielsen #6): provide suggestions, templates, and examples of what users can ask or do; do not require them to guess the AI's capabilities
Flexibility (Nielsen #7): support both novice users (who need guidance) and expert users (who want direct control)

Key Takeaways

AI is both a tool for usability (automated evaluation, generative design, analytics) and a new type of interface with its own usability challenges.
AI-generated designs require human evaluation against usability principles; they are starting points, not finished products.
The non-determinism of AI systems undermines traditional mental model formation; design should clearly distinguish deterministic and non-deterministic components.
Explainability is a usability requirement for AI systems: explanations should match the user's decision-making needs, not the system's technical architecture.
Trust should be calibrated, not maximised; automation bias is the primary over-trust risk.
Existing usability heuristics apply to AI interfaces with domain-specific adaptations.

Textbook of Usability