- Describe how AI is changing the design and evaluation of user interfaces
- Identify usability challenges specific to AI-powered systems
- Explain the role of explainability and trust in AI interface design
- Evaluate the potential and limitations of AI-assisted usability evaluation
- Apply existing usability principles to the design of AI interactions
Introduction
Artificial intelligence is transforming usability from two directions simultaneously Shneiderman, 2022. First, AI is becoming a design tool — generating interfaces, predicting usability problems, and automating aspects of evaluation. Second, AI is becoming the interface itself — large language models, recommendation systems, and autonomous agents create interaction patterns that do not fit the traditional model of a user operating deterministic controls Amershi, 2019. This chapter examines both directions: AI as a tool for usability and the usability of AI systems themselves.
AI as a Design Tool
Generative Design
Large language models and image generation systems can produce interface designs from text descriptions. A designer can describe a desired layout — "a dashboard with three cards showing patient vital signs, an alert panel on the right, and a navigation sidebar" — and receive a rendered mockup in seconds. The speed of generation enables rapid exploration of design alternatives. Where a designer might manually create 3–5 layout variations, AI can generate dozens. This supports the divergent phase of design — generating many options before converging on the best.
AI-generated designs are starting points, not finished products. They can accelerate exploration but they cannot evaluate their own usability. A generated layout may look plausible while violating Fitts's Law Fitts, 1954 (targets too small), cognitive load principles Sweller, 1988 (too many elements competing for attention), or accessibility standards Initiative, 2018 (insufficient contrast). The designer's role shifts from generating designs to evaluating and refining AI-generated candidates using the principles in this textbook.
Automated Usability Evaluation
AI systems can automate some aspects of usability evaluation:
- Automated accessibility checking: tools that scan interfaces for WCAG violations (missing alt text, insufficient contrast, missing form labels) — already mature and widely deployed.
- Predictive analytics: machine learning models trained on interaction data to predict where users will struggle — for instance, identifying form fields with high abandonment rates or navigation paths with frequent backtracking.
- Heuristic detection: AI systems that evaluate interfaces against design heuristics, flagging potential violations. These are less mature than accessibility checkers and prone to false positives, but improving.
- Gaze prediction: models that predict where users will look on a page (based on saliency maps and trained on eye-tracking data), providing an approximation of visual attention without requiring eye-tracking hardware.
AI-Assisted User Research
AI can assist (but not replace) user research:
- Transcription and coding: automated transcription of usability test sessions with preliminary thematic coding
- Sentiment analysis: detecting frustration or satisfaction in user verbalisations during think-aloud testing
- Pattern detection: identifying common behaviour sequences across many users in large-scale analytics data
If AI can generate designs and evaluate them against usability heuristics, does it follow that AI could design usable interfaces autonomously — without a human designer? What aspects of usability judgment require human expertise that current AI systems lack? Is the limitation primarily technical (AI will improve) or fundamental (usability judgment requires understanding human experience in a way that AI cannot)?
The Usability of AI Systems
The Unpredictability Problem
Traditional interfaces are deterministic: the same input always produces the same output. A button always performs the same action. A menu always contains the same items. Users build mental models based on this predictability, and predictability supports learning and trust. AI systems — particularly those based on large language models — are non-deterministic. The same prompt may produce different responses. A recommendation system changes its suggestions based on accumulated data. An autocomplete system offers different completions depending on context. This non-determinism undermines the formation of stable mental models.
When incorporating AI into an interface, make the deterministic components clearly distinguishable from the non-deterministic ones. Users should know which elements will behave consistently and which may change. This supports the formation of partial mental models: "The navigation always works the same way, but the recommendations change based on my behaviour."
The Explanation Problem
Many AI systems operate as black boxes: they produce outputs without explaining how or why Cai, 2019. A clinical decision support system that recommends a drug without explaining its reasoning provides no basis for the clinician to evaluate the recommendation — to decide whether to trust it, modify it, or override it. Explainability in AI interfaces is not a technical add-on; it is a usability requirement. The usability question is not "Can the system explain itself?" but "Does the explanation help the user make a better decision?"
Levels of Explanation
Not all users need the same level of explanation. A radiologist reviewing an AI's chest X-ray classification may want to see which regions of the image the model focused on (a visual saliency map). A patient told by their doctor that "the AI recommends further testing" needs a different kind of explanation — one framed in terms of risk and benefit, not model attention weights.
A clinical AI system that predicts patient deterioration might provide explanations at three levels:
- Summary: "This patient has a high risk of deterioration in the next 12 hours."
- Contributing factors: "Key factors: declining blood pressure trend, elevated heart rate, rising lactate."
- Technical detail: "The model's prediction is based on a gradient-boosted decision tree trained on 50,000 ICU admissions, with an AUROC of 0.87." Most clinicians need level 1 (to act) and level 2 (to evaluate). Level 3 is relevant for clinical informaticists evaluating the system, not for bedside decision-making.
Trust Calibration
The goal of AI interface design is not to maximise user trust in the AI but to calibrate trust appropriately. Users should trust the AI when it is likely to be correct and question it when it might be wrong. Under-trust (ignoring accurate recommendations) and over-trust (accepting inaccurate recommendations without scrutiny) are both harmful.
Trust in AI should be calibrated, not maximised. Interface design can support calibrated trust by: (1) communicating uncertainty (showing confidence scores or probability ranges), (2) providing evidence (showing the data or reasoning behind the recommendation), (3) highlighting disagreement (flagging cases where the AI's recommendation conflicts with clinical guidelines or the user's input), and (4) tracking accuracy (showing the system's historical accuracy so users can develop informed expectations).
Automation Bias
Automation bias is the tendency to accept automated recommendations without sufficient scrutiny, even when the recommendations are incorrect Parasuraman, 1997. This is the over-trust failure. It has been extensively documented in aviation (pilots accepting incorrect autopilot actions) Sarter, 1995 and healthcare (clinicians accepting incorrect drug dose recommendations from CPOE systems) Koppel, 2005. Interface design can mitigate automation bias by:
- Requiring active confirmation rather than passive acceptance of AI recommendations
- Presenting the AI's recommendation alongside the relevant data, so the user evaluates both
- Occasionally presenting cases where no AI recommendation is provided, forcing the user to make independent judgments and maintaining their skill
The Dialogue Paradigm
Large language model interfaces (chatbots, AI assistants) introduce a fundamentally different interaction paradigm: conversation rather than command. The user expresses intent in natural language; the system interprets and responds. This is closer to human-human interaction than to traditional GUI interaction. The dialogue paradigm introduces usability challenges that traditional interface design principles do not fully address:
- Discoverability: in a GUI, available actions are visible (menus, buttons). In a dialogue interface, the user must guess what the system can do.
- Error recovery: in a GUI, undo reverses the last action. In a dialogue, a misunderstood request may produce a long, irrelevant response that the user must read to determine that it missed the point.
- Mental model formation: users struggle to form accurate mental models of what an LLM knows, what it can do, and when it is likely to be wrong.
- Consistency: the same question asked in different phrasing may produce substantially different answers, violating users' expectations of consistency.
AI and Accessibility
AI offers significant potential for accessibility:
- Real-time captioning for deaf and hard-of-hearing users
- Image description for blind and low-vision users
- Gaze-based interaction for users with motor impairments
- Simplified language generation for users with cognitive disabilities
- Personalised interfaces that adapt to individual user capabilities However, AI accessibility features also raise concerns: AI-generated image descriptions may be inaccurate, AI captions may misrepresent speech, and personalised interfaces may inadvertently exclude users from features or information they need.
AI in Usability Research
AI is increasingly used as a research tool in usability studies:
- Large-scale behavioural analysis: AI can process interaction logs from millions of users to identify usability patterns that would be invisible in small-sample studies
- Automated prototype testing: AI agents that simulate user interactions with prototypes, identifying potential usability issues before human testing
- Synthetic users: AI models trained on human behaviour data that can predict how users would respond to design changes — a complement to (not replacement for) testing with real users
AI "synthetic users" can generate large volumes of simulated usability data quickly and cheaply. But they are trained on historical human behaviour, which means they reproduce the biases, capabilities, and limitations of the training population. Can synthetic users ever replace real user testing? Under what circumstances might they be sufficient, and when are they clearly inadequate?
Designing AI Interactions: Principles
The existing usability principles from this textbook apply to AI interfaces, but with specific adaptations [Nielsen, 1994; Amershi, 2019]:
- Visibility of system status (Nielsen #1): show what the AI is doing (processing, searching, generating), not just a generic loading spinner
- User control (Nielsen #3): allow users to stop, redirect, and undo AI actions easily
- Consistency (Nielsen #4): where possible, make AI responses consistent and predictable; where not possible, communicate the variability
- Error prevention (Nielsen #5): provide guardrails that prevent the AI from producing harmful outputs (content filtering, output validation)
- Recognition rather than recall (Nielsen #6): provide suggestions, templates, and examples of what users can ask or do — do not require them to guess the AI's capabilities
- Flexibility (Nielsen #7): support both novice users (who need guidance) and expert users (who want direct control)
Key Takeaways
- AI is both a tool for usability (automated evaluation, generative design, analytics) and a new type of interface with its own usability challenges.
- AI-generated designs require human evaluation against usability principles; they are starting points, not finished products.
- The non-determinism of AI systems undermines traditional mental model formation; design should clearly distinguish deterministic and non-deterministic components.
- Explainability is a usability requirement for AI systems: explanations should match the user's decision-making needs, not the system's technical architecture.
- Trust should be calibrated, not maximised; automation bias is the primary over-trust risk.
- Existing usability heuristics apply to AI interfaces with domain-specific adaptations.
Further Reading
- Amershi, S., et al. (2019). Guidelines for human-AI interaction. Proceedings of CHI '19, Paper 3.
- Shneiderman, B. (2022). Human-Centered AI. Oxford University Press.
- Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39(2), 230–253.
- Cai, C. J., et al. (2019). Human-centered tools for coping with imperfect algorithms during medical decision-making. Proceedings of CHI '19, Paper 4.
- Norman, D. A. (2023). Design for a Better World: Meaningful, Sustainable, Humanity Centered. MIT Press.