Fail-Safe Design | Glossary | Textbook of Usability

Fail-safe design ensures that when a component or system fails, the failure mode defaults to a safe state rather than an unsafe one. A fail-safe traffic light defaults to flashing red (all-way stop), not green. A fail-safe nuclear reactor shuts down when it loses power, rather than continuing to run uncontrolled.

The principle is fundamental to safety-critical engineering and has been applied across aviation, nuclear power, medical devices, and (increasingly) software systems.

Examples in software:

Session timeouts: idle sessions expire rather than remaining open indefinitely
Default permission levels: restrictive defaults; privileges must be explicitly granted
Destructive action confirmation: "Are you sure?" dialogs for irreversible operations
Auto-save: work is preserved automatically rather than lost on crash
Graceful degradation: features fail in ways that preserve core functionality
Circuit breakers: services disconnect cleanly when downstream systems fail
Rollback mechanisms: deployments can be undone if they fail
Data integrity checks: transactions either complete fully or roll back

Fail-safe design is closely related to defence in depth (Reason's Swiss Cheese Model) and error tolerance. It accepts that failures will occur and ensures they produce safe rather than catastrophic outcomes.

The principle sometimes conflicts with other goals (convenience, efficiency, aesthetic simplicity). An auto-save that runs every keystroke may slow the interface; a restrictive default may require users to grant explicit permissions for routine tasks. Good fail-safe design minimises these friction costs while preserving the safety benefit.

Related terms: Reason's Swiss Cheese Model, Human Error

Discussed in:

Chapter 10: Design Laws from Aviation and Engineering (Error Tolerance and Fail-Safe Design)

Also defined in: Textbook of Usability