Fail-safe design ensures that when a component or system fails, the failure mode defaults to a safe state rather than an unsafe one. A fail-safe traffic light defaults to flashing red (all-way stop), not green. A fail-safe nuclear reactor shuts down when it loses power, rather than continuing to run uncontrolled.
The principle is fundamental to safety-critical engineering and has been applied across aviation, nuclear power, medical devices, and — increasingly — software systems.
Examples in software:
- Session timeouts — idle sessions expire rather than remaining open indefinitely
- Default permission levels — restrictive defaults; privileges must be explicitly granted
- Destructive action confirmation — "Are you sure?" dialogs for irreversible operations
- Auto-save — work is preserved automatically rather than lost on crash
- Graceful degradation — features fail in ways that preserve core functionality
- Circuit breakers — services disconnect cleanly when downstream systems fail
- Rollback mechanisms — deployments can be undone if they fail
- Data integrity checks — transactions either complete fully or roll back
Fail-safe design is closely related to defence in depth (Reason's Swiss Cheese Model) and error tolerance. It accepts that failures will occur and ensures they produce safe rather than catastrophic outcomes.
The principle sometimes conflicts with other goals (convenience, efficiency, aesthetic simplicity). An auto-save that runs every keystroke may slow the interface; a restrictive default may require users to grant explicit permissions for routine tasks. Good fail-safe design minimises these friction costs while preserving the safety benefit.
Related terms: Reason's Swiss Cheese Model, Human Error
Discussed in:
- Chapter 10: Design Laws from Aviation and Engineering — Error Tolerance and Fail-Safe Design
Also defined in: Textbook of Usability