Overview of Fault-Tolerant Architectures
Overview of Fault-Tolerant Architectures
Dynamic redundancy: standby module that is continuously active Static redundancy: multiple redundant modules with majority voting and fault masking, m out of n systems
It can be employed as a fail-silent node providing the capability of detecting any (100% coverage) single error (permanent or transient) occurring indifferently on the CPU, memory or communication sub-system.
Fault-tolerant Sensors
HW Sensor Redundancy
(a) Triplex system with static redundancy and hot standby, (b) Duplex system with dynamic redundancy, and (c) Duplex system with dynamic redundancy, hot standby, and plausibility checks.
Fault-tolerant Sensors
Analytical Sensor Redundancy
Sensor fault tolerance for one output signal y1 through analytical redundancy by process models: (a)two measured outputs, no measured input and (b) one measured input and one measured output.
Combined Approach
Fault-tolerant Actuator
Common Actuator
Approaches to SW Fault-tolerance
Provides uninterrupted operation in presence of program fault through multiple implementations of a given function Two approaches
N-version programming
Analogous to fault masking (static redundancy)
Recovery blocks
Analogous to dynamic redundancy Error detection mechanism Backup routines for continued service
Checkpoints are created before a version executes, and are needed to recover the state after a version fails to provide a valid operational starting point for the next version.
A Detailed Model of RB
N-Version Programming
A design diverse technique, defined as an independent generation of N (N > 2) functionally equivalent programs from the same initial specification. Basic elements include N software versions, a decision mechanism and a supervisory program or an executive.
Summary
Duplex and TMR configurations form the basis for most of the fault-tolerant mechanisms.
Similar configurations can be adopted for both HW and SW fault-tolerance.
Choices mainly driven by fault detection capabilities as well as timeliness and cost factors.
References
R. Isermann, R. Schwarz, S. Sttzl, Fault-Tolerant Drive-byWire Systems, IEEE Control Systems Magazine, pp. 63-81, Oct. 2002. S.M. Mahmud, S. Alles, In-Vehicle Network Architecture for the Next-Generation Vehicles, SAE 2005-01-1531, 2005. J.-C. laprie, J. Arlat, C. Bounes and K. Kanoun, Definition and Analysis of Hardware- and Software-Fault-Tolerant Architectures. IEEE Computer, Vol. 23, No. 7, pp. 39-51, July 1990. M. Baleani, A. Ferrari, L. Mangeruca, Maurizio Peri, Saverio Pezzini, Fault-Tolerant Platforms for Automotive Safety Critical Applications, Proc. of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp. 170177, 2003. EASIS, General Architecture Framework, Deliverable 0.2.4, Aug. 2004. L. Pullum, Software Fault-Tolerance Techniques and Implementation, Artech House, 2001.