0% found this document useful (0 votes)
17 views

Overview of Fault-Tolerant Architectures

There are several approaches to achieving fault tolerance in electronic hardware and software systems. For hardware, dynamic and static redundancy techniques like duplex and triple modular redundancy use multiple redundant components with voting to mask faults. Lock-step and loosely-coupled dual processor architectures provide redundancy at the multi-processor level. For software, N-version programming and recovery blocks provide redundancy through independent implementations that are monitored for errors. Fault-tolerant architectures typically combine redundant hardware and software techniques with error detection to ensure continued system operation despite failures.

Uploaded by

himanshu_agra
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Overview of Fault-Tolerant Architectures

There are several approaches to achieving fault tolerance in electronic hardware and software systems. For hardware, dynamic and static redundancy techniques like duplex and triple modular redundancy use multiple redundant components with voting to mask faults. Lock-step and loosely-coupled dual processor architectures provide redundancy at the multi-processor level. For software, N-version programming and recovery blocks provide redundancy through independent implementations that are monitored for errors. Fault-tolerant architectures typically combine redundant hardware and software techniques with error detection to ensure continued system operation despite failures.

Uploaded by

himanshu_agra
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Overview of Fault-Tolerant Architectures

Fault-tolerance Schemes for Electronic HW

Dynamic redundancy: standby module that is continuously active Static redundancy: multiple redundant modules with majority voting and fault masking, m out of n systems

Dynamic redundancy: standby module that is inactive

Fault-Tolerant Multi-Processor Architectures


Lock-Step Dual Processor Architecture
Two processors (master & checker): execute the same code being strictly synchronized. Master: has access to the system memory and drives all system outputs. Checker: continuously executes the instructions moving on the bus (i.e. those fetched by the master processor) Monitor: consisting of a comparator circuit at the masters and checkers bus interfaces, that checks the consistency of their data-addressand control-lines.

It can be employed as a fail-silent node providing the capability of detecting any (100% coverage) single error (permanent or transient) occurring indifferently on the CPU, memory or communication sub-system.

Fault-Tolerant Multi-Processor Architectures


Loosely-Synchronized Dual Processor Architecture:
Two CPUs run independently having access to distinct memory subsystems. A real-time operating system running on both CPUs Inter-processor communication Synchronization Error detection (e.g. by means of crosschecks), correction and containment (e.g. memory protection) Each processor adds its own signature to the outputs of critical tasks and the receiver checks for both signatures before accepting the data.
The execution of a function on both CPUs guarantees the detection of any error (100% coverage) occurring indifferently on one of the CPUs, busses or memories.

Fault-Tolerant Multi-Processor Architectures


Triple Modular Redundant (TMR) Architecture:
three identical CPUs: execute the same code in lock-step. majority voter: majority vote of the outputs masks any possible single CPU fault. The memory and communication sub-system faults can be masked employing ECC (Error Correcting Codes) techniques.

General Scheme of Process-Model-based and Signal-based Fault Detection

Fault-tolerant Sensors
HW Sensor Redundancy

(a) Triplex system with static redundancy and hot standby, (b) Duplex system with dynamic redundancy, and (c) Duplex system with dynamic redundancy, hot standby, and plausibility checks.

Fault-tolerant Sensors
Analytical Sensor Redundancy

Sensor fault tolerance for one output signal y1 through analytical redundancy by process models: (a)two measured outputs, no measured input and (b) one measured input and one measured output.

Combined Approach

Fault-tolerant Actuator

Common Actuator

Actuator with Duplex Drive

Fault-tolerant In-Vehicle Network

Fault-tolerant Automotive System Architecture

Approaches to SW Fault-tolerance
Provides uninterrupted operation in presence of program fault through multiple implementations of a given function Two approaches
N-version programming
Analogous to fault masking (static redundancy)

Recovery blocks
Analogous to dynamic redundancy Error detection mechanism Backup routines for continued service

Recovery Block (RB) Approach

Checkpoints are created before a version executes, and are needed to recover the state after a version fails to provide a valid operational starting point for the next version.

A Detailed Model of RB

N-Version Programming
A design diverse technique, defined as an independent generation of N (N > 2) functionally equivalent programs from the same initial specification. Basic elements include N software versions, a decision mechanism and a supervisory program or an executive.

A Detailed Model of NVP

Summary
Duplex and TMR configurations form the basis for most of the fault-tolerant mechanisms.
Similar configurations can be adopted for both HW and SW fault-tolerance.

Choices mainly driven by fault detection capabilities as well as timeliness and cost factors.

References
R. Isermann, R. Schwarz, S. Sttzl, Fault-Tolerant Drive-byWire Systems, IEEE Control Systems Magazine, pp. 63-81, Oct. 2002. S.M. Mahmud, S. Alles, In-Vehicle Network Architecture for the Next-Generation Vehicles, SAE 2005-01-1531, 2005. J.-C. laprie, J. Arlat, C. Bounes and K. Kanoun, Definition and Analysis of Hardware- and Software-Fault-Tolerant Architectures. IEEE Computer, Vol. 23, No. 7, pp. 39-51, July 1990. M. Baleani, A. Ferrari, L. Mangeruca, Maurizio Peri, Saverio Pezzini, Fault-Tolerant Platforms for Automotive Safety Critical Applications, Proc. of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp. 170177, 2003. EASIS, General Architecture Framework, Deliverable 0.2.4, Aug. 2004. L. Pullum, Software Fault-Tolerance Techniques and Implementation, Artech House, 2001.

You might also like