Lecture 4
Lecture 4
Redundancy
hardware redundancy
2nd CPU, 2nd ALU, ...
software redundancy
validation test...
information redundancy
error-detecting and correcting codes, ...
time redundancy
repeating tasks several times, ...
p. 3 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Example
FT digital filter
acceptance test [0 - 255]
SW: detect overflow HW: memory for test time: to execute test
Redundancy (5)
NOTHING FOR FREE! costs
HW: components, area, power, ... SW: development costs, ... information: extra HW to code / decode time: faster CPUs, components
Types of redundancy
hardware redundancy information redundancy software redundancy time redundancy
HW redundancy: overview
passive redundancy techniques
fault masking
Passive HW redundancy
Triple Modular Redundancy (TMR)
input 1
M1 M2 M3 voter
output
input 2 input 3
Passive HW redundancy
Triple Modular Redundancy (TMR)
3 active components fault masking by voter
Passive HW redundancy
in1
M1 M2 M3
V1 V2 V3
M1 M2 M3
V1 V2 V3
out1
in2 in3
out2 out3
restoring organ
p. 10 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Passive HW redundancy
N-modular redundancy (NMR)
N active components (N A) N odd, for majority voting tolerates N/2 module faults
example Apollo
N=5 2 faults can be tolerated (masked)
p. 11 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
HW voting
hardware realisation of 1-bit majority voter f = ab + ac + bc f a 0 0 1 0 0 1 1 1 b n-bit majority voter: n times 1-bit requires 2 gate delays
p. 12 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
a c b c f
SW voting
Voting can be performed using software voter is software implemented by a microprocessor voting program can be as simple as a sequence of three comparisons, with the outcome of the vote being the value that agrees with at least on on the other two
HW vs. SW Voting
HW: fast, but expensive
32-bit voter: 128 gates and 256 flip-flops 1 TMR level = 3 voters
time
p. 17 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Types of HW redundancy
static techniques (passive)
fault masking
hybrid techniques
static + dynamic fault masking + reconfiguration
p. 19 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Active HW redundancy
dynamic redundancy
actions required for correct result
detection, localization, containment, recovery no fault masking
does not attempt to prevent faults from producing errors within the system
Active HW redundancy
most common in applications that can tolerate temporary erroneous results
satellite systems - preferable to have temporary failures that high degree of redundancy
Implementation of comparator
In hardware, a bit-by-bit comparison can be done using two-input exclusive-or gates In software, a comparison can be implemented a a COMPARE instruction
commonly found in instruction sets of almost all microprocessors
Standby sparing
One module is operational and one or more serve as stand-bys, or spares error detection is used to determine when a module has become faulty error location is used to determine which module is faulty faulty module is removed from operation and replaced with a spare
p. 26 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
Standby sparing
M1 in M2
Error detection
...
Error detection
N to 1 switch
out
Mn
Error detection
Switch
The switch examines error reports from the error detection circuitry associated with each module
if the module is error-free, the selection is made using a fixed priority any module with errors is eliminated from consideration momentary disruption in operation occur while the reconfiguration is performed
p. 28 - Design of Fault Tolerant Systems - Elena Dubrova, ESDlab
out
out
Pair-and-a-spare technique
Combines standby sparing and dublication with comparison like standby sparing, but two instead of one modules are operated in parallel at all times
their results are compared to provide error detection error signal initiates reconfiguration
Pair-and-a-spare technique
M1 in M2
Error detection
out
...
Error detection
N to 2 switch
Mn
Error detection
Pair-and-a-spare technique
As long as two selected outputs agree, the spares are not used If they disagree, the switch uses error reports to locate the faulty module and to select the replacement module
compare
Watchdog timer
watchdog timer
must be reset an on a repetitive basic if not reset - system is turned off (or reset) detection of
crash overload infinite loop
HW redundancy: overview
static techniques (passive)
fault masking
hybrid techniques
static + dynamic fault masking + reconfiguration
Hybrid HW redundancy
combines
static redundancy
fault masking
dynamic redundancy
detection, location, containment and recovery
Self-purging redundancy
All units are actively participate in the system each module has a capability to remove itself from the system if its faulty
very attractive feature: maintenance personnel can disable individual modules and replace them without interrupting the system
Self-purging redundancy
M1
switch
M2 Mn
switch
voter
out
switch
switch
voter
out
Triple-duplex architecture
Combines duplication with comparison and triple modular redundancy
Triple-duplex architecture
M1a M1b M2a M2b M3a M3b
comp
Triple-duplex architecture
TMR allows faults to be masked
performance without interruption
duplication with comparison allows faults to be detected and faulty module removed from voting
removal of faulty module allows to tolerate future faults
Summary
application-dependent choice
critical-computation - momentary erroneous results are not acceptable
passive or hybrid
Next lecture
Information redundancy