Fault Tolerant Systems: Prerequisites
Fault Tolerant Systems: Prerequisites
http://www.ecs.umass.edu/ece/koren/FaultTolerantSystems
Part 1 - Introduction
Chapter 1 - Preliminaries
Part.1 .1
Prerequisites
Basic courses in
Digital Design
Hardware Organization/Computer
Architecture
Probability
Part.1 .2
Page 1
References
Main reference
Further Readings
Part.1 .3
Administrative Details
Instructor: Prof. Israel Koren
Office: KEB 309E, Tel. 545-2643
Email: [email protected]
Office Hours: TuTh 4:00-5:00pm
Course web page:
http://euler.ecs.umass.edu/ece655/
Grading:
Homework - 10%
Mid-term I - 25%
Mid-term II - 25%
Project or Seminar - 40%
Copyright 2007 Koren & Krishna, Morgan-Kaufman
Part.1 .4
Page 2
Course Outline
Part.1 .5
Part.1 .6
Page 3
Part.1 .7
Part.1 .8
Page 4
Part.1 .9
Part.1 .10
Page 5
time
Example - a memory cell whose contents are
changed spuriously due to some electromagnetic
interference
Overwriting the memory cell with the right content
will make the fault go away
Permanent Faults - never go away, component has to
be repaired or replaced
Intermittent Faults - cycle between active and
benign states
Example - a loose connection
Another classification: Benign vs malicious
Copyright 2007 Koren & Krishna, Morgan-Kaufman
Part.1 .11
software/programming mistake
Error - a manifestation of a fault
Example: An adder circuit with one output
lines stuck at 1
This is a fault, but not (yet) an error
Becomes an error when the adder is used
and the result on that line should be 0
Part.1 .12
Page 6
Part.1 .13
Redundancy
Redundancy is at the heart of fault tolerance
Redundancy - incorporation of extra components
in the design of a system so that its function is
not impaired in the event of a failure
We will study four forms of redundancy:
1.
2.
3.
4.
Copyright 2007 Koren & Krishna, Morgan-Kaufman
Part.1 .14
Page 7
Hardware Redundancy
Extra hardware is added to override the
Part.1 .15
same function
The hope is that such diversity will ensure that
not all the copies will fail on the same set of
input data
Part.1 .16
Page 8
Information Redundancy
Part.1 .17
Time Redundancy
Part.1 .18
Page 9
Part.1 .19
Part.1 .20
Page 10
Part.1 .21
Part.1 .22
Page 11
i1
Copyright 2007 Koren & Krishna, Morgan-Kaufman
Part.1 .23
Part.1 .24
Page 12
processors
Classical Node and Line Connectivity - the
minimum number of nodes and lines, respectively,
that have to fail before the network becomes
disconnected
Measure indicates how vulnerable the network is
to disconnection
A network disconnected by the failure of just one
(critically-positioned) node is potentially more
vulnerable than another which requires several
nodes to fail before it becomes disconnected
Copyright 2007 Koren & Krishna, Morgan-Kaufman
Part.1 .25
Connectivity - Examples
Part.1 .26
Page 13
Part.1 .27
Page 14