Error Detecting and Correcting Codes: Appendix A
Error Detecting and Correcting Codes: Appendix A
527
528 AppendixA
Table A.2 shows the evolution of the number of codewords that can be made
when n increases. Cells noted '-' correspond to situations without interest or
impossible.
11 1024 462 - - -
12 2048 924 64 256 -
16 32768 1287 256 4096 2048
19 262144 92378 - 32768 -
21 1048576 352716 - 65536 -
32 2147483648 601080390 65536 134217728 268435456
36 34359738368 9075135488 262144 214783648 -
1. ON-LINE REDUNDANCY
All modules are operating in parallel. As far as k of them are faultless, the system
functions correctly. The value of k depends on the technique used. This corresponds
to a passive redundancy according to the observability of the errors affecting each
module.
529
530 Appendix B
Structure m-out-of-n
The system does not fail as far as m modules are faultless. The outputs are
elaborated by a m-out-oJ-n voter. Hence, the global reliability is:
n
R =( L (~ ) R o m (1 - Ra) n.m) R v , where R v is the reliability of the voter.
i=m l
a) 2-out-of-3 b) 3-out-of-4
b) Structure 3-out-of-4
This structure is represented by Figure B.2-b. For a perfect voter, we have:
R -- 4 R 0
3 - 3 R 0 4 -- 4e -31.. 1 - 3e -41.. 1 .
Hence, as all modules have the same reliability: R =2 R02 - R04 =2 e -21..1 _ e -41..1.
Special case: n = 2
R = e ,A.I + A, t e ,A. I, MTTF = 2 . MTTFo,
2
'--_-' Standby
b) n = 2
n
a) general case
Figure BA. Off-line redundancy
R(t) 1
Redundant Structures
1. Basic Module
0,9
2. TMR
0,8 3. 1-out-of-2
4.3-out-of-4
5. Double-Duplex
6. Standby 2
time
(hours)
1. DEBUGGING AID
During the debugging of an application running on a microprocessor, it is
necessary to understand the execution of the implemented programs. Like all
processors, the Pentium offers a debugging mode called 'Probe Mode' which allows
accessing from the outside to the internal registers, to the system memory 1/0
spaces, and to the internal state of the microprocessor. The Pentium has 4 debugging
registers used to insert breakpoints. Moreover, the specialist in charge of the
debugging can access to internal counters that records some events of the internal
evolution. All these features obviously belong to the design verification group of
techniques.
535
536 Appendix C
2. OFF-LINE TESTING
The Boundary Scan (IEEE 1149.1) standard has been implemented in the
microprocessor for testing at 'global board level'. This means that in any system
comprising a microprocessor connected to other circuits (such as memory unit,
interface circuits, etc.) on a PCB, it is possible to access through the Pentium to
these others circuits in order to apply test sequences to them and to collect the
resulting outputs.
The following pins of the test bus are accessible: TCK, TDIITDO, TMS, TRST, as
weIl as the test logic (the TAP automaton).
Finally, the Pentium integrates a BIST procedure that is automatically executed
when the microprocessor is switched on. This off-line testing procedure is called
Reset Self-Test but in reality the term 'self-test' refers here to a Built-In Self-Test
technique. Intel announces that this integrated test covers 100% of the single stuck-
at Oll faults of the Micro-Code PLAs, memory caches (instruction and data caches),
and some other internal circuitry (TLB, ROM).
3. ON-LINE TESTING
This component also offers on-line testing features:
• internaion-line error detection, thanks to error detecting codes,
• redundancy capability allowing Duplex redundant structures.
Error Detection
During the functioning of the Pentium, some error detection mechanisms are
activated by a specialized automaton called the Machine Check Exception. These
errors are revealed by the use of single parity error detecting codes:
• single parity test on the Data Bus (DATA PARITY):
64 bit-Data Bus + 8 parity bits (one bit per byte of data),
• single parity on the Address Bus (ADDRESS PARITY):
32 bit-Address Bus + 1 parity bit,
• some other internal parity codes.
Microprocessor Redundancy
Finally, the circuit has been designed in order to allow a Duplex redundant
structure to be easily implemented, thanks to the Functional Redundancy Checking
(FRC) technique. Figure C.I illustrates this technique.
A 'Master' microprocessor performs the normal functioning of the application
and is connected to the extern al process.
A second microprocessor, called 'Check', plays the role of an observer. When an
error is detected (by simple comparison of the two functions), the output signal IER
is activated, calling for an external action (alarm, switch-off, recovery, etc.). The
commercial document of Intel ensures that more than 99% of the faults are thus
detected.
Testing Features of a Microprocessor 537
IER
error
signal
The first launch of Ariane V led to the destruction of the rocket, due to a failure
of the embedded computing system. Whereas most of the firms whose projects
failed had hidden the causes, the CNES (French national space agency) provided
numerous pieces of information whose study concurred in the improvement of
knowledge on dependability. The following presentation is based on the published
documents. The analysis developed in this appendix must above all strengthen the
opinion that the mastering of faults in complex computing systems is very difficult,
illustrating this idea on areal example.
1. FAlLURE SCENARIO
A simplified view of the architecture of the computing system embedded in the
rocket is provided in Figure D.l .
SRI OBC
Engines
fromAriane 4
SRI =
Inertial Reference System
Running Expecting OBe =On-Board Computer
539
540 AppendixD
The engines (Vulcan main engine and boosters) are controlled by the OBC (On-
Board Computer) which receives data from various sensors which are autonomous
complex sub-systems. The SRI (Inertial Reference System) is such a sub-system. It
provides flight data concerning the rocket position. The OBC as weIl as the SRI
have redundant hardware boards based on a recovery block. The first hardware
board is in operation till an error is detected; then, the second board replaces the first
one. These hardware systems execute complex software real-time applications using
a multitasking kernel. The programs executed on the two hardware platforms SRIl
and SRI2 or OBCI and OBC2, are the same.
Among its numerous treatments, the program of the SRI (SRIl or SRI2) calls a
function which make a conversion between areal value expressed in a particular
format and an integer value. This function was previously used in the software
managing the flight control of Ariane IV. Being dependent on the acceleration, the
actual values handled by this function at Ariane IV launch time were in a given
range. Unfortunately, the acceleration of Ariane V being higher, the conversion
function was called with a value out of this range. This situation raised an exception
during the function execution.
The fault-tolerance mechanism implemented in SRIl handled this erroneous
state, switching on the SRI2 redundant system. Executing the same program, the
same exception raised. Its handling by SRI2 consisted in communicating a diagnosis
data to the OBC before switching off the SRI system. Thanks to this information, the
OBC should continue the flight in a degraded mode, for instance extrapolating the
evolutions of the rocket positions. Unfortunately, the diagnosis data communicated
by the SRI2 were interpreted by the OBC as a flight data. Thus, the OBC reacted by
swiveling the engines.
2. ANALYSIS
So, it is difficult to adjudge the fault to apart of the system or to a partner of the
project. However, this example illustrates several aspects highlighted in the book.
FinaIly, the complexity of the global system and of the physical devices of its
functional environment often limits the integration testing.
FIRST PART
Exercise 3.1. Failures of a drinks dispenser
1. Static failure: the money change or return operations are incorrect.
2. Dynamic failure: when the machine has delivered a eoffee, the red light stays on
one minute before authorizing the next drink to be selected.
3. Temporary failure: this morning, the machine was unable to deliver tea.
4. Static and persistent failure: the ~$ coins are no longer accepted by the machine.
543
544 Appendix E
Modifled Graph
These possibilities depend on the implementation of this stack: hardware with gates
and registers, or simulation of the stack in the main memory.
A test sequence detecting this fault could be to Push 15 integers (from 1 to 15), and
then to Pop 15 values and to compare them with the initial values. Let us note that
this test sequence also detects the previous functional design fault.
External fault. The user of the stack ignores the signal indicating an overflow. The
sequence given for the case of design fault will transform this fault into a failure. In
both cases, the failure is the same, (but without any detection).
2. Anormal use of a stack is to apply a same number of Push operations than Pop
operations. If a fault provokes the application of a Pop action to an empty stack,
a failure occurs. The situation is very similar to the overflow studied in the
previous question, and the use of Stack_Ernpty signal allows the detection of
such situation.
Exercise 3.4. Study of a program
For an addition such as Exp1 + Exp2, where Exp1 and Exp2 are two arithmetic
expressions, the compiler generates a sequence of executable instructions allowing
the evaluation of these expressions. The two obtained results are placed in two
distinct registers. Then, the compiler adds an instruction which perform the sum of
the content of these registers. However, the programming language does not define
the order of evaluation of these two expressions: one can first evaluate Exp1, then
Exp2, or the opposite! This means that, for our example, we will compute first F1,
then F2, or the opposite.
Let us exarnine the functioning with '1' as the initial value of A.
After execution of F 1, A = 2, which is also the value returned by F 1. Then, after
execution of F2, the value of A and the value returned by F2 are equal to 4. Hence, B
will then take the value: 2 + 4 = 6. On the contrary, if F2 is evaluated first, the value
of A and the value returned by F2 are equal to 2 (A being initially equal to 1). Then,
the execution of F1 returns 3. Hence B will be equal to 2 + 3 = 5.
Consequently, according to the executable code generated by the compiler, the final
result of B is either 6 or 5!
What could be concluded from this analysis? The addition is a commutative
operation, so both interpretations of the compiler are acceptable. However, this
commutativity property is only effective for the addition of values (i.e. 5 + 3 = 3 +
5), and not for the addition of expressions having 'side effects' (in our example, the
execution of Fl and F2 modify A). Thus, a possible failure (only one of these two
interpretations is expected) may result from the fact that the designer does not know
how the technology he/she uses will operate on the source code. Here, the
technology deals with the implementation of the prograrn by the compiler.
Exercise 4.1. Latency of an asynchronous counter
The MSB (Most Significant Bit) will normally switch to the value 'I' after 8 clock
pulses. Hence, the latency is equal to 8 x 2 ms = 16 ms.
The fault will lead to a failure that remains 8 clock pulses and then disappear.
Exercise 4.2. Latency of a structured system
Error # 1: 10ms, error #2: 110ms, error #3 = failure of the system: 140ms.
546 AppendixE
abc N NI N2 N3 N4
000 0 0 0 0 1
001 0 0 1 0 1
010 0 0 0 0 0
01 1 1 1 1 1 1
100 1 0 1 1 1
101 1 0 1 1 1
1 10 0 0 1 0 0
111 1 1 1 1 1
Table E.2. Normal and erroneous functions
abc CS CI SI C 2 S2
000 00 00 00
001 01 11 01
010 01 01 00
01 1 10 00 01
100 01 01 01
101 10 00 10
1 10 10 10 11
111 11 11 10
Table E.3. Truth tables: without fault and with faults FI and F2
548 AppendixE
Table E.3 gives the output values without fault and with faults F 1 and F2• The
values noted in bald characters show the failures.
3. The two faults produce quite different failures. It is possible to distinguish
between these faults by applying to the circuit an input vector such as Oll. The
diagnosis is as folIows:
• if the output C only is erroneous, then fault F 1 is present,
• if outputs C and S are erroneous, then fault F 2 is present,
• if both outputs are correct, none of these two faults is present.
-010 0 0 1 0 0 0 1 0 1 0 1 1 1
011 1 1 1 1 1 0 1 1 1 0 1 1 0
100 0 0 0 0 1 0 1 0 1 0 1 1 1
101 1 1 1 1 1 0 1 1 1 0 1 1 0
110 1 0 1 0 1 1 1 0 1 0 1 0 0
111 1 1 1 1 1 1 1 1 1 0 1 1 0
Table E.4. Correct and erroneous functions
2. Let us assume that the functional faults can affect each gate by transforming it
into any other gate type: AND, OR, NOT, NAND and NOR. We illustrate these
faults with two cases (Table EA):
- FFI which transforrns the AND gate into a NAND gate:
z = (a b)' + c = a' + b' + c.
- FF2 which transforms the OR gate into a NOR gate:
z= (a b + c)' =a 'c' + b'c'.
These two failures do not belong to those induced by the stuck-at fault model. Going
further, we can wonder if this functional gate-transforming model is able to
complement the set of all theoretical failures (255 classes!). The answer is not: for
instance, the erroneous function Z = a b' + b c' cannot be obtained with these fault
models. Now the question not answered here is:
Can such ajailure occur, andjrom wh ich technologicalor junctionaljaults?
Answer to the Exercises 549
Correct Functioning
z= 0 0/1
state yl y2 DID2
x 0 1
1 00 01 01
2 01 11 10
3 11 00 00
4 10 01 11
z=l 1 z=O
Functional Fault
z= 0 0/1
state yl yl DIDl
x
z=l 1 z=O
2. The 'stuck-at l' fault noted a modifies the logieal expression of Dl which
becomes: Dl = yl.y2' + yl' .y2. Figure E.4 shows the new transition table and
state graph. Only one transition is modified : the are joining state 4 to state 2
when x = 0 is now going to state 3. Here also, we have transformed the hardware
fault model into a graph fault model.
550 Appendix E
If we apply the input sequence <0, 1, 0> to the initial state 1, the system goes into
states 2, 4 and 3 instead of state 2. However, no failure occurs at the output z! A
failure is produced if a new vector x = 0 is added to this sequence: the incorrect
circuit reaches state 1 instead of state 3 and gives a final output z =0 instead of 1.
Hardware Fault
z =0 0/1
state yl y1
z=l 1 z=o
SECONDPART
Exereise 7.1. The 'fault - error - failure - detection - repair' eyde
1. Figure E.5 shows the interpreted cycle:
Le J: latency of the fault according to the occurrence of the first error,
Lf. latency of the fault according to the occurrence of the failure,
D: detection time, R: repairing time,
SF: mean time of good functioning to the occurrence of a fault.
I I
t TIME
I
I
~
•
faDure diagnosil
l.e1 _ _+
U ~
~ ~ ~
D R SF
1. Structure 'parallel-series':
Rps =(1 - (1 - RI) (1 - R2» (1 - (1 - R3) (1 - R4»,
Rps =R 4 - 4 R3 + 4 R2, if Ri =R.
Structure 'series-parallel':
1 - Rsp =(1 - RI. R3) (1 - R2 . R4),
Rsp =2 R 2 - R4, if Ri =R.
2. Comparison of the two structures:
Rps - Rsp = 2 (R 2 - R)2 which is always positive; so Rps > Rsp.
Thus, the first structure is always more reliable than the second structure.
Answer to the Exercises 553
Note. As the faults altering the modules are independent, these reliability results can
also easily be determined by the composition reliability theorems. For example, for
the PS structure we have: Rps = P«l or 3) and (2 or 4)) = P«(1 or 3) . P (2 or 4)) =
(P(l) + P(3) - P(1).P(3)).(P(2) + P(4) - P(2).P(4)) = (2R - R2)2, if all modules have
the same reliability R.
Exercise 7.5. Safety analysis by a Markov graph
The evolution matrix which gives the probability to pass from astate to another
(with a sampling rate expressed by hour) is shown in Figure E.7. After two
elementary periods (hours), the probability to reach state 4 (considered as
dangerous) is equal to pl.p3 + p2.p4. The raising of this matrix to the successive
power of 2, 3, etc., gives the progression of the probability values to reach this
dangerous state (hour after hour). As this system does not posses any regeneration
mechanism, all parameter values always increase and are bounded by 1; this means
that the degradation probabilities increase with time.
Active wriJs
Spare down
t.. TI
The analysis of this graph can be performed by means of a finite state machine (non-
parallel model), called the marking graph, which shows all possible evolutions from
the initial state (3 tokens in PI and 1 token in P3). We can notice that the total
number of tokens is constant.
554 AppendixE
Example 0/ evolution:
(Pl=3, P3=1) - (Pl=2, P2=1, P3=1) - (Pl=3, PS=l, P3=0) - (Pl=3, P3=1), etc.
Exercise 7.7. Fault Tree and Reliability Block Diagram
The fault tree can be analyzed with the knowledge of the reliabilities of the basic
events (leaves of the tree). Hence, we start from the leaves and go up towards the
studied event which is the failure of the system. The probability at the output of a
AND node is the product of the probabilities at its inputs. The probability at the
output of a OR node (here with 2 inputs) is the sum of the probabilities at its inputs
minus the product of these probabilities (this can be generalized to a more
complicated formula for n inputs). The failure ofthe system has the probability:
F= Fl2 + F3 - Fl2.F3 = (1-R}).(1-R 2) + (1- R3 ) - (1-R}).(1-R2)·(1 - R3 ),
Hence, R = 1 - F = (RI + R2 - RI.R2).R3.
F = F12 + F3 - F3.F12
Figure E.lO shows the Reliability Block Diagram of this redundant system: two
modules MI and M2 in 'parallel', in 'series' with M3. The analysis by the method
already studied gives the reliability: R = Rl2 . R3 = (1 - (1 - RI).(l - R2)) . R3 = (RI
+ R2 - RI.R2).R3. We obtain the same result.
P
P(c) = (c + 1)/100
0,1 I--_\____ P(c) =(19 - c)/IOO
c
o 9 18 99
4, and finally reaehes state 2 by the are 4-2. In that ease, the funetional
redundaney eannot be expressed as a redundant are but as a redundant path:
<4 - 4 - 2> is redundant.
a ab fg
f
b 00 I I
01 10
ß g 10 01
I I 10
2. We deduee from the previous study that the line a is totally superfluous. Thus, it
eorresponds to passive redundaney. Going a step further, the analysis shows that
the output g is independent from the produet term a.b produeed by the AND
gate. Thus, this gate (noted X in the figure) can be removed, the NOR gate
producing g being hence a simple INVERTER.
3. The truth table shows that the input eonfiguration (00) never oceurs at the output
of the eireuit: this eorresponds to an output junctional redundancy.
Exercise 8.4. Structural redundancy of several circuits
We suggest the following analysis to deteet possible 'structural redundaneies'. We
establish the logical expression of eaeh node of the eircuit, starting from the primary
inputs (the extern al inputs of the eireuit) and going backwards to the primary outputs
(the externaioutputs of the eireuit). At each step, the resulting logical expression is
analyzed in order to determine possible simplifieations. If such simplifieations exist,
then they reveal structural redundancies.
Answer to the Exercises 557
THIRDPART
Exercise 9.1. Requirement analysis
Two families of entities are defined in the text.
The first one concerns the capability of the product to be moved. This notion is
specified by two entities: the product must be contained in a hand and the product
must be moved by car.
The second family concerns the notion of autonomy specified as maximized.
Let us note that numerous specifications can be derived from these requirements.
For instance, the autonomy can be provided by an efficient battery included in the
mobile phone, and/or by a connection to the car battery.
Exercise 10.1. Verification ofthe adder
The functional fault considered transforms the adder into the circuit of Figure E.14.
a_--r:~
b ~-4-.-~
S S=aE9bE9c
There is a failure on output C each time one input only is at '1': so there are three
erroneous vectors.
2. Verification by double transformation with intennediate model. We choose
as intermediate model the modular description of the adder as two interconnected
half-adders. The fault modifies each one of these half-adders: the behavior with
and without fault is the same only when both inputs have identical values. The
combination of these two modules is correct if and only if: a = b = c = 0 or
a . b . c = 1. All others vectors give a wrong output.
Coba ElItend
(Val••_C'!!")
01D ElItend ColD. R. turnod AmoRat Provfded :-
alu"ö_ColD) (VoIüo_Co 111) Am01lll(Providod + Val.o_Coba
D rink_S.lecUd
- COID_Eat.~
(Valuo_Coi;)- ).-
Call.oo.l1ol1
-
Cofr"_AvaIIablo
Coffet: AvaUablo
Do.e Number-
CO ...._Retumed
(Amollllt_Provided - 75c)
1) Interface 2) Behavior
At first, we define the set of all paths, we give a name to eaeh path, and we
enumerate its states.
• Enter a eoin and caneel: {I, 2, 3, I}.
~ Enter two eoins and eaneel: {I, 2, 2, 3, 11. The presenee of a loop from state 2
introduces an infinite number of paths; we limit the number of iterations to 1.
• Order a coffee after having entered a suffieient number of eoins, if the number of
doses is greater or equal to 1: {I, 2, 4, 5, I}. The eondition labeling the are (4, 5)
induces a domain of values for the eouple (Amount_Provided, Dose_Number). It
is necessary to take a value ~ 75e (for example 1$) and a number of doses> 0
(for example 2). Moreover, we must apply 'limit tests', i.e. 75e and 2 doses, then
1$ and 1 dose. Consequently, 3 sequenees must be defined for this path. This
situation shows also an interesting aspeet dealing with the memory implied by
the used variables. Indeed, when we apply the first part of the sequenee {I, 2, 3,
1 }, the expected behavior is the same, whatever the past of the system.
Moreover, this behavior will have no effeet on the future. On the eontrary, in
order to test the behavior of the system when only one dose remains, it is
necessary to first apply sequenees leading the system in the required initial state
(one dose only); these sequenees are ealled initialization or homing sequences.
Finally, onee the test consuming the last eoffee dose has been performed, it will
be neeessary to eontinue the test proeedure with test sequenees, assuming a null
number of doses. To eonclude, the various test sequenees we have defined are
not independent; henee, these fragments must be seheduled in a coherent order,
maybe with extra link sub-sequenees.
• Case where the user orders one eoffee after entering a suffieient number of eoins,
but when there are no more eoffee doses: {I, 2, 4, 3, 1} . As said before, the
preeeding parts of the global test sequenee must have led to a situation where no
eoffee doses remain. Otherwise, the eondition labeling the are (4, 3) being
eonstituted by a Boolean expression using one OR, it is also neeessary to test the
opposite situation, e.g. 'AmounCProvided < 75e', and the two simultaneous
situations, i.e 'AmounCProvided < 75e and Dose_Number =0' .
From this analysis, we ean deduee the various pieces of sequenees assoeiated with
each tested fragment of behavior. Table E.5 gives an example for the first ease
considered. We will not develop the whole set of test sequenees. Its obtaining is easy
as far as we take eare of the necessary relationships between those fragments,
aeeording to the state of the system. This job may seem to be tedious. However, it is
systematie, providing a good guarantee that the resulting funetional test sequenee
aetivates properly the whole set of possible behaviors.
Input Output
Coin_Entered (50e)
Caneellation
Coins_Returned (X$)
Table E.5. Sequence
562 AppendixE
We must check the module performing the subtraction (El - E2) with different
exponents: (El > E2), then (El < E2), positive values, then negative values. These
operations also verify the circuit which performs the 'adjust' operation (right shift of
the mantissas).
Then, we must check the circuit ca1culating the final 'sign' of the result. For this
purpose, we make several '+' and '-' operations with numbers having the same sign,
and finally opposite signs. The sign S of the final result must take into account the
carry corning from the +/- circuit. Hence, we consider a situation such that
M' 1 > M'2 for an adding control (signal +/-): e.g. subtraction of two negative
numbers, the absolute value of the subtracted one being greater than the first one. If
the result ofthe circuit '+/-' is greater than 1, there is a carry, and we must perform a
normalization operation, i.e. add '1' to the exponent and make a one-figure shift to
the right of the mantissa.
Finally, the overflow situations must be considered. For example, we add two
negative numbers with maximum value positive exponents (+999 if Eis expressed
with 3 digits), and such that IMli + 1M2 I ~ 1.
Exercise 10.8. Inductive formal proof
1. We must demonstrate that Al ==> A2 when R ~ B after the execution of R := A
and Q := O. The second condition of A2 is evident: it is the loop assertion(R ~ B).
As R := A and Q := 0, then Q*B + R = O*B + A = A. So, the first condition of A2
is true.
2. We must demonstrate that Al ==> A3 when R ~ B is false after the execution of
R := A and Q := O. The condition 'R ~ Bis false' implies that R < B. We have
Q*B + R = Q*B + A =A. So, the first condition of A3 is also true.
3. We must demonstrate that, when [A2 is true and R := R - Band Q := Q + 1 are
executed and then R ~ B ], then A2 is true with the new values of Rand Q. Let us
note Rb and Qb the values of Rand Q before the execution. The hypotheses are
A = Qb*B + Rb (relation 1), and Rb ~ B (relation 2). After execution of the loop
statements, we obtain R = Rb - B (relation 3) and Q = Qb + 1 (relation 4). The
relation R ~ B is true due to the loop condition. We must demonstrate that
A = Q*B + R. Relations 3 and 4 give: Q*B + R = (Qb + 1)*B + (Rb - B) = Qb*B
+ B + Rb - B = Qb + B + Rb = A (relation 1). So the second condition is
demonstrated.
4. The demonstration of A2 ==> A3 after the execution of the loop statements and
when condition R ~ B is false is quite similar concerning the second condition.
The second condition R ~ B is due to the negation of the loop condition.
Exercise 11.1. Component choice
Failure rate of the first structure. The failure rate is the sum of the failure rates of
the components (as these values are very smaII: this would not be true otherwise!):
Al = 12.10-7 + 1.10-6 + 3.10- 5 = 3.22.10-5•
Failure rate ofthe second structure: 1.,2 = 4.10-6 •
Thus, the second structure has a better reliability than the first one.
564 AppendixE
Note. This exercise does not consider the influence of temperature or radiations on
the reliability of these components, or their mutual influence.
Exercise 11.2. Comparison of the reliability of two products
The two failure rates 1..1 and 1..2 evolve according to power of 10. Hence, IOglO(A.1)
and IOglO(A.2) are linear. We deduce from this the two logarithmic equations for the
two products:
1) For 1..1: IOglO(A.1(1) = [lOglO(10. A.01) - IOglO(A.01)]T 1 (38-18) + b, where A.01=
1..1(18°).
When T = 18°C, we have b =-59/10.
So, IOglO(A.1(1) =TI20 - 59110.
2) The same reasoning for 1..2 gives IOglO(A.2(1) =TI10 - 88110.
Finally, we deduce T for 1..1 = 1..2: from 58°C the reliability of Pl becomes better
than the reliability of P2.
Exercise 11.3. Shared FIFO
1. The result is hazardous, as the data structure (array and indexes) is shared (same
situation as in sub-section 11.2.2.2). Moreover, the data structure value may be
incoherent. For instance, consider the scheduling described in Figure E.17,
where WI expresses the variable Wr i te_Index.
TIME
11,
Let us show that the result is hazardous. On the left side of Figure E.I8, the result is
unchanged at the end of the execution of Taskl then Task2. This is the expected
result, as the value of I is incremented and decremented. On the contrary, on the
right side of Figure E.I8, each task saves and restores its own context (in a local
Task Control Block) at each task switch. After the execution of the second line of
Taskl (Ine AX), this value is not transmitted to Task2 that decreases its own copy
ofAX when executing its second line. Thus, the final value of I is incorrect.
Exercise 12.1. Signature testing
1. The sequential treatment of the binary flow comprises 64 (i.e. 1024 / 16) XOR
operations on consecutive 16-bit words. If we suppose that the signature of the
faultless circuit is known, any multiple error altering one or several words is
detectable if and only if any modified bit of a word is also modified an odd
number of times in the same position of several words. According to the output
stream, this corresponds to erroneous bits repeated an odd number of times
566 AppendixE
modulo 64. For example, a multiple error altering bits 1, 15, 65, 121 is
detectable: errors 1 and 65 neutralize themselves, but each error 15 and 121 is
detectable. All functional or technological faults producing such errors are thus
detected. All other faults are undetectable.
2. Without any knowledge about the electronics implementation of this system
(gate or MOS structure), one cannot deduce any class of technological faults that
produces the preceding errors.
The NOR gate is symmetrieal, according to its inputs a, band c (Figure E.19).
Consequently, the study can be reduced to the case of one input only, e.g. a. The
other inputs (b and c) must be set to the value '0' in order to let the error pass to the
output. If an input is set to the value '1', it forces the output to the value '0'.
1. Optimal test sequence. There is only one optimal test sequence comprising 4
test vectors: <000, 100, 010, 001>.
2. Coverage. Each input vector covers some stuck-at faults of the 110 Iines. Table
E.6 shows the fault coverage of each input vector.
We notice that some input vectors have a very small coverage; they should not
be taken to test this circuit; thus, (Oll), (101), (110), and (111) test the stuck-at 1
of line d only. On the contrary, the vector (000) covers half of stuck-at 0/1 faults.
Input vectors Test coverage
abc a b c d
000 1 1 1 0
001 - - 0 1
010 - 0 - 1
01 1 - - - 1
100 0 - - 1
101 - - - 1
1 10 - - - 1
111 - - - 1
Table E.6. Fault coverage
Figure E.20 shows the coverage curves of: 1) the exhaustive sequence, 2) the
optimal sequence, and 3) the very simple toggle test sequence <000, 111>.
Faults coverage
\00% 8 I l i J I i i 1 -!-·-t'- - -'F--+-1 - -
7 - -+--Jt-I~I-tl-+I-+-1- - r - i r -ll'"'""ti - -+-r - -
6 - _. I I I +-1-+-j-+-1- -+--i-+--1---t-'r -
! - -t-l ! t I I I I I J
3 _ I I I I I I I -I,__+-~I__
2 I ! ! I I ! i I I
\ 1 I I I i " !
o i ! I I i i ; I I I
000 00\ 010 Oll 100 10\ 110 '1 Il 000 100 0\0 10\ 000 111 Input
Exhaustlve Optimal Toggle Vectors
sequenc:e sequenc:e
We observe that all stuck-at 0 faults are detected by distinct test vectors. Hence, they
can all be distinguished by the sequence <000, 001, 010, 100>. All these faults can
also be distinguished from the stuck-at 1 fault ofthe output. We also observe that all
stuck-at 1 faults of the inputs and the stuck-at 0 of the output are detected by the
vector (000) only. Consequently, they cannot be distinguished from the outside.
Hence, they are said to be equivalent.
Exercise 12.6. Optimal test sequence
Figure E.21 rerninds the gate structure of the circuit. Input vector 1 (respectively 2)
apply 11 to gate A (respectively B), and 10 (respectively 01) to gate C. (see Table
E.7). Hence, any stuck-at 0 fault is detected: activated as an error, and the error
propagated to fLet us note that the input vector 111 would apply 11 simultaneously
to gates A and B, but no stuck-at 0 of these gate would be observable on fLet us
also note that when receives 11, B (or A) receives one of the vectors 10 or 01;
unfortunately, these vectors cannot be 'counted' as belonging to the minimal AND
test sequence, because gate C will not propagate any error coming from B (or A).
Hence, input vectors 3 and 4 are necessary to apply the missing configurations: 01
and 10 to the AND gates, and 00 to the OR gate. These vectors will detect all the
stuck-at 1 faults: activation as an erroneous '1' error, and propagation of this
erroneous '1' to the output. The optimal test sequence has 4 vectors: <110, 011, 010,
101>.
Inputs Gates
abc A B C
vectors 01 10 11 0110 11 0001 10
1 110 X X
2 01 1 X X
3 010 X X X
4 101 X X X
Table E.7. Optimal test sequence
ST1 ST2
e 0110 011011001
q 2431 243124231
s 1010 101010110
Table E.8. Correct and erroneous functions
Table E.9 shows the results obtained. There are three 'best test vector': 010, 100
and 110. Each one covers 4 faults. The input vector having the lowest coverage
is 111 with only I fault detected.
2. Minimal test sequence. Some faults are detected by one test vector only; it is
the case of faults 11 (vector 010), 1° (vector 110), i (vector 100), and 2° (vector
110). Hence, the 3 vectors 010, 100 and 110 belong to any minimal test
sequence. There are three minimal-Iength test sequences, for example TS =<001,
010, 100, 110>.
570 AppendixE
abc 1 2 3 4 S
000 - - 1 1 1
001 - - 0 - 0
010 1 - 1 1 1
Oll - - 0 - 0
100 - 1 1 1 1
101 - - 0 - 0
1 10 0 0 - 0 0
1I 1 - - - - 0
Table E.9. Fault table
Then, the fault is propagated to the output with the same exploration procedure until
a solution is found (if any test vector exists). Obviously, this very simple technique
can be long to converge towards a solution if the first possible test vector has many
'1' values for a, b, c, etc.
Let us now complete the given procedure:
Now, the objective is to propagate the eITor through gate E. A propagation
towards gate E of the known values is performed (if it has not already be done!):
E = 0, so the eITor cannot be transformed into a failure on outputf. We make a
backtracking in the input assignment.
Input c is set to 'x', and input b is switched to '1', and a propagation is
performed: the fault remains passive.
Input c is set to '0', and a propagation is performed: A = 1, B =0, C = 1, hence
the fault is activated.
Now, the objective is to propagate the eITor through gate E. Input d is set to '0',
which forces output f to '1': this is a case of inconsistency, so we go backwards.
Input d is switched to '1', and the eITor is finally propagated to f.
Thus, we obtain the test vector: (a b cd) = (1 1 0 1).
2. This procedure makes a backward propagation along one path from the fault
activation. At a given gate, if several vectors satisfy the desired output value, we
choose the easiest path only (the closest path to the primary inputs). If several
inputs must be set (to '0' or '1 '), we choose the hardiest path first (having the
higher number of gates to the primary inputs). As usual, when the fault is
activated as an eITor, we try to propagate the eITor along a path. All the process
uses a backtracking technique in case of inconsistency. This method is close to
the PODEM algorithm.
Input b is switched to 1, and we perform a propagation action: it brings nothing.
Input c is set to '0', and we perform a propagation: A = 1, C = 0, so the fault is
activated.
Now, the objective is to propagate the eITor through gate E. Input d is set to '1'.
We obtain the same test vector (a b c d) = (1 1 0 1). It is the only vector which
detects the fault.
Let us note that this procedure is not pertinent for this circuit. Indeed, at step 5 we
have chosen to set b to '0' in order to force A = 1; this led to an inconsistency. Then,
we have abandoned this path to try another one. Instead, at this point, we can try the
second way to have A = 1, which is to set c to '0'. Then, the procedure sets a and b
to '1'. Thus, the test vector is rapidly found.
Exercise 13.4. Fault coverage of a test vector
1. The structural analysis of the circuit gives the fault detection table shown in
Table E.I0): detection at outputj, and at output g.
Note about 'reconvergent fanout' structures. We observe that the stuck-at' l' of line
2 is not detected on output f this fault produces an eITor on line 9 and an opposite
eITor on line 10, these eITors being neutralized by gate 13. We also note that the
stuck-at 1 of line 3 is detected on outputf it produces two identical eITors on lines
572 Appendix E
13 and 15, which propagate through the output gate givingj This same fault is also
detected on output g: it produces two identical errors on lines 15 and 16, which
propagate to the output g .
abcd 1 2 3 4 5 6 7 8 9 10111213 14 15 16 17 18
1001 detection on f - - 1 - I I I - 0 0 0 o 0 0
detection on g - 1 - - -- - - - - 1
2. This test vector covers 11 of the 36 possible faults. The theoretical maximum
coverage of a test vector is 18. Now, we can try to find the best test vector by
analyzing the structure of the circuit. We know that the best test of ANO, OR,
NANO and NOR gates is obtained when their inputs take the neutral element
value. The worst case is when all their inputs take the opposite values. The
circuit we consider is made of a mix of ANO, NANO, OR, and NOR gates; it is
easy to see that no input vector will apply the optimal configuration to each gate.
So no test vector will covers 50% of the faults! A good exploitation of all these
local constraints is given by the input vector (0101) which covers 13 faults!
Exercise 13.5. Diagnosis of a circuit
1. Faults 2° and 5° are activated by the same constraint: b = 1. Fault 11 1 is activated
by b = 0, or by b = 0 and c = 1. Hence, we can separate these two groups of
faults by applying test vectors with b = I, and test vectors with bc = 01. A first
test vector could be (a b cd) =(- 0 I 1) which detects fault 11 1 at output g. Now,
we must try to distinguish faults 2° and 5°. Fault 2° can only be detected onfby
applying the input vector (1110). Faults 2° and 5° can be detected on f (through
the path 11 - 10 - 13 - 17) by the input vectors (010-). So, here is an example of
diagnosis sequence: DS =<0011 , 0100, 1110>.
The related fault tree is drawn in Figure E.24. It gives all information necessary
to diagnose one of these faults . For instance, if the signature corresponding to
the application of the test sequence is <OK, f KO, f KO>, then the identified
fault is 2°.
2. We first determine all faults detected by vectors (1000), (1001), and (0110) on
outputs f, g and both. This step is achieved by using the backward analysis
method presented in Chapter 13. We obtain the partial fault table of Table E.ll.
Note that some faults are detected onjonly, on g only, or on both outputs.
From this table, we deduce the fault tree (Figure E.25) corresponding to the test
sequence: TS = <1000, 1001,0110>.
T abcd 1 2 3 4 S 6 7 8 9 10 11 12 13 14 IS 16 17 18
Tl 1000f - - 1 - I I - - - 0 0 0 0 0 - - 0 -
g I
T2 100 I f - - I - I I I - - 0 - 0 0 0 - 0 0 -
g I I
T3 o I IOf 1 - - - - - - - I I I - I - - - I -
g 1 0 0 I
Table E. II . Partial fault table
3. The diagnosis power of this sequence is not good. Many faults belonging to the
resulting fault c1asses can easily be distinguished by adding other test vectors.
X
{51, 61,10°,11°,12°,13°,14°, 17°}
OK
OK
{.p, 1°,2,30,4, SJ, GI, ,0, So, X
9°,12\ 141, 15, 16°, ISo} {51, 6\ 10°, UO, 13°, 14°, l'tl }
a _---r::::::-l
c
10 11
a - __--r-::-l 1------ S
b --++"''-i
4
3 c
1. Function test sequence. A very simple functional sequence will make one
= =
addition with (SC) (00), and one addition with (SC) (11). This sequence is
TSI= <000, 111>. The two first lines of Table E.12 show faults detected by this
sequence. These faults have been determined by the structural method proposed
in Chapter 13 applied to the logical structure (Figure E.27).
Test abc 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Sequence
Functional 000 1 1 - 1 1 - I 1 - 1 1 - 0 0 1 1
I1I 0 0 0 0 0 0 I 1 - 0 0 - 1 - 0 0
Toggle 101 0 0 - I I - 0 0 0 0 0 0 - 1 I 0
001 1 1 - II - I 1 1 0 0 - 0 0 0 1
010 I 1 1 0 0 - 0 0 - 1 1 1 0 0 0 1
Complete 100 0 0 - 1 1 - 0 0 0 0 0 0 - 1 1 0
Table E.12. Test coverage
2. Toggle test. In order that each line takes values '0' and '1', we add the test
vector 101; hence, the sequence becomes: TS2 =<000, 111, 101>. The third line
of the table shows faults detected by this new vector. We observe that TS2 does
not detect 4 faults, confirming the fact that a toggle test is generally not sufficient
to test every stuck-at fault.
3. Complete test sequence. Faults not detected by sequence TS2 are 31, 61, 91 and
12 1 • To detect these faults, we must add three test vectors to TS2 : 001 , 010 and
100. The resulting complete test sequence has then 6 vectors: TS3 = <000, 111,
101, 001, 010, 100>. This complete test sequence is not optimal in terms of
number of test vectors. An optimal sequence, is made of 5 test vectors, such as:
TSop =<001, 010, 100, 110, 101>.
abc 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
OOOS 1 1 1 I I 1 I I I
C - - - - 0 0 I
I lOS 1 1 0 0 0 0 1 I 0
C 1 1 - - I I 0 0 I
I I IS 0 0 0 0 I 1 0 0 0
c 0 0 - - I - 0
Table E.13. Fault table of the full-adder
Then, we deduce the fault tree, drawn in Figure E.28, allowing the diagnosis of the
test sequence <000, 010, 111>. To simplify the representation, all impossible
situations are not represented . This tree partitions all the 33 possibilities (32 faults +
one good state) into 11 groups. All the elements belonging to a group cannot be
distinguished by the sequence (they are said to be equivalent with regard to this
sequence). In particular, it is not possible to answer the question: "is the circuit
faultless?".
abc 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
000 1 - - 1 - - - - - 0 1 - 1 1 0 1
001 1 1 - 1 - 1 - - - 0 1 - 1 1 0 1
010 1 - 1 1 - - - 1 - 0 1 - 1 1 0 1
011 - 0 0 - - - 0 0 - - 0 - 0 - 0 0
100 0 - - 0 - - - - - 1 1 1 1 1 1 1
101 0 1 - 0 - 1 1 - - 1 1 1 1 1 1 1
110 0 - 1 0 - - - 1 1 1 1 1 1 1 1 1
111 - 0 0 - - 0 - 0 - - 0 0 - - 0 0
4
a
2
b
c
:
5 7 9
°
As the domain of initial_temperature is not discrete, we test three
values: -50, 30, 150, and the limits and 90. These values are combined with the
discrete values of the other parameters.
3. When this sequence is applied, the coverage depends on the elements considered.
For a statement testing, the coverage is not 100%. Indeed, the default part of the
switch statement ofthe function modify_temperature is never executed.
Exercise 13.14. MCIDC testing of a program
Table E.15 gives a sequence of Boolean values for ConditionIDecision, compatible
with the requirements of MCIDC Testing.
Condition Decision
A=B C2 D>3 Action
True True True True
False True True False
True True True True
True False True False
True True True True
True True False False
True True True True
Te~t:
OutputSTI
Olltpllt
Telt:
output ST2
We suppose that new inputs and outputs can be added to the circuit, in order to
increase its controllability and observability. Figure E.30 shows the modifications
proposed to cut the feedback line between the two modules and to directly observe
the outputs of each module.
Hence, two inputs (/TI and IT2) and two outputs (STl and ST2) have been added.
When ITl and IT2 take the value '0', we block any uncontrolled evolution of the
circuit. Hence, each module can be directly accessed.
Exercise 14.2. Analysis of a redundant circuit
1. We have: /1 = a' c, j2 = a' c, ß = a + c'. The output variables do not depend on
input b: thus, this circuit is redundant. In particular, the stuck-at 1 fault shown in
Figure E.3I cannot be detected from the inputs/outputs; indeed, to test this fault,
we must satisfy:
• the controllability constraint: b = 0,
• the observability constraints: a = 0 and c = 1; this produces a '1' on line x, a '1'
at the output of gate A, and finally a '1 ' at output 11 which masks any detection
of the fault.
2. The second circuit has a different logical behavior: /1 = a' b' c + a b + b c',
j2 = a' c, ß = a + c' . The output 11 is a logical function of input b. Obviously,
this circuit could have been realized with a SIGMA-PI structure; however, it is a
totally testable circuit.
a
C --'--.y,)--I
b
f2
a
I--..L...-_- f3
C
2. The anti-glitch circuit can easily be modified as shown in Figure E.32 in order to
make it completely testable. When T = 0, the outputs of gates Band C are equal
to '0', and we can test all stuck-at 0 faults of gate A .
T-,----t
abcT comments
01 10
o1 0 1 these three vectors
001 1 detect all stuck-at' l' faults
I 1 1 1 detects all stuck-at '0'
Table E.l6. Complete test sequence
r--,. a$bc$ad
output
b c a d
inputs
2. We determine the logical expression of this circuit (see Figure E.33), and we
compare this expression with the given SIGMA-PI expression. To facilitate this
logical comparison, we may use an intermediate verification model such as a
truth table (which corresponds to a canonicallogical form).
3. A XOR network has a very interesting property concerning error detection: any
single error occurring on one input is automatically transmitted to the output.
Indeed, a single input error changes the parity of the number of 'I' inputs. As a
XOR network produces an output 'I' if and only if an odd number of 'I' is
applied to it, any change in the input parity of 'I' values provokes a modification
of the output. The 4 vectors of sequence TS1 apply to each AND gate the three
testing configurations 11, 01 and 10. Hence, every fault of each AND gate is
activated as an error which enters the XOR network, and is consequently
propagated to the final output where it can be observed.
4. Electronics specialists have shown that a 2-input XOR gate is fully tested by the
exhaustive input test sequence only. With the previous TS1 sequence, this
property is not satisfied. It is very easy to verify that the proposed 5-vector TS2
sequence applies the 4 input configurations to any XOR gate.
xy
1 2 3 4
ANDparity
a
b c:t=:f=~i=~==~= AND - Net
c
Shift Register
ORparity
3. Test sequence. It is made up of two parts: one sequence of 6 vectors to test the
AND network, and a sequence of 4 vectors to test the OR network.
AND network test. xy = 01 -+ TSl = <011,101,110>.
xy = 10 -+ TS2 = <100,010,001>.
For example, the test vector Oll forces the first line of the OR matrix to take value
'0', all other lines being at '1'. Thus, the 4 product terms take the values 10 11;
582 AppendixE
hence, if no faults are present in the AND network, the parity error line is at '1'. Any
fault equivalent to a stuck-at 0 of one active node (represented by a dot in the figure)
belonging to columns 1,3 and 4 is detected.
OR network test. A '1' bit is shifted 4 times from left to right in the shift register
(scan in input). All single or multiple permanent hardware faults are detected, apart
from the ones that do not modify the parity property of the AND parity vector (4
bits) and the OR parity vector.
Exercise 14.7. Scan Design
For each test vector Vi:
• The circuit is switched to Test Mode.
• Aserial input operation through the Scan In input is performed, in order to load
in the state register the 4-bit state belonging to the Vi test vector. This state
loading operation takes 4 clock couples (HM - HE); in parallel, the state register
containing the result from the previous test vector is read.
• The circuit returns to the Normal Mode, and one normal treatment step is
executed with one clock pulse (HM - HE).
A last reading of the interna! state register completes the test sequence.
Exercise 14.8. LFSR
1. This generator elaborates a deterministic cyclic sequence of 3-bit vectors. It is
based on a synchronous shift register whose input is the XOR of bits 1 and 3.
Hence, the initial condition gives the starting point of the sequence produced;
this sequence is shown in Table E.17.
Clock QIQ2Q3
0 010
1 001
2 100
3 110
4 111
5 o1 1
6 101
Table E.17. Sequence
2. The modified circuit behaves as a LFSR. From the initial state 111, we obtain the
following cycIic output sequence: <111, 011, 001, 001, 100,010, 110>.
Let us note that the LFSR property is not guaranteed for any XOR feedback
function. For example, if we take the XOR of bits Ql, Q2 and Q3, and if the
initial state is 111, then the circuit remains al ways in this state!
3. We analyze in Figure E.35 the evolution of the circuit from the initial state 100
when the first input vector 111 is received. Values in gray are the next state of
the register. This study can easily be extended to the rest ofthe applied sequence.
Answer to the Exercises 583
I.
(lk. --"""'T'"---I---"""'T'"-- --....,
1 1
Ql Q2
FOURTHPART
All multiple errors with rank higher than 2 cannot be detected. For example, if
YI, Y2 and Y3 are false, the resulting syndrome is equal to '0', hence this tripie
error is not detected. If only single errors occur, they are detected. Moreover, a
non-null decimal value of the syndrome indicates the position of the erroneous
bit, as shown in Table E.18: this property justifies the chosen relation order.
2. Any 'double' error is confused with a 'single' error when considering the value
taken by the syndrome. For example, we have shown that the double error
altering Y3 and Y6 produces the syndrome value s = (1 0 1): this error has the same
effect than a single error altering Ys.
3. This code is very close to the one presented in Example 15-4. Indeed, it
corresponds to a simple re-organization of the coding relations. Consequently,
both codes have the same detecting and correcting capability. The interest ofthe
version of this exercise is only to facilitate the identification of the erroneous bit.
4. In order to allow the detection of single and double errors, and to allow the
correction of single errors, we add a height redundant bit obtained by the E9 of all
the bits. This redundant bit add a fourth control relation:
YI E9 Y2 E9 Y3E9 Y4 E9 Y5 E9 Y6E9 Y7E9 Y8= 0 (4)
This relation produces the fourth bit of the syndrome, s4. Thanks to this fourth
relation, we can distinguish between any single error, which lead to s4 = 1, and
any double error, which maintains s4 = o. This new code is called the modified
Hamming code C(8, 4).
Answer to the Exercises 585
G_[~~1001~~~1
- 0101010'
1101001
0001111]
H= [ 0110011.
1010101
110 1
0001111] 1 0 1 1 0000]
We verify thatH.GT = 0: [ 0110011 . 1 0 0 0 = [ 0000 .
1010101 o111 0000
o10 0
o 0 10
o 0 01
2. Coding: Y = U.G, i.e. [yj, Y2, Y3, Y4, Ys, Y6, Y7] = [uj, U2, U3, U4] . G.
For example, if U = [11 0 1], then Y = [1 0 1 0 1 0 1].
wl
w2
w6
w7
o
1
586 AppendixE
InputU
OutputY
UJ.-I _. Uo PM-I •• Po
During this phase, the state of the 3 D-Flip-F1ops, initially at '0', evolves as shown
in Table E.19. Then, the content of the register is shifted to the output Y. Hence, the
codeword is y =(0100011) corresponding to the polynomial: y(x) =x + x 5 + x6.
Now, let us calculate the codeword by a direct division of i n-k) u(x) by g(x), which
gives x(n-k) u(x) + r(x) :
g(x) =x3 + X + 1
x 3 + x2 + X =quotient
0+ x5 +x4 + x 3
X 5 +X3 +X2
r(x) = x
1 101000
. 0110100
We venfy that [0 1 000 1 1] = [00 1 1]. [
1110010
1010001
Exercise 15.5. Single parity bidimensional code
This code uses redundancy at two levels: Longitudinal Redundancy Checking bits
(noted LRC) are added to each word, and Vertical Redundancy Check words are
added to the block (VRC). Each row and each column of the coded matrix belongs
to an error detecting and correcting code.
1. One parity bit is added to each word (LRC) , and a word (VRC) is added to the
block. Table E.20 gives an example of coding with p = 5, k = 4. After treatment
(e.g. a memory storage), a parity check is applied to each word, and a parity
check is made between all words. Any erroneous row or column is recorded.
2. Any single or multiple error is detectable if at least one error occurs on a row or
a column. It is obviously the case for any odd multiple error. It is also the case
for some even multiple errors; for example, a quadrupie error on a same word
will be detected four times.
3. To be undetectable, an error must have an even rank on each altered row and
each altered column. For example, the quadrupie error altering the bits of rows 1
and 3 and columns 2 and 4 cannot be detected, as no parity violation occurs.
4. Any error detected on rows 4 and 5 and columns 2 and 3 is a double error. Two
errors can produce this signature (Table E.21), but we cannot identify which one
is present:
X R X R
abcd efg abcd efg
0000 100 1000 01 1
0001 01 1 1001 010
0010 o1 1 1 010 010
001 1 010 10 1 1 001
0100 o1 1 1 100 010
0101 010 110 1 001
01 10 010 1 110 001
o1 1 1 001 1 111 000
Table E.22. Berger code for k = 4
Answer to the Exercises 589
2. The Berger code is separable and can thus be structured into two fields (X, R),
where X is the word before coding and R the redundant field. R is the binary
number of '0' bits of X. Let us first consider a unidirectional fault that increases
the number of '0' bits of the complete word. Three cases can be considered:
• If the X field only is altered, the number of '0' of X is increased and becomes
greater than the value in R: hence, this error is detected, as NbZero (X) > R,
• If the R field only is altered, the value of R decreases whilst the real number of
'0' bits of X is not modified: here also this error is detected, as R < NbZero (X),
• If X and R are both altered, the value of R becomes again smaller than the
number of '0' of X which has increased, so the error is detected.
The reasoning is similar with a unidirectional error that reduces the number of
'0' bits of the complete word.
3. Now, R is the binary expression of the number of '1' in X. We follow the same
reasoning as in the previous question with first an error that increases the number
of '0' bits of the complete word:
• If the X field only is altered, the number of '1' of X is decreased and becomes
smaller than the value in R: hence, this error is detected, as NbOne (X) < R,
• If the R field only is altered, the value of R decreases whilst the real number of
'1' bits of X remains unchanged: here also this error is detected, as R < NbOne
(X),
• If X and R are both altered, the value of R decreases whilst the number of '1' bits
of X also decreases: the error is not necessarily detected.
o + 2 = 2 [9], and we observe that the resulting dass belongs to the same dass as
the expected final result: 236 = 2 + 3 + 6 [9] = 11 [9] = 2 [9].
• In order to verify the second operation, we multiply the two classes,
o x 2 =0 [9], value which is different from the dass of the expected final result:
8867 = 8 + 8 + 6 + 7 [9] = 29 [9] = 11 [9] = 2 [9].
• The third operation is verified by subtracting the two dasses of the numbers,
0- 2 = 7 [9], value which is different from the expected resuIt: 144 = 1 + 4 + 4
[9] = 9 [9] = 0 [9].
Hence, we have detected an error on the operations 2 and 3. However, we cannot
correct those errors, as this code is only an error detecting code. Moreover, all
the faults are not detected, as shown by the fourth operation:
• 189 - 47 [9] = 7, and 97 [9] = 7; however, the correct value is 198 - 47 = 142,
which is different from the proposed result 97. This condusion is generalized by
question 4.
3. With the example 48 / 12 = 4, we obtain 48 = 3 [9], 12 = 3 [9], 3/3 = 1, value
which is different from the dass of the correct result: 4.
4. An error transforms a result N into another number N* = N + E (E is the error,
either positive or negative). This error is not detectable if and only if N* = N [9],
that is to say, if Eis a multiple value of9.
others, hence constituting a block of six 4-bit words. Here also, this computation
can be made, either globally, or in a cumulative way:
• 1101 + 0011 = 0000 (the carry is ignored),
• 0000+ 1110= 1110,
• 1110 + 0110 = 0100,
• 0100 + 0101 = 1001 which is then complemented to '2': 10000 - 1001 = 0111.
This last word is then added to the block.
2. The stored block contains 6 words: (1101, 0011, 1110, 0110, 0101, 0111);
indeed, the addition of the 5 first words gives the previous 1001 value which is
by construction the 2's complement of the fifth word 0111. Their addition
modulo 24 gives 0000.
3 - 4. This modulo 24 code detects any error that does not add or subtract to the
correct result a value which is a multiple of 16.
ab scp
00 000
01 101
10 101
11 011
2. Figure E.37 shows the modified gate circuit and the corresponding truth table
when a parity output p is added to this half-adder. Error detection is performed
by a 3-input XOR gate.
In that case, aseparate structural redundant circuit has been added to the basic
circuit. The on-line testing capability is better than in the previous technique.
However, some faults are still not tested, such as the stuck-at '0' noted a on the
figure: indeed, if a = 1 and b = 1, this fault produces the undetected failure
(s c p) = (l 0 I) instead ofthe normal vector (Oll).
Answer to the Exercises 593
3. In order to improve this situation, the previous circuit is modified by using three
independent circuits (Figure E.38). Any fault altering only one of these three
independent circuits is detected as soon as it provokes an error at one output
only. Hence, this on-line detection capability concerns all faults belonging to the
stuck-at fault model. However, the detection circuit is not concerned by the on-
line testing property. Indeed, the stuck-at 0 of the output of this circuit is not
detected! To remedy this problem, we can use a self-checking circuit, as shown
in the right part of Figure E.38. The final error outputs fand g belong to the 1-
out-of-2 detecting code {1O, 01}. Hence, any single fault in the whole circuit is
now detected.
a -T""".......;>\""-
)----1
self-checking
cbccker
~-k;llr
}-----I-r-- c
p~g
p
error
a -r--i
b -+~-;
r
g
4. A Duplex structure is shown in Figure E.39. It uses two half-adder modules and
a 2-bit double-rail see (this see is studied in the next exercise). We suppose
that the two duplicated modules are not affected by the same faults
simultaneously. The advantage of such approach is its simplicity. On the
contrary, it is much more expensive in terms of gate number.
Inputs a1 a2 bl b2 cl c2
0 I 0 I D 0 1
2 / 4 codeword 0 1 1 0 B 1 0
(4 vectors) I 0 0 I A I 0
1 0 1 0 e 0 1
wrong 2 / 4 words I 1 0 0 0 0
(2 vectors) 0 0 1 1 0 0
less than 2 bits' I' 0 0 0 0 0 0
(5 vectors) ---- --
I 0 0 0 0 0
more than 2 bits ' l' 0 I I I I I
(5 vectors) - -- --
I I 1 1 I 1
2. Let us analyze the global circuit of Figure E.41 which combines 3 elementary
checkers to check a 4-bit double-rail code. To prove that this circuit is a sec, it
is sufficient to verify that each checker receives the four 2-bit double-rai!
codewords defined in question I.
Table E.25 shows that the whole SCC is tested by a sub-set of only 4 input
codewords: each checker receives a testing set of 4 input vectors.
Exercise 16.5. Parity self-checking checker
1. The circuit (Figure E.42) is a SCC converting a 4-bit odd-parity input code into a
l-out-of-2 output code. We must verify that it is code disjoint and self-testing.
abc d
ilvJ,L.g
l!J~f
Figure E.42. Parity checker
1. The 2-out-of-4 code can represent N = (~) =6 codewords, which is exactly the
number of internat states of the state graph to be coded.
2. Four synchronous D Flip-Flops are used to implement this circuit. The D-inputs
(Di = yi) are logical functions of the outputs of these Flip-Flops (Q1 = Y1):
598 AppendixE
probability of having a double fault affecting two modules is the product of the
probabilities of having a fault in each module. With electronic components, the
actual values of the 1.. are very small (e.g. 10-7) , hence, we neglect the product terms
(10. 14). This assumption cannot be made if strictly identical components are used in
the TMR. Indeed, these components can have the same design faults or
environmental weaknesses (e.g. sensitivity to temperature); thus, faults cannot be
considered as independent phenomena and all reliability computations are false.
Another criticism deals with other faults violating the independence assumption.
They produce failures at the same time on non-identical components. For instance,
this situation can result from external perturbations, such as an Electro-Magnetic
parasite.
Exercise 18.3. NMR
1. Let us assume that each module has only one output. During anormal
functioning, the output vector (z] , z2, z3) must take the values (000) or (1 1 1).
Any other value is erroneous, hence the detection function is:
error =(zl ' . z2'. z3' + zl. z2. z3)"
where '+', '.', and '" represent the operators OR, AND, and NOT.
This expression can directly be implemented by a very simple circuit (containing
few MOS transistors). We will develop it further to make a transition with
question 2. We obtain the expression given in Chapter 18:
error =(zl$ z2) + (z] $ z3) + (z2 $ z3).
The 2-input $ operation gives a '1' if and only if its inputs are different.
2. First we create the three elementary comparison functions:
ja = (z]$ z2),jb = (z] $ z3), andje = (z2 $ z3).
If z] is erroneous,ja AND jb is equal to '1',
If z2 is erroneous, ja AND je is equal to '1',
If z3 is erroneous, jb AND je is equal to '1'.
Hence: M] = ja AND jb, M2 = ja AND je, M3 = jb AND je.
sI
s2
sI
s3
s2
s3
The corresponding circuit is shown in Figure E.44. The signals M1, M2 and M3
identify the failing module (their value is a 1-out-o/-3 codeword in case of error),
and allow its inhibition (thanks to apower switch-off, for example), and finally
its replacement by a spare module.
3. The voter must behave as the majority of its inputs: this function is the logic
MAJORITY. For 3 inputs, we have:
MAJ (zl, z2, z3) =zl.z2 + zl.z3 + z2.z3.
The corresponding electronic CMOS component is simple.
This function can easily be extended to 4 inputs:
MAl (zl, z2, z3, z4) =zl.z2.z3 + zl.z2.z4 + zl.z3.z4 + z2.z3.z4.
Note: The MAJORITY function is not associative (no possibility for combining
smaller MAJORITY modules).
Exercise 18.4. Study of the double duplex
1. Reread Chapter 18, sub-section 7.2.2.
2. The product functions correctly as long as one of the two couples (LI, 1.2) or
(2.1, 2.2) functions correctly. The reliability of the product is then:
=
R P«1.l AND 1.2) OR (2.1 AND 2.2» =P(1.l AND 1.2) + P(2.1 AND 2.2)-
P(1.l AND 1.2). P(2.1 AND 2.2),
=
R PO.1) . P (1.2) + P(2.1) . (2.2) - P(1.l). P (1.2). P(2.1). P (2.2),
where + and - are the addition and subtraction operators.
If the modules have the same reliability RO, we have:
R =2. RO 2 _ RO 4 =2 e -2")..1 _ e -4")..1.
Note. The reliability curves of this structure are given in Appendix B.
Exercise 18.5. Study of self-purging technique
1. The switch-off of a failing module is performed by each one of the modules of
the structure. This approach is interesting because it eliminates a part of the
centralized commutation unit which is always a delicate part of a fault-tolerant
system (such a part is called the kernel of the system). This technique is a step
towards a complete decentralization of the duplicate modules and of the decision
function (thanks to a distributed voter). Such distributed structure can be
encountered in the framework of distributed software tasks in a distributed
multiprocessor system.
2. When only 2 modules remain active, the product regresses to a simple Duplex.
Thus, the next error occurrence will not be tolerated. The system operates
according to a degraded mode until a maintenance operation restores the
tolerance capacity of the product.
Exercise 18.6. Example of a tolerant program based on retry mode
The fault to be treated being associated with a provided data, the use of the retry
mode is pertinent. Indeed, the Get procedure is not the cause of the problem. Fault
602 Appendix E
tolerance mechanism must detect an error if a non-integer data is provided from the
keyboard. In this case, the data sampling must be reiterated (the Get function is
executed again) after having restored the initial context. Let us consider the
following solution:
Procedure Safe_Get{I : out integer) is
begin
loop
beg in
Get{I);
exit;
exception when Data_Error => Skip_Line;
end;
end loop;
end Safe_Get;
As soon as an integer value is provided and acquired by the Get (I) function, the
exi t statement allows to exit from the loop (loop), hence to finish the execution
ofthe Safe_Get procedure.
On the contrary, the reading by Get of an erroneous data value leads to the raising
of an exception (Data_Error) and the branching to the associated exception
handler. This treatment erases (thanks to the operation Skip_Line) the content of
the buffer containing the keypressed characters; these characters have not yet been
all extracted because of the partial execution of Get. For example, if the user has
keypressed the 5-character sequence <17 A28>, and then 'Carriage Return', the
execution of Get (I) can let characters '2' and '8' in the keyboard buffer, as the
left to right analysis of the expected figures has been interrupted by the raising of the
exception induced by the analysis of 'A'. As expected, the buffer reset action
restores the system in a safe state.
Exercise 18.7. Programming and evaluation of recovery bocks
1. Programming. The two following program extracts illustrate the two
approaches proposed in section 18-4. We assurne that the execution context is
limited to the input/output parameter C. C_Prime is a data structure having the
same type T as C.1t will store the safeguard copy (the duplicate).
Procedure Recovery_Block_V1{C : in out T) is
C_Prime: T;
Error: Boolean;
begin
Save{C, c_Prime);
Error := P{ C );
if Error then Restore (C_Prime, C);
Q{ C );
end if;
end RecoverY_Block_V1;
Answer to the Exercises 603
Where the procedures Save (X, Y) and Restore (X, Y) both makes a copy of X
intoY.
Procedure Recovery_Block_V2(C : in out T) is
C_Prime : T;
Error: Boolean;
begin
Save(C, C_Prime);
Error := P( C_Prime );
if Error then Q( C );
else Restore(C_Prime, Cl;
end if;
end Recovery_Block_V2;
2. Evaluation of the performance. The two previous programs allow the expected
performance of the two proposed approaches to implement the recovery blocks
to be evaluated. In both cases, the context is initially saved. But the rest of the
bodies is different.
• In the first case, a correct execution of P does not require any supplementary
treatment; when an error is detected, arestore operation is performed before
executing procedure Q.
• In the second case, an opposite situation occurs, i.e. a correct execution implies a
restore operation; on the contrary, in case of error detection such restore
operation is not necessary before executing Q.
To conclude, the first approach is more efficient when P is correctly executed, while
the second approach is more efficient when the use of the redundant component Q is
required. This last design approach can for example be chosen if we know that the
execution of the redundant component requires a supplementary duration to which
any further restoring duration must be added, due to real-time constraints.
Exercise 18.8. EDC in a RAM
1. Matrices G and H:
111000000000
100110000000
010101000000 101010101010]
110100100000 [ 011001100110
G=
110100101000 ,H= 000111100001.
010000010100 000000011111
110000010010
000100010001
Coding operation:
=
[Yt. Y2, Y3, Y4, Ys, Y6' Y7, Ys, Y9, YIO, Ylt. Y12] [u], U2, U3, U4, US, U6, U7, Us] . G.
For example, if U = [00 1 1 101 1], then Y = [11 0 101 1 1 101 1].
604 Appendix E
101010101010j
011001100110
[0]
1
S=H.WT , [ .[110100111011] T= .
000111100001 1
000000011111 °
The decimal value of this syndrome vector indicates the erroneous bit: bit 6. The
correction is then a simple binary complementation.
3. Implementation ofthis code in the MMU.
l t ControIIIua
I=t:~ 1krMry
4bils 4 bils
Decodi~ ....... syndrolM
~
8 bits
.. 8 hits CorrectIon
..... Error
f 8 bits
Adress Bus ..
Data Bus
.
Figure E.45. Detection and correction circuit
Figure E.45 shows the structure of the EDC circuitry for this code. The 'check
bit generation' module implements in hardware the XOR expressions to generate
the 4 redundant bits. The 'decoding and correction' module implements the
matrix product S =H. WT . This module uses the result S to correct the erroneous
bit, and it communicates with the external system (e.g. a CPU) for error logging.
4. Scrubbing operation
As said in Chapter 18, the scrubbing is an off-li ne operation which write
corrected erroneous word, and read them again, in order to check if the faults are
hard or soft. If they are soft, the word has been cleaned up. On the contrary, the
fault is hard and cannot be cleaned.
The previous structure is entirely compatible with such useful function.
Glossary
1. ACRONYMS
605
606 Glossary
2. KEYWORDS
fit Word
"',
7
Meanmg 0~~ r:I .if.i
eH
acceptability Curve expressing the acceptable risk rate of failures from their 17.1
curve seriousness
acceptable A product whose failures have acceptable risk rates 17. 1
product
acceptable risk See risk: acceptable rate
rate
acceptance test See test: acceptance
activation : The OCCUITence of a first eITor provoked by a fault. 4.1
initial This eITor is called primitive error or immediate error 13.2
See also fault activation
active fault Seefault tolerance: active
tolerance
Ad Hoc See DFT: ad hoc approach
approach
adaptive See sequence: adaptive
sequence
adaptive vote See vote: adaptive
aggression Seefault: external
608 Glossary
alias An alias oeeurs when a faulty cireuit test output response gives a 14.5
signature whieh is identieal to the fault-free signature (used in
BIST techniques by LFSR signature analysis)
alpha test See test: alpha
alternate Redundant module (version) having the same specifieation (or a 18.4
degraded form) and, often, a different implementation than the
original funetional module
ambiguity An element whieh leads to several meanings 9.3
analysis: See criticality analysis
eritieality
analysis: See dynamic & static analysis
dynarniclstatie
assertion Funetional redundaney uSed for software verifieation. It tests the 10.5
validity of a property eaeh time a given cireumstanee eould violate 16.3
it. It ean be used: during the ereation stages for fault removal (Ch.
10), or during the operation stage for fault detection by on-line
testing (Ch. 16)
ATE Automatie Test Equipment 12.1
ATPG Automatie Test Pattern Generation: automatie generation of lists 12.3
of test inputs and expeeted outputs to perform produet testing
attributes of Criteria enabling the system dependability to be assessed. The 1.4
dependability most used attributes are: reliability, availability, maintainability, 7
testability, safety and security
attributes of The behavior of a module is eharaeterized by a set of attributes 2.3
module whose values define the states of the module
availability It is the probability that the system is operational at the time t, 7.5
knowing that it funetions eorrectly at time 0
availability: Value ofthe availability at a given time t: A(t) 7.5
instant
availability: In permanent stage, availability value of A(t) when t -+ 00 7.5
permanent
baekward fault Step of struetural fault eoverage method whieh deterrnines the 13.3
analysis faults deteeted by a given test veetor by a baekward proeess (from
the outputs towards the inputs)
baekward See propagation: backward
propagation or
tracing
baekward Fault-tolerance technique whieh eonsists in bringing the system 18.3
recovery baek in astate previously reaehed before the system exeeution
resumption. This teehnique makes often use of eontext saving and
restoring meehanisrns (sueh as the recovery cache). The exeeution
of M is resumed at a recovery point
Glossary 609
bathtub curve Reliability model which represents the evolution with time of 7.2
failure rate of electronic components. Typically, it shows 3 parts:
infant mortality where the failure rate decreases, usefullife where
the failure rate is constant, and wearout where the failure rate
increases.
behavior Reaction of a system mainly described as changes of states in this 2.3
book
behavioral A design step/model of the system specifying its behavior 2.2
level/model 2.3
benign Seefailure: benign
beta test See test: beta
BIST Built-In Self-Test. Group of Design For Testability methods which 14.5
incorporate the test functions into the circuit
BIST: signature Group of BIST techniques using a test sequence generator (usually 14.5
a LFSR), a compactionfunction (usually a PSA), and a signature
analysis function
BIT Built-In Test. Group of Design For Testability methods which 14.4
incorporate test facilities and offer a test interface
bit stuffing Fault detection technique applied to data. After a number of bits 18.7
with the same polarity, an additional bit is introduced with an
opposite polarity. Used for instance in the CAN Bus
BITE Built-In Test Equipment. All maintenance functions of a system 14.4
boundary scan Scan technique belonging to the BIT design for testability. 14.4
Normalized as the IEEE Standard 1149.1
branch test See test: branch
bridging fault A particular case of short electronic fault 5.2
See also fault: short-circuit
BSDL Boundary Scan Description Language 14.4
bug See fault: structural (for software technology) 3.2
bum-in test See test: bum-in
CIDC See test: ConditionlDecision
CAM Computer Aided Maintenance 12.1
CANBus Control Area Network bus. Initially created for automotive 18.7
industry. Normalized under ISO 11898
catastrophic See failure: catastrophic
checker Module used in self-testing systems to detect the occurrence of 16.3
errors from the observation for instance of some EDC code
variables
See also self-checking checker
checkerboard See test: memory
610 Glossary
event tree Tree connecting correct (states) or incorrect (faults, error, failures) 7.11
events with logical operators (AND, OR). Used for deductive
approach in qualitative dependability assessment
See Fault Tree Methad
evolutive See maintenanee: preventive
maintenance
620 Glossary
failure: Failure leading to human loss, destruction of the product or the 4.2
catastrophic environment, including the controlled process 17.1
Also called disastrous
failure: Failure perceived similarly by all users 3.1
consistent
failure: crash Persistent omission failure 3.1
failure: See failure: serious
dangerous
failure: Seefailure: catastrophic
disastrous
failure: A failure caused by a technological fault 3.2
disruptive Also called disruption
failure: The temporal characteristics of the product behavior are not in 3.1
dynamic accordance with the specifications: e.g. response time incorrect,
too fast or too slow
Also called timing failure
failure: See risk
extremely
improbable
failure: See risk
extremely rare
failure: See risk: impossible
impossible
failure: The users do not perceive in the same way the failure occurrence. 3.1
inconsistent Also called Byzantine failure
failure: major See failure: significant
failure: minor See failure: benign
failure: A specific stopping failure when no values are delivered 3.1
omission
failure: The provided service is not in accordance with the specification, 3.1
persistent during a long period in regards with the mission duration
failure: See risk
probable
failure: rare See risk
failure: serious Failure whose negative effects on the user or the environment are 4.2
quite important, the security margins being dangerously reduced. 17.1
Leads to a small number of casualties and/or serious injuries of the
users, andlor a serious reduction of the functionality of the product
Also called dangerous
622 Glossary
fault diagnosis Operation which identifies the faults altering a product 6.3
Also called fault localization or fault isolation 12.2
13.4
fault dictionary List of faults, their activation, and their effects (as errors or 12.3
failures), which can aid in the determination of probable causes
during failure analysis of defective devices
fault Estimation of the presence of faults (number and seriousness) 1.3
forecasting Developed in Chapter 7
fault grading Measure of how effective a set of test vectors is at detecting 12.3
potential faults. Finding of the coverage of a given test sequence
Also called test validation
fault grading: The DFG is a simulation method which compares the results of a 12.3
deterministic faulty design (fault injected) with the outputs coming from the
design. It includes various simulation algorithms, such as grouping
of equivalent faults, also known asfault collapsing, and making
use of customized hardware platforms (accelerators)
fault grading: There are three approaches to fault grading, based onfault 12.3
fault simulation simulation techniques: probabilistic (PFG), deterministic (DFG),
and statistical (SFG)
fault grading: The PFG is a simulation method which provides an estimation of 12.3
probabilistic the fault coverage rather than an exact determination. The principle
is based on an analysis ofthe node activity in terms of
controllability and observability
fault grading: The SFG reduces the cost of DFG by applying deterministic fault 12.3
statistical simulation to a sub-sets of the potential faults of the given fault
model. It provides a close approximation ofthe DFG results, while
requiring only a small fraction of the run time
fault grading: Method evaluating the fault coverage of a test sequence by 13.3
structural structural analysis of the producl. It consists in: forward
approach simulation, and backward fault analysis
fault injection Technique consisting in adding faults to a system in order to 7.9
analyze its behavior. Used for fault grading or to assess fault 12.3
tolerance mechanisms
fault logging Recording of errors occurring in a product during operation in 16.3
order to facilitate ulterior maintenance
See also instrumentation and logfile
fault masking Fault belonging to a passive redundant element that cannot be 13.3
detected from the outside of the producl. Use of compensation
mechanisms
fault model See model: fault
fault prevention Aims at reducing the creation or occurrence of faults during the 1.3
system life cycle 6.2
Developed in Chapters 9, 10, and 11
624 Glossary
fault removal Aims at detecting and eliminating existing faults, or to show the l.3
absence of faults 6.3
Developed in Chapters 12, 13, and 14
fault secure See totally self-checking system
fault simulation Technique used for dependability assessment. 7.9
Fault grading technique which provides a list of faults detected by 12.3
a given test sequence (hence, thefault coverage) by means of a
simulation program with fault injection
There are four main approaches: serial, parallel, deductive, and
concurrent
fault table Table showing the faults covered by each vector of a test sequence 12.2
Also called coverage table
fault tolerance Aims at guaranteeing the service provided by the product despite l.3
the presence or appearance of faults 6.4
Developed in Chapters 16, 17, and 18
fault tolerance: Approach that makes use of error detection and handling 18.5
active
fault tolerance: See compensation technique
compensation
fault tolerance: Approach that does not make use of error detection 18.5
passive
fault tree SeeFTM
method
fault tree: See diagnosis fault tree
diagnosis
fault: Fault which is not intentionally created 3.2
accidental <> intentional
fault: acti ve A fault becomes active when it provokes an error during the 4.1
operation of the product
<> passive
fault: bridge See bridging fault
fault: common Fault caused by the same circumstances, and thus provoking the 18.2
mode same errors/failures, of several redundant modules in a fault-
tolerant system
fault: Fault associated with a component. 3.3
component Also called module fault
fault: See fault: functional
conceptual
fault: creation Fault occurring during specification, design and/or production 3.2
phases (excluding the operation phase)
Glossary 625
fault: delay For electronic models. A delay fault, occurs when a signal 5.2
propagating through a circuit is slower than it really should be
fault: dormant See fault: passive
fault: dynamic See fault: temporary
fault: external Failure eause attributed to the user or the environment. 3.2
Also ealled perturbation or aggression or disturbance
fault: Fault due to human activities during the product life phases. The 3.2
functional origin is the designer during the creation steps and the user during
the operational step. Also called conceptual fault or human-made
fault
fault: hard Permanent fault oeeurring in memory circuits 11.3
<> fault: soft
fault: hardware See fault: physical
fault: human- See fault: functional
made
fault: initial See activation: initial
aetivation
fault: Fault created deliberately 3.2
intentional <> fault: accidental
fault: Fault coming from the interactions of several components 3.3
interaction
fault: Temporary fault due to intemal causes 3.2
intermittent
fault: internal Failure cause occurring in the product or system 3.2
fault: isolation See fault: localization
fault: Identifieation of the faults of an erroneous system 6.3
localization Also called fault isolation or fault diagnosis
fault: masked A faultf1 is masked by a fault 12 according to a given input 13.5
sequence, if the oecurrence of f1 does not provoke a failure, due to
the presence of12
fault: module See fault: component
fault: MOS on Fault model at MOS level: a MOS is always eonducting 5.2
See also fault: short-circuit
fault: MOS Fault model at MOS level: a MOS is always blocked 5.2
open/off
fault: Ability to detect the presence of a fault through a failure 4.1
observation oecurrence
fault: Fault oecurring during the operational stage of the life eycle 3.3
operational
626 Glossary
fault: passive The fault does not raise error; hence, it does not disturb the 4.1
product' s functioning. Also called dormant
<> active
fault: Fault that persists once it has occurred (e.g. design fault) 3.2
permanent Also called static fault
fault: physical Technological fault concerning hardware technology 3.2
Also called hardware fault
fault: short- Fault model at electronic level. Particular case: bridging fault 5.2
circuit which provokes wired logic (OR or AND)
fault: soft Non-permanent faults in RAM: random, non-recurring single-bit 11.3
fault 18.7
fault: static See fault: permanent
fault: structural When the internal functional faults are concerned, a fault consists 4.1
in a non-adequate structure alteration
Also called defect (hardware) or bug (software)
fault: stuck-at See stuck-at fault
011
fault: Fault of the technological means (hardware I software) used to 3.2
technological implement the product
Also called hardware or physical fault for hardware technology
fault: temporal Electronic fault due to incorrect response time of components 3.2
fault: Fault the presence of which is time bounded. The duration range is 3.2
temporary generally assumed as short 5.2
Also called dynamic fault
fault: transient Temporary fault due to external causes 3.2
fault: No input test sequence can reveal the fault at the output of the 8.3
undetectable system. This corresponds to passive redundancy 13.2
fault-secure Fundamental property of a self-testing system which guarantees 16.3
that no failure can occurs which is not immediately detected
feature Element of a modeling tool or language 2.2
6.3
feature Prevention techniques for software which consist in avoiding 11.2
restrictions features which increase the fault risk (shared variables, goto, etc.)
final test See test: final
Fire code See code: Fire
FITPLA BIT technique used to improve the testability of PLA 14.4
fixed sequence See sequence: fixed
Glossary 627
functional Static input domain: set of input values which can be applied to the 8.2
domain: static product as defined by the product specifications
Static output domain: set of output values which can be given by
the product as defined by the specifications
functional See user
environment
functional See redundancy: functional
redundancy
functional test See test: functional
fusion Operator combining the behaviors of several modules, taking theirs 8.2
correlations into account
galloping See test: memory
Galois Field Mathematical structure which has fundamental applications to 15.3
Cyclic Error Detecting and Correcting Codes
guidelines Best practices to reach an objective, for example testability 10.3
improvement, fault prevention, etc. 14.2
Hamrning Fundamental property of redundant codes which allows the 15.2
distance detection and/or correction of errors
Number of bits that differ between two binary words
hard fault See fault: hard
HDB See code: low-levelHDB
HDL Hardware Description Language. Language which describes 2.2
circuits in textual code. The two most widely accepted HDLs are
VHDL and Verilog
hierarchy: Expression of a system as the composition of sub-systems or 2.3
composition components which are again broken down into sub-systems
hierarchy: use Defines a system, highlighting the services used (or called) by a 2.3
component and offered by others
hot standby See redundancy: hot standby
redundancy
hot swap Components (CPUlMemory, 1/0 boards, power/cooling modules) 18.5
hardware that can be changed or serviced while the system remains on-line
IDDQ testing Method for enhancing the quality of IC tests by measuring the 12.1
power supply current of a CMOS circuit during quiescent states
Detects the physical defects that creates conduction paths between
the power supply and the ground lines (e.g. stuck-on faults)
IEEE P1450 Standard Tester Interface Language (STIL). Language describing 12.1
test pattern and application protocols in standard neutral form
IEEE P1500 Embedded Core Test. Application of tests to embedded cores: test- 12.1
description language, test-control mechanisms and peripheral
access mechanisms
Glossary 629
IEEE Std. IEEE standard describing the Test Access Port and Boundary Scan 14.4
1149.1.1990 Architecture
IEEE Std. 1155 See VXI
impairment Opposed to dependability (degradation mechanism): fault - error- 1.4
failure. Deve10ped in Chapters 4 and 5
implementation See production (for software technology) 2.2
implementation Fault prevention techniques defining implementation restriction, 11.3
constraints used for software
impossible See risk: impossible
incompatibility The service delivered by the product is different than the one 4.2
(of a product) expected from the specifications
incompleteness The service delivered by the product is less than the one expected 4.2
(of a product) from the specifications
incompleteness Definition or properties of an object having potentially multiple 3.3
(specification) meanings. Fuzziness of its semantics. Absence of pieces of
infiormation
inconsistency Contradictory definitions or properties of one object or of several 3.3
objects
increasing See reliability: increasing
reliability
inertia (of the Mean time between the occurrence of a failure and the beginning 4.2
environment) of its external consequences on the mission
inputsequence See sequence: input
inspection A formal review technique based on ni ne steps 9.4
instrumentation Adding of mechanisms to detect errors and record data during the 14.2
operation of the product. Used to make test detection and diagnosis 16.3
easier
integration test See test: integration
integrity Non occurrence of improper alterations of information 7.7
intrinsic safety See safety: intrinsic
irredundant An element of a system is irredundant if its removal causes the 8.3
element system to be functionally different
JTAG The Joint Test Action Group. This group created the foundation for 14.4
the IEEE 1149.1
language See modeling tool (generally considered as defined formally) 2.3
latency Latency is the mean time between the occurrence of a fault and its 4.1
initial activation as an error at the level of a given module
Byextension: meantime between the occurrence of a fault/error in
a given module and the raising of an error in another given module
630 Glossary
model: error An error model defines a set of faults characterized as errors by a 5.1
property on desired or intended behavior 15.1
Also called error typology
model: fault Afault model defines a set of faults characterized by 5.1
physicaVstructural properties on the desired model structure
modeling tool Generic means (language or notation) to express the system. The 2.3
expression of a specific system is called a model
modified See test: MCIDC
conditionl
decision
module See component
module: A module containing the basic functional elements 8.3
functional <> module: redundant
module: A module containing redundant elements 8.3
redundant <> module: functional
Monte Carlo Quantitative dependability evaluation method based on simulation 7.9
simulation and fault injection
MTBF Mean Time Between Failures. Maintenance indicator. The time 7.2
between two failures on a piece of equipment (calculated)
MTIF Mean Time To Failure 7.2
MTIFF Mean Time To First Failure. It is the same as MTIF 7.2
MTIR Mean Time To Repair 7.4
Mean time between the instant of failure occurrence and the return
of the product to full functional operation
MUT Mean Up Time: mean time during which the product deli vers its 7.5
service
mutant A system, such as a program, modified by a mutation 13.8
mutation Modification of the structure of a system (generally by a fault) 13.8
See test: mutation
need Expectations ofthe product's users, that is to say knowing why 2.2
he/she has to use a product.
netlist Basic structural model of electronic circuits, at gate or MOS level 2.2
NMR N-Modular Redundancy. Fault tolerant technique derived from the 18.5
TMR technique, using active redundancy
non-ambiguity An element which has only one interpretation 9.3
non-destructi ve See test: destructive
test
non-functional Part of the product specifications dealing with constraints on the 2.2
characteristics non-functional environment and with dependability requirements
Glossary 633
redundancy: Separable redundant modules which are in a passive state (off- 8.3,
cold standby line), waiting to be activated 18.5
redundancy: A dynamic functional domain of a product is redundant if it is 8.2
dynarnic strictly included in the dynarnic functional uni verse of this product
functional
domain
redundancy: Certain theoretical input values are not applicable to the product by 8.2
functional the functional environment as defined by the specifications. 16.3
Extended to the outputs and inputs/outputs values
redundancy: Separable redundant modules which are in an active state (on-Une) 8.3
hot standby in parallel with the functional module 18.5
redundancy: See redundancy: cold standby
off-line
separable
redundancy: See redundancy: hot standby
on-line
separable
redundancy: The structural redundancy of a system is passive if some elements 8.3
passive can be removed without changing the produced behavior
<> redundancy: active
redundancy: Presence of elements of sentences in a text whose meaning can be 8.1
semantic deduced from others sentences of the text
redundancy: The structural redundancy of a system is separable if the redundant 8.3
separable elements and the non-redundant elements are located in different
modules. Thus, the system possesses afunctional module and
several redundant modules (versions, replicates, duplicas)
redundancy: A system has a structural redundancy if its structure possesses 8.3
structural some elements not necessary to produce a behavior conform to the 16.3
specifications, assurning that all the structure elements provide a
correct functioning 18.1
sequence: Test sequence whose input and output values are dynamically 12.2
adaptive defined, taking the previous results of the test application into
account
<> sequence: fu:ed
sequence: fixed Test sequence whose input and output values are defined prior to 12.2
the test processing
<> sequence: adaptive
sequence: input List of the inputs of a test sequence 12.2
sequence: List of the outputs of a test sequence 12.2
output
serious See failure: serious
seriousness Seefailure: severity or seriousness
service See degradation
degradation
service The delivered service is the product's real behavior when placed in 2.1
delivered its applicative environment
service Relationships between sub-systems expressing that one uses 2.2
relationships services provided by others
serviceability Measure of the ease with which a system functioning is restored to 7.4
a specified state after the system is repaired. Used to express the
maintainability
See maintainability
severity Seefailure: severity or seriousness
shallow See diagnosis: experimental approach
reasoning
short-circuit Seefault: short-circuit
signature See test: signature analysis
signature Technique used in compaction test technique. The signature 12.2
analysis synthesizes the output values as the result of a likelihood property
LFSR signature analysis: used for BIST off-line techniques 14.5
signature See BIST: signature
analysis
function
significant Seefailure: signijicant
simplicity The concepts manipulated by a text (or a model) describing a 9.3
system are simple. In particular, the number of these concepts is
limited and they are loosely coupled
simulation: See fault simulation
fault
simulation: See Monte Carlo simulation
Monte Carlo
642 Glossary
snapshot Image of the system execution context at a given time. It is used 18.3
for example for backward recovery technique implementation
soft fault See fault: soft
software See instrumentation
instrumentation
software: Fault removal techniques based on the program control flow: 13.2
structural • statement test
testing
• branch and path test
• condition and decision test (see C/De and MC/DC)
spare module Redundant off-line module 8.3
specification Stage of the life cyde which defines the characteristics of the 2.2
product to be created. The result of this operation is a document 9.3
called specifications or contract (see contract)
specification See dependability assessment
assessment
value
stable See reliability: stable
reliability
stage See phase
standard See referent product
product
standby: hot I See redundancy: hot standby I cold standby
cold
state Set of the values taken by the attributes of a module 2.3
Internal property of a module 4.1
statement test See test: statement
statie analysis Groups of techniques of the fault removal which are made without 6.3
exeeution of the analyzed models or products
Statistical Fault See fault grading: statistical
Grading
step See phase
STIL See IEEE PI450
stoehastie Petri Non-deterrninistie parallel state graph model whose ares are 7.9
net labeled by probabilistie values; used for dependability assessment
strobing Term used for test: number of times a test equipment looks at the 12.2
output data of a DUT during aperiod
structural Fault grading methods which study the faultless system and 12.3
analysis deduce all the faults (of a model) that ean produce failures
structural A design step/model of the system expressing it as a struetured 2.2
level/model system (composed of sub-systems or components or modules)
Glossary 643
termination One of the forward recovery techniques of fault to1erance, which 18.4
mode consists in completing the task started by a module P by using a
redundant module Q, after the detection of an error in P
test Dynamic techniques relevant to fault removal. It is an experiment 6.3
(input sequences) applied to an executable product or model by a
tester which compares the given results with expected values
The process of exercising or evaluating a system or system
component by manual or automated means to verify that it satisfies
specified requirements, or to identify differences between expected
and actual results (IEEE Std 729.1983)
Also called dynamic analysis
test application Test processing performed by the tester which applies the test 12.3
sequence to the product
test equipment See tester
test evaluation See fault grading
test generation See test pattern generation
test pattern See test sequence
test pattern Technique to determine the test sequence for a given product 12.3
generation
(TPG)
test sequence List of test vectors used by a tester to detect andlor diagnose faults 10.4
in a product. This term is often restricted to the sequence of input 12.2
vectors
Also called test pattern
test sequence See BIST: signature
generator
test sequence: Two main parameters are used to evaluate the quality of a test 12.2
quality sequence: the length (number of test vectors) and thefault
coverage (percentage of the faults of a fault model which are
detected)
test vector Element of a test sequence: couple (input vector, output vector) 12.2
test withl Test method based on the detection of faults belonging to a pre- 12.2
without fault defined fault modell without precise hypotheses about the faults
model
test: Test experiment with stress constraints: elevated power supply 7.2
accelerated andlor temperature 11.2
test: acceptance Another name for final test for final checking of the product 14.2
See test: final, test: compliance, test: conformity
Also used to name on-line checking used in fault tolerance 18.3
mechanism to detect errors
test: A specific ATPG handling each fault of a fault model 12.3
algorithmic
approach
Glossary 645
test: final Test applied to a complete system or product before it is delivered 14.2
to the dient
Also called acceptance test
See also test: unit and test: integration
test: functional Functional verification methods based on a functional model of the 10.5
system to test (e.g. Finite State Machines) 12.3
<> test: structural
test: functional Type of diagnosis techniques which aims at locating faults at 10.5
diagnosis functional level, without precise fault model
test: GO- See test: production
NOGO
test: Formal methods for test pattern generation of sequential systems 12.3
identification without fault model
test: in situ Test is applied to the product in its normal environment 14.6
test: integration Test applied to sub-systems integrating elementary modules or 14.2
others sub-systems
test: likelihood See likelihood test
test: See test: diagnosis
localization
test: logical Test applied to a system modeled at logicallevel 12.1
test: Test applied during the maintenance operations 6.3
maintenance 12.1
12.2
test: MC/DC Structural software testing which adds the foIIowing requirement 13.6
to the ConditionlDecision testing method (see test: condition/
decision): each condition in adecision must be shown to
independently affect the result of the decision
test: memory Specific testing techniques taking into account technological faults 12.3
ofRAM circuits: checkerboard, marching, walking, galloping or
ping-pong
test: modified See test: MC/De
conditionl
decision
test: mutation Test validation technique which consists in injecting modifications 13.8
in a system in order to check whether a given test sequence detects
the faults or not
test: mutation: The weak mutation testing requires that the test sequence activates 13.8
weak the fault introduced by the mutation, but it does not require that
this sequence propagates the initial error to the outputs (as failure)
test: non- Test performed after the repair of a faulty product in order to 12.2
regression assure that no fault has been introduced by the repair operation or
other chan ging
Glossary 647
test: structural Structural test methods are based on a structural model (e.g. gate 12.3
structure) and generally use fault model (e.g. 'stuck-at')
See also software: structural testing
<> test: functional
test: toggle See toggle test
test: unit Test applied to elementary modules 14.2
test: validation Validation of a test sequence, frequently by a fault grading 12.3
See also test: evaluation andfault grading
testability Attribute of dependability which measures the easiness with wh ich 7.3
a product can be tested, Le. the easiness to obtain test sequences, 14.1
and the easiness to apply these sequences
Closely linked to the test sequence properties:
• the length, Le. the number of input vectors
• the coverage or test efficiency, Le. the ratio of the tested fault
and the total number of faults according to a given fault model
Testability can be evaluated on the product, by controllability and
observability parameters
Testability measurement: methods that analyze a design and
estimate the difficulty of test pattern generation as a measure of
testability
testability: There are two are groups: 14
techniques • Ad hoc techniques: design rules listing the structures that cause
testing problems and techniques for avoiding these problems
• Design For testability (DF7): design techniques to increase
testability
tester Any means (human or physical) involved in fault detection and 12.1
diagnosis of a product by a test. Also known as test equipment
TMR Tripie Modular Redundancy. Basic N-version fault tolerant 18.2
technique based on passive redundancy. Three copies (duplicate
modules) of the main module are used and a voter elaborates the
final output. A 3-version also called trip lex
toggle test Test sequence which assures that each line of the tested component 12.3
is switched to '0' and '1'
tolerable See risk: acceptable rate
probability
tolerance See fault tolerance
Glossary 649
651
652 References
18. M. L. Bushnell, Vishwani and D. Agrawal, Essentials of Electronic Testing for Digital,
Memory, and Mixed-Signal VLSI Circuits, Kluwer Academic Publishers, 2000.
19. S. Chakravarty and P. Thadikaran, Introduction to IDDQ Testing, Kluwer Academic
Publishers, 1997.
20. Kwang-Ting Cheung, Vishwani and D. Agrawal, Unified Methods for VLSI Simulation
and Test Generation, 'Series in Engineering and Computer Science: SECS73', Kluwer
Academic Publishers, 1989.
21. J. M. Crichlow, An Introduction to Distributed and Parallel Computing, Prentice Hall,
1988.
22. R A. DeMillo, W. Michael McCracken, RJ. Martin, and John F. Passafiume, Software
Testing and Evaluation, The Benjamin Cummings Publishing Company, Inc., Menl0
Parc, Ca. USA, 1987.
23. B. Douglass, Real-Time UML: Developing Efficient Objects for Embedded Systems
Reading, Addison-Wesley, 1998
24. B. Douglass, Doing Hard Time: Using Object Oriented Programming and Software
Patterns in Real Time Applications Reading, Addison-Wesley, 1999.
25. R Drechsler, Formal Verification ofCircuits, Kluwer Academic Publishers, 2000.
26. E. Dustin, J. Rashka, J. Paul John, and D. Mc Diarmid, Automated Software Tf'sting:
Introduction, Management, and Performance, Addison-Wesley, 1999.
27. N.E. Fenton, Software Metrics. A Rigorous Approach, Chapman and Hall, 1991.
28. M. Fowler and K. Scott, UML Distilled: Applying the Standard Object Modeling
Language Reading, Addison-Wesley, 1997.
29. M.A. Friedman and J.M. Voas, Software Assessment: Reliability, Safety, Testability,
Wiley, 1995.
30. T. Gilb and D. Graham, Software Inspection, Addison-Wesley, 1994.
31. D. Harel and M. Politi, Modeling Reactive Systems With Statecharts: The Statemate
Approach, McGraw-Hill, 1998.
32. Hardware Description Languages, RW. Hartenstein Editor, Elsevier Science
Publishers, 1987.
33. Logic Design and Simulation, E. Höerbst Editor, Elsevier Science Publishers, 1986.
34. c.P. Hollocker, Software Reviews and Audits Handbook, Wiley, 1990.
35. Shi-Yu Huang, Formal Equivalence Checking and Design Debugging, Kluwer
Academic Publishers, 1998.
36. W. Humphrey, A Discipline For Software Engineering, Addison-Wesley, 1995.
37. W. Humphrey, Introduction To The Personal Software Process, Addison-Wesley, 1997.
38. IEEE Standardfor Software Unit Testing, IEEE Press, 1987.
39. Finn Jensen, Component Reliability. Fundamentals, Modeling, Evaluation & Assurance,
Wiley Series in Quality and Reliability Engineering, Patrick D.T. O'Connor Editor, John
Wiley & Sons, 1995.
40. Niraj K. Jha and Sandip Kundu, Testing and Reliable Design ofCMOS Circuits, Kluwer
Academic Publishers, 1990.
41. B.W. Johnson, Design and Analysis of Fault Tolerant Digital Systems, Addison-
Wesley, 1989.
42. D. R. H. Jones, Failure Analysis Case Studies: A Source Book of Case Studies Selected
from the Pages of Engineering Failure Analysis 1994-1996, Pergamon Press, 1998.
43. Cem Kaner and D. Pels, Bad Software, Wiley Interscience, 1998.
44. Cem Kaner, J.D. Falk and Nguyen, Testing Computer Software, Wiley Interscience,
1999.
References 653
45. P.K. Kapur, R.B. Garg, and S. Kumar, Contributions to Hardware and Software
Reliability Modeling, World Scientific Publishing Company, 1999.
46. R. Kehoe A. Jarvis, ISO 9000.3. A Tool for Software Product and Process
Improvement, Springer-Verlag, New York, 1996.
47. Fault Tolerance: Achievement and Assessment, M. Kersken and F. Saglietti, Editors,
Strategies, Research Esprit-Project 300. Request, Voll, Springer-Verlag, 1992.
48. Furnihiko Kimura, Computer-Aided Tolerancing, Chapman & Hall, 1996.
49. Z. Kohavi, Switching and Finite Automata Theory, TATA McGraw Hili Publisher,
1978.
50. T. Koomen and M. Pol, Test Process Improvement, Kluwer Acadernic Publishers, 1998.
51. Way Kuo, Wei-Ting Kary Chien, and Taeho Kim, Reliability, Yield, and Stress Bum-
In: A Unified Approach for Microelectronics Systems Manufacturing and Software
Development, Kluwer Acadernic Publishers, 1998.
52. P. K. Lala, Fault.Tolerant & Fault Testable Harware Design, Prentice Hall, 1985.
53. Dependability: basic concepts and terminology, in five languages, J.c. Laprie Editor,
IFIP WG 10.4, Springer-Verlag, 1990.
54. L. Lavagno and A. Sangiovanni-Vincentelli, Algorithms for Synthesis and Testing of
Asynchronous Circuits. Kluwer Acadernic Publishers, 1993.
55. S. C. Lee, Modem Switching Theory and Digital Design, Prentice Hall Inc., 1978.
56. N. G. Leveson, Safeware. System Safety and Computers, Addison-Wesley Publishing
Company, 1995.
57. Advanced Techniques for Embedded Systems: Design and Test, edited by Juan Carlos
Lopez, Roman Herrnida, and Walter Geisselhardt, Kluwer Acadernic Publishers, 1998.
58. D. Luckham, Programming with Specifications. An Introduction to ANNA. A Language
for Specifying Ada Programs, Springer-Verlag, 1990.
59. Software Fault.Tolerance, M.R. Lyn Editor, Wiley, 1995.
60. L. A. Macaulay, Requirements Engineering, Series in 'Applied Computing', Springer-
Verlag, 1996.
61. B. Marick, The Craft of Software Testing Subsystem Testing Including Object-based
and Object-Oriented Testing, Prentice Hall, 1994.
62. L. Perry Martin, Electronic Failure Analysis Handbook, McGraw-HilI, 1998.
63. C. Maunder and R. E. Tulloss, The Test Access Port and Boundary Scan Architecture,
collection of several papers on this subject, IEEE Computer Society Press, Los
Alarnitos, Ca, USA, 1990.
64. Pinaki Mazumder and Kanad Chakraborty, Testing and Testable Design of Random-
Access Memories, Kluwer Acadernic Publishers, 1996.
65. K. L. McMillan, Symbolic Model Checking, Kluwer Acadernic Publishers, 1993.
66. A. Miczo, Digital Logic Testing and Simulation, Harper & Row Publishers, New York,
1986.
67. C. MitchelI, V. Stavridou, Mathematics of Dependable Systems, Clarendon Press, 1995.
68. J. W. Moore, Software Engineering Standards. A User's Road Map, IEEE Computer
Society, Los Alarnitos, Califomia, 1998.
69. G. Motet, A. Marpinard, and J.c. Geffroy, Design of Dependable Ada software,
Prentice Hall, 1996.
70. G. Myers, The Art of Software Testing, Wiley, 1979.
71. B. Nadeau-Dostie, Design for AT-Speed Test. Diagnosis and Measurement, Kluwer
Acadernic Publishers, 1999.
72. W. Nelson, Accelerated Testing: Statistical Models. Test Plans. and Data Analyses,
Wiley Interscience, 1990.
654 References
99. S. A. Vanstone and P. C. van Oorschot, An Introduction to Error Correcting Codes with
Applications, Kluwer Academic Publishers, 1989.
100. A. Villemeur, Reliability, Availability, Maintainability and Safety Assessment: Methods
& Techniques, 2 Volumes, Wiley, 1991.
101. J. Voas and G. McGraw, Software Fault Injection, Wiley, 1998.
102. Formal Techniques in Real-Time and Fault-Tolerant Systems, Jan Vytopil Editor,
Kluwer Academic Publishers, 1993.
103. E. Wallmüller, Software Quality Assurance: A practical approach, Prentice Hall, 1994.
104. B.A. Wichmann, Software in Safety-Related Systems, Wiley, 1992.
105. R. J. Wieringa, Requirements Engineering. Frameworkfor Understanding, John Wiley
and Sons Ltd., 1996.
106. VLSI Testing, T.W. Williams Editor, 'Advances in CAD for VLSI', Vol. 5, North-
Holland, 1986.
107. V.N. Yarmolik, Fault.Diagnosis of Digital Circuits, Wiley & Sons Ltd., England, 1990.
108. V.N. Yarmolik and I.V. Kachan, Self-Testing VLSI Design, Elsevier, Amsterdam, 1993.
109. M. Yoeli, Formal Verification of Hardware Design, Selection of papers, IEEE
Computer Society Press Tutorial, 1990.
110. S. Zahran, Software Process Improvement, Addison-Wesley, 1998.
Index
A MDT,154
acceptability curve, 454 MUT, 155
acceptable product, 455 permanent, 154
acceptance test, 368,478
accident, 43 B
activation, 71, 326 backward fault analysis, 333
adaptive vote, 493 backward propagation, 327
aggression, 47 backward tracing, 327
alias, 389 backward recovery, 476
alternate, 482 acceptance test, 478
arithmetic code, 419 context restoration, 481
Arrhenius law, 148 domino effect, 481
assertion, 246, 436 recovery point, 477, 479
assessment recovery cache, 478
dependability, 141 retry point, 477
design, 212 rollback point, 477
error/fault model, 105 snapshot, 479
requirements, 204 behaviorallevel, 25, 27
reliability, 146,479 behavioral model, 30
risk, 452 behavioral property, 91
safety, 453 benign event, 452
testability , 363 Berger code, 418
attribute bidimensional code, 409, 415
dependability, 9, 14, 154 binary code, 404
module, 30, 71 bit stuffing, 506
quality,2 black-box testing, 237
automata, 27, 270 boundary scan, 385
automatic test pattern generation (ATPG), branch test, 343
304,306 breakdown, 6
availability,9, 154 BSDL,387
instant, 154
657
658 Index
bug, 44 reusabiIity,54
BuHt-In Self-Test (BIST), 290, 366, 388 technical, 54
compaction function, 290, 389 technological, 54
signature, 290, 389 contarnination, 73, 494
signature analysis function, 290, 389 context of the test, 432
BuHt-In Test (BIT), 366, 380, 387 context of execution, 479
boundary scan (IEEE 1149-1), 385 contract, 23, 40
test bus, 385 control flow, 346
FITPLA,380 control path, 344
JTAG,385 controHabiIity, 132,332,364
LSSD,383 COTS, 54, 125
scan design, 383 coverage
fuH scan, 386 code, 344 408
partial scan, 386 fault, 291
scan domain, 387 table, 294
BuHt-In Test Equipment (BITE), 380 test, 149,294,333,363
coverage analysis
C backward fault analysis, 333
CAN Bus, 496, 505 forward simulation, 333
catastrophic event, 452 structural method, 333
checker, 441, 444 CRC, 412, 501, 503
code disjoint, 445 creation process, 22
self-chaching checker, 445 criticality analysis, 456
checksum code, 420 cyclic code 412
dient, 22 Cross-InterIeaved Reed-Solomon
code,402 (CIRC),416,502
code-preserving,441 Reed-Solomon, 416,502
code disjoint, 445 BCH,415
codeword, 405 coding procedure, 414
cold standby, 491 ESF,415
compaction. 290, 389 Fire, 415, 501, 508
compatible, 81
compensation, 473, 486 D
complete diagnosis sequence, 336 dangerous event, 452
complete distinguishing sequence, 300, dead-man, 435
336 debugging, 281
completeness, 80, 212 defect,44
compliance test, 283 degradation, 486
component, 30 delivered service, 19
composition relationships, 27 dependabiIity,9
compositional hierarchy, 31 attribute, 9, 14
computer aided maintenance (CAM), 287 impairments, 7,14,39
condition, 345 means,13
ConditionlDecision Coverage, 346 dependability assessment, 141
C/DC test, 345 attribute, 141
confidentiality, 10, 157 avaHabiIity, 154
confinement, 137,494 forecasting value, 142
conforrnity test, 211, 283 maintainabiIity, 152
consistency, 329 maintenance, 150
constraints exploitation value, 142
Index 659
o physical, 90
observability, 76, 132, 150,332,364 structural, 90
off-line testing, 134,279, 362, 366 property satisfaction, 238
maintenance testing, 280 prototyping, 214
production testing, 280 pseudo-random testing, 290
on-line testing (OLT), 134, 137,281,392,
427,485 Q
continuous, 428 qualitative
discontinuous, 427 criticality analysis, 456
context ofthe test, 432 dependabilityassessment, 143
context savinglrestoring, 432 risk assessment, 452
self-testing, 428 safety assessment, 456
operation, 5, 22, 29,125, 134 specification assessment, 212
output vector, 180,288 tolerance assessment, 476
quality, 1
p attribute, 2
parallel signal analyzer (PSA), 390 quality assurance test, 264
parity check, 412 quality control, 125,264,283
parity code, 409 destructive test, 283
partition, 300 non-destructive test, 283
pattern equivalent faults, 297, 300 quality metrics, 365
perturbation, 47 quality of the test sequence, 363
Petri net, 25, 27 coverage, 363
stochastic, 162 length,363
phase, 21 quantitative
physical property, 90 criticality assessment, 456
post-condition, 246,436 dependabilityassessment, 141, 159
pre-condition, 246, 436 risk assessment, 452
prime gate, 189 safety assessment, 453
probable event, 453
process characterization & control, 264, R
283 random testing, 290, 303
product, 17, 29 rare event, 453
product structure, 30 realization, 5, 24
component, 30 reconfiguration, 137,486
logicallink, 30 reconvergent fan-out structure, 328, 334
module, 30 recovery, 472, 485
sub-system, 30 backward, 476
production, 5, 21, 28, 124, 133 block, 482, 497
production testing, 133,264,283 cache, 478
program mutation, 352 forward, 482, 499
proof, 127, 248 point, 477, 479
propagation, 73 redundancy, 11, 135, 176,402
backward, 327 active, 13, 188,493
forward, 327 code,405
path,73 data, 470
property function, 470
analysis, 239, 244 functional, 179,436
behavioral, 91 domain, 402
generic,91 composition of modules
Index 667
yield,29