0% found this document useful (0 votes)
55 views

Error Detecting and Correcting Codes: Appendix A

This document compares several error detecting and correcting codes and redundant structures using reliability block diagrams. It provides tables showing properties of different codes like parity, m-out-of-n, and Hamming codes. The reliability block diagram is used to model redundant structures and analyze reliability. Common redundant configurations are examined, including structures with modules in parallel, m-out-of-n, double duplex, and offline redundancy. Their reliability equations and curves are presented and compared to a basic module.

Uploaded by

Ehtasham Jilani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Error Detecting and Correcting Codes: Appendix A

This document compares several error detecting and correcting codes and redundant structures using reliability block diagrams. It provides tables showing properties of different codes like parity, m-out-of-n, and Hamming codes. The reliability block diagram is used to model redundant structures and analyze reliability. Common redundant configurations are examined, including structures with modules in parallel, m-out-of-n, double duplex, and offline redundancy. Their reliability equations and curves are presented and compared to a basic module.

Uploaded by

Ehtasham Jilani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 143

Appendix A

Error Detecting and Correcting Codes

In this Appendix, we compare some redundant codes:


• the single parity code (separable),
• the m-out-of-n code (non-separable), optimal for m = rn / 21,
• the double rail code (separable), particular case of m-out-of-2m code,
• the Berger code (separable), optimal for r = rlog(k+ I)l,
• the modified Hamming code (separable) which is a basic cyclic code.
We use the following notations: N is the total number of codewords, k the number
of bits of the words to be coded, n the number of bits of the codewords, and r the
number of redundant bits (n = k + r). All these parameters do not apply when the
code is non-separable.
Table A.I gives the general features of these codes and their error model (S and
NS means Separable and Non-Separable).

code parity m-out-of-n Double Rail Berger Hamming


(m 12m) modified
k, n, m, r k = n 1- m=fn/21 k=n/2=m r= n=2(r-l)
rlog(k+l)l
Separable S NS S S S
N 2k
n
2k 2k 2k
(m)

errors odd unidirectional . unidirectional unidirectional double detected


detected , multi. / I rail single corrected

Table A.l. Basic properties

527
528 AppendixA

Table A.2 shows the evolution of the number of codewords that can be made
when n increases. Cells noted '-' correspond to situations without interest or
impossible.

parity m-out-of-n Double RaU Berger Hamming


n
(m 12m) modified
4 N=8 6 4 4 -
5 16 10 - 8 -
7 64 35 - 16 -
8 128 70 16 32 16
9 256 126 - 64 -
10 512 252 32 128 -

11 1024 462 - - -
12 2048 924 64 256 -
16 32768 1287 256 4096 2048
19 262144 92378 - 32768 -
21 1048576 352716 - 65536 -
32 2147483648 601080390 65536 134217728 268435456
36 34359738368 9075135488 262144 214783648 -

Table A.2. Evolution with n of the number of codewords N


Appendix B

Reliability Block Diagrams

The Reliability Block Diagram is a very simple model used to represent


redundant structures and to analyze their reliability. It was one of the first tools to be
employed, and it remains pedagogically very interesting. We have introduced the
principles of 'series' and 'parallel' redundant structures in Chapter 7. The aim of this
appendix is to bring some complements on the reliability analysis of non-repairable
redundant structures, with simple hypotheses. We assume that the modules have
exponential reliability laws with constant failure rate. The reliability of the reference
module is:
RO =e -At, which gives MTTFO = 1/A..

1. ON-LINE REDUNDANCY
All modules are operating in parallel. As far as k of them are faultless, the system
functions correctly. The value of k depends on the technique used. This corresponds
to a passive redundancy according to the observability of the errors affecting each
module.

Structure n modules in parallel


The system does not fail as far as one module is faultless. In the general case of n
modules, we have: (1 - R) =n i (1 - R j ), MTTF =MTTFo . Li (1/ i), where R is
the global reliability, R j the reliability of module i.
Hence, for n =2 (Figure B.1), R j =Ro; this gives:
R =2 Ro - Ro 2 = 2 e -AI_ e -2).1, MTTF = 1,5 MTTFo.
Numerical value: if A. = 10-4 , t = 103, then, R(103 ) =0.9909.

529
530 Appendix B

For t = 103, the reliability of the reference module is Ra (l 0 3) = 0.9048; hence,


R( 103) > Ro{lO\

Figure B.l. Structure 2 modules in parallel

Structure m-out-of-n
The system does not fail as far as m modules are faultless. The outputs are
elaborated by a m-out-oJ-n voter. Hence, the global reliability is:
n
R =( L (~ ) R o m (1 - Ra) n.m) R v , where R v is the reliability of the voter.
i=m l

a) 2-out-of-3 b) 3-out-of-4

Figure B.2. Two examples of structures m-out-oJ-n

a) Structure 2-out-of-3 (calIed TMR)


This technique, illustrated by Figure B.2-a), has been presented in Chapter 18 as
the trip/ex or TMR. The voter elaborates the final outputs from the outputs of the
three modules. This module is supposed hereafter to be faultless. Hence, the
reliability can be simply determined by enumerating the disjoined cases of good
functioning:
• one case where all modules are faultless (probability: R3 ),
• three cases where two modules are faultless and one module is failing
(probability: R2 (1 - R».
Reliability Block Diagrams 531

With a perfect voter, R = R03 + 3 R 02 (1 - Ro) = 3 R02 - 2 R0 3 = 3 e -21.. 1 - 2 e -31.. \


MTTF =(5/6) _MTTFo.
The MTTF of the TMR system is lower than the MTTF of the basic module.
However, these two reliability curves have an intersection point (see Figure B.5) : for
missions of small duration, the TMR has a better reliability, and for greater mission
duration, the basic module is better.

b) Structure 3-out-of-4
This structure is represented by Figure B.2-b. For a perfect voter, we have:
R -- 4 R 0
3 - 3 R 0 4 -- 4e -31.. 1 - 3e -41.. 1 .

Structure Double Duplex


Four modules are associated as two pairs. One of these pairs is connected to the
process, while the second one is in standby. As soon as a failure is detected on the
active pair, the second one replaces the defective pair. The detection results from the
comparison between the two outputs of a pair. The reliability of this quadri-
redundant structure corresponds to the survival probability of one of the two pairs:
P«l.1 AND 1.2) OR (2.1 AND 2.2» = P(1.1 AND 1.2) + P(2.1 AND 2.2) -
P(l.1 AND 1.2) . P(2.1 AND 2.2) = P(1.1).P(1.2) + P(2.1).P(2.2) - P(1.1).P(1.2) .
P(2.1).P(2.2) (all probabilities are independent).

Hence, as all modules have the same reliability: R =2 R02 - R04 =2 e -21..1 _ e -41..1.

Figure B.3. Double Duplex

2. OFF -LINE REDUNDANCY


An off-line redundant structure pos ses ses n element:, one of them is connected to
the process, while the (n - 1) other modules are in off-line or cold standby. We
assume that faults only occur in active modules; thus, the reliability of the redundant
standby modules is supposed to be perfect. When the active module is failing, it is
532 Appendix B

replaced by a standby module. The detection and reconfiguration mechanism is not


considered in the reliability block diagram: it is supposed here to be faultless.
As the events are not independent, the reliability calculus is made easier by the
use ofthe Laplace transform. We obtain:
n
R =e 'AI. L «A, t) i-I) /( (i -1) !), MTTF =n ,MTTFo.
i =1

Special case: n = 2
R = e ,A.I + A, t e ,A. I, MTTF = 2 . MTTFo,

Numerical value: if A, = 10,4 and t = 103, then R =0,9953 .

2
'--_-' Standby

b) n = 2
n

a) general case
Figure BA. Off-line redundancy

3. COMPARISON OF SOME STRUCTURES


To conclude this Appendix, we show in Figure 8.5 the reliability curves of some
redundant structures:
• TMR,
• 2 modules in parallel,
• 3-out-of-4,
• Double Duplex,
• and off-line redundancy with n = 2.
These curves are drawn for A= 10'4, and they are referred to the reliability of a
single module called 'basic module'.
Reliability Block Diagrams 533

R(t) 1
Redundant Structures
1. Basic Module
0,9
2. TMR
0,8 3. 1-out-of-2
4.3-out-of-4
5. Double-Duplex
6. Standby 2

time
(hours)

Figure B.5. Reliability of some redundant structures


Appendix C

Testing Features of a Microprocessor

All semiconductor manufacturers are very much interested in the dependability


features of the components they produce (ASICS, microprocessors, micro-
controllers, etc.). Naturally, this interest deals with all design and fabrication aspects
of their products: use of fault prevention and fault removal techniques during
specification, design and fabrication stages. In this general dependability framework,
production but also maintenance test has apredominant place. In particular, design
for testability techniques are integrated in the design process. We have already
mentioned the present IEEE 1149-1 boundary scan standard.
Some semiconductor companies go a step ahead in that direction and consider
critical applications with the use of redundant microprocessor structures. We report
in this appendix some very general information about the Pentium microprocessor,
interpreted from an Intel documentation. The interest of this presentation is to show
the rather great variety of dependability techniques offered by a general purpose
integrated circuit. Naturally, such an approach can be found in many other
concurrent circuits such as those produced by Motorola, AMD, etc. We will identify
three main levels of means: Aid to the debugging of microprocessor applications,
Off-Line testing, and On-Line testing.

1. DEBUGGING AID
During the debugging of an application running on a microprocessor, it is
necessary to understand the execution of the implemented programs. Like all
processors, the Pentium offers a debugging mode called 'Probe Mode' which allows
accessing from the outside to the internal registers, to the system memory 1/0
spaces, and to the internal state of the microprocessor. The Pentium has 4 debugging
registers used to insert breakpoints. Moreover, the specialist in charge of the
debugging can access to internal counters that records some events of the internal
evolution. All these features obviously belong to the design verification group of
techniques.

535
536 Appendix C

2. OFF-LINE TESTING
The Boundary Scan (IEEE 1149.1) standard has been implemented in the
microprocessor for testing at 'global board level'. This means that in any system
comprising a microprocessor connected to other circuits (such as memory unit,
interface circuits, etc.) on a PCB, it is possible to access through the Pentium to
these others circuits in order to apply test sequences to them and to collect the
resulting outputs.
The following pins of the test bus are accessible: TCK, TDIITDO, TMS, TRST, as
weIl as the test logic (the TAP automaton).
Finally, the Pentium integrates a BIST procedure that is automatically executed
when the microprocessor is switched on. This off-line testing procedure is called
Reset Self-Test but in reality the term 'self-test' refers here to a Built-In Self-Test
technique. Intel announces that this integrated test covers 100% of the single stuck-
at Oll faults of the Micro-Code PLAs, memory caches (instruction and data caches),
and some other internal circuitry (TLB, ROM).

3. ON-LINE TESTING
This component also offers on-line testing features:
• internaion-line error detection, thanks to error detecting codes,
• redundancy capability allowing Duplex redundant structures.

Error Detection
During the functioning of the Pentium, some error detection mechanisms are
activated by a specialized automaton called the Machine Check Exception. These
errors are revealed by the use of single parity error detecting codes:
• single parity test on the Data Bus (DATA PARITY):
64 bit-Data Bus + 8 parity bits (one bit per byte of data),
• single parity on the Address Bus (ADDRESS PARITY):
32 bit-Address Bus + 1 parity bit,
• some other internal parity codes.

Microprocessor Redundancy
Finally, the circuit has been designed in order to allow a Duplex redundant
structure to be easily implemented, thanks to the Functional Redundancy Checking
(FRC) technique. Figure C.I illustrates this technique.
A 'Master' microprocessor performs the normal functioning of the application
and is connected to the extern al process.
A second microprocessor, called 'Check', plays the role of an observer. When an
error is detected (by simple comparison of the two functions), the output signal IER
is activated, calling for an external action (alarm, switch-off, recovery, etc.). The
commercial document of Intel ensures that more than 99% of the faults are thus
detected.
Testing Features of a Microprocessor 537

IER
error
signal

Figure C.l . Duplex Structure


Appendix D

Study of a Software Product


Ariane V Flight Control System

The first launch of Ariane V led to the destruction of the rocket, due to a failure
of the embedded computing system. Whereas most of the firms whose projects
failed had hidden the causes, the CNES (French national space agency) provided
numerous pieces of information whose study concurred in the improvement of
knowledge on dependability. The following presentation is based on the published
documents. The analysis developed in this appendix must above all strengthen the
opinion that the mastering of faults in complex computing systems is very difficult,
illustrating this idea on areal example.

1. FAlLURE SCENARIO
A simplified view of the architecture of the computing system embedded in the
rocket is provided in Figure D.l .

SRI OBC

Engines

fromAriane 4
SRI =
Inertial Reference System
Running Expecting OBe =On-Board Computer

Figure D.l. Architecture of the Control System of Ariane V

539
540 AppendixD

The engines (Vulcan main engine and boosters) are controlled by the OBC (On-
Board Computer) which receives data from various sensors which are autonomous
complex sub-systems. The SRI (Inertial Reference System) is such a sub-system. It
provides flight data concerning the rocket position. The OBC as weIl as the SRI
have redundant hardware boards based on a recovery block. The first hardware
board is in operation till an error is detected; then, the second board replaces the first
one. These hardware systems execute complex software real-time applications using
a multitasking kernel. The programs executed on the two hardware platforms SRIl
and SRI2 or OBCI and OBC2, are the same.
Among its numerous treatments, the program of the SRI (SRIl or SRI2) calls a
function which make a conversion between areal value expressed in a particular
format and an integer value. This function was previously used in the software
managing the flight control of Ariane IV. Being dependent on the acceleration, the
actual values handled by this function at Ariane IV launch time were in a given
range. Unfortunately, the acceleration of Ariane V being higher, the conversion
function was called with a value out of this range. This situation raised an exception
during the function execution.
The fault-tolerance mechanism implemented in SRIl handled this erroneous
state, switching on the SRI2 redundant system. Executing the same program, the
same exception raised. Its handling by SRI2 consisted in communicating a diagnosis
data to the OBC before switching off the SRI system. Thanks to this information, the
OBC should continue the flight in a degraded mode, for instance extrapolating the
evolutions of the rocket positions. Unfortunately, the diagnosis data communicated
by the SRI2 were interpreted by the OBC as a flight data. Thus, the OBC reacted by
swiveling the engines.

2. ANALYSIS

2.1. Fault Diagnosis


The first question raised is "who is responsible?" that is, "where is the fault?"
The conversion function seerns to be the obvious guilty: it was unable to convert the
given data. Is it so simple? We mayaiso consider that the failure comes from:
• the OBC because it interpreted a failure identification as a flight data,
• the OBC which did not perceived these flight data as erroneous, for instance, by
likelihood check,
• the SRI2 as it did not provide correct flight data,
• the SRI which did not tolerate a software fault,
• the conversion function which raised the exception,
• the too important acceleration of the rocket at launch time,
• the development team which did not detect the presence of a fault,
• the managers who required the reuse of this part of Ariane IV, etc.
Study of a Software Product 541

So, it is difficult to adjudge the fault to apart of the system or to a partner of the
project. However, this example illustrates several aspects highlighted in the book.

2.2. Fault Prevention


At first, the specification is an important phase of the development. The
specification model must define the role of the system but also the domain in which
the services will be provided. For instance, the constraints associated with the values
of the input parameter of the conversion function were not precise.
Secondly, the presence of redundant elements may be dangerous if redundancy is
not mastered. For example, the conversion function input presents a large functional
redundancy: the integer and real types constitute the Uni verses whereas the Static
Domains were reduced to ranges. On the contrary, the use of one output parameter
of SRI for two concepts (the flight data and the diagnosis data) was possibly due to
performance reasons. As several times mentioned, the dependability requirements
are often against the performance requirements (time, memory, costs, etc.). So, a
compromise must be found to develop industrial dependable real-time systems.
The fact that the conversion function is a component successfully used in Ariane
IV shows the difficulties of reuse. Apriori, the reuse of a component increases the
reliance which can justifiably be placed on the service it deli vers, that is, its
dependability. However, the justification of this reliance was obtained for a specific
functional and non-functional environment; this reliance was not preserved when the
environment changed.
The Ariane V control system is a complex system in which numerous elements
interact: hardware platforms interact with software applieations to detect errors and
to handle them (switching from the initial platform to the redundant one), complex
coupling between the sub-systems SRI and aBC, ete. These complex interactions
are at the origin of numerous faults in the recent systems which use the integration
of sub-systems. Each sub-system operates correctly, whereas errors occur when the
sub-systems interact each other. Most of the errors propagated during the first flight
of Ariane V are coming from integration issues: SRI2 interprets the exception raised
by SRIl as a hardware failure and aBC interprets the diagnosis data as a flight data.

2.3. Fault removal


The reader is probably amazed that the fault of the eonversion function was not
detected during the reviews and test procedures. Probably, as previously mentioned,
the reuse of the SRI was the cause of less eheeking. The successful use of a
component during several years increases an unjustified reliance on the component,
and then decreases the time and the money spent for its testing.
To be efficient, the fault detection in a component requires the handling of
information on the component domain. In particular, the coverage of 100% of a
structural testing may not detect any faults due to use out of the domain. The
structural test should take into account the domains of the components used to
implement the system: can a sequence lead the system operation to reach states out
ofthe domain? Consider, for instance, the following statement:
K :=. . I * J. .,
The assessment of the program must detect values of land J such as I*J
provokes an overflow, that is, the result of this multiplication is greater than the
higher integer which can be expressed by the run-time resources.
542 AppendixD

FinaIly, the complexity of the global system and of the physical devices of its
functional environment often limits the integration testing.

2.4. Fault Tolerance


After the failure, most of the criticisms concerned the exception mechanism
which raises the error. Several persons proposed the suppressing of the raising to
continue the execution. This viewpoint is dangerous. The absence of error detection
does not prevent the occurrences of errors; it just allows masking them. In this case,
as weIl as when the exception handler consists in doing nothing, the behavior of the
system is hazardous and causes an uncontrollable contamination of the errors in the
system. This situation is Illustrated by the relationships between the SRI and the
ÜBC: the SRI signals its failure considered as anormal data. The use of an
exception would force the ÜBC to take this information into account.
The wrong but not detected communication between SRI and ÜBC also shows
the importance ofredundant elements to detect error on-Une. For instance, the use of
likelihood checking certainly would have detected the flight data inconsistency. In
the same way, redundant information would be useful to detect that the exception
raised in SRIl was not due to a hardware failure but software one. This diagnosis
probably leads to another reaction to handle the error.
The described scenario presents a contamination of errors: from the conversion
function to SRIl then to SRI, ÜBCI and the engines. To handle the problems
coming from integration of components, the designer must pay a special attention to
the error conjinement.
Finally, the fact that only hardware faults were tolerated illustrates that engineers
or managers assume that the problems come from the aggressions of the
environment. üf course, these causes exist, particularly for spatial applications.
However, the Ariane V failure shows that the human is also often at the origin of the
faults.
Appendix E
Answer to the Exercises

FIRST PART
Exercise 3.1. Failures of a drinks dispenser
1. Static failure: the money change or return operations are incorrect.
2. Dynamic failure: when the machine has delivered a eoffee, the red light stays on
one minute before authorizing the next drink to be selected.
3. Temporary failure: this morning, the machine was unable to deliver tea.
4. Static and persistent failure: the ~$ coins are no longer accepted by the machine.

Exercise 3.2. Faults of a drinks distributor


1. Examples of functional and hardware faults.
Functional fault. The machine does not test the state of one of the resources
(coffee, tea, chocolate, cup, sugar, and spoon). Consequently, the service is no
longer delivered to the user who has paid and selected his/her drink. He/she obtains
an empty cup. This is a persistent and static failure. Thus, the 'money manager'
automaton which uses the state diagram is correctly functioning. On the contrary,
the 'drink delivery' automaton which interprets the orders and delivers the drink
does not execute the order.
Hardware fault. A fault of the tea selection button may lead to a quite different
failure. If this button become inactive, the machine does not receive any tea
selection order, so it stays in the 'seleetion' state, waiting for a user selection. This
user who would like to drink tea has to cancel or select another drink (for example a
coffee). The finite state machine cannot directly pass from the global 'selection'
state to the 'delivery' state allowing for a tea selection.
This fault is equivalent to the cut of the are connecting the state 'selection' to the
state 'delivery' for a tea selection.

543
544 Appendix E

2. Money management alteration. Money is implied in states 'payment', 'cancel'


and 'change'. Hence, functional and hardware faults altering this money service
are to be found in these states. The panel of possible faults is rather large: coin
rejection, impossibility to cancel a drink, incorrect money change, etc.
3. Functional transformation. The new proposed functionality can be obtained by
modifying the global functional state graph as shown in Figure E.l. When the
drink is delivered, its price is subtracted from the value of the total amount of
money introduced in the machine; then, the system comes back in the 'selection'
state. The user can then choose a new drink or order the return of the remaining
money (by pressing the 'cancel' button).

Modifled Graph

Figure E.l . Initial graph and modified graph

Exercise 3.3. Study of a stack


1. In order to simplify the study, we suppose that the real size of the stack is only
10 objects. We will now study some faults.
Design fault. The real size of the stack has been underestimated (for instance, 10
locations only), whereas the storing of 15 objects is envisaged. If 15 objects are
pushed, the fault produces a failure. Assuming that the stack memorizes integers, let
us consider the prograrn:
for i varying from 1 to 15, loop
PUSH A(i);
end loop;
If the Stack_Full mechanism is used, the preceding treatment will stop at the first
stack overflow. This overflow can raise an exception in the case of a software
implementation. Hence, the fault provokes a failure which is detected.
Hardware fault. If a breakdown affects the Stack_Full signal and maintains it
at '0' (no signaling) despite excessive stacking, the calling program can send too
many different values to store. In the case of the preceding program, what will
occurs after the 10th value sent to the stack?
If the stack refuses to store more than 10 values, 5 values will remain not stored.
The stack could accept all coming values and store the 5 values 11 to 15 at the 10th
memory address. Or else, the stack could return to the first address and store the
values 11 to 15 at the addresses 1 to 5, hence erasing the preceding stored data.
Answer to the Exercises 545

These possibilities depend on the implementation of this stack: hardware with gates
and registers, or simulation of the stack in the main memory.
A test sequence detecting this fault could be to Push 15 integers (from 1 to 15), and
then to Pop 15 values and to compare them with the initial values. Let us note that
this test sequence also detects the previous functional design fault.
External fault. The user of the stack ignores the signal indicating an overflow. The
sequence given for the case of design fault will transform this fault into a failure. In
both cases, the failure is the same, (but without any detection).
2. Anormal use of a stack is to apply a same number of Push operations than Pop
operations. If a fault provokes the application of a Pop action to an empty stack,
a failure occurs. The situation is very similar to the overflow studied in the
previous question, and the use of Stack_Ernpty signal allows the detection of
such situation.
Exercise 3.4. Study of a program
For an addition such as Exp1 + Exp2, where Exp1 and Exp2 are two arithmetic
expressions, the compiler generates a sequence of executable instructions allowing
the evaluation of these expressions. The two obtained results are placed in two
distinct registers. Then, the compiler adds an instruction which perform the sum of
the content of these registers. However, the programming language does not define
the order of evaluation of these two expressions: one can first evaluate Exp1, then
Exp2, or the opposite! This means that, for our example, we will compute first F1,
then F2, or the opposite.
Let us exarnine the functioning with '1' as the initial value of A.
After execution of F 1, A = 2, which is also the value returned by F 1. Then, after
execution of F2, the value of A and the value returned by F2 are equal to 4. Hence, B
will then take the value: 2 + 4 = 6. On the contrary, if F2 is evaluated first, the value
of A and the value returned by F2 are equal to 2 (A being initially equal to 1). Then,
the execution of F1 returns 3. Hence B will be equal to 2 + 3 = 5.
Consequently, according to the executable code generated by the compiler, the final
result of B is either 6 or 5!
What could be concluded from this analysis? The addition is a commutative
operation, so both interpretations of the compiler are acceptable. However, this
commutativity property is only effective for the addition of values (i.e. 5 + 3 = 3 +
5), and not for the addition of expressions having 'side effects' (in our example, the
execution of Fl and F2 modify A). Thus, a possible failure (only one of these two
interpretations is expected) may result from the fact that the designer does not know
how the technology he/she uses will operate on the source code. Here, the
technology deals with the implementation of the prograrn by the compiler.
Exercise 4.1. Latency of an asynchronous counter
The MSB (Most Significant Bit) will normally switch to the value 'I' after 8 clock
pulses. Hence, the latency is equal to 8 x 2 ms = 16 ms.
The fault will lead to a failure that remains 8 clock pulses and then disappear.
Exercise 4.2. Latency of a structured system
Error # 1: 10ms, error #2: 110ms, error #3 = failure of the system: 140ms.
546 AppendixE

Exercise 4.3. Consequences of failures


Mean cost = (2 x 0 + 3 x 5 .103 + 2 x 6 .103 X 4 + 3 (103 + 3 .103 X 4» /10.
Mean cost = 10.2 ku.
Exercise 4.4. Fault - Error - Failure in a program
1. As the actual last right page number is 325, the expected result is (325+ 1)/2 =
163 sheets.
If the faulty expression provides 326 instead of 325, the result computed is
(326+1)/2 = 163, taking the integer division semantics into account. So, the
result is good: no failures occur.
Considering 327, the returned value is (327+1)/2 = 164. The procedure execution
fails. The same result and conclusion is obtained with 328.
2. An error characterizes an unacceptable state or state evolution occurring at run-
time. The states being characterized by values taken by attributes, consider Last-
Right-Page as attribute. ''The last right page number of a book is odd" is a
property. Therefore, no error is detected in the first case (326), no error is
detected in the second case (327), and an error is detected in the third case (328).
3. Conclusion. Consider Table E.I which synthesizes the three cases. A fault may
provoke an error which may provoke a failure. In the first case, the fault is
activated as an error, but the last statement tolerates it (no failure occurs). In the
second case, no error is detected, as the observation means is not sufficient. The
property which characterizes the error is not accurate enough: all even values are
not correct values. However, the program fails. The last case is the conventional
one: the fault activates an error, which propagates as a failure.
case fault error failure
detection
1 yes yes no
2 yes no yes
3 yes yes yes
Table E.l. Fault - Error - Failures cases
Exercise 5.1. Faults of a MOS network
The determination of the logical expression of a 'structured' MOS network can be
obtained by an iterative decomposition of this network into 'series' and 'parallel'
sub-networks, till reaching the basic MOS components.
1. We perform this analysis for the fault-free network and for the two faulty
networks:
• Faultless circuit: R =a b' + b c,
• Circuit with fault FI: RI =b c,
• Circuit with fault F2: R z = (a + c) (b + b') = (a + c).
The 3 corresponding functions, N (faultless circuit), NI (with fault FI) and N z (with
fault F2), are shown in Table E.2. The output takes the same values for four input
configurations only: 000,010,011, and 111.
Answer to the Exercises 547

abc N NI N2 N3 N4
000 0 0 0 0 1
001 0 0 1 0 1
010 0 0 0 0 0
01 1 1 1 1 1 1
100 1 0 1 1 1
101 1 0 1 1 1
1 10 0 0 1 0 0
111 1 1 1 1 1
Table E.2. Normal and erroneous functions

2. N z becomes N 3 = (a + b) . (b' + c) = a b' + a c + b c =N, without fault. Thus, this


fault has no influence on the function performed by the network (see Table E.2).
Let us note that in the faultless circuit, the permutation between transistors
controlled by band c has no influence on the functioning. However, the same
fault F z will have different effects, (failures) according to the chosen network!
3. Fault F3 : the function becomes N4 = b' + b c = b' + c, shown on the previous
table. It induces two failures highlighted in bold.

Exercise 5.2. Faults of a full adder


1. Functional fault F]. The 'sum' output (S) is not modified, but the 'carry' (C)
becomes Cl = a b + a' b' c. There are three failures on the carry output, for
abc =001, 101 and 011.
2. Hardware fault F 2 • The c1assical Stuck-At 0/1 fault model supposes that a fault
occurring on a gate input line has no backwards effects. Here, the NAND gate
receiving a and b is not altered by the fault a.
The input b no longer acts on the output S, producing 4 failures. The carry
function becomes Cz =[(ab)' . «a Et> O).c)']' =a b + a c, instead of ab + ac + bc.
A failure occurs for a' b c = 1.

abc CS CI SI C 2 S2
000 00 00 00
001 01 11 01
010 01 01 00
01 1 10 00 01
100 01 01 01
101 10 00 10
1 10 10 10 11
111 11 11 10

Table E.3. Truth tables: without fault and with faults FI and F2
548 AppendixE

Table E.3 gives the output values without fault and with faults F 1 and F2• The
values noted in bald characters show the failures.
3. The two faults produce quite different failures. It is possible to distinguish
between these faults by applying to the circuit an input vector such as Oll. The
diagnosis is as folIows:
• if the output C only is erroneous, then fault F 1 is present,
• if outputs C and S are erroneous, then fault F 2 is present,
• if both outputs are correct, none of these two faults is present.

Exercise 5.3. Fault models and failures


1. We draw the truth table associated with each fault. Thus, the erroneous function
for c stuck at 0 (noted co) is z =a b; it provokes 3 failures for the input vectors
001, 011, and 101. We observe on Table EA that faults aO, bO and Jl are
equivalent, and that faults d l , Cl and Zl are equivalent.

abc z aO a l bO bl CO Cl dO d l ZO Zl FFI FF2


000 0 0 0 0 0 0 1 0 1 0 1 1 1
001 1 1 1 1 1 0 1 1 1 0 1 1 0

-010 0 0 1 0 0 0 1 0 1 0 1 1 1
011 1 1 1 1 1 0 1 1 1 0 1 1 0
100 0 0 0 0 1 0 1 0 1 0 1 1 1
101 1 1 1 1 1 0 1 1 1 0 1 1 0
110 1 0 1 0 1 1 1 0 1 0 1 0 0
111 1 1 1 1 1 1 1 1 1 0 1 1 0
Table E.4. Correct and erroneous functions

2. Let us assume that the functional faults can affect each gate by transforming it
into any other gate type: AND, OR, NOT, NAND and NOR. We illustrate these
faults with two cases (Table EA):
- FFI which transforrns the AND gate into a NAND gate:
z = (a b)' + c = a' + b' + c.
- FF2 which transforms the OR gate into a NOR gate:
z= (a b + c)' =a 'c' + b'c'.
These two failures do not belong to those induced by the stuck-at fault model. Going
further, we can wonder if this functional gate-transforming model is able to
complement the set of all theoretical failures (255 classes!). The answer is not: for
instance, the erroneous function Z = a b' + b c' cannot be obtained with these fault
models. Now the question not answered here is:
Can such ajailure occur, andjrom wh ich technologicalor junctionaljaults?
Answer to the Exercises 549

Exercise 5.4. Faults of a sequential circuit


From the circuit, we can write the logical expressions of the D-inputs of the Flip-
Flops; then, we deduce the transition table and the state graph shown in Figure E.2 .

Correct Functioning
z= 0 0/1
state yl y2 DID2
x 0 1
1 00 01 01
2 01 11 10
3 11 00 00
4 10 01 11
z=l 1 z=O

Figure E. 2. Transition table and state graph of the correct circuit

1. The transformation of gate A into a NOR modifies the logical expression of D2


which becomes D2 = y2' + x'+ yl'. Figure E.3 shows the new transition table and
state graph. We observe that two transitions are modified: the arc joining state 2
to state 4 when x = 1 is now going to state 3, the are joining state 3 to state 1
when x = 0 is now going to state 2.
This analysis led us to represent the initial funetional fault at 'state graph level' by a
new fault model (arc modification). If we suppose that state 1 is the initial state, then
by applying the input sequence <0, 1>, the cireuit goes into state 2 and finally state 3
instead of state 4 ; the final output is z = I instead of z =0 : hence a failure oeeurs.

Functional Fault
z= 0 0/1
state yl yl DIDl
x

z=l 1 z=O

Figure E.3. Influence of the functional fault

2. The 'stuck-at l' fault noted a modifies the logieal expression of Dl which
becomes: Dl = yl.y2' + yl' .y2. Figure E.4 shows the new transition table and
state graph. Only one transition is modified : the are joining state 4 to state 2
when x = 0 is now going to state 3. Here also, we have transformed the hardware
fault model into a graph fault model.
550 Appendix E

If we apply the input sequence <0, 1, 0> to the initial state 1, the system goes into
states 2, 4 and 3 instead of state 2. However, no failure occurs at the output z! A
failure is produced if a new vector x = 0 is added to this sequence: the incorrect
circuit reaches state 1 instead of state 3 and gives a final output z =0 instead of 1.

Hardware Fault

z =0 0/1
state yl y1

z=l 1 z=o

Figure E.4. Influence of the hardware fault

Exercise 5.5. Software functional faults


1. FauIt analysis: Line 5: Sum := A( i ) - Sum;
The Sum is iteratively subtracted from each value A ( i ) . Then, we divide the resuIt
by the number of values. We obtain a final Sum value = -15 and a final Average
value = - 3.75 instead of + 3.25. The difference between the correct value and the
erroneous one is quite important; hence, the external consequences of such a failure
can be serious. However, this difference depends on the values stored in the array A.
For example, if we add a fifth figure equal to 0, the erroneous average becomes 3
instead of 2.6: thus the difference is only 0.4 in that case!
Line 7: return Sum / (A'last - A'first);
This fault provokes a bad counting of the total number of numbers to be averaged:
correct number minus 1. The seriousness of the resulting failure decreases with the
number of values to be considered. Consequently, this fauIt has more 'regular'
effects than the preceding one.
2. The result becomes - 7.375. The performed mathematical function is
transformed:
T= 2'\ A(4) - 2- 2 A(3) + 2. 3 A(2) - 2. 4 A(l).

Exercise 5.6. Software technological faults


The proposed program converges when N increases, whereas the series
mathematically diverges. This failure comes from the Iimited precision used to
represent the floating numbers in computers. When I reaches a certain value, 1.0 /
f10at (/) is computed as 0.0. This situation is an example of technological fault, as
the real number representation differs from their mathematical definition.
Let us note that for most programs, this fault actually exists but has no serious
effects, even if division operations are used. Such a situation occurs in exercise 5.5,
as the value of N does exceed the precision limit.
Answer to the Exercises 551

SECONDPART
Exereise 7.1. The 'fault - error - failure - detection - repair' eyde
1. Figure E.5 shows the interpreted cycle:
Le J: latency of the fault according to the occurrence of the first error,
Lf. latency of the fault according to the occurrence of the failure,
D: detection time, R: repairing time,
SF: mean time of good functioning to the occurrence of a fault.

product product product


availtlble lIoll-tlVlIÜIlble tlVfliltlble

I I
t TIME
I
I
~

fault error detedion repalr fault


faDure diagnosil
l.e1 _ _+
U ~
~ ~ ~
D R SF

Figure E.5. Cycle of a repairable product

2. MTBF study. MTBF = LI + SF: mean time to the occurrence of a failure.


The integration of the latency phenomenon increases the MTBF/MTTF:
MTTR = (D - LI> + R.
The availability rate of this system is: (SF + LI> / (D + R + SF).
Exercise 7.2. Reliability of a eomponent
The component follows an exponential reliability law with a constant failure rate A..
1. We perform adefinite mathematical integration of R(t) = e'~, from 0 to 00. The
mean time is 1 / A., that is to say 106 hours.
The reliability at the mean time is R(l/ A.) =e -1.
2. dR / dt = - A. R. It is the derivative of the survivallaw, that is to say the failure
density at time t. The tangent at the origin is given by the equation: y = A. x + 1;
this line met the abscissa at time l/A..
3. A. = - (dR / R) / dt. It corresponds to the conditional probability of a fault
occurring at time t during a time unit (1 hour).
4. The second version is more reliable than the first one, as it has a smaller failure
rate: R2(104) / Rl(104) =e 0,099 = 1,1041.
Note. The failure rate has been multiplied by 10, but the reliability at time 104 H is
multiplied by 1.1 only.
552 Appendix E

Exercise 7.3. Composed reliability


1. Series diagram. The global reliability function is the product of the reliability
functions of the constituting modules:
R(t) = RI (t) . R2(t) = e -Alt. e -Alt = e (-A1+ Al) t.
Hence, the faiIure rates of the components are added:
'A = 'Al+ 'A2, MTBF = 1 / ('Al+ 'A2).
If 'Al= 'A2, the MTBF is divided by 2. We note that the value of the MTBF is
inversely proportional to the number of components (if they are identical).
2. Parallel diagram. 1 - R(t) =(1 - RI(t» . (1 - R2(t».
Thus, R(t) =RI(t) + R2(t) - RI(t).R2(t).
MTBF (or MTTF) = 11 'Al + 1 / 'A2 - 1/ ('Al + 'A2).
Ifthe two components are identical, R(t) =2Ro - R 0
2 (where R o is the reliability of
each one). MTBF = 1,5 MTBFo.
3. Reliability of one module: Ro (103)=0,9048.
Series diagram: =e -0,2 =0,8187 -+ the reliability is smaller.
R( 103)
Parallel diagram: R(103) =0,9909 -+ the reliability is higher.
Exercise 7.4. Comparison of two redundant structures

a) Series- Parallel b) Parallel-Series

Figure E.6. Two redundant structures

1. Structure 'parallel-series':
Rps =(1 - (1 - RI) (1 - R2» (1 - (1 - R3) (1 - R4»,
Rps =R 4 - 4 R3 + 4 R2, if Ri =R.
Structure 'series-parallel':
1 - Rsp =(1 - RI. R3) (1 - R2 . R4),
Rsp =2 R 2 - R4, if Ri =R.
2. Comparison of the two structures:
Rps - Rsp = 2 (R 2 - R)2 which is always positive; so Rps > Rsp.
Thus, the first structure is always more reliable than the second structure.
Answer to the Exercises 553

Note. As the faults altering the modules are independent, these reliability results can
also easily be determined by the composition reliability theorems. For example, for
the PS structure we have: Rps = P«l or 3) and (2 or 4)) = P«(1 or 3) . P (2 or 4)) =
(P(l) + P(3) - P(1).P(3)).(P(2) + P(4) - P(2).P(4)) = (2R - R2)2, if all modules have
the same reliability R.
Exercise 7.5. Safety analysis by a Markov graph
The evolution matrix which gives the probability to pass from astate to another
(with a sampling rate expressed by hour) is shown in Figure E.7. After two
elementary periods (hours), the probability to reach state 4 (considered as
dangerous) is equal to pl.p3 + p2.p4. The raising of this matrix to the successive
power of 2, 3, etc., gives the progression of the probability values to reach this
dangerous state (hour after hour). As this system does not posses any regeneration
mechanism, all parameter values always increase and are bounded by 1; this means
that the degradation probabilities increase with time.

I~I I~2 I~3 I~ 41 [(1- pl- p2) p2 pi 0 1


p= [ 2~1 2~2 2~3 2~4 = r2 (1-p4-r2) 0 p4
3~1 3~2 3~3 3~4 rl 0 (1-p3-rl) p3
4~1 4~2 4~3 4~4 0 0 0 I

Figure E.7. Evolution matrix

Exercise 7.6. Representation of a system by a stochastic Petri net


Figure E.8 shows the failing and restoring mechanisms of this system. When an
active unit fails and if the spare is available, the spare unit replaces the failing unit
with a rate p. This failing unit is then symbolized by a token in place P5, waiting for
repairing (with rate f.1). Then, it is considered as the new spare unit (a token in place
P3). The spare unit is submitted to failures and repair with rates Ä.s and f.1s.

Spare avaiJabk 1'3 As

Active wriJs

Spare down
t.. TI

Figure E.8. Stochastic Petri net

The analysis of this graph can be performed by means of a finite state machine (non-
parallel model), called the marking graph, which shows all possible evolutions from
the initial state (3 tokens in PI and 1 token in P3). We can notice that the total
number of tokens is constant.
554 AppendixE

Example 0/ evolution:
(Pl=3, P3=1) - (Pl=2, P2=1, P3=1) - (Pl=3, PS=l, P3=0) - (Pl=3, P3=1), etc.
Exercise 7.7. Fault Tree and Reliability Block Diagram
The fault tree can be analyzed with the knowledge of the reliabilities of the basic
events (leaves of the tree). Hence, we start from the leaves and go up towards the
studied event which is the failure of the system. The probability at the output of a
AND node is the product of the probabilities at its inputs. The probability at the
output of a OR node (here with 2 inputs) is the sum of the probabilities at its inputs
minus the product of these probabilities (this can be generalized to a more
complicated formula for n inputs). The failure ofthe system has the probability:
F= Fl2 + F3 - Fl2.F3 = (1-R}).(1-R 2) + (1- R3 ) - (1-R}).(1-R2)·(1 - R3 ),
Hence, R = 1 - F = (RI + R2 - RI.R2).R3.

F = F12 + F3 - F3.F12

Figure E.9. Fault tree analysis

Figure E.lO shows the Reliability Block Diagram of this redundant system: two
modules MI and M2 in 'parallel', in 'series' with M3. The analysis by the method
already studied gives the reliability: R = Rl2 . R3 = (1 - (1 - RI).(l - R2)) . R3 = (RI
+ R2 - RI.R2).R3. We obtain the same result.

Figure E.l O. ReIiabiIity Block Diagram of the system

Exercise 8.1. Functional redundancy of an adder


1. The number of input vectors producing a same output value is variable. This
function increases from 1 to 10 according to a linear law when the output value
varies from 0 (only one possibility: 0 + 0) to 9 (10 vectors: 0+9, 1+8, ... ,9+0).
Then, it decreases from 10 to 1 when the output value passes from 9 to 18 (only
one case: 9+9). Finally, it takes the constant value 0 when the output values are
between 19 and 99, corresponding to 'impossible' cases. The input values being
supposed as having the same occurrence probability, we deduce the probabilistic
domain shows in Figure E.II.
Answer to the Exercises 555
P(c) = (c + 1)1100, for c E [0,9], P(c) = (19 - c)/lOO, for c E [9, 18],
and P(c) =0 for c > 18.

P
P(c) = (c + 1)/100
0,1 I--_\____ P(c) =(19 - c)/IOO

c
o 9 18 99

Figure E.ll. Probabilistie statie funetional domain

Failure detection. A typical application of the functional redundancy deals with


failure detection. Let us imagine an external observer receiving the output values.
The preceding domains reveals that the output values belonging to (19, 99) are
strictly impossible: if the observer receives a value belonging to this sub-domain, it
can without any doubt signal the occurrence of a failure due to an unknown fault. If
the output values belong to the acceptable sub-domain (between 0 and 18), this
observer must record a11 produced output values and compare their occurrence rate
with their probabilistic rate. In case of significant difference, this observer can raise
a warning signal indicating that a failure might have occurred.
2. Input redundancy. There are 2 x 4 = 8 bits, i.e. 256 configurations, but only
100 of them are used by the considered code: 10 values for A and 10 values for
B. The resulting input redundancy rate is: (256 - 100) / 256 = 0.61 (number of
unused vectors divided by the total number of possible vectors).
Output redundancy. There are also 8 output bits, i.e. 256 configurations. Only
19 ofthem are used. So, the output redundancy rate is: (256 - 19) /256 =0.93.
3. The numbers have 2 bits and they are constrained by the property A :5 B. This
leads to 1 + 2 + 3 + 4 = 10 cases. Hence, the redundancy rate is:
(2 4 - 10) /24 =0.38.
Exercise 8.2. Functional redundancy of astate graph
First of all, we observe that the graph has no redundant unreachable part: from the
initial state 1, we can reach any other state. Then, we analyze the state graph to find
if it accepts sequences that are forbidden by the input constraint: 'c is never applied
after b'. There are two such situations (see Figure E.12):
a) The first one occurs when the graph is in state 2 or 3, if we apply the forbidden
sub-sequence <b, c>. As this situation will never occur, the arc 3-2 (by input c) is
redundant and thus can be removed. Indeed, this transition is never fired by any
acceptable sequence.
b) The second one occurs when the graph is in state 4, and if we apply the same
sub-sequence <b, c>. However, the arc 4-2 (by input c) is not redundant. Indeed,
the sequence <a, a, c> from state 1 is quite acceptable: it passes through states 2,
556 Appendix E

4, and finally reaehes state 2 by the are 4-2. In that ease, the funetional
redundaney eannot be expressed as a redundant are but as a redundant path:
<4 - 4 - 2> is redundant.

Figure E.12. Redundant graph

Exercise 8.3. Structural redundancy and faults


1. This eireuit implements the logical funetions:j = a' + b, g = b'. The table of
Figure E.i3 shows the resulting input/output eonfigurations. Both faults
eonsidered here have no effect on f Fault a has also no effeet on g.
Consequently, there is no failure. On the eontrary, fault ß is aetivated as a failure
when a.b = 10: henee, the outputjtakes value '0' instead of '1'.

a ab fg
f
b 00 I I
01 10
ß g 10 01
I I 10

Figure E.13. Redundant circuit

2. We deduee from the previous study that the line a is totally superfluous. Thus, it
eorresponds to passive redundaney. Going a step further, the analysis shows that
the output g is independent from the produet term a.b produeed by the AND
gate. Thus, this gate (noted X in the figure) can be removed, the NOR gate
producing g being hence a simple INVERTER.
3. The truth table shows that the input eonfiguration (00) never oceurs at the output
of the eireuit: this eorresponds to an output junctional redundancy.
Exercise 8.4. Structural redundancy of several circuits
We suggest the following analysis to deteet possible 'structural redundaneies'. We
establish the logical expression of eaeh node of the eircuit, starting from the primary
inputs (the extern al inputs of the eireuit) and going backwards to the primary outputs
(the externaioutputs of the eireuit). At each step, the resulting logical expression is
analyzed in order to determine possible simplifieations. If such simplifieations exist,
then they reveal structural redundancies.
Answer to the Exercises 557

Circuit 1. We determinej= a + a.b' + c = a + b' + c. Hence, the input line b ofthe


AND gate is redundant. We can remove this line and also the AND gate, a entering
directly into the NOR gate. This redundancy is passive.
Circuit 2. This circuit possesses passive structural redundancy: gate (b + c) can be
removed. No stuck-at 1 faults of this gate can be detected on the output!
Circuit 3. This circuit realizes the majority function of its inputs without any
structural redundancy:j= a.b + a.c + b.c.
Circuit 4. The input lines of circuit 4 are all different, excepted variable b which
intervenes twice. The XOR being commutative, we can modify the network by
shifting the terms band b.e to the beginning of this network. The function becomes:
j = b 61 be 61 ac 61 d. Now, b 61 be = be'; hence, this circuit can be simplified.
However, there is no passive redundancy: all stuck-at faults can be detected.
Exercise 8.5. Software redundancy and constraint types
1. The feature new creates a new type from another one (here the type 'integer').
Specific operations (subprograms) must be defined, as the ones provided by the
other types cannot be used. For instance, two Size_oCShoes cannot be added or
divided.
The declaration
type Size_of_Shoes is new integer;
P: Size_of_Shoes;
instead of
P: integer;
is not a functional redundancy. The two versions lead to the same executable code
which allocates one word in memory for the variable P.
On the contrary, it constitutes a structural redundancy of the source program. It is an
active redundancy ifsubprograms using parameters oftype Size_of_Shoes exist.
Indeed, the associated operations are specific to Size_of_Shoes and not to any
integers. On the contrary, this redundancy is passive if the program makes only use
of integer operations.
2. The adding of the constraint
type Size_of_Shoes is new integer range 28 •• 45;
reduces the number of acceptable values. For subprograms having parameters of this
type, this declaration reduces the functional domains, hence having an impact on
their functional redundancy. The type declaration itself constitutes a structural
redundancy, as it corresponds to an element of the structure of the 'program model'.
It seems to be passive, as it omission has no effect on the behavior of the system.
The reality is more complex. If the structure of the program is such that no value
outside the interval [28 ..45] can be attributed to P, the answer is 'yes': there is
passive redundancy. On the contrary, if this hypothesis can be guaranteed, then the
constraint cannot be removed from the declarative part, because the execution of the
pro gram leading to the assignment of a value outside the [28 ..45] range produces the
raising of an exception (Constraint_Error in Ada), and thus a different
behavior. In this case, the redundancy is active.
558 AppendixE

Exercise 8.6. Exception mechanisms of languages: termination model


The functional redundancy depends on the types of the parameters and on the actual
values received and returned by the procedure. Let us signal that if the exception
handler implements a full-tolerance, the returned values are the same, whatever an
exception is raised or not during the body execution.
Structural redundancy exists. The exception handler is not useful if no exception (no
error) occurs. So, the redundancy is passive. However, let us signal that if an
exception is raised, it is then propagated in the software hierarchy when no handler
exits. Hence, the handler removal changes the program behavior where errors are
concerned.
This redundancy is separable: the handler is explicitly separated from the body. This
feature thus constitutes an interesting means to show the normal body and the error
handling part.
The redundancy is off-Une (or inactive), as the handler starts its execution only when
an exception is raised.

THIRDPART
Exercise 9.1. Requirement analysis
Two families of entities are defined in the text.
The first one concerns the capability of the product to be moved. This notion is
specified by two entities: the product must be contained in a hand and the product
must be moved by car.
The second family concerns the notion of autonomy specified as maximized.
Let us note that numerous specifications can be derived from these requirements.
For instance, the autonomy can be provided by an efficient battery included in the
mobile phone, and/or by a connection to the car battery.
Exercise 10.1. Verification ofthe adder
The functional fault considered transforms the adder into the circuit of Figure E.14.

a_--r:~
b ~-4-.-~
S S=aE9bE9c

Figure E.J4. Erroneous adder

1. Verification by extraction. We determine the new logical expressions of this


circuit, starting from the primary inputs and progressing towards the primary
outputs:
• S =a EB b EB c, the sum is not altered by the fault considered,
• C =a + b + c, the carry is erroneous (the correct function is ab + ac + bc).
Answer to the Exercises 559

There is a failure on output C each time one input only is at '1': so there are three
erroneous vectors.
2. Verification by double transformation with intennediate model. We choose
as intermediate model the modular description of the adder as two interconnected
half-adders. The fault modifies each one of these half-adders: the behavior with
and without fault is the same only when both inputs have identical values. The
combination of these two modules is correct if and only if: a = b = c = 0 or
a . b . c = 1. All others vectors give a wrong output.

3. Verification by double top-down transformation. The behavior of the circuit


is simulated with a functional input sequence significant of the correct behavior.
For example, we perform an addition without carry (1 + 0 + 0), and an addition
giving the maximum output value (1 + 1 + 1). Then we compare the results given
by the circuit with the computed theoretical values.

Exercise 10.2. Programming style (C language)


The four situations described hereafter stress examples of bad programming style,
which increase the risk of introducing faults. Let us note that in each case, the
program is syntactically and functionally correct. However, the bad style which is
used makes it very probable to produce faults. Moreover, even for such simple
functions, the style used for its specification will probably lead to utilization faults
(calls to the function).
• The type of the returned value (int) is absent. This is syntactically correct (use
of default type), but the user of this function may think that this function does
not return any value. In a general way, the use of 'by default' or 'implicit'
constructions is not at all advised.
• The name of the function is not explicit. In particular, the fact that it proposes
two mutually exclusive treatments is not specified.
• The value '5' used in the specifications to define the size of the array is then
reused in the loop! First, no link exists between these two values dealing with the
same constant (same concept). Moreover, the maintenance operations can
introduce faults if the size of this array is modified. It is much preferable to
explicitly define a constant by means of a #define before the function.
• The parameter B has a non-explicit name; moreover, the associated type (int)
reinforces this ambiguity. One always must explicitly define the Boolean type by
an enumerated type or by defining the two constants TRUB and FALSE.

Exercise 10.3. FSM synthesis


The reasoning has been presented in section 10.4 of Chapter 10 dealing with
functional testing. It is based on a simulation of the automata, in order to compose
them as one automaton. This composition process reveals the input/output relations
and removes the internal interactions between the automata (these relations have
been introduced by the design process).
Exercise 10.4. Functional test sequence
In order to obtain a functional test sequence of the drinks distributor, we develop a
three-step procedure:
560 Appendix E

• formal specification of the behavior,


• definition of an input sequence which provokes the complete activation of this
behavior,
• deduction of the expected output values in response to the preceding input
sequence.
The formal specification is derived from the informal specification. It describes two
aspects shown in Figure E.15: the interface and the behavior.

Coba ElItend
(Val••_C'!!")
01D ElItend ColD. R. turnod AmoRat Provfded :-
alu"ö_ColD) (VoIüo_Co 111) Am01lll(Providod + Val.o_Coba

D rink_S.lecUd

- COID_Eat.~
(Valuo_Coi;)- ).-
Call.oo.l1ol1

-
Cofr"_AvaIIablo
Coffet: AvaUablo
Do.e Number-
CO ...._Retumed
(Amollllt_Provided - 75c)

1) Interface 2) Behavior

Figure E.15. Interface and behavior

Functional modeling is made by an automaton which is weIl adapted to the


sequential features of this system. We have added to this automaton some
annotations dealing with the data treatments (e.g. the addition operations, etc.). Let
us note that this model describes only functional aspects of the specification. Other
points, such as those dealing with the ergonomics of the system, are not expressed.
The formal modeling of the specifications for test generation purpose is interesting
for two main reasons:
• Rules allowing to deduce what are 'all possible behaviors' can be defined for
each modeling means. For example, in the case of an automaton, one might want
to pass through each state or else through each transition between states. This
aspect is developed in Chapter 13.
• The application of the rules can be systematic; that is to say, we are able to
deduce a sequence activating all behaviors (in the sense of the rules). Tools can
then automate the production of the test sequences.
In our case, we will make this work 'by hand', with the following assumptions:
• every path must be exercised at least once by the sequence,
• if an arc is conditioned by a Boolean expression, we must pass through that arc:
~ with one internal value belonging to each domain defined by this expression,
~ and with the limit values between these domains.
Answer to the Exercises 561

At first, we define the set of all paths, we give a name to eaeh path, and we
enumerate its states.
• Enter a eoin and caneel: {I, 2, 3, I}.
~ Enter two eoins and eaneel: {I, 2, 2, 3, 11. The presenee of a loop from state 2
introduces an infinite number of paths; we limit the number of iterations to 1.
• Order a coffee after having entered a suffieient number of eoins, if the number of
doses is greater or equal to 1: {I, 2, 4, 5, I}. The eondition labeling the are (4, 5)
induces a domain of values for the eouple (Amount_Provided, Dose_Number). It
is necessary to take a value ~ 75e (for example 1$) and a number of doses> 0
(for example 2). Moreover, we must apply 'limit tests', i.e. 75e and 2 doses, then
1$ and 1 dose. Consequently, 3 sequenees must be defined for this path. This
situation shows also an interesting aspeet dealing with the memory implied by
the used variables. Indeed, when we apply the first part of the sequenee {I, 2, 3,
1 }, the expected behavior is the same, whatever the past of the system.
Moreover, this behavior will have no effeet on the future. On the eontrary, in
order to test the behavior of the system when only one dose remains, it is
necessary to first apply sequenees leading the system in the required initial state
(one dose only); these sequenees are ealled initialization or homing sequences.
Finally, onee the test consuming the last eoffee dose has been performed, it will
be neeessary to eontinue the test proeedure with test sequenees, assuming a null
number of doses. To eonclude, the various test sequenees we have defined are
not independent; henee, these fragments must be seheduled in a coherent order,
maybe with extra link sub-sequenees.
• Case where the user orders one eoffee after entering a suffieient number of eoins,
but when there are no more eoffee doses: {I, 2, 4, 3, 1} . As said before, the
preeeding parts of the global test sequenee must have led to a situation where no
eoffee doses remain. Otherwise, the eondition labeling the are (4, 3) being
eonstituted by a Boolean expression using one OR, it is also neeessary to test the
opposite situation, e.g. 'AmounCProvided < 75e', and the two simultaneous
situations, i.e 'AmounCProvided < 75e and Dose_Number =0' .
From this analysis, we ean deduee the various pieces of sequenees assoeiated with
each tested fragment of behavior. Table E.5 gives an example for the first ease
considered. We will not develop the whole set of test sequenees. Its obtaining is easy
as far as we take eare of the necessary relationships between those fragments,
aeeording to the state of the system. This job may seem to be tedious. However, it is
systematie, providing a good guarantee that the resulting funetional test sequenee
aetivates properly the whole set of possible behaviors.

Input Output
Coin_Entered (50e)
Caneellation
Coins_Returned (X$)
Table E.5. Sequence
562 AppendixE

Exercise 10.5. Property research


In the previous Exercise, the need of the client is to ear money. One can ask the
question: "Is it possible to get a coffee without paying or with less than 75c?".
This analysis must first be conducted on the specification model. Here the answer is
'no', as Coffee_Available is conditione by' AmounCProvided ;::: 75c' (because of
the OR function). On the contrary, one must then ask himself if AmounCProvided
effectively contains the amount of money which has been entered in the machine. It
would not be the case if AmounCProvided were initialized to 1$, and then not
a signed in the model. The analysis of the backward path shows that the amount of
entered coins is effectively the value of Amount_Provided i state 4.
Exercise 10.6. Properties oe functional graphs
Independently from any functional aspect of the product (we ignore its function), we
try to define properties which are significant of the studied functional graph.
1. If state 4 is suppressed, the graph is split into two independent sub-graphs;
hence, it is no more possible to pass from one sub-graph to the other. Let us note
that each sub-graph is aliv . This situation can corre pond to two independent
modules. The global functioning res Its from the 'Cartesian product' of these
two sub-graphs: every state couple from the two graphs is theoreticaUy possible.
2. 0 the contrary, if we add an arc joining state 5 to state 1 (in bold in Figure
E.I6), the connexity of the graph is increased. It is then possible to draw a table
containing all the states reachable from any state:
from state 1 or 2 or 3, one can reach states 1, 2, 3, 4; from state 4, only state 4
can be reached; from state 5 or 6, one can reach states 1,2,3,4,5,6.
State 4 remains astate which definitely blocks the functioning of the system:
hence, this situation corresponds to a locking.

Figure E.16. Graph


Exercise 10.7. Verification of a floating-point unit
Black box verification by functional simulation. We are looking for a simulation
sequence which passes through each module of the product and activates a
maximum of functions and connections. A simple sequence would incIude aseries
of additions on several numbers:
Al =MI 10 EI and A2 =M2 10 E2.
Answer to the Exercises 563

We must check the module performing the subtraction (El - E2) with different
exponents: (El > E2), then (El < E2), positive values, then negative values. These
operations also verify the circuit which performs the 'adjust' operation (right shift of
the mantissas).
Then, we must check the circuit ca1culating the final 'sign' of the result. For this
purpose, we make several '+' and '-' operations with numbers having the same sign,
and finally opposite signs. The sign S of the final result must take into account the
carry corning from the +/- circuit. Hence, we consider a situation such that
M' 1 > M'2 for an adding control (signal +/-): e.g. subtraction of two negative
numbers, the absolute value of the subtracted one being greater than the first one. If
the result ofthe circuit '+/-' is greater than 1, there is a carry, and we must perform a
normalization operation, i.e. add '1' to the exponent and make a one-figure shift to
the right of the mantissa.
Finally, the overflow situations must be considered. For example, we add two
negative numbers with maximum value positive exponents (+999 if Eis expressed
with 3 digits), and such that IMli + 1M2 I ~ 1.
Exercise 10.8. Inductive formal proof
1. We must demonstrate that Al ==> A2 when R ~ B after the execution of R := A
and Q := O. The second condition of A2 is evident: it is the loop assertion(R ~ B).
As R := A and Q := 0, then Q*B + R = O*B + A = A. So, the first condition of A2
is true.
2. We must demonstrate that Al ==> A3 when R ~ B is false after the execution of
R := A and Q := O. The condition 'R ~ Bis false' implies that R < B. We have
Q*B + R = Q*B + A =A. So, the first condition of A3 is also true.
3. We must demonstrate that, when [A2 is true and R := R - Band Q := Q + 1 are
executed and then R ~ B ], then A2 is true with the new values of Rand Q. Let us
note Rb and Qb the values of Rand Q before the execution. The hypotheses are
A = Qb*B + Rb (relation 1), and Rb ~ B (relation 2). After execution of the loop
statements, we obtain R = Rb - B (relation 3) and Q = Qb + 1 (relation 4). The
relation R ~ B is true due to the loop condition. We must demonstrate that
A = Q*B + R. Relations 3 and 4 give: Q*B + R = (Qb + 1)*B + (Rb - B) = Qb*B
+ B + Rb - B = Qb + B + Rb = A (relation 1). So the second condition is
demonstrated.
4. The demonstration of A2 ==> A3 after the execution of the loop statements and
when condition R ~ B is false is quite similar concerning the second condition.
The second condition R ~ B is due to the negation of the loop condition.
Exercise 11.1. Component choice
Failure rate of the first structure. The failure rate is the sum of the failure rates of
the components (as these values are very smaII: this would not be true otherwise!):
Al = 12.10-7 + 1.10-6 + 3.10- 5 = 3.22.10-5•
Failure rate ofthe second structure: 1.,2 = 4.10-6 •
Thus, the second structure has a better reliability than the first one.
564 AppendixE

Note. This exercise does not consider the influence of temperature or radiations on
the reliability of these components, or their mutual influence.
Exercise 11.2. Comparison of the reliability of two products
The two failure rates 1..1 and 1..2 evolve according to power of 10. Hence, IOglO(A.1)
and IOglO(A.2) are linear. We deduce from this the two logarithmic equations for the
two products:
1) For 1..1: IOglO(A.1(1) = [lOglO(10. A.01) - IOglO(A.01)]T 1 (38-18) + b, where A.01=
1..1(18°).
When T = 18°C, we have b =-59/10.
So, IOglO(A.1(1) =TI20 - 59110.
2) The same reasoning for 1..2 gives IOglO(A.2(1) =TI10 - 88110.
Finally, we deduce T for 1..1 = 1..2: from 58°C the reliability of Pl becomes better
than the reliability of P2.
Exercise 11.3. Shared FIFO
1. The result is hazardous, as the data structure (array and indexes) is shared (same
situation as in sub-section 11.2.2.2). Moreover, the data structure value may be
incoherent. For instance, consider the scheduling described in Figure E.17,
where WI expresses the variable Wr i te_Index.

Write (Xl): Write (0):


• Buffer(WI) :=Xl; • Buffer(WI) :=X2;
• WI:= (WI mxl Buffer_Size) + 1; • WI:= (WI mxl Buffer_Size) + 1;

TIME
11,

Figure E.17. Incoherent value ofthe array

After execution of this sequence of statements, only X2 is memorized, as it


overloaded Xl, and an empty item is created, as Write_Index is incremented
two times. Such a situation defines an error, that is to say an unacceptable state
of the data structure value.
2. No problems occur if we consider that the list is never empty, as the two couples
of data structures (Write_Index, Buffer (Write_Index) and
(Read_Index, Buffer (Read_Index) have no common elements.
However, no mechanisms guarantee that a reading cannot occur when the FlFO
list is empty.
To conclude, this implementation induces a high risk of error occurrences. The
problems come from a characteristic of the task management implementation: the
preemption of a task by another task does not guarantee exclusive access to the
Answer to the Exercises 565

shared resources. To be safe, a mechanism managing the access authorizations (such


as a semaphore) must be added.
Exercise 11.4. Hazards in shared variable implementation
No problems seem to exist in the sharing of variable using one statement. Indeed,
the design is correct. However, assurne that the incrementation and decrementation
operations are processed on a register AX. Then, 1++ is translated as Taskl:
Move AX, @I
Inc AX
Move @I, AX
In the same way, the statement r' is translated as a Task2:
Move AX, @I
Dec AX
Move @I, AX
Consequently, several instructions are necessary to implement one statement. Figure
E.I8 shows the two considered implementations of these tasks: sequential or
interleaved.

Taskl Task2 Taskl Task2


Mlve AX,@I MlveAX,@I
Inc AX M:lVe AX,@I
M:lVe @I,AX Inc AX
Mlve AX,@I Dec AX
MIve@I,AX
Dec AX
Mlve @I,AX Mlve @I,AX
TIME ,
TIME '

Figure E.18. Hazardous result

Let us show that the result is hazardous. On the left side of Figure E.I8, the result is
unchanged at the end of the execution of Taskl then Task2. This is the expected
result, as the value of I is incremented and decremented. On the contrary, on the
right side of Figure E.I8, each task saves and restores its own context (in a local
Task Control Block) at each task switch. After the execution of the second line of
Taskl (Ine AX), this value is not transmitted to Task2 that decreases its own copy
ofAX when executing its second line. Thus, the final value of I is incorrect.
Exercise 12.1. Signature testing
1. The sequential treatment of the binary flow comprises 64 (i.e. 1024 / 16) XOR
operations on consecutive 16-bit words. If we suppose that the signature of the
faultless circuit is known, any multiple error altering one or several words is
detectable if and only if any modified bit of a word is also modified an odd
number of times in the same position of several words. According to the output
stream, this corresponds to erroneous bits repeated an odd number of times
566 AppendixE

modulo 64. For example, a multiple error altering bits 1, 15, 65, 121 is
detectable: errors 1 and 65 neutralize themselves, but each error 15 and 121 is
detectable. All functional or technological faults producing such errors are thus
detected. All other faults are undetectable.
2. Without any knowledge about the electronics implementation of this system
(gate or MOS structure), one cannot deduce any class of technological faults that
produces the preceding errors.

Exercise 12.2. Toggle test sequence


A Toggle Sequence is such that every line in the circuit takes the values '0' and '1':
• each XOR must receive a vector from the set {OO, 11}, and a vector from the set
{01, 1O},
• each NAND must receive the vector (11), and, either the two vectors (01) and
(10), or the vector (00).
We propose the following Toggle sequence: <010, 101, 111> (there are other
solutions). The reader will verify that this sequence applies '0' and '1' to each line.
Exercise 12.3. Test of components
1. Statistically speaking, y% of the products are good and are tested with a duration
of n 'time units'. The faulty products correspond to (1 - y)% of the production.
As the test coverage is c = 80%, 20% of these faulty products, that is to say
(1 - c).(1 - y) of the total production, will be considered as good by the test
sequence after a duration of n time units. Finally, c.(1 - y) products will be
declared as wrong after a mean duration time of n/2 time units.
Therefore, the mean time dedicated to the test of this production is:
d = [n . y + nI2 . c.(1 - y) + n . (1 - c).(1 - y)] = [n .(1 - c / 2 + C.y / 2)] =0.996 n.
2. The rate offaulty products not detected as faulty is: (1 - c).(1 - y) =2%.
3. If only 70% of the products are submitted to test, the mean time is reduced.
The ratio of non-detected faulty products has two terms:
t.(1 - c).(1 - y) + (1 - t).(1 - y).
The first term corresponds to faulty products which are tested but not detected as
faulty; the second one corresponds to faulty products which are not tested.
Numerical value: 4.4%.
Exercise 12.4. Fault coverage

Figure E.19. A 3-input NOR gate


Answer to the Exercises 567

The NOR gate is symmetrieal, according to its inputs a, band c (Figure E.19).
Consequently, the study can be reduced to the case of one input only, e.g. a. The
other inputs (b and c) must be set to the value '0' in order to let the error pass to the
output. If an input is set to the value '1', it forces the output to the value '0'.
1. Optimal test sequence. There is only one optimal test sequence comprising 4
test vectors: <000, 100, 010, 001>.
2. Coverage. Each input vector covers some stuck-at faults of the 110 Iines. Table
E.6 shows the fault coverage of each input vector.
We notice that some input vectors have a very small coverage; they should not
be taken to test this circuit; thus, (Oll), (101), (110), and (111) test the stuck-at 1
of line d only. On the contrary, the vector (000) covers half of stuck-at 0/1 faults.
Input vectors Test coverage
abc a b c d
000 1 1 1 0
001 - - 0 1
010 - 0 - 1
01 1 - - - 1
100 0 - - 1
101 - - - 1
1 10 - - - 1
111 - - - 1
Table E.6. Fault coverage

Figure E.20 shows the coverage curves of: 1) the exhaustive sequence, 2) the
optimal sequence, and 3) the very simple toggle test sequence <000, 111>.

Faults coverage
\00% 8 I l i J I i i 1 -!-·-t'- - -'F--+-1 - -
7 - -+--Jt-I~I-tl-+I-+-1- - r - i r -ll'"'""ti - -+-r - -
6 - _. I I I +-1-+-j-+-1- -+--i-+--1---t-'r -
! - -t-l ! t I I I I I J
3 _ I I I I I I I -I,__+-~I__
2 I ! ! I I ! i I I
\ 1 I I I i " !
o i ! I I i i ; I I I
000 00\ 010 Oll 100 10\ 110 '1 Il 000 100 0\0 10\ 000 111 Input
Exhaustlve Optimal Toggle Vectors
sequenc:e sequenc:e

Figure E.20. Fault coverage evolution

Exercise 12.5. Simple fault diagnosis


Let us analyze the coverage table obtained in the preceding exercise.
568 Appendix E

We observe that all stuck-at 0 faults are detected by distinct test vectors. Hence, they
can all be distinguished by the sequence <000, 001, 010, 100>. All these faults can
also be distinguished from the stuck-at 1 fault ofthe output. We also observe that all
stuck-at 1 faults of the inputs and the stuck-at 0 of the output are detected by the
vector (000) only. Consequently, they cannot be distinguished from the outside.
Hence, they are said to be equivalent.
Exercise 12.6. Optimal test sequence
Figure E.21 rerninds the gate structure of the circuit. Input vector 1 (respectively 2)
apply 11 to gate A (respectively B), and 10 (respectively 01) to gate C. (see Table
E.7). Hence, any stuck-at 0 fault is detected: activated as an error, and the error
propagated to fLet us note that the input vector 111 would apply 11 simultaneously
to gates A and B, but no stuck-at 0 of these gate would be observable on fLet us
also note that when receives 11, B (or A) receives one of the vectors 10 or 01;
unfortunately, these vectors cannot be 'counted' as belonging to the minimal AND
test sequence, because gate C will not propagate any error coming from B (or A).

Figure E.21. AND-OR circuit

Hence, input vectors 3 and 4 are necessary to apply the missing configurations: 01
and 10 to the AND gates, and 00 to the OR gate. These vectors will detect all the
stuck-at 1 faults: activation as an erroneous '1' error, and propagation of this
erroneous '1' to the output. The optimal test sequence has 4 vectors: <110, 011, 010,
101>.
Inputs Gates
abc A B C
vectors 01 10 11 0110 11 0001 10
1 110 X X
2 01 1 X X
3 010 X X X
4 101 X X X
Table E.7. Optimal test sequence

Exercise 12.7 Sequential circuit testing


1. Table E.8 shows the evolution of the sequential system submitted to the input
sequences STI and sn, applied to the same initial state 1.
2. A simple simulation of the logical circuit allows establishing the different values
of each node for test sequences STI and sn. Then, we deduce lines that take
both values 0 and 1.
Answer to the Exercises 569

ST1 ST2
e 0110 011011001
q 2431 243124231
s 1010 101010110
Table E.8. Correct and erroneous functions

3. Complete structural test.


• First, we determine those faults which are not detected by the functional test
sequence sn. This step can be performed thanks to a 'fault simulator' such as
Verifault of Cadence. Thus we can identify the 5 stuck-at faults which are not
detected, reducing to 3 classes of equivalent faults.
• Now, if we want to test one of these remaining faults, we can proceed as for
combinational circuits. We first try to activate this fault by setting the faulty line
at the opposite value. Then, a backward procedure is applied. Because of the
feedback loops, this procedure generally does not easily converge towards a
solution. The fault is detected when it is transformed into an erroneous output
value. The Reset input can be useful to perform this procedure, assuming it has
previously been tested.
4. The problem of the initialization of a circuit prior to the application of a testing
sequence is not always easy. Generally, it is assumed that there is areset input
which switches all flip-flops to the zero state. In the very general case of
asynchronous sequential systems without such inputs, it is necessary to find
special initialization input sequences called homing sequences. Initialization is a
real problem for testing complex systems.

Exercise 13.1. Test of a small circuit


1. Fault Table. Faults detected by the different input vectors can be obtained from
the logical circuit of Figure E.22, by applying the methods studied in Chapter
13, either column after column, or row after row.

Figure E.22. Two-gate circuit

Table E.9 shows the results obtained. There are three 'best test vector': 010, 100
and 110. Each one covers 4 faults. The input vector having the lowest coverage
is 111 with only I fault detected.
2. Minimal test sequence. Some faults are detected by one test vector only; it is
the case of faults 11 (vector 010), 1° (vector 110), i (vector 100), and 2° (vector
110). Hence, the 3 vectors 010, 100 and 110 belong to any minimal test
sequence. There are three minimal-Iength test sequences, for example TS =<001,
010, 100, 110>.
570 AppendixE

abc 1 2 3 4 S
000 - - 1 1 1
001 - - 0 - 0
010 1 - 1 1 1
Oll - - 0 - 0
100 - 1 1 1 1
101 - - 0 - 0
1 10 0 0 - 0 0
1I 1 - - - - 0
Table E.9. Fault table

Exerdse 13.2. Test vectors detecting a fault


The fault aetivation eondition implies to put a '1' value to line 11. By baekward
propagation of this eondition, we find three possible eases on lines 5 and 6, i.e.
he = 10, 01 and 11. The forward propagation of the error produeed on <X ean follow
two different paths: PI (through lines 10 - l3 - j), and P2 (through lines 15 - g).
These two paths will never simultaneously eonduet errors to the outputsfand g.
Path PI. The loeal propagation eonditions are 9 = 0 and 12 = 1.
We find the 5 following test veetors (a b e d): (0 10 -), (0110), (- 010).
The resulting failure on outputfis a '1' value instead of a '0' value without fault.
Path P2. The loeal propagation eondition is 16 =O.
So we must have 7 =8 = 1. We find the 4 following test veetors: (a b e d) =(- - 11).
The resulting failure on output g is a value '0' instead of a value '1' without fault.
Summary: there are 9 test veetors for this fault, 5 of whieh produee a failure on f
(0100, 0101, 0110, 0010, 1010), and the 4 other ones produee a failure on g (0011,
0111, 1011, 1111).
Exercise 13.3. Analysis of test procedures
1. This proeedure explores the input spaee with a partition teehnique. The first
objeetive is to aetivate the fault. A tree is built, eaeh braneh eorresponding to a
disjoined input eube (sub-set expressed with '0', '1' and 'x' values). The
simulation propagates the known values towards the fault loeation (Figure E.23).

Figure E.23. NAND-gate circuit


Answer to the Exercises 571

Then, the fault is propagated to the output with the same exploration procedure until
a solution is found (if any test vector exists). Obviously, this very simple technique
can be long to converge towards a solution if the first possible test vector has many
'1' values for a, b, c, etc.
Let us now complete the given procedure:
Now, the objective is to propagate the eITor through gate E. A propagation
towards gate E of the known values is performed (if it has not already be done!):
E = 0, so the eITor cannot be transformed into a failure on outputf. We make a
backtracking in the input assignment.
Input c is set to 'x', and input b is switched to '1', and a propagation is
performed: the fault remains passive.
Input c is set to '0', and a propagation is performed: A = 1, B =0, C = 1, hence
the fault is activated.
Now, the objective is to propagate the eITor through gate E. Input d is set to '0',
which forces output f to '1': this is a case of inconsistency, so we go backwards.
Input d is switched to '1', and the eITor is finally propagated to f.
Thus, we obtain the test vector: (a b cd) = (1 1 0 1).
2. This procedure makes a backward propagation along one path from the fault
activation. At a given gate, if several vectors satisfy the desired output value, we
choose the easiest path only (the closest path to the primary inputs). If several
inputs must be set (to '0' or '1 '), we choose the hardiest path first (having the
higher number of gates to the primary inputs). As usual, when the fault is
activated as an eITor, we try to propagate the eITor along a path. All the process
uses a backtracking technique in case of inconsistency. This method is close to
the PODEM algorithm.
Input b is switched to 1, and we perform a propagation action: it brings nothing.
Input c is set to '0', and we perform a propagation: A = 1, C = 0, so the fault is
activated.
Now, the objective is to propagate the eITor through gate E. Input d is set to '1'.
We obtain the same test vector (a b c d) = (1 1 0 1). It is the only vector which
detects the fault.
Let us note that this procedure is not pertinent for this circuit. Indeed, at step 5 we
have chosen to set b to '0' in order to force A = 1; this led to an inconsistency. Then,
we have abandoned this path to try another one. Instead, at this point, we can try the
second way to have A = 1, which is to set c to '0'. Then, the procedure sets a and b
to '1'. Thus, the test vector is rapidly found.
Exercise 13.4. Fault coverage of a test vector
1. The structural analysis of the circuit gives the fault detection table shown in
Table E.I0): detection at outputj, and at output g.
Note about 'reconvergent fanout' structures. We observe that the stuck-at' l' of line
2 is not detected on output f this fault produces an eITor on line 9 and an opposite
eITor on line 10, these eITors being neutralized by gate 13. We also note that the
stuck-at 1 of line 3 is detected on outputf it produces two identical eITors on lines
572 Appendix E

13 and 15, which propagate through the output gate givingj This same fault is also
detected on output g: it produces two identical errors on lines 15 and 16, which
propagate to the output g .
abcd 1 2 3 4 5 6 7 8 9 10111213 14 15 16 17 18
1001 detection on f - - 1 - I I I - 0 0 0 o 0 0
detection on g - 1 - - -- - - - - 1

Table E.I0. Fault detection

2. This test vector covers 11 of the 36 possible faults. The theoretical maximum
coverage of a test vector is 18. Now, we can try to find the best test vector by
analyzing the structure of the circuit. We know that the best test of ANO, OR,
NANO and NOR gates is obtained when their inputs take the neutral element
value. The worst case is when all their inputs take the opposite values. The
circuit we consider is made of a mix of ANO, NANO, OR, and NOR gates; it is
easy to see that no input vector will apply the optimal configuration to each gate.
So no test vector will covers 50% of the faults! A good exploitation of all these
local constraints is given by the input vector (0101) which covers 13 faults!
Exercise 13.5. Diagnosis of a circuit
1. Faults 2° and 5° are activated by the same constraint: b = 1. Fault 11 1 is activated
by b = 0, or by b = 0 and c = 1. Hence, we can separate these two groups of
faults by applying test vectors with b = I, and test vectors with bc = 01. A first
test vector could be (a b cd) =(- 0 I 1) which detects fault 11 1 at output g. Now,
we must try to distinguish faults 2° and 5°. Fault 2° can only be detected onfby
applying the input vector (1110). Faults 2° and 5° can be detected on f (through
the path 11 - 10 - 13 - 17) by the input vectors (010-). So, here is an example of
diagnosis sequence: DS =<0011 , 0100, 1110>.
The related fault tree is drawn in Figure E.24. It gives all information necessary
to diagnose one of these faults . For instance, if the signature corresponding to
the application of the test sequence is <OK, f KO, f KO>, then the identified
fault is 2°.

Figure E.24. Fault tree


Answer to the Exercises 573

2. We first determine all faults detected by vectors (1000), (1001), and (0110) on
outputs f, g and both. This step is achieved by using the backward analysis
method presented in Chapter 13. We obtain the partial fault table of Table E.ll.
Note that some faults are detected onjonly, on g only, or on both outputs.
From this table, we deduce the fault tree (Figure E.25) corresponding to the test
sequence: TS = <1000, 1001,0110>.

T abcd 1 2 3 4 S 6 7 8 9 10 11 12 13 14 IS 16 17 18
Tl 1000f - - 1 - I I - - - 0 0 0 0 0 - - 0 -
g I
T2 100 I f - - I - I I I - - 0 - 0 0 0 - 0 0 -
g I I
T3 o I IOf 1 - - - - - - - I I I - I - - - I -
g 1 0 0 I
Table E. II . Partial fault table

3. The diagnosis power of this sequence is not good. Many faults belonging to the
resulting fault c1asses can easily be distinguished by adding other test vectors.

X
{51, 61,10°,11°,12°,13°,14°, 17°}

OK
OK
{.p, 1°,2,30,4, SJ, GI, ,0, So, X
9°,12\ 141, 15, 16°, ISo} {51, 6\ 10°, UO, 13°, 14°, l'tl }

Figure E.25. Fault tree

Exercise 13.6. Complete diagnosis of a small circuit


The minimal test sequence obtained in question 2) of Exercise 13.1 is also a minimal
diagnosis sequence: <001 , 010, 100, 110>. This sequence separates the following
c1asses: {I!>}, {li}, {2 1 }, {3°} , {50}, {10, 20, 4°}, {3 1,4 I ,SI}.
All faults belonging to these groups are equivalent. They cannot be distinguished
from the outside
574 Appendix E

Exercise 13.7. Logical test of a full-adder


1. To activate this fault, we must set input b to ' 1'. This initial error (noted e in
Figure E.26) is propagated to output S, for any values of inputs a and c: hence,
we obtain 4 test vectors (a b c) = (- 1 -) according to the output S.
This initial error can also be observed on output C if it can be propagated
through the two NAND gates (see the figure). For this purpose, we must have
a =0 and c = I. It leads to the test vector: (a b c) =(011).

a _---r::::::-l

Figure E.26. Test of the half-adder

2. In Exercise 5.2, we have found all the failure configurations by a different


approach: functional extraction, then comparison with the truth tables.
3. Any input vector detects every functional or physical fault that modifies the
output value. Hence, it is not surprising if a vector obtained in question 1) is able
to detect the functional fault consisting in transforming the XOR gates into
IDENTITY gates.
Let us analyze the fault on the circuit's structure. Any vector (a b) will provoke an
error at the output of the first IDENTITY gate; this error (noted e in the previous
figure) will again be transformed through the second IDENTITY gate and produce a
correct output value. Thus, the fault is not observed on output S (we have already
proved this property by extraction in Exercise 10.1). The conditions necessary to
propagate error e towards the carry output C are exactly the same as for fault a of
question 1). Hence, the functional fault is detected by the input vector
(a b c) =(0 1 I) which also detects fault a.

Exercise 13.8. Functional and toggle test of a full-adder


The structure of the circuit is given in Figure E.27.

c
10 11
a - __--r-::-l 1------ S
b --++"''-i
4

3 c

Figure E.27. Logical structure of the adder


Answer to the Exercises 575

1. Function test sequence. A very simple functional sequence will make one
= =
addition with (SC) (00), and one addition with (SC) (11). This sequence is
TSI= <000, 111>. The two first lines of Table E.12 show faults detected by this
sequence. These faults have been determined by the structural method proposed
in Chapter 13 applied to the logical structure (Figure E.27).
Test abc 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Sequence
Functional 000 1 1 - 1 1 - I 1 - 1 1 - 0 0 1 1
I1I 0 0 0 0 0 0 I 1 - 0 0 - 1 - 0 0
Toggle 101 0 0 - I I - 0 0 0 0 0 0 - 1 I 0
001 1 1 - II - I 1 1 0 0 - 0 0 0 1
010 I 1 1 0 0 - 0 0 - 1 1 1 0 0 0 1
Complete 100 0 0 - 1 1 - 0 0 0 0 0 0 - 1 1 0
Table E.12. Test coverage

2. Toggle test. In order that each line takes values '0' and '1', we add the test
vector 101; hence, the sequence becomes: TS2 =<000, 111, 101>. The third line
of the table shows faults detected by this new vector. We observe that TS2 does
not detect 4 faults, confirming the fact that a toggle test is generally not sufficient
to test every stuck-at fault.
3. Complete test sequence. Faults not detected by sequence TS2 are 31, 61, 91 and
12 1 • To detect these faults, we must add three test vectors to TS2 : 001 , 010 and
100. The resulting complete test sequence has then 6 vectors: TS3 = <000, 111,
101, 001, 010, 100>. This complete test sequence is not optimal in terms of
number of test vectors. An optimal sequence, is made of 5 test vectors, such as:
TSop =<001, 010, 100, 110, 101>.

Exercise 13.9. Test of a structured circuit


1. Test of the structured adder. We will see that it is possible to apply the given
complete test sequence to each module.
Test 0/ module FAI. The controllability of this module is complete, so the five test
vectors can be applied. The observability of output SI is also complete. The only
problem that remains is the observability of signal cl. Any error on this line will
change the parity of the inputs of module M2; hence, this error is propagated to the
primary output S2. ConsequentJy, the first full-adder is completely tested.
Test 0/ module FA2 . Here, the only problem to analyze is the controllability of line
cl . Whatever we put on inputs a2 and b2, it is easy to bring either a value '0' on line
cl (no carry for bits cO, aO and bO) or a value '1' on line cl (by producing a carry for
bits cO, aO and bO).
2. Test sequence. A complete test sequence of 5 input vectors (cO, aI, bI, a2, b2)
can be obtained: TS =<00101, 01010, 10011 , 11000, 01110>
In conclusion, this structured circuit is easy to test. Naturally, one should not deduce
that all structured circuits are easy to test.
576 AppendixE

Exercise 13.10. Diagnosis study of the full-adder


We draw first the partial fault table indicating, for each input vector, the outputs
where the faults are detected (Table E.J3).

abc 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
OOOS 1 1 1 I I 1 I I I
C - - - - 0 0 I
I lOS 1 1 0 0 0 0 1 I 0
C 1 1 - - I I 0 0 I
I I IS 0 0 0 0 I 1 0 0 0
c 0 0 - - I - 0
Table E.13. Fault table of the full-adder

Then, we deduce the fault tree, drawn in Figure E.28, allowing the diagnosis of the
test sequence <000, 010, 111>. To simplify the representation, all impossible
situations are not represented . This tree partitions all the 33 possibilities (32 faults +
one good state) into 11 groups. All the elements belonging to a group cannot be
distinguished by the sequence (they are said to be equivalent with regard to this
sequence). In particular, it is not possible to answer the question: "is the circuit
faultless?".

Figure E.28. Diagnosis tree of the full-adder


Exercise 13.11. Complete test sequence of a circuit
To be tested, each AND gate requires 4 input vectors: 111, 011, 101 and 110. As
these two sets are not compatible, this leads to 8 different input vectors. Hence, all
the 8 input vectors constitute the complete test sequence! As a consequence, the
exhaustive sequence is also the optimal one.
Exercise 13.12. Redundancy analysis
1. Structural redundancy. A logical analysis gives:f= a' + b.c, g = b.c + a.b.c.
Functionfhas no structural redundancy; the gate 'abc' of gis structurally redundant.
Table E.14 gives the faults detected by all input vectors. It shows that 5 faults cannot
be detected: 9°, 7°, 5°, 51, 14°. They correspond to a passive structural redundancy.
Answer to the Exercises 577

abc 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
000 1 - - 1 - - - - - 0 1 - 1 1 0 1
001 1 1 - 1 - 1 - - - 0 1 - 1 1 0 1
010 1 - 1 1 - - - 1 - 0 1 - 1 1 0 1
011 - 0 0 - - - 0 0 - - 0 - 0 - 0 0
100 0 - - 0 - - - - - 1 1 1 1 1 1 1
101 0 1 - 0 - 1 1 - - 1 1 1 1 1 1 1
110 0 - 1 0 - - - 1 1 1 1 1 1 1 1 1
111 - 0 0 - - 0 - 0 - - 0 0 - - 0 0

Table E.14. Fault table

2. Detection and diagnosis masking.


Detection masking. The fault noted y in Figure E.29 cannot be detected. The fault
noted a in the figure can be tested by the input vector (001). The output g takes '0'
when this fault is present. But if y is also present, the output g remains erroneously at
value '1': hence fault a is no more detected.
Distinction masking. We assume that the untestable fault is present in the circuit.
Its occurrence can lead to a wrong diagnosis if we apply the test sequence <110,
l1l> aimed at diagnosing between faults noted a and ~ . If the circuit is altered by
fault a, this sequence will erroneously signal the presence of ß because of a masking
provoked by fault y.

4
a
2
b
c
:

5 7 9

Figure E.29. Redundancy

Exercise 13.13. Stmctural testing of a program


1. The function modi fy _tempera ture increases or decreases the temperature,
=
depending on the action parameter (1 heating, 2 =cooling, else no action). The
temperature is increased or decreased by a number of degrees function of the
duration parameter value. The final temperature is then returned.
The main function regulator brings back the initial_temperature in
the range 0 to 90 by a variation of 10 degrees ifvariation =0, or a variation
of 20 degrees else. It returns the final_temperature or -3000°C if the
heater or the fan is damaged (heating_state or fan_state =0).
578 Appendix E

2. We test the functional correctness of the regulator by analyzing the domains of


the input parameters:
• three cases for ini tial_temperature < 0, in [0, 10], and > 90,
• two cases for heating_state = ° or 1,
• two cases for fan_state = °
or 1,
• twocasesforvariation=Oor 1.

°
As the domain of initial_temperature is not discrete, we test three
values: -50, 30, 150, and the limits and 90. These values are combined with the
discrete values of the other parameters.
3. When this sequence is applied, the coverage depends on the elements considered.
For a statement testing, the coverage is not 100%. Indeed, the default part of the
switch statement ofthe function modify_temperature is never executed.
Exercise 13.14. MCIDC testing of a program
Table E.15 gives a sequence of Boolean values for ConditionIDecision, compatible
with the requirements of MCIDC Testing.

Condition Decision
A=B C2 D>3 Action
True True True True
False True True False
True True True True
True False True False
True True True True
True True False False
True True True True

Table E.J5. MC/DC testing

Exercise 14.1. Ad Hoc techniques

Te~t:
OutputSTI

Olltpllt

Telt:
output ST2

Figure E.30. Ad Hoc technique


Answer to the Exercises 579

We suppose that new inputs and outputs can be added to the circuit, in order to
increase its controllability and observability. Figure E.30 shows the modifications
proposed to cut the feedback line between the two modules and to directly observe
the outputs of each module.
Hence, two inputs (/TI and IT2) and two outputs (STl and ST2) have been added.
When ITl and IT2 take the value '0', we block any uncontrolled evolution of the
circuit. Hence, each module can be directly accessed.
Exercise 14.2. Analysis of a redundant circuit
1. We have: /1 = a' c, j2 = a' c, ß = a + c'. The output variables do not depend on
input b: thus, this circuit is redundant. In particular, the stuck-at 1 fault shown in
Figure E.3I cannot be detected from the inputs/outputs; indeed, to test this fault,
we must satisfy:
• the controllability constraint: b = 0,
• the observability constraints: a = 0 and c = 1; this produces a '1' on line x, a '1'
at the output of gate A, and finally a '1 ' at output 11 which masks any detection
of the fault.
2. The second circuit has a different logical behavior: /1 = a' b' c + a b + b c',
j2 = a' c, ß = a + c' . The output 11 is a logical function of input b. Obviously,
this circuit could have been realized with a SIGMA-PI structure; however, it is a
totally testable circuit.

a
C --'--.y,)--I
b

f2
a
I--..L...-_- f3
C

Figure E.31 . Redundant circuit and its simplification

Exercise 14.3. Anti-glitch circuit


1. If gate A is removed from the given circuit, the logical function remains
unchanged: 1 = a b + b' c. However, the output 1 can produce a glitch (a short
negative pulse) when the inputs switch from (111) to (l01).
This anti-glitch circuit is useful but not completely testable: no stuck-at 0 of gate
Aare testable; indeed, their test requires a = c = 1, which implies that either the
output of gate Borgate C is at '1', hence, the final output 1 always takes the
value '1' (with or without fault). Consequently, this circuit has passive
(untestable) redundancy, which cannot be removed!
580 Appendix E

2. The anti-glitch circuit can easily be modified as shown in Figure E.32 in order to
make it completely testable. When T = 0, the outputs of gates Band C are equal
to '0', and we can test all stuck-at 0 faults of gate A .

T-,----t

Figure E.32. Redundant circuit

Exercise 14.4. Easily testable gate network


The initial circuit requires 8 input vectors to be totally tested with the single stuck-at
0/1 fault model. It is an exhaustive test sequence.
In the modified circuit, input T must be at '0' during the normal functioning. This
ensures that all XOR gates behave as INVERTERs, like in the initial circuit. During
test operation, input T are set to '1', which applies the same inputs to the two AND
gates. The circuit is completely tested with the 4-vector sequence of Table E.16.

abcT comments
01 10
o1 0 1 these three vectors
001 1 detect all stuck-at' l' faults
I 1 1 1 detects all stuck-at '0'
Table E.l6. Complete test sequence

Exercise 14.5. Reed-Muller structure


1. Test sequence of the SIGMA-PI realization of the logical function:
TSO = <1010, 1100,0110, 1111, 1011, 1110,0010, 0101, 1101>.

r--,. a$bc$ad

output

b c a d
inputs

Figure E.33. RM circuit


Answer to the Exercises 581

2. We determine the logical expression of this circuit (see Figure E.33), and we
compare this expression with the given SIGMA-PI expression. To facilitate this
logical comparison, we may use an intermediate verification model such as a
truth table (which corresponds to a canonicallogical form).
3. A XOR network has a very interesting property concerning error detection: any
single error occurring on one input is automatically transmitted to the output.
Indeed, a single input error changes the parity of the number of 'I' inputs. As a
XOR network produces an output 'I' if and only if an odd number of 'I' is
applied to it, any change in the input parity of 'I' values provokes a modification
of the output. The 4 vectors of sequence TS1 apply to each AND gate the three
testing configurations 11, 01 and 10. Hence, every fault of each AND gate is
activated as an error which enters the XOR network, and is consequently
propagated to the final output where it can be observed.
4. Electronics specialists have shown that a 2-input XOR gate is fully tested by the
exhaustive input test sequence only. With the previous TS1 sequence, this
property is not satisfied. It is very easy to verify that the proposed 5-vector TS2
sequence applies the 4 input configurations to any XOR gate.

Exercise 14.6. FIT PLA


1. /1 = a'b + bc,j2 = ab' + bc. Hence, 3 product terms are needed: a'b, ab' and bc
(common to both functions).
2. Figure E.34 shows the symbolic structure ofthis PLA.

xy
1 2 3 4
ANDparity
a
b c:t=:f=~i=~==~= AND - Net
c

Shift Register

ORparity

Figure E.34. FITPLA symbolic structure

3. Test sequence. It is made up of two parts: one sequence of 6 vectors to test the
AND network, and a sequence of 4 vectors to test the OR network.
AND network test. xy = 01 -+ TSl = <011,101,110>.
xy = 10 -+ TS2 = <100,010,001>.
For example, the test vector Oll forces the first line of the OR matrix to take value
'0', all other lines being at '1'. Thus, the 4 product terms take the values 10 11;
582 AppendixE

hence, if no faults are present in the AND network, the parity error line is at '1'. Any
fault equivalent to a stuck-at 0 of one active node (represented by a dot in the figure)
belonging to columns 1,3 and 4 is detected.
OR network test. A '1' bit is shifted 4 times from left to right in the shift register
(scan in input). All single or multiple permanent hardware faults are detected, apart
from the ones that do not modify the parity property of the AND parity vector (4
bits) and the OR parity vector.
Exercise 14.7. Scan Design
For each test vector Vi:
• The circuit is switched to Test Mode.
• Aserial input operation through the Scan In input is performed, in order to load
in the state register the 4-bit state belonging to the Vi test vector. This state
loading operation takes 4 clock couples (HM - HE); in parallel, the state register
containing the result from the previous test vector is read.
• The circuit returns to the Normal Mode, and one normal treatment step is
executed with one clock pulse (HM - HE).
A last reading of the interna! state register completes the test sequence.
Exercise 14.8. LFSR
1. This generator elaborates a deterministic cyclic sequence of 3-bit vectors. It is
based on a synchronous shift register whose input is the XOR of bits 1 and 3.
Hence, the initial condition gives the starting point of the sequence produced;
this sequence is shown in Table E.17.
Clock QIQ2Q3
0 010
1 001
2 100
3 110
4 111
5 o1 1
6 101
Table E.17. Sequence

2. The modified circuit behaves as a LFSR. From the initial state 111, we obtain the
following cycIic output sequence: <111, 011, 001, 001, 100,010, 110>.
Let us note that the LFSR property is not guaranteed for any XOR feedback
function. For example, if we take the XOR of bits Ql, Q2 and Q3, and if the
initial state is 111, then the circuit remains al ways in this state!
3. We analyze in Figure E.35 the evolution of the circuit from the initial state 100
when the first input vector 111 is received. Values in gray are the next state of
the register. This study can easily be extended to the rest ofthe applied sequence.
Answer to the Exercises 583

I.
(lk. --"""'T'"---I---"""'T'"-- --....,
1 1

Ql Q2

Figure E.35. Analysis of the PSA circuit

4. Non-detectable errors are necessarily multiple errors on several words of the


incoming sequence. For example, is not detectable the modification of the first
two words as folio ws (two single errors): <110, 111, ... .>. Naturally, based on
the mathematical properties of the Galois Fields, the class of non-detectable
errors can be formally determined.
5. Such a BIST technique is very attractive, because the coding and decoding
functions are easily implemented as logical circuits or software procedures.
Moreover, the speed of the corresponding circuitry is high. Unfortunately, the
efficiency of this technique in terms of fault detection strongly depends on the
product to be tested.

FOURTHPART

Exercise 15.1. Single parity code


1. The redundant parity bit is obtained by making the XOR of all the other bits. The
redundant codeword becomes (10111). This code detects:
• any single error, i.e. 5 errors (the parity bit belongs to the codeword),
• any tripie error, i.e. 10 errors (all 3-out-of-5 words),
• any quintuple error, i.e. 1 error (the word: 01000).
Trus makes 16 detected errors, amongst the (2 5 - 1) = 31 theoretically possible
errors; hence, the error coverage of this simple code is c = 16/31 = 0,516.
2. Example of odd non-detectable error: 11101 (bits 2 and 4 are erroneous).
3. Characteristics of the code:
Capacity: N =2 =16.
n· (

Density: d = NI 2n = 16/32 = 0.5.


Coverage rate for each codeword: C =number of detected errors/total number of
possible errors =2 1 2 -1 = 0.516.
n-l n
584 AppendixE

Redundancy: rr =r 1 k = 114 =0.25 (or 1/5 ofthe codewords).


Exercise 15.2. Hamming Code C(7, 4)
1. This separable code adds to the initial bits three redundant bits, Yh Y2 and Y4,
calculated by the expressions given in the exercise text. One can easily deduce
three properties, called control relations, which allow to detect and/or correct
errors. We order these relations as follow to make error correction easier:
Y4E9Y5E9Y6E9Y7=0 (1), YZE9Y3E9Y6E9Y7=0 (2), ylE9Y3E9Y5E9Y7=0 (3)
We call syndrome, noted s = (sI, s2, s3), the vector obtained by computing these
expressions. This syndrome is equal to zero if no error occurred; it is different
from zero if a single error or a double error occurs. For example:
• if Y3 is false, expressions 2 and 3 are modified and the syndrome is s = (0 1 1),
• if Y3 and Y6 are false, expression 1 is equal to '1', expression 2 remains at '0'
(because two modifications are neutralized in a XOR function), and expression 3
is equal to '1'; hence s =(1 0 1).
Erroneous bit 1 2 3 4 5 6 7
syndrome sI 0 0 0 1 1 1 1
s2 0 1 1 0 0 1 1
s3 1 0 1 0 1 0 1
Table E.18. Syndrome values

All multiple errors with rank higher than 2 cannot be detected. For example, if
YI, Y2 and Y3 are false, the resulting syndrome is equal to '0', hence this tripie
error is not detected. If only single errors occur, they are detected. Moreover, a
non-null decimal value of the syndrome indicates the position of the erroneous
bit, as shown in Table E.18: this property justifies the chosen relation order.
2. Any 'double' error is confused with a 'single' error when considering the value
taken by the syndrome. For example, we have shown that the double error
altering Y3 and Y6 produces the syndrome value s = (1 0 1): this error has the same
effect than a single error altering Ys.
3. This code is very close to the one presented in Example 15-4. Indeed, it
corresponds to a simple re-organization of the coding relations. Consequently,
both codes have the same detecting and correcting capability. The interest ofthe
version of this exercise is only to facilitate the identification of the erroneous bit.
4. In order to allow the detection of single and double errors, and to allow the
correction of single errors, we add a height redundant bit obtained by the E9 of all
the bits. This redundant bit add a fourth control relation:
YI E9 Y2 E9 Y3E9 Y4 E9 Y5 E9 Y6E9 Y7E9 Y8= 0 (4)
This relation produces the fourth bit of the syndrome, s4. Thanks to this fourth
relation, we can distinguish between any single error, which lead to s4 = 1, and
any double error, which maintains s4 = o. This new code is called the modified
Hamming code C(8, 4).
Answer to the Exercises 585

Exercise 15.3. Linear code


1. Matrices G and H are deduced from the coding and control relations:

G_[~~1001~~~1
- 0101010'
1101001

0001111]
H= [ 0110011.
1010101
110 1
0001111] 1 0 1 1 0000]
We verify thatH.GT = 0: [ 0110011 . 1 0 0 0 = [ 0000 .
1010101 o111 0000
o10 0
o 0 10
o 0 01

2. Coding: Y = U.G, i.e. [yj, Y2, Y3, Y4, Ys, Y6, Y7] = [uj, U2, U3, U4] . G.
For example, if U = [11 0 1], then Y = [1 0 1 0 1 0 1].
wl
w2

3. Detection and eorrection, S = H. W. i.e. H. : ; = [:~] = S

w6
w7

If we analyze Ibe eodeword W = [1 0 1 0 1 0 11. we ean verify !hat H. W = m.


1
o

Ifbit 3 i, erroneou, in Ibi, eodeword. H. ~ = m identifi., Ibe faulty bit.

o
1
586 AppendixE

Exercise 15.4. Eneoding of a eyeUe eode


The first phase of the coding process uses 4 dock pulses and deli vers at the output Y
the higher bits of the codeword, i.e. the bits of the word to be coded u in the
decreasing order: 1, 1, 0 and 0 (see Figure E.36).

InputU

OutputY
UJ.-I _. Uo PM-I •• Po

Figure E.36. Encoding circuit

Clock pulse FFI FF2 FF3


1 1 1 0
2 1 0 1
3 1 0 0
4 0 1 0
Table E.19. State evolution

During this phase, the state of the 3 D-Flip-F1ops, initially at '0', evolves as shown
in Table E.19. Then, the content of the register is shifted to the output Y. Hence, the
codeword is y =(0100011) corresponding to the polynomial: y(x) =x + x 5 + x6.
Now, let us calculate the codeword by a direct division of i n-k) u(x) by g(x), which
gives x(n-k) u(x) + r(x) :
g(x) =x3 + X + 1
x 3 + x2 + X =quotient
0+ x5 +x4 + x 3
X 5 +X3 +X2

r(x) = x

We obtain the same codeword y(x) =x(n-k) u(x) + r(x) =x + x5 + x6.


1101000]
. . . .
The generator matrIX assocIated wlth thlS cychc code IS: G
. .
= [ 1110010
0110100
.
1010001
Answer to the Exercises 587

1 101000
. 0110100
We venfy that [0 1 000 1 1] = [00 1 1]. [
1110010
1010001
Exercise 15.5. Single parity bidimensional code
This code uses redundancy at two levels: Longitudinal Redundancy Checking bits
(noted LRC) are added to each word, and Vertical Redundancy Check words are
added to the block (VRC). Each row and each column of the coded matrix belongs
to an error detecting and correcting code.

Table E.20. Bidimensional code with single parity

1. One parity bit is added to each word (LRC) , and a word (VRC) is added to the
block. Table E.20 gives an example of coding with p = 5, k = 4. After treatment
(e.g. a memory storage), a parity check is applied to each word, and a parity
check is made between all words. Any erroneous row or column is recorded.
2. Any single or multiple error is detectable if at least one error occurs on a row or
a column. It is obviously the case for any odd multiple error. It is also the case
for some even multiple errors; for example, a quadrupie error on a same word
will be detected four times.
3. To be undetectable, an error must have an even rank on each altered row and
each altered column. For example, the quadrupie error altering the bits of rows 1
and 3 and columns 2 and 4 cannot be detected, as no parity violation occurs.
4. Any error detected on rows 4 and 5 and columns 2 and 3 is a double error. Two
errors can produce this signature (Table E.21), but we cannot identify which one
is present:

column 2 column 3 column 2 column 3


row4 error error
row5 error error

Table E.21. Errors detected on rows 4 and 5 and columns 2 and 3

5. Two cIasses of errors can be corrected:


• All single errors. The erroneous bit is identified by intersection between the
detected row and column; then, the identified bit must be complemented.
588 Appendix E

• All odd errors on a same row or a same column.


Note. The practical efficiency of such a code is strongly related to the technology
used to store the data words. The error model considered here must be validated by a
statistical fault analysis.
Exercise 15.6. M-out-oJ-n code
1. Let wl and w2 any two different words of the m-out-oJ-n code. Having both
exactly m bits '1', they are different for at least 2 bits. If we compute the OR
function of these two words, the number of '1' bits will be at least (m +1) bits,
and if we compute their AND function, the number of '1' bits will be at most
(m - 1). Thus, in both cases the resulting combined word does not belong to the
m-out-oJ-n code. For example, if wl = 1100101 and w2 = 1001110, then wl OR
w2 = 110111, wl AND w2 = 1000100, both outside the code.
2. The smallest distance between two words is 2, as seen in the previous question
(e.g. 1100101 and 1101001). The greatest distance between two words is 2.m if
n :2: 2.m, or 2.(n - m) if n < 2.m. Examples:
• d (1100101,0011011) = 6 for a 4-out-oJ-7 code,
• d (11001010, 00110101) = 8 for a 4-out-oJ-8 code.
3. By definition, a unidirectional error modifies the number of bits' l' of the altered
word; thus, this error is easily detectable by counting the number of bits' 1'.
4. It is necessary to count the number of bits '1' of the word after treatment, and to
compare this number with m. This operation can be performed by a specific logic
circuit or a software procedure written in assembly language, according to the
speed requirement of the final application: a circuit is more expensive but faster
than a software procedure.
Exercise 15.7. Berger code
1. If k = 4, we need r = 3 redundant bits to express the number of '0' contained in
the data part. Thus, this code is not optimal (with r = 3 we could have k = 7).
Table E.22 shows all the obtained codewords.

X R X R
abcd efg abcd efg
0000 100 1000 01 1
0001 01 1 1001 010
0010 o1 1 1 010 010
001 1 010 10 1 1 001
0100 o1 1 1 100 010
0101 010 110 1 001
01 10 010 1 110 001
o1 1 1 001 1 111 000
Table E.22. Berger code for k = 4
Answer to the Exercises 589

2. The Berger code is separable and can thus be structured into two fields (X, R),
where X is the word before coding and R the redundant field. R is the binary
number of '0' bits of X. Let us first consider a unidirectional fault that increases
the number of '0' bits of the complete word. Three cases can be considered:
• If the X field only is altered, the number of '0' of X is increased and becomes
greater than the value in R: hence, this error is detected, as NbZero (X) > R,
• If the R field only is altered, the value of R decreases whilst the real number of
'0' bits of X is not modified: here also this error is detected, as R < NbZero (X),
• If X and R are both altered, the value of R becomes again smaller than the
number of '0' of X which has increased, so the error is detected.
The reasoning is similar with a unidirectional error that reduces the number of
'0' bits of the complete word.
3. Now, R is the binary expression of the number of '1' in X. We follow the same
reasoning as in the previous question with first an error that increases the number
of '0' bits of the complete word:
• If the X field only is altered, the number of '1' of X is decreased and becomes
smaller than the value in R: hence, this error is detected, as NbOne (X) < R,
• If the R field only is altered, the value of R decreases whilst the real number of
'1' bits of X remains unchanged: here also this error is detected, as R < NbOne
(X),
• If X and R are both altered, the value of R decreases whilst the number of '1' bits
of X also decreases: the error is not necessarily detected.

Exercise 15.8. Unidirectional codes


The comparison between the capabilities of these codes is presented in Appendix A.
For n = 10: the 5-out-of-1O code gives 252 codewords, the double-raiI5/1O code has
32 codewords, and the (m = 7, r = 3) code has 128 codewords.
Exercise 15.9. Modulo 9 proof
1. The c1ass of any integer A modulo 9 is the remainder of the division of A by 9.
If A = AO + Al 10 1 + A2 102 + ... + An 10 n, its c1ass, noted CCA), is:
CCA) = (AO + Al 101 + A2 102 + ... + An 10 n) / 9 [MOD 9],
CCA) = AO + Al + ... + An [MOD 9], as any power of 10 gives 1 as remainder.
This means that the research of the remainder of a division of an integer by 9 is
equivalent to the determination of the remainder of the sum of the figures of this
integer by 9. And this process is iterative. For example, if A = 591:
CCA) = 5 + 9 + 1 [MOD 9] = 15 [MOD 9] = 1 + 5 [MOD 9] = 6 [MOD 9].
This procedure is very simple to implement as hardware or software module.
2. Verification of the operation:
189 = 1 + 8 + 9 [9] = 18 [9] = 9 [9] =0 [9], 47 = 11 [9] = 2 [9].
• We perform the addition of the c1asses of the two considered numbers,
590 Appendix E

o + 2 = 2 [9], and we observe that the resulting dass belongs to the same dass as
the expected final result: 236 = 2 + 3 + 6 [9] = 11 [9] = 2 [9].
• In order to verify the second operation, we multiply the two classes,
o x 2 =0 [9], value which is different from the dass of the expected final result:
8867 = 8 + 8 + 6 + 7 [9] = 29 [9] = 11 [9] = 2 [9].
• The third operation is verified by subtracting the two dasses of the numbers,
0- 2 = 7 [9], value which is different from the expected resuIt: 144 = 1 + 4 + 4
[9] = 9 [9] = 0 [9].
Hence, we have detected an error on the operations 2 and 3. However, we cannot
correct those errors, as this code is only an error detecting code. Moreover, all
the faults are not detected, as shown by the fourth operation:
• 189 - 47 [9] = 7, and 97 [9] = 7; however, the correct value is 198 - 47 = 142,
which is different from the proposed result 97. This condusion is generalized by
question 4.
3. With the example 48 / 12 = 4, we obtain 48 = 3 [9], 12 = 3 [9], 3/3 = 1, value
which is different from the dass of the correct result: 4.
4. An error transforms a result N into another number N* = N + E (E is the error,
either positive or negative). This error is not detectable if and only if N* = N [9],
that is to say, if Eis a multiple value of9.

Exercise 15.10. Binary residual code


1. The binary number can be expressed as: N =No 16° + NI 16 1 + ... The remainder
in the division by 15 = 24 - 1, is obtained as in the previous exercise, by an
iterative process on the Ni elements. Practically speaking, we divide the binary
numbers into 4-bit slices (as 16 =24), starting from the LSB (the least significant
bit), and going towards the MSB (most significant bit). If necessary, some '0'
value bits can be added to the left of the number to complete the last slice. Then,
these slices are added together modulo 15 = 1111.
• N= 0010 111110111100 1101
(two '0' bits have been added to the left of the number),
• N = 0010 + 1111 + 1011 + 1100 + 1101 [15] = 11 0100 [15] = 0011 + 0101 [15]
= 1000 [15]
(this result can be obtained directly or, on the contrary, by progressively adding
the slices two by two).
2. Verification of the operation:
0011 0010 + 0110 1110 = 1010 1100? In decimal, this gives: 50 + 110 = 172?
Classes of the two operands and ofthe expected result: 0101 (5),0101 (5),0111 (7).
We add the two dasses of the operands: 0101 + 0101 = 1010 (10).
We do not find the dass of the expected result. Hence, the operation is false (this
can easily be manually verified in decimal).
Exercise 15.11. Checksum code
1. These five words are added without carry. The resulting word is joined to the
Answer to the Exercises 591

others, hence constituting a block of six 4-bit words. Here also, this computation
can be made, either globally, or in a cumulative way:
• 1101 + 0011 = 0000 (the carry is ignored),
• 0000+ 1110= 1110,
• 1110 + 0110 = 0100,
• 0100 + 0101 = 1001 which is then complemented to '2': 10000 - 1001 = 0111.
This last word is then added to the block.
2. The stored block contains 6 words: (1101, 0011, 1110, 0110, 0101, 0111);
indeed, the addition of the 5 first words gives the previous 1001 value which is
by construction the 2's complement of the fifth word 0111. Their addition
modulo 24 gives 0000.
3 - 4. This modulo 24 code detects any error that does not add or subtract to the
correct result a value which is a multiple of 16.

Exercise 15.12. GCR(4B - SB) code


Obviously, this mapping is redundant: 5 bits are needed instead of 4. Its property is
to ensure that at most two successive zeros occur:
• in a word, as there are no more than 2 successive '0' bits,
• in consecutive words in aserial transmission, as data combinations with more
than one zero at the beginning and the end of any word are prohibited.
This property is desirable in some applications (transmission, storage) to increase bit
density (e.g. for data storage on a magnetic tape) and ease clock synchronization.
Exercise 16.1. Test of a control system
1. The test of regulators Rl and R2 is performed off-line, according to a cyclic
mode. Areal-time clock periodically activates the test task. An efflcient testing
procedure will require a proper access to the regulators in order to test their
various regulation functions and process interfaces: amplifiers, sampie and hold
modules, analog to digital and digital to analog converters, etc. The tester
module can, for example, order the regulators to perform some pre-defined
regulation treatments, then to compare the obtained results with correct values
stored in memory. The periodicity of the test is here of 168H for Rl and 24H for
R2. If we know the reliability of the equipment, we can deduce the probability of
the occurrence of a fault between two consecutive tests: assuming simple
exponential laws with constant failure rates, the fault probability during the test
is approximated to the value: test-period x A..
Dealing with R3, an interrupt procedure is envisaged with a time slot of 10'
every hour: hence, the test period is IH. We suppose that during this 10' test
operation, R3 is totally checked.
2 - 3. On the contrary, if the complete test of the regulator is longer than 10', it is
necessary to split the testing task into several shorter test sequences, for example
one 8' and one 7' sequences. Thus, the periodicity of the complete test is
increased to 2 hours.
592 Appendix E

Exercise 16.2. Duplex technique


1. Any multiple fault altering the functional module is detected as soon as it
produces an output error (failure of this module). This also stands for any fault
altering the duplicate module.
Faults altering the comparison module are detected only if they lead to an
incorrect error signal value. For example, the stuck-at 0 of this erroneous output
cannot be detected if '0' is considered as the specification of a correct result.
The undetectable faults in the functional modules are those that modify in the
same way and at the same time both duplicated modules: they provoke the same
failure. This is the reason why these duplicate modules must be realized with
different methods and technologies.
Note. When an error is detected, it is not possible to locate it.
2. The number of faults of this product is about twice the number of faults of the
basic module; hence, the fault probability of the product is twice the fault
probability of one module. Consequently, the reliability of a duplex is lower than
the reliability of a non-redundant product. This is the price to pay for an
immediate detection (on-line testing technique) of the failures.

Exercise 16.3. On-line testing of a half-adder


1. Table E.23 provides again the truth table of the half-adder. We observe that,
whatever its structural implementation, this half-adder possesses natural
functional redundancy: the output vector (s c) = (l I) never occurs. An external
observer can exploit this property in order to detect 'on-line' any fault producing
a failure characterized by this forbidden vector (a simple AND gate is sufficient
to detect this case). However, this on-line testing capacity is very limited and
covers only a few real faults; in particular no stuck-at 0 fault can be detected on-
line.

ab scp
00 000
01 101
10 101
11 011

Table E.23. Truth table

2. Figure E.37 shows the modified gate circuit and the corresponding truth table
when a parity output p is added to this half-adder. Error detection is performed
by a 3-input XOR gate.
In that case, aseparate structural redundant circuit has been added to the basic
circuit. The on-line testing capability is better than in the previous technique.
However, some faults are still not tested, such as the stuck-at '0' noted a on the
figure: indeed, if a = 1 and b = 1, this fault produces the undetected failure
(s c p) = (l 0 I) instead ofthe normal vector (Oll).
Answer to the Exercises 593

Figure E.37. Half-adder with a parity output

3. In order to improve this situation, the previous circuit is modified by using three
independent circuits (Figure E.38). Any fault altering only one of these three
independent circuits is detected as soon as it provokes an error at one output
only. Hence, this on-line detection capability concerns all faults belonging to the
stuck-at fault model. However, the detection circuit is not concerned by the on-
line testing property. Indeed, the stuck-at 0 of the output of this circuit is not
detected! To remedy this problem, we can use a self-checking circuit, as shown
in the right part of Figure E.38. The final error outputs fand g belong to the 1-
out-of-2 detecting code {1O, 01}. Hence, any single fault in the whole circuit is
now detected.

a -T""".......;>\""-
)----1

self-checking
cbccker

~-k;llr
}-----I-r-- c
p~g
p

error

Figure E.38. Use of independent circuits and corresponding see

a -r--i
b -+~-;

r
g

Figure E.39. Duplex approach


594 Appendix E

4. A Duplex structure is shown in Figure E.39. It uses two half-adder modules and
a 2-bit double-rail see (this see is studied in the next exercise). We suppose
that the two duplicated modules are not affected by the same faults
simultaneously. The advantage of such approach is its simplicity. On the
contrary, it is much more expensive in terms of gate number.

Exercise 16.4. Double-rail self-checking checker


1. If each input pair (aJ, az) and (bio b z) belongs to the set {OI, 1O}, 4 input vectors
(22) can be applied to this see during a normal operation of the tested circuit.
This circuit (see Figure E.40) uses 4 product terms: A = al. b z, B = az. bio
C =al. bio and D =az· b z.

Figure EAO. Double-RaH see


It is easy to verify on the circuit that, in each case, only one AND gate (A, B, C,
and D) is active, hence producing an output vector belonging to the set {OI, 1O}.
All other input vectors (12 vectors) produce different output values:
• if the number of inputs' l' is lower or equal to 2, no AND gate is active and the
output vector is 00,
• if the number of inputs '1' is greater than 2, at least one AND gate of each output
is active, producing an output vector 11.
This behavior is shown by Table E.24. We can deduce from this table that this
circuit is code disjoint.
Now, to prove that this circuit is a see for the 2-bit double-rail code, we must prove
that it is selJ-testing for the normal input vectors. So, all its stuck-at faults must be
tested during the normal operation, i.e. by application of the previous four
codewords only! Obviously, the circuit presents symmetry property between the
AND and the OR gates. We see in Table E.24 that each AND gate is activated once
and is activated alone; hence, all stuck-at 0 are tested by producing an output vector
00 which is outside the code {OI, 1O}. Let us consider the gate A; it receives the
input vectors 01 (input codeword 0101) and 10 (input codeword 1010), and each
time gate B is inactive: hence, all stuck-at 1 of gates A and E are tested by producing
an output vector 11 outside the normal code. Symmetrical situations can be found
for all other AND gates.
Answer to the Exercises 595

Inputs a1 a2 bl b2 cl c2
0 I 0 I D 0 1
2 / 4 codeword 0 1 1 0 B 1 0
(4 vectors) I 0 0 I A I 0
1 0 1 0 e 0 1
wrong 2 / 4 words I 1 0 0 0 0
(2 vectors) 0 0 1 1 0 0
less than 2 bits' I' 0 0 0 0 0 0
(5 vectors) ---- --
I 0 0 0 0 0
more than 2 bits ' l' 0 I I I I I
(5 vectors) - -- --
I I 1 1 I 1

Table E.24. Truth table of the see

2. Let us analyze the global circuit of Figure E.41 which combines 3 elementary
checkers to check a 4-bit double-rail code. To prove that this circuit is a sec, it
is sufficient to verify that each checker receives the four 2-bit double-rai!
codewords defined in question I.

Figure E.41. Association of three sees

Test vectors Internal Outputs


a b d e c f g
01010101 01 01 01
011001 10 1010 01
10011010 10 01 10
10101001 01 10 10

Table E.25. Minimum test sequence


596 Appendix E

Table E.25 shows that the whole SCC is tested by a sub-set of only 4 input
codewords: each checker receives a testing set of 4 input vectors.
Exercise 16.5. Parity self-checking checker
1. The circuit (Figure E.42) is a SCC converting a 4-bit odd-parity input code into a
l-out-of-2 output code. We must verify that it is code disjoint and self-testing.

abc d

ilvJ,L.g
l!J~f
Figure E.42. Parity checker

• The circuit is obviously 'code disjoint' . If an odd number of inputs (1 or 3) take


the value '1' (8 cases), the outputs fand g belong to the code {O 1, 1O}. On the
contrary, if an even number of inputs (0 or 2) take the value '1', the output
signals take the values 00 or 11 .
• We assume that the test of each XOR gate requires the application of all its input
vectors (00, 01, 10 and 11). Any error is then propagated to the final output, as
XOR gates propagate any input modification (the observability ofaXOR
network is complete for single errors).
Table E.26 shows an example of test sequence constituted of four input vectors
belonging to the normal odd-parity input code. This 4-length sequence is
minimal. So, the circuit is self-testing. It is a self-checking checker for an odd-
parity code.
Lines Outputs
a b V c d f g
0 0 0 0 1 0 1
0 I I 0 0 1 0
I 0 I I I 0 I
I I 0 1 0 I 0
TabLe E.26. Minimal test sequence

2. We know from question 1 that a subset of only 4 input codewords is sufficient to


ensure the self-testing property. However, not any subset guarantees this
property: for example, the circuit is no longer self-testing if the circuit under on-
line testing produces the first three codewords only.
3. Consider the minimal test sequence given in question 1. If we operate a
permutation of inputs band c, the resulting set of input vectors does not provide
the self-testing property to the circuit. Indeed, the second XOR gate receives two
input vectors only (00 and 01).
Answer to the Exercises 597

Exercise 16.6. Software funetional redundaney


First of all, the used formal parameters and local variables represent temperatures
lower than O°C, as the function is called only if the freezer is freezing. In the present
case, the function calling with Min = +372 °C, or the return of a positive I value, will
not be detected. To remedy this problem, we introduce a new type
Freezing_Temperature:
subtype Freezing_Temperature is integer range
Minimal_Temperature .. 0;
where Minimal_Temperature is an negative constant previously declared.
Hence, Min, Max and I belong to this type.
Moreover, this function implicitly assumes that the value of Min should be lower
than the value of Max. However, no verification of this property is made. We
propose to add to the program a pre-condition as the first statement of the body of
the function:
if Min > Max then raise Erroneous_Call;
end if;
Finally, the returned value must belong to the range [Min, Max] . Here also, this
condition is implicitly expressed by the name of the function. However, the violation
of this property due to a design fault is not detected. We propose to add just before
the 'return' statement the post-condition:
if not (Min<=I and I<=Max) then raise Erroneous_Design;
end if;

Exercise 17.1. Trame Light Controller


Figure E.43 gives the coded state table and the symbolic Moore structure of the
traffic light controller.

J>resent state Next statc


y 1234 y 1234
I I 100 1010
2 10 I 0 1001
3 1001 011 0
4 01 10 0 10 I
5 0101 00 1 1
6 00 II 110 0 Ok

Figure E.43. Fail-Safe design of the controller

1. The 2-out-of-4 code can represent N = (~) =6 codewords, which is exactly the
number of internat states of the state graph to be coded.
2. Four synchronous D Flip-Flops are used to implement this circuit. The D-inputs
(Di = yi) are logical functions of the outputs of these Flip-Flops (Q1 = Y1):
598 AppendixE

DI = QI.Q2 + QI.Q3 + Q3.Q4


D2 =QI.Q4 + Q2.Q3 + Q3.Q4
D3 =QI.Q2 + QI.Q4 + Q2.Q4
D4 =QI.Q3 + Q2.Q3 + Q2.Q4
They are realized by 4 independent AND/OR logical networks. The outputs are
also realized by monotonic circuits (AND/OR gates only) of the outputs of the D
flip-flops:
RI =QI.Q2 + QI.Q3 + QI.Q4 + Q3.Q4, YI =Q2.Q4, GI =Q2.Q3
R2 =QI.Q4 + Q2.Q3 + Q2.Q4 + Q3.Q4, Y2 = QI.Q3, G2=QI.Q2
3. We will examine a few faults of the Di equations to show the principle of the on-
line detection. Any stuck-at 0 altering an AND gate will be transformed into an
error (1 -+ 0) when the present state of the FSM normally activates this gate;
hence, the next state will have one bit '1" only. Consequently, all the AND gates
will take an output value '0', and, at the next dock pulse, the FSM will reach a
null state (0000). The convergence towards this null safe state is achieved in two
dock pulses. Moreover, this is a stable trap state. A same reasoning applies to
any stuck-at 0 fault of the OR gates. Now, let us consider a stuck-at 1 fault in a
AND gate which provokes an error (0 -+ 1), for example gate QI.Q2 of DI. A
necessary condition for this error to be propagated to D 1, is that the present state
is different from (1100). Whatever the considered stuck-at 1 fault, there is a
present state that leads the FSM to astate having 3 bits '1'. The structure of the
Di expression is such that, in that case, the next state will be the stable trap safe
sate (1111). Here also, the convergence is made in two dock pulses.
To complete this study, we could ask the question: is any stuck-at fault detected
during the normal functioning of this FSM? The answer is 'yes', if we assurne
that the whole state graph is totally used (each state and each transition) during
the normal life of the circuit. This is required to guarantee the implicit single
fault assumption. If not, we could have double faults that inhibit the fail-safe
property.
4. Any fault affecting one flip-flop will provoke an evolution of the internal state of
the circuit outside the 2 / 4 code, like in the previous question. If the Clock input
is blocked (stuck-at 0 or 1), the whole state rnachine remains in the same correct
state; hence, this fault is not safe. To remedy this problem, the specialists have
proposed special duplicated dock systems.
5. With a l-out-of-n coding of the internal states, we need 6 internal variables
instead of 4. However, the logical expressions are very simple.
Exercise 17.2. Mathematical function processing
According to the first approach, if the treatment is stopped, no value is available for
Y. On the contrary, after each iteration of the second approach, a value of Y is
available and this value is doser and doser to the correct result. Consequently, an
approximate value may be used if the deadline is reached before the end of the
normal processing.
So, this second solution is much preferable to implement a fail-safe program.
Answer to the Exercises 599

Exercise 18.1. Reliability of the TMR


1. There is no failure as long as 2 of the 3 modules function correctly. We suppose
that the three modules have the same reliability, Ro(t) =e -Ä.t, and that the voter is
faultless. The global reliability corresponds to all situations leading to no
failures. We can deterrnine it by different methods.
• We enumerate all these statistical situations: 2 modules are faultless and the third
one is faulty (3 cases), and the three modules are faultless (1 case).
Thus, we obtain: R =3. Ro2 .(1 - Ro) + Rl= 3. R02 - 2. R03 .
• We make a logical treatment based on the theorems of composed probabilities:
= = =
R P(AB OR AC OR BC) P(AB OR (AC OR BC» P(AB) + P(C.(A OR B» -
P(ABC), P being the probability and A, Band C being the 3 modules.
Let us note that P(ABC) is subtracted as it is the only event counted twice in the
two other terms.
R =P(AB) + P(C). P(A OR B) - P(ABC),
R =P(A).P(B) + P(C).(P(A) + P(B) - P(A).P(B» - P(A).P(B).P(C),
R =P(A).P(B) + P(C).(P(A) + P(C).P(B) - 2. P(A).P(B).P(C),
-+ R(t) = 3.Ro(t)2 - 2.Ro(t)3 =3.e -2A! _ 2.e '3A!,
as all modules have the same reliability.
By mathematical integration of the previous function, we deduce the MTBF (or the
MTTF): MTBF =5/6 MTBFo, which is lower than the MTBF of one module.
Note. In fact, the reliability curve of the TMR has a horizontal asymptote for t =0;
this reliability is greater than the reliability of the basic module when t is 'small' , but
it becomes lower after a certain time (see Appendix B).
2. According to the reliability diagram, the voter module is in 'series' with the three
modules. Thus, its reliability must be multiplied by the reliability of the triplex:
R(t) =(3.e -2A.I _ 2.e -3A.I).e -A.t1JO
Exercise 18-2. Fault tolerance of the TMR
In terms of reliability, we assume that the TMR system fails as soon as two modules
fail. To simplify, we neglect the reliability of the Voter. Any functional or
technological fault altering only one module is tolerated. In fact, such a redundant
structure tolerates much more faults: any fault altering one or several modules is
tolerated if and only if it does not modify the behavior of 2 or 3 modules in the same
way and at the same time. For example, a fault producing the same simultaneous
error at two module outputs induces a global failure of the system. Moreover, the
latency phenomena slightly complicate this analysis. Indeed, we know that a fault is
not necessarily activated as a failure as soon as it occurs. Thus, the tolerance is
increased as the occurrence of a possible failure on 2 or 3 modules is delayed.
Let us now examine the TRM structure with hardware fault hypotheses. In an
electronic circuit made of a set of components, faults are supposed to be independent
probabilistic events (and we use prob ability theorems with this assumption). The
assumption of a fault altering one module only, generally used (see the reliability
computation made in the previous exercise), is justified by the fact that the
600 Appendix E

probability of having a double fault affecting two modules is the product of the
probabilities of having a fault in each module. With electronic components, the
actual values of the 1.. are very small (e.g. 10-7) , hence, we neglect the product terms
(10. 14). This assumption cannot be made if strictly identical components are used in
the TMR. Indeed, these components can have the same design faults or
environmental weaknesses (e.g. sensitivity to temperature); thus, faults cannot be
considered as independent phenomena and all reliability computations are false.
Another criticism deals with other faults violating the independence assumption.
They produce failures at the same time on non-identical components. For instance,
this situation can result from external perturbations, such as an Electro-Magnetic
parasite.
Exercise 18.3. NMR
1. Let us assume that each module has only one output. During anormal
functioning, the output vector (z] , z2, z3) must take the values (000) or (1 1 1).
Any other value is erroneous, hence the detection function is:
error =(zl ' . z2'. z3' + zl. z2. z3)"
where '+', '.', and '" represent the operators OR, AND, and NOT.
This expression can directly be implemented by a very simple circuit (containing
few MOS transistors). We will develop it further to make a transition with
question 2. We obtain the expression given in Chapter 18:
error =(zl$ z2) + (z] $ z3) + (z2 $ z3).
The 2-input $ operation gives a '1' if and only if its inputs are different.
2. First we create the three elementary comparison functions:
ja = (z]$ z2),jb = (z] $ z3), andje = (z2 $ z3).
If z] is erroneous,ja AND jb is equal to '1',
If z2 is erroneous, ja AND je is equal to '1',
If z3 is erroneous, jb AND je is equal to '1'.
Hence: M] = ja AND jb, M2 = ja AND je, M3 = jb AND je.

sI
s2

sI
s3

s2
s3

Figure E.44. DetectionlDiagnosis circuit


Answer to the Exercises 601

The corresponding circuit is shown in Figure E.44. The signals M1, M2 and M3
identify the failing module (their value is a 1-out-o/-3 codeword in case of error),
and allow its inhibition (thanks to apower switch-off, for example), and finally
its replacement by a spare module.
3. The voter must behave as the majority of its inputs: this function is the logic
MAJORITY. For 3 inputs, we have:
MAJ (zl, z2, z3) =zl.z2 + zl.z3 + z2.z3.
The corresponding electronic CMOS component is simple.
This function can easily be extended to 4 inputs:
MAl (zl, z2, z3, z4) =zl.z2.z3 + zl.z2.z4 + zl.z3.z4 + z2.z3.z4.
Note: The MAJORITY function is not associative (no possibility for combining
smaller MAJORITY modules).
Exercise 18.4. Study of the double duplex
1. Reread Chapter 18, sub-section 7.2.2.
2. The product functions correctly as long as one of the two couples (LI, 1.2) or
(2.1, 2.2) functions correctly. The reliability of the product is then:
=
R P«1.l AND 1.2) OR (2.1 AND 2.2» =P(1.l AND 1.2) + P(2.1 AND 2.2)-
P(1.l AND 1.2). P(2.1 AND 2.2),
=
R PO.1) . P (1.2) + P(2.1) . (2.2) - P(1.l). P (1.2). P(2.1). P (2.2),
where + and - are the addition and subtraction operators.
If the modules have the same reliability RO, we have:
R =2. RO 2 _ RO 4 =2 e -2")..1 _ e -4")..1.
Note. The reliability curves of this structure are given in Appendix B.
Exercise 18.5. Study of self-purging technique
1. The switch-off of a failing module is performed by each one of the modules of
the structure. This approach is interesting because it eliminates a part of the
centralized commutation unit which is always a delicate part of a fault-tolerant
system (such a part is called the kernel of the system). This technique is a step
towards a complete decentralization of the duplicate modules and of the decision
function (thanks to a distributed voter). Such distributed structure can be
encountered in the framework of distributed software tasks in a distributed
multiprocessor system.
2. When only 2 modules remain active, the product regresses to a simple Duplex.
Thus, the next error occurrence will not be tolerated. The system operates
according to a degraded mode until a maintenance operation restores the
tolerance capacity of the product.
Exercise 18.6. Example of a tolerant program based on retry mode
The fault to be treated being associated with a provided data, the use of the retry
mode is pertinent. Indeed, the Get procedure is not the cause of the problem. Fault
602 Appendix E

tolerance mechanism must detect an error if a non-integer data is provided from the
keyboard. In this case, the data sampling must be reiterated (the Get function is
executed again) after having restored the initial context. Let us consider the
following solution:
Procedure Safe_Get{I : out integer) is
begin
loop
beg in
Get{I);
exit;
exception when Data_Error => Skip_Line;
end;
end loop;
end Safe_Get;
As soon as an integer value is provided and acquired by the Get (I) function, the
exi t statement allows to exit from the loop (loop), hence to finish the execution
ofthe Safe_Get procedure.
On the contrary, the reading by Get of an erroneous data value leads to the raising
of an exception (Data_Error) and the branching to the associated exception
handler. This treatment erases (thanks to the operation Skip_Line) the content of
the buffer containing the keypressed characters; these characters have not yet been
all extracted because of the partial execution of Get. For example, if the user has
keypressed the 5-character sequence <17 A28>, and then 'Carriage Return', the
execution of Get (I) can let characters '2' and '8' in the keyboard buffer, as the
left to right analysis of the expected figures has been interrupted by the raising of the
exception induced by the analysis of 'A'. As expected, the buffer reset action
restores the system in a safe state.
Exercise 18.7. Programming and evaluation of recovery bocks
1. Programming. The two following program extracts illustrate the two
approaches proposed in section 18-4. We assurne that the execution context is
limited to the input/output parameter C. C_Prime is a data structure having the
same type T as C.1t will store the safeguard copy (the duplicate).
Procedure Recovery_Block_V1{C : in out T) is
C_Prime: T;
Error: Boolean;
begin
Save{C, c_Prime);
Error := P{ C );
if Error then Restore (C_Prime, C);
Q{ C );
end if;
end RecoverY_Block_V1;
Answer to the Exercises 603

Where the procedures Save (X, Y) and Restore (X, Y) both makes a copy of X
intoY.
Procedure Recovery_Block_V2(C : in out T) is
C_Prime : T;
Error: Boolean;
begin
Save(C, C_Prime);
Error := P( C_Prime );
if Error then Q( C );
else Restore(C_Prime, Cl;
end if;
end Recovery_Block_V2;
2. Evaluation of the performance. The two previous programs allow the expected
performance of the two proposed approaches to implement the recovery blocks
to be evaluated. In both cases, the context is initially saved. But the rest of the
bodies is different.
• In the first case, a correct execution of P does not require any supplementary
treatment; when an error is detected, arestore operation is performed before
executing procedure Q.
• In the second case, an opposite situation occurs, i.e. a correct execution implies a
restore operation; on the contrary, in case of error detection such restore
operation is not necessary before executing Q.
To conclude, the first approach is more efficient when P is correctly executed, while
the second approach is more efficient when the use of the redundant component Q is
required. This last design approach can for example be chosen if we know that the
execution of the redundant component requires a supplementary duration to which
any further restoring duration must be added, due to real-time constraints.
Exercise 18.8. EDC in a RAM
1. Matrices G and H:
111000000000
100110000000
010101000000 101010101010]
110100100000 [ 011001100110
G=
110100101000 ,H= 000111100001.
010000010100 000000011111
110000010010
000100010001
Coding operation:
=
[Yt. Y2, Y3, Y4, Ys, Y6' Y7, Ys, Y9, YIO, Ylt. Y12] [u], U2, U3, U4, US, U6, U7, Us] . G.
For example, if U = [00 1 1 101 1], then Y = [11 0 101 1 1 101 1].
604 Appendix E

2. Error detection and correction. We suppose that bit w6 is erroneous, and we


perform:

101010101010j
011001100110
[0]
1
S=H.WT , [ .[110100111011] T= .
000111100001 1
000000011111 °
The decimal value of this syndrome vector indicates the erroneous bit: bit 6. The
correction is then a simple binary complementation.
3. Implementation ofthis code in the MMU.

l t ControIIIua

I=t:~ 1krMry
4bils 4 bils
Decodi~ ....... syndrolM

~
8 bits
.. 8 hits CorrectIon
..... Error

f 8 bits
Adress Bus ..
Data Bus
.
Figure E.45. Detection and correction circuit

Figure E.45 shows the structure of the EDC circuitry for this code. The 'check
bit generation' module implements in hardware the XOR expressions to generate
the 4 redundant bits. The 'decoding and correction' module implements the
matrix product S =H. WT . This module uses the result S to correct the erroneous
bit, and it communicates with the external system (e.g. a CPU) for error logging.
4. Scrubbing operation
As said in Chapter 18, the scrubbing is an off-li ne operation which write
corrected erroneous word, and read them again, in order to check if the faults are
hard or soft. If they are soft, the word has been cleaned up. On the contrary, the
fault is hard and cannot be cleaned.
The previous structure is entirely compatible with such useful function.
Glossary

1. ACRONYMS

ABS Antiloek Braking System


ATE Automatie Test Equipment
ATPG Automatie Test Pattern Generation
BCH Bose Chauduri Hoequenghem
BIST Built-In Self-Test
BIT Built-In Test
BITE Built-In Test Equipment
BNF Baekus-Naur Form
C/DC ConditioniDecision Coverage
CAM Computer Aided Maintenanee
CAN Control Area Network
CIRC Cross-Interleaved Reed-Solomon Code
CMOS Complementary MOS
COTS Components Off The Shelf I Commercial Off-The-Shelf
CRC Cyclie Redundaney Cheek
DAT Digital Audio Tape
DFG/PFG/SFG Deterministie I Probabilistie I Statistieal Fault Grading
DFT Design For Testability
DRC Design Rule Cheeking
DUT Deviee Under Test

605
606 Glossary

ECC Elliptic Curve Cryptography


EDC I ECC Error Detecting Codes I Error Correcting Codes
EMC Electro-Magnetic Compatibility
ESF Extended Super Frame
FMEA Failure Modes and Effects Analysis
FMECA Failure Modes and Effects and Criticality Analysis
FPGA Field Programmable Gate Array
FRC Functional Redundancy Checking
FSM Finite State Machine
FTM Fault Tree Method
GSM Global System for Mobile communication
HDB High Density Bipolar (signal coding)
HDL Hardware Description Language
IC Integrated Circuit
JTAG Joint Test Action Group
LFSR Linear Feedback Shift Register
LRC I VRC Longitudinal I Vertical Redundancy Check
LSB/MSB Least/Most Significant Bit
LSSD Level Sensitive Scan Design
MCIDC Modified ConditionlDecision Coverage
MDT Mean Down Time
MOS Metal Oxide Semiconductor
MTBF Mean Time Between Failures
MTTF Mean Time To Failure
MTTFF Mean Time To First Failure
MTTR Mean Time To Repair
MUT Mean UpTime
NMR N-Modular Redundancy
NRZ Non-Return to Zero
PCB Printed Circuit Board
PLA Programmable Logic Array
PLC Programmable Logic Controller
PLD Programmable Logic Device
PSA Parallel Signal Analyzer
RAID Redundant Array of Independent Disks
Glossary 607

RAM Random Access Memory


ROM Read Only Memory
RSA Rivest Shamir Adleman
SCC Self-Checking Checker
SOC System On Chip
STG State Transformation Graph
STIL Standard Tester Interface Language
TAP (Boundary Scan) Test Access Port
TMR Tripie Modular Redundancy
VAN VehicJe Area Network
VHDL VHSIC Hardware Description Language
VHSIC Very High Speed Integrated Circuit
VXI VME eXtensions for Instrumentation

2. KEYWORDS

fit Word
"',
7
Meanmg 0~~ r:I .if.i
eH
acceptability Curve expressing the acceptable risk rate of failures from their 17.1
curve seriousness
acceptable A product whose failures have acceptable risk rates 17. 1
product
acceptable risk See risk: acceptable rate
rate
acceptance test See test: acceptance
activation : The OCCUITence of a first eITor provoked by a fault. 4.1
initial This eITor is called primitive error or immediate error 13.2
See also fault activation
active fault Seefault tolerance: active
tolerance
Ad Hoc See DFT: ad hoc approach
approach
adaptive See sequence: adaptive
sequence
adaptive vote See vote: adaptive
aggression Seefault: external
608 Glossary

alias An alias oeeurs when a faulty cireuit test output response gives a 14.5
signature whieh is identieal to the fault-free signature (used in
BIST techniques by LFSR signature analysis)
alpha test See test: alpha
alternate Redundant module (version) having the same specifieation (or a 18.4
degraded form) and, often, a different implementation than the
original funetional module
ambiguity An element whieh leads to several meanings 9.3
analysis: See criticality analysis
eritieality
analysis: See dynamic & static analysis
dynarniclstatie
assertion Funetional redundaney uSed for software verifieation. It tests the 10.5
validity of a property eaeh time a given cireumstanee eould violate 16.3
it. It ean be used: during the ereation stages for fault removal (Ch.
10), or during the operation stage for fault detection by on-line
testing (Ch. 16)
ATE Automatie Test Equipment 12.1
ATPG Automatie Test Pattern Generation: automatie generation of lists 12.3
of test inputs and expeeted outputs to perform produet testing
attributes of Criteria enabling the system dependability to be assessed. The 1.4
dependability most used attributes are: reliability, availability, maintainability, 7
testability, safety and security
attributes of The behavior of a module is eharaeterized by a set of attributes 2.3
module whose values define the states of the module
availability It is the probability that the system is operational at the time t, 7.5
knowing that it funetions eorrectly at time 0
availability: Value ofthe availability at a given time t: A(t) 7.5
instant
availability: In permanent stage, availability value of A(t) when t -+ 00 7.5
permanent
baekward fault Step of struetural fault eoverage method whieh deterrnines the 13.3
analysis faults deteeted by a given test veetor by a baekward proeess (from
the outputs towards the inputs)
baekward See propagation: backward
propagation or
tracing
baekward Fault-tolerance technique whieh eonsists in bringing the system 18.3
recovery baek in astate previously reaehed before the system exeeution
resumption. This teehnique makes often use of eontext saving and
restoring meehanisrns (sueh as the recovery cache). The exeeution
of M is resumed at a recovery point
Glossary 609

bathtub curve Reliability model which represents the evolution with time of 7.2
failure rate of electronic components. Typically, it shows 3 parts:
infant mortality where the failure rate decreases, usefullife where
the failure rate is constant, and wearout where the failure rate
increases.
behavior Reaction of a system mainly described as changes of states in this 2.3
book
behavioral A design step/model of the system specifying its behavior 2.2
level/model 2.3
benign Seefailure: benign
beta test See test: beta
BIST Built-In Self-Test. Group of Design For Testability methods which 14.5
incorporate the test functions into the circuit
BIST: signature Group of BIST techniques using a test sequence generator (usually 14.5
a LFSR), a compactionfunction (usually a PSA), and a signature
analysis function
BIT Built-In Test. Group of Design For Testability methods which 14.4
incorporate test facilities and offer a test interface
bit stuffing Fault detection technique applied to data. After a number of bits 18.7
with the same polarity, an additional bit is introduced with an
opposite polarity. Used for instance in the CAN Bus
BITE Built-In Test Equipment. All maintenance functions of a system 14.4
boundary scan Scan technique belonging to the BIT design for testability. 14.4
Normalized as the IEEE Standard 1149.1
branch test See test: branch
bridging fault A particular case of short electronic fault 5.2
See also fault: short-circuit
BSDL Boundary Scan Description Language 14.4
bug See fault: structural (for software technology) 3.2
bum-in test See test: bum-in
CIDC See test: ConditionlDecision
CAM Computer Aided Maintenance 12.1
CANBus Control Area Network bus. Initially created for automotive 18.7
industry. Normalized under ISO 11898
catastrophic See failure: catastrophic
checker Module used in self-testing systems to detect the occurrence of 16.3
errors from the observation for instance of some EDC code
variables
See also self-checking checker
checkerboard See test: memory
610 Glossary

checksum See code: checksum


clarity The text or the model describing a system is easy to read 9.3
client Entity or person expressing requirements or specifications to ask 2.2
for an expected product
code Set of the codewords in an EDCC 15.1
code preserving See Totally Self-Checking System
code: Unidirectional code such that each codeword has exactIy m bit '1' 15.4
m-out-of-n and (n-m) bits '0'
code: Special case of m-out-of-n code such that the codeword is obtained 15.4
two-rail by adding to the word to be coded its complemented copy
Also caIIed double-rail
code: Category of codes dealing with detection and correction of errors 15.5
arithmetic in arithmetic systems
code: Berger One of the codes dedicated to unidirectional errors. The redundant 15.4
part expresses in binary the number of bits '0' in the word to be
coded
code: Code applied to blocks of words 15.3
bidimensional Also caIIed product code
<> unidimensional
code: capacity Number of codewords that can be made with a given code 15.2
Also caIIed power 0/ expression or cardinality
code: See code: capacity
cardinality
code: Bidimensional arithmetic code based on the sum without 15.5
checksum remainder of the words of a block
code: CIRC Cross-Interleaved Reed-Solomon Code 15.3
code: cost Number of bits (n) of the codewords 15.2
code: coverage Ratio of the number of errors detected andlor corrected by the code 15.2
rate and the number of errors belonging to the considered error model
code: CRC Cyclic Redundancy Check code are cyclic EDCC codes 15.3
code: cyclic Family of linear EDCC codes. Modeled with polynomials 15.3
code: density Ratio of the capacity of a redundant n-bit code and the theoretical 15.2
number of words that can be made (2n)
code: disjoint See self-checking checker
code: ECC EIIiptic Curve Cryptography codes based on eIIiptic curves 15.1
code: EDCC Error Detecting and Correcting Code. Redundant coding of 15.1
information used to detect andlor correct errors
code: error See code: EDCC
corrector
Glossary 611

code: error See code: EDCC


detector
code: fault See totally self-checking system
secure
code: Fire Cyclic code addressing burst multiple errors 15.3
code: linear EDCC using multiple parity. Modeled with matrices 15.3
code: low- High Density Binary. Signal level coding 15.1
level: HDB
code: low- Signal level coding 15.1
level:
Manchester
code: low- Non-Return to Zero. Signal level coding 15.1
level: NRZ
code: modulo 9 An example of arithmetic code 15.5
proof
code: multiple Code using several redundant bits which are obtained by XOR 15.3
parity combinations of some bits of the word to be coded
code: power of See code: capacity
expression
code: product See code: bidimensional
code: Ratio of the number of added bits ('redundant bits') and the 15.2
redundancy number of bits of the word to be coded (calIed 'useful bits')
rate
code: residual Category of codes intended to detect errors in arithmetic circuits 15.5
code: RSA Ri vest Sharnir Adleman codes based on the factorization of large 15.1
numbers
code: self- See self-checking checker
checking
code: separable Properties of redundant codes 15.2
/ non-separable separable code: the information to be coded is explicitly included
in the codeword (the codeword is made by adding redundant bits to
the information data)
code: single Code using one redundant bit which is the XOR of allother bits of 15.3
parity the word to be coded
code: syndrome See syndrome
code: totally See totally self-checking system
self-checking
code: two-rail Particular case of m-out-of-n code 15.3
Also called double-rail
612 Glossary

code: Code applied to individual words 15.3


unidimensional <> bidimensional

code: Category of codes intended to detect unidirectional errors, Le. 15.4


unidirectional multiple errors that modi fies the altered bits in the same way
(all '0' to '1', or all '1' to '0')
codeword Coded element of information 15.2
cold standby See redundancy: cold standby
redundancy
compaction See test: compaction
compaction See BIST: signature
function
compatibility The service delivered by the product is greater than the one 4.2
ofa product expected from the specifications
<> incompatibility
compensation Fault tolerance technique using passive redundancy such as the 6.4
technique TMR. Does not require error detection 18.2
See also error masking
complete See complete distinguishing sequence
diagnosts
sequence
complete Diagnosis sequence which split the fault model into classes of 12.2
distinguishing system equivalent faults 13.4
sequence
completeness All possible cases are handled 9.3
compliance test See test: compliance
component Structural entity of a system 2.3
Also called module or sub-system
composition Relationships composing sub-systems to express the structural 2.2
relationships model of a system
compositional See hierarchy: composition
hierarchy
comprehension The understanding of the semantics of a text or a model describing 9.3
a system or other pieces of information
computer aided CAM. Tools which assist the maintenance team 12.1
maintenance
concision The text or the model describing a system does not contain useless 9.3
verbiage
condition Boolean expression which does not contain any Boolean operator 13.6
(AND, OR, NOT)
See also decision
Glossary 613

conditionl See test: conditionldecision


decision test
conditional See maintenance: preventive
maintenance
confidentiality Non occurrence of unauthorized disclosure of information 7.7
confinement See error confinement
consequences External effects of faults or failures on the product' s mission. 4.2
of These effects are generally classed into groups according to their 17.1
faults/failures seriousness: minor or benign (seefailure: minor), signijicant (see
failure: signijicant), serious (seefailure: serious), catastrophic or
disastrous (seefailure: catastrophic),
consistency No conflicts exist between definitions of the elements of a system 9.3
or ofa text
consistency (or justijication) One of the four basic steps of path sensitizing test 13.2
operation generation method which verifies that the loeal constraints can be
satisfied in the whole circuit
contamination See error propagation
continuity of See reliability
service
continuity test See test: continuity
continuous on- See on-line testing
line testing
contract Document produced during the specification phase, which 2.2
formalizes the mission of the product (funetion and duration), non-
functional constraints on the environment, and the dependability
attributes
control flow Finite State model derived from a program, expressing the 13.6
sequencing of the statement block and taking the input events and
the internal decisions into account
control path Path of a control flow 13.6
controllability Ease of reaching a given state of a system behavior by exercising 6.3
its inputs 14.1
corrective Action taken to eliminate the causes of an existing nonconformity, 2.2
action defect (fault) or other undesirable situation in order to prevent
recurrence
corrective See maintenance: corrective
maintenance
coverage See fault coverage and code: coverage rate
coverage table See fault table

CRC See code: CRC


614 Glossary

creation See development process


process
criticality Methods used to estimate the risks of the failures of a product 17.1
analysis
criticality level Measurement or classification based on acceptable rate risk 17.1
Also see consequences offaultslfailures
curative See maintenance: corrective
maintenance
dangerous See failure: serious
debugging The process of detecting, locating, and correcting faults and errors. 12.1
Belongs to fault removal
decision Combination of conditions or decisions using Boolean operators 13.6
(AND, OR, NOT)
decreasing See reliability: decreasing
reliability
defect See fault: structural (for hardware technology) 3.2
degradation Degradation of the service delivered by a product affected by faults 18.5
delivered See service delivered
service
dependabi lity The dependability of a system is that property of the system such 1.2
that reliance can justifiably be placed on the service it deli vers
dependability Techniques to measure or estimate the dependability, thanks to 7.1
assessment attributes: reliability, availability, maintainability, testability,
safety, security
These attributes are evaluated at three levels in the life cycle:
specification andforecast assessment during the creation stages,
and exploitation assessment during the exploitation stage
There are two groups of techniques: quantitative approach and
qualitative approach
dependability Set of actions justifying the reliance placed in a given product 6.5
assurance
design Step of the life cycle wh ich transforms specifications into a system 2.2
design for SeeDFT
testability
design guide Fault prevention techniques, relative to the design process, 10.3
advising the design process choices
design level Design is traditionally classified into three modeling steps: 2.2
behavioral, structural, and technological
design rule DRC. Example: ensure all geometric features laid out on each 11
checking mask meet size, spacing, and overlap mies 14.4
design test See test: design
Glossary 615

design: Fault removal technique: verification with the specifications, by 10.4


extraction reverse transformation, e.g. identification of the electronic
structure from the layout
design: proof Fault removal: formal verification technique with the 10.6
specifications, by reverse transformation, or demonstration of a
property of the system behavior
designer The entity or person which creates a system or a product from 2.2
requirements or specifications
destructive test See test: destructive
detection test See test: detection
Deterministic See fault grading: deterministic
Fault Grading
development The process that leads from the specification to the product. In this 2.2
process book, it groups together specification, design and production
Also called creation process
device under SeeDUT
test
DFf Design For Testability. Set of design techniques increasing the 14.1
controllability and observability ofthe product. Used for off-li ne
testing. There are four DFf main approaches: Ad Hoc techniques,
specific design for testability, Built-In Test (BIT), and Built-In Self
Test (BIST)
DFf: adhoc Guidelines used during or after the design to facilitate the test 14.2
approach
DFf: specific Group of Design For Testability methods which provide products 14.3
design naturally easy to test
diagnosis Process of identifying the fault, if one exists 6.3
See also test: diagnosis, fault: localization 10.5
12
13.4
diagnosis Definition of modeling tools used to express the pieces of 12.1
algorithm information handled during the diagnosis (such as fault tree), and 13.7
the tasks and steps to be done to diagnose the faults
Also called diagnosis process
diagnosis fault Technique which successively split the set of the fault3 of a fault 12.2
tree technique model into fault classes in order to diagnose the causes of a failure
diagnosis See diagnosis algorithm
process
diagnosis test See test: diagnosis
diagnosis Diagnosis testing using an adaptive sequence 12.2
testing:
adaptive
616 Glossary

diagnosis Diagnosis testing using a fixed sequence 12.2


testing: fixed
diagnosis tree Graphie tool allowing to determine which fault (or faults) is (are) 12.2
present, from the output values produced by a system submitted to l3.3
a test sequence
diagnosis: See diagnosis: model based approach
based on deep
knowledge
diagnosis: See diagnosis: model based approach
based on
structure and
function
diagnosis: See diagnosis: experimental approach
empirical
associations
diagnosis: Diagnosis methods based on knowledge of relationships between 12.1
experimental possible faults or errors and the related failures
approach Also called empirical associations, or surface or shallow
reasoning, or reasoning by associations
diagnosis: Diagnosis methods which do not use fault or error models. The 12-1
model-based failures are diagnosed thanks to the system model
approach Also called diagnosis based on deep knowledge, or diagnosis
based on structure and function
diagnosis: See diagnosis: experimental approach
reasoning by
associations
diagnosis: See diagnosis: experimental approach
shallow
rasoning
diagnosis: See diagnosis: experimental approach
surface
reasoning
disastrous Seefai/ure: catastrophic
discontinuous See on-Une testing
on-line testing
disruption Modification of the correct state which cause an error 15.1
Also called error in error detecting and correcting code theory
See also fai/ure: disruptive
disruption Operator combining a correct state and a disruption to express an 15.1
operator error
distance: Fundamental notion similar to the Hamming distance, used to 15.5
arithmetic study arithmetic codes
Glossary 617

distance: See Hamming distance


Hamming
distinguishing Test sequence able to decide which fault of a given fault model is 12.2
sequence present in a circuit. Used for diagnosis 13.4
disturbance See fault: external
domain: See safety: dangerous domain
dangerous
domain: See functional domain
functional
domain: safe See safety: dangerous domain
domino effect Cascade phenomenon occurring during the restoration of the 18.3
context of a multiple task system, when using a recovery point
fault tolerant technique
double-duplex Fault tolerance technique using active redundancy 18.7
dreaded event Impairments of dependability (faults, errors, failures) studied by 7.1
qualitative dependability assessment techniques. Often limited to
dangerous events
duplex Self-testing technique based on structural redundancy. The main 16.3
module is functionally duplicated, the outputs of the two duplicates
being compared by achecker
duplicate Duplicates redundant modules are Versions having the same 18.2
implementation (used in fault-tolerance techniques)
Also called replica
duration of the Objectives of the product in terms of operationallife 2.1
mission
DUT Device Under Test. Product connected to a tester 12.1
dynamic Group of techniques relevant to fault removal carried out by 6.3
analysis executing products or models
Also called test
ECC See code: ECC
EDC I ECC Error Detecting Codes I Error Correcting Codes 15.1
See code: EDCC
embedded core See IEEE P 1500
emergence Operator determining the behavioral part of a component which 8.2
effectively intervenes in the global behavior of a system
emergent The functional part of the product which is really used in the 4.2
functionality context of the mission
empirical See diagnosis: experimental approach
associations
618 Glossary

environment funetional: see user 2.1


non-funetional: entities external to the eouple Produet-User and
having an influence on the delivered service
equivalence: A mutated program is equivalent to the initial program ifthe 13.8
mutated mutation does not modi fies the behavior
program
equivalent fault See pattern equivalent fault and system equivalent fault
error An error occurs in a module (or component) when its aetual state 4.1
deviates from its desired or intended state
error Methods and techniques used to limit the error propagation to a 6.4
confinement certain subset of the system 18.6
See also fault: eontention
error See error propagation
contamination
error detecting Redundant coding to detect and/or eorreet errors 6.4
and eorrecting 15.1
codes
error detection Group of teehniques of fault tolerance 6.4
and correction
error diffusion See error propagation
error logging See log file
error masking Group of techniques of fault tolerance which does not require error 6.4
detection, as the faults effects are rnasked 18.2
See eompensation teehniques
error model See model: error
error Mechanism which transforms an error oceurring in a product into 4.1
propagation one or several other errors or failures 15.6
The propagation is condueted through one or several error
propagation paths
Also called error diffusion or eontamination
error typology See model: error
error: Asymmetrie error has a different probability to produce a '1' value 5.2
asymmetrie and a '0' value
error: burst l-order multiple error model such as all the errors affect a sequence 15.1
of l consecutive bits
error: dynamic Adynamie error provokes transient undesirable states (e.g. a 4.1
transient oscillation on a line)
Also called transient error
error: generic Error associated with a modeling tool 6.3
<> error: speeijie
error: hard See error: permanent (for electronic eomponents) 5.2
Glossary 619

error: See aetivation: initial


immediate
error: logical Logieal error is characterized by transformations of logical values: 5.2
'0' becomes 'I' and vice versa. 15.1
Non-logical errar provokes alterations of the logic levels outside
the specification domains
error: multiple Multiple errar disturbs the functioning of several elements (e.g. a 5.2
problem in the electrical supplying affects all the components)
error: order Order of multiplicity. The number of elements altered by a 5.2
multiple error 15.1
error: packet Multiple error where all modified bits are grouped within a certain 15.1
distance
error: A permanent errar affect a module for a long duration (e.g. the 4.1
permanent output of a module is stuck-at '0')
<> error: temporary
error: primitive See aetivation: initial
error: single Single error affects one element (for instance a transistor) of the 5.2
structure of the system 15.1
error: soft Temporary error induced by transient faults in electronic 5.2
components
error: specific Error associated with a particular system 6.3
<> error: generie
error: static A static error provokes a stable undesirable state (e.g. a false signal 4.1
'1' instead of the right one '0')
error: Symmetrie errar provokes with the same probability, astate 5.2
symmetric changing (for instance, '0' to '1') and conversely
error: A temporary errar has limited operation duration 4.1
temporary <> error: permanent 5.2
error: transient See error: dynamie
error: Gate level: multiple logical error such as all altered lines are stuck 5.2
unidirectional at the same value
Code theory: multiple error which modifies several bits of a word
in the same sense: 0 to I, or 1 to 0 15.4

event tree Tree connecting correct (states) or incorrect (faults, error, failures) 7.11
events with logical operators (AND, OR). Used for deductive
approach in qualitative dependability assessment
See Fault Tree Methad
evolutive See maintenanee: preventive
maintenance
620 Glossary

exception Software on-line error detectionlhandlinglpropagation mechanism 14.2


mechanism
17.2
execution path A control path which can be run when the program is executed 13.6
exploitation See operation
exploitation See dependability assessment
value
exponentiallaw The simplest reliability model used for electronic components 7.2
extraction Electronic extraction of an JC. Analysis which gives a transistor- 10.4
level description from a mask-Ievellayout description
extremely See risk
improbable
extremely rare See risk
fabrication See production
fail-fast system Afail-fast system is afail-safe system which integrates a maximal 17.2
duration to reach the safe sate in their specifications
fail-safe system System integrating techniques to reduce or avoid the occurrence of 6.4
failures considered as catastrophic or dangerous 17.1
fail-silent Fail-safe technique: when an error or a failure occurs, the system 17.2
system turns itself into a safe 'off or 'passive' mode which does not act
on the environment
Also calledfail passive
failure A failure occurs when the delivered service no longer complies 3.1
with the specifications
Taking the mission notion into account, a failure is the non-
performance or inability of the system or component to perform its
intended function for a specified time under specified
environmental conditions
failure mode Abstract viewpoint about failures, independently of the particular 3.1
system functions and failures. Often defined by three parameters
(value/timing, persistentltemporary, consistentlinconsistent)
completed by the seriousness and risk
failure rate ().) Mathematical estimator of reliability which expresses a failure 7.2
occurrence probability per hour, e.g. 10.6 faultlH (non MKSA
unit)
failure: benign Failure which has no serious consequences on the mission which 4.2
carries on normally. Failure leading to upset of the users, and/or a 17.1
partial reduction of the functionality of the product
Also called minor
failure: Seefai/ure: inconsistent 3.1
Byzantine
Glossary 621

failure: Failure leading to human loss, destruction of the product or the 4.2
catastrophic environment, including the controlled process 17.1
Also called disastrous
failure: Failure perceived similarly by all users 3.1
consistent
failure: crash Persistent omission failure 3.1
failure: See failure: serious
dangerous
failure: Seefailure: catastrophic
disastrous
failure: A failure caused by a technological fault 3.2
disruptive Also called disruption
failure: The temporal characteristics of the product behavior are not in 3.1
dynamic accordance with the specifications: e.g. response time incorrect,
too fast or too slow
Also called timing failure
failure: See risk
extremely
improbable
failure: See risk
extremely rare
failure: See risk: impossible
impossible
failure: The users do not perceive in the same way the failure occurrence. 3.1
inconsistent Also called Byzantine failure
failure: major See failure: significant
failure: minor See failure: benign
failure: A specific stopping failure when no values are delivered 3.1
omission
failure: The provided service is not in accordance with the specification, 3.1
persistent during a long period in regards with the mission duration
failure: See risk
probable
failure: rare See risk
failure: serious Failure whose negative effects on the user or the environment are 4.2
quite important, the security margins being dangerously reduced. 17.1
Leads to a small number of casualties and/or serious injuries of the
users, andlor a serious reduction of the functionality of the product
Also called dangerous
622 Glossary

failure: Set of failures having the same seriousness 17.1


seriousness
class
failure: severity Measurement of the consequences of a failure on the system, user, 4.2
or seriousness and environment 17.1
failure: Seriousness of a failure: the mission is disturbed and the efficiency 4.2
significant ofthe delivered service is reduced. Leads to injuries ofthe users, 17.1
andlor a partial reduction of the functionality of the product
Also called major
failure: static A product has a static failure when, at a given time, its actual 3.1
perception via its inputs, or its actual reaction are not in accordance
with its specifications
Also called value failure
failure: When the product's activity no longer evolves, a constant value 3.1
stopping being delivered to the user.
failure: A failure caused by a functional fault. 3.2
systemic
failure: The provided service is not in accordance with the specifications at 3.1
temporary a given time and for a short duration
failure: timing See failure: dynamic
failure: value See failure: static
false alarm Generally associated with built-in test. It is an indication of a 16.3
failure in a system where no failure exists.
fault Adjudged or hypothesized cause of a failure. Also seefault: 3.2
structural
Also called defect (hardware) or bug (software)
fault activation Raising of an error from a fault 4.1
In particular, it is the first step of path sensitizing test generation 13.2
methods which transforms a fault into a primitive error to be
propagated to the primary outputs of the system under test
fault avoidance Fault prevention + fault removal 6.1
fault collapsing Technique to reduce the ron time of fault simulation by identifying 12.3
equivalent faults and simulating only one fault for each class
fault contention Technique to prevent errors due to faults in a module to reach other 6.4
modules of the system
See also error confinement
fault correction Operation which suppress present faults 6.3
fault coverage Percentage of potential faults of a given fault model that are 12.2
detected during a test 13
fault detection Operation which highlight the presence of faults 6.3
Glossary 623

fault diagnosis Operation which identifies the faults altering a product 6.3
Also called fault localization or fault isolation 12.2
13.4
fault dictionary List of faults, their activation, and their effects (as errors or 12.3
failures), which can aid in the determination of probable causes
during failure analysis of defective devices
fault Estimation of the presence of faults (number and seriousness) 1.3
forecasting Developed in Chapter 7
fault grading Measure of how effective a set of test vectors is at detecting 12.3
potential faults. Finding of the coverage of a given test sequence
Also called test validation
fault grading: The DFG is a simulation method which compares the results of a 12.3
deterministic faulty design (fault injected) with the outputs coming from the
design. It includes various simulation algorithms, such as grouping
of equivalent faults, also known asfault collapsing, and making
use of customized hardware platforms (accelerators)
fault grading: There are three approaches to fault grading, based onfault 12.3
fault simulation simulation techniques: probabilistic (PFG), deterministic (DFG),
and statistical (SFG)
fault grading: The PFG is a simulation method which provides an estimation of 12.3
probabilistic the fault coverage rather than an exact determination. The principle
is based on an analysis ofthe node activity in terms of
controllability and observability
fault grading: The SFG reduces the cost of DFG by applying deterministic fault 12.3
statistical simulation to a sub-sets of the potential faults of the given fault
model. It provides a close approximation ofthe DFG results, while
requiring only a small fraction of the run time
fault grading: Method evaluating the fault coverage of a test sequence by 13.3
structural structural analysis of the producl. It consists in: forward
approach simulation, and backward fault analysis
fault injection Technique consisting in adding faults to a system in order to 7.9
analyze its behavior. Used for fault grading or to assess fault 12.3
tolerance mechanisms
fault logging Recording of errors occurring in a product during operation in 16.3
order to facilitate ulterior maintenance
See also instrumentation and logfile
fault masking Fault belonging to a passive redundant element that cannot be 13.3
detected from the outside of the producl. Use of compensation
mechanisms
fault model See model: fault
fault prevention Aims at reducing the creation or occurrence of faults during the 1.3
system life cycle 6.2
Developed in Chapters 9, 10, and 11
624 Glossary

fault removal Aims at detecting and eliminating existing faults, or to show the l.3
absence of faults 6.3
Developed in Chapters 12, 13, and 14
fault secure See totally self-checking system
fault simulation Technique used for dependability assessment. 7.9
Fault grading technique which provides a list of faults detected by 12.3
a given test sequence (hence, thefault coverage) by means of a
simulation program with fault injection
There are four main approaches: serial, parallel, deductive, and
concurrent
fault table Table showing the faults covered by each vector of a test sequence 12.2
Also called coverage table
fault tolerance Aims at guaranteeing the service provided by the product despite l.3
the presence or appearance of faults 6.4
Developed in Chapters 16, 17, and 18
fault tolerance: Approach that makes use of error detection and handling 18.5
active
fault tolerance: See compensation technique
compensation
fault tolerance: Approach that does not make use of error detection 18.5
passive
fault tree SeeFTM
method
fault tree: See diagnosis fault tree
diagnosis
fault: Fault which is not intentionally created 3.2
accidental <> intentional
fault: acti ve A fault becomes active when it provokes an error during the 4.1
operation of the product
<> passive
fault: bridge See bridging fault
fault: common Fault caused by the same circumstances, and thus provoking the 18.2
mode same errors/failures, of several redundant modules in a fault-
tolerant system
fault: Fault associated with a component. 3.3
component Also called module fault
fault: See fault: functional
conceptual
fault: creation Fault occurring during specification, design and/or production 3.2
phases (excluding the operation phase)
Glossary 625

fault: delay For electronic models. A delay fault, occurs when a signal 5.2
propagating through a circuit is slower than it really should be
fault: dormant See fault: passive
fault: dynamic See fault: temporary
fault: external Failure eause attributed to the user or the environment. 3.2
Also ealled perturbation or aggression or disturbance
fault: Fault due to human activities during the product life phases. The 3.2
functional origin is the designer during the creation steps and the user during
the operational step. Also called conceptual fault or human-made
fault
fault: hard Permanent fault oeeurring in memory circuits 11.3
<> fault: soft
fault: hardware See fault: physical
fault: human- See fault: functional
made
fault: initial See activation: initial
aetivation
fault: Fault created deliberately 3.2
intentional <> fault: accidental
fault: Fault coming from the interactions of several components 3.3
interaction
fault: Temporary fault due to intemal causes 3.2
intermittent
fault: internal Failure cause occurring in the product or system 3.2
fault: isolation See fault: localization
fault: Identifieation of the faults of an erroneous system 6.3
localization Also called fault isolation or fault diagnosis
fault: masked A faultf1 is masked by a fault 12 according to a given input 13.5
sequence, if the oecurrence of f1 does not provoke a failure, due to
the presence of12
fault: module See fault: component
fault: MOS on Fault model at MOS level: a MOS is always eonducting 5.2
See also fault: short-circuit
fault: MOS Fault model at MOS level: a MOS is always blocked 5.2
open/off
fault: Ability to detect the presence of a fault through a failure 4.1
observation oecurrence
fault: Fault oecurring during the operational stage of the life eycle 3.3
operational
626 Glossary

fault: passive The fault does not raise error; hence, it does not disturb the 4.1
product' s functioning. Also called dormant
<> active
fault: Fault that persists once it has occurred (e.g. design fault) 3.2
permanent Also called static fault
fault: physical Technological fault concerning hardware technology 3.2
Also called hardware fault
fault: short- Fault model at electronic level. Particular case: bridging fault 5.2
circuit which provokes wired logic (OR or AND)
fault: soft Non-permanent faults in RAM: random, non-recurring single-bit 11.3
fault 18.7
fault: static See fault: permanent
fault: structural When the internal functional faults are concerned, a fault consists 4.1
in a non-adequate structure alteration
Also called defect (hardware) or bug (software)
fault: stuck-at See stuck-at fault
011
fault: Fault of the technological means (hardware I software) used to 3.2
technological implement the product
Also called hardware or physical fault for hardware technology
fault: temporal Electronic fault due to incorrect response time of components 3.2
fault: Fault the presence of which is time bounded. The duration range is 3.2
temporary generally assumed as short 5.2
Also called dynamic fault
fault: transient Temporary fault due to external causes 3.2
fault: No input test sequence can reveal the fault at the output of the 8.3
undetectable system. This corresponds to passive redundancy 13.2
fault-secure Fundamental property of a self-testing system which guarantees 16.3
that no failure can occurs which is not immediately detected
feature Element of a modeling tool or language 2.2
6.3
feature Prevention techniques for software which consist in avoiding 11.2
restrictions features which increase the fault risk (shared variables, goto, etc.)
final test See test: final
Fire code See code: Fire
FITPLA BIT technique used to improve the testability of PLA 14.4
fixed sequence See sequence: fixed
Glossary 627

FMEA Failure Modes and Effects Analysis: normalized technique 7.10


dedicated to qualitative analysis of reliability and safety 17.1
FMECA Failure Modes and Effects and Criticality Analysis is a variant of
FMEA that associates a probability with the failure of the
components and with their effects
forecasting See dependability assessment
value
formal See test: identification
identification
formal proof See design: proof
formal proof: Formal proof approach which demonstrates properties, starting 10.6
deductive from the conclusions
approach
formal proof: Formal proof approach which demonstrates properties, starting 10.6
inductive from the hypotheses
approach
formal proof: A technique to implement inductive approach of formal proof by 10.6
symbolic handling symbols instead of values
execution
forward See propagation: forward
propagation
forward Fault tolerance techniques resuming the system execution after an 18.4
recovery error detection in a new state (not previously reached)
forward Structural (e.g. gate level) simulation executing the model in the 13.3
simulation direct way; used for instance in path sensitizing test methods
<> backward propagation
frequent See risk: frequent
FfM Fault Tree Method. Deductive approach for qualitative 7.11
dependability assessment 10.6
See also event tree
FfM: basic Leaves of a Fault tree 7.11
event
full scan See scan: Jull
function The Junetion defines what the product is intended for and justifies 2.1
its existence. An element of the mission
functional See mission
characteristics
functional Set of possible input and/or output sequences of values as defined 8.2
domain: by the product specifications
dynarnic
628 Glossary

functional Static input domain: set of input values which can be applied to the 8.2
domain: static product as defined by the product specifications
Static output domain: set of output values which can be given by
the product as defined by the specifications
functional See user
environment
functional See redundancy: functional
redundancy
functional test See test: functional
fusion Operator combining the behaviors of several modules, taking theirs 8.2
correlations into account
galloping See test: memory
Galois Field Mathematical structure which has fundamental applications to 15.3
Cyclic Error Detecting and Correcting Codes
guidelines Best practices to reach an objective, for example testability 10.3
improvement, fault prevention, etc. 14.2
Hamrning Fundamental property of redundant codes which allows the 15.2
distance detection and/or correction of errors
Number of bits that differ between two binary words
hard fault See fault: hard
HDB See code: low-levelHDB
HDL Hardware Description Language. Language which describes 2.2
circuits in textual code. The two most widely accepted HDLs are
VHDL and Verilog
hierarchy: Expression of a system as the composition of sub-systems or 2.3
composition components which are again broken down into sub-systems
hierarchy: use Defines a system, highlighting the services used (or called) by a 2.3
component and offered by others
hot standby See redundancy: hot standby
redundancy
hot swap Components (CPUlMemory, 1/0 boards, power/cooling modules) 18.5
hardware that can be changed or serviced while the system remains on-line
IDDQ testing Method for enhancing the quality of IC tests by measuring the 12.1
power supply current of a CMOS circuit during quiescent states
Detects the physical defects that creates conduction paths between
the power supply and the ground lines (e.g. stuck-on faults)
IEEE P1450 Standard Tester Interface Language (STIL). Language describing 12.1
test pattern and application protocols in standard neutral form
IEEE P1500 Embedded Core Test. Application of tests to embedded cores: test- 12.1
description language, test-control mechanisms and peripheral
access mechanisms
Glossary 629

IEEE Std. IEEE standard describing the Test Access Port and Boundary Scan 14.4
1149.1.1990 Architecture
IEEE Std. 1155 See VXI
impairment Opposed to dependability (degradation mechanism): fault - error- 1.4
failure. Deve10ped in Chapters 4 and 5
implementation See production (for software technology) 2.2
implementation Fault prevention techniques defining implementation restriction, 11.3
constraints used for software
impossible See risk: impossible
incompatibility The service delivered by the product is different than the one 4.2
(of a product) expected from the specifications
incompleteness The service delivered by the product is less than the one expected 4.2
(of a product) from the specifications
incompleteness Definition or properties of an object having potentially multiple 3.3
(specification) meanings. Fuzziness of its semantics. Absence of pieces of
infiormation
inconsistency Contradictory definitions or properties of one object or of several 3.3
objects
increasing See reliability: increasing
reliability
inertia (of the Mean time between the occurrence of a failure and the beginning 4.2
environment) of its external consequences on the mission
inputsequence See sequence: input
inspection A formal review technique based on ni ne steps 9.4
instrumentation Adding of mechanisms to detect errors and record data during the 14.2
operation of the product. Used to make test detection and diagnosis 16.3
easier
integration test See test: integration
integrity Non occurrence of improper alterations of information 7.7
intrinsic safety See safety: intrinsic
irredundant An element of a system is irredundant if its removal causes the 8.3
element system to be functionally different
JTAG The Joint Test Action Group. This group created the foundation for 14.4
the IEEE 1149.1
language See modeling tool (generally considered as defined formally) 2.3
latency Latency is the mean time between the occurrence of a fault and its 4.1
initial activation as an error at the level of a given module
Byextension: meantime between the occurrence of a fault/error in
a given module and the raising of an error in another given module
630 Glossary

level: See behaviorallevel


behavioral
level: logical See logicallevel
level: physical See physicallevel
level: structural See structurallevel
level: symbolic See symbolic level
level: See technologicallevel
technological
LFSR Linear Feedback Shift Register. Synchronous sequential circuit 14.5
using Flip-Flops and XOR gates, which generates a pseudo-
random pattern of Os and I s. Used for signature analysis in BIST
techniques
life cycle Succession of the stages of a product' s life: specification, design, 2.2
production, operation
likelihood test Verification of a property based on functional redundancy 16.3
link: logical Elements interconnecting modules in a system 2.3
localization test See test: localization
log file A file storing activities maintained to facilitate auditing and 16.3
recovery (in particular fault detection) 18.7
Also called error logging
logicallevel One of the steps of the development process of a product 2.2
logicallinks Define the relationships between the components of a system 2.3
logical test See test: logical
LSSD Level Sensitive Scan Design. Scan design technique proposed by 14.4
IBM in the 60's
maintainability Attribute of dependability with regard to the easiness in 7.4
performing the maintenance actions
In a quantified way, it is the measure of the interruption duration of
the service if a failure appears. A useful estimator associated with
this measure is the MTTR (Mean Time To Repair).
The term serviceability is also used by numerous electronic or
computer manufacturers
maintenance Actions processed on the product structure during its usefullife. 2.2
Contains preventive, corrective maintenance, and adaptive 7.4
maintenance
maintenance See test: maintenance
testing
maintenance: Actions applied to a product after it failed in order to restore its 2.2
corrective service 7.4
Also called curative 12.2
Glossary 631

maintenance: Actions applied to a product in order to improve or modify its 2.2


evolutive functionality 7.4
12.2
maintenance: in Facilities integrated in the product site in order to facilitate the 14.6
situ maintenance operation in situ
maintenance: Actions applied to a product prior to failures in order to detect the 2.2
preventive presence of faults and to correct them: 7.4
• systematic (or scheduled) preventive maintenance (e.g. every 12.2
1000 hours of service)
• conditional preventive maintenance (e.g. the maintenance is
decided if the temperature is excessive)
maintenance: Test facilities to detect and diagnose a product from a remote 14.6
remote specialized center
facilities
maintenance: Set of actions aimed at maintaining or restoring a product in a 7.4
troubleshooting specified state
and repair
major Seefailure: major
Manchester See code: low-level Manchester
manufacturing See production for hardware products
marching See test: memory
Markov model Non deterministic state graph model used for quantitative analysis 7.9
of dependability
MC/DC Modified ConditioniDecision Coverage.
See test: MCIDC
MDT Mean Down Time: mean time during which the product does not 7.5
deli ver a service
means for To provide a product having the required dependability level, that 1.4
dependability is, the ability to deli ver a service and to reach confidence in this
ability
method A detailed approach to the achieving of prescribed goals 10.3
minor See failure: minor
mission The mission specifies the product' s objective in terms of the 2.1
function to perform and its duration. Also calledfunctional
characteristics of a product
model One instantiation of a modeling tool to express a specific system 2.3
model based See diagnosis: model based approach
approach
model: design Classical modeling level used in hardware design: behavioral, 2.2
level structural, technological
632 Glossary

model: error An error model defines a set of faults characterized as errors by a 5.1
property on desired or intended behavior 15.1
Also called error typology
model: fault Afault model defines a set of faults characterized by 5.1
physicaVstructural properties on the desired model structure
modeling tool Generic means (language or notation) to express the system. The 2.3
expression of a specific system is called a model
modified See test: MCIDC
conditionl
decision
module See component
module: A module containing the basic functional elements 8.3
functional <> module: redundant
module: A module containing redundant elements 8.3
redundant <> module: functional
Monte Carlo Quantitative dependability evaluation method based on simulation 7.9
simulation and fault injection
MTBF Mean Time Between Failures. Maintenance indicator. The time 7.2
between two failures on a piece of equipment (calculated)
MTIF Mean Time To Failure 7.2
MTIFF Mean Time To First Failure. It is the same as MTIF 7.2
MTIR Mean Time To Repair 7.4
Mean time between the instant of failure occurrence and the return
of the product to full functional operation
MUT Mean Up Time: mean time during which the product deli vers its 7.5
service
mutant A system, such as a program, modified by a mutation 13.8
mutation Modification of the structure of a system (generally by a fault) 13.8
See test: mutation
need Expectations ofthe product's users, that is to say knowing why 2.2
he/she has to use a product.
netlist Basic structural model of electronic circuits, at gate or MOS level 2.2
NMR N-Modular Redundancy. Fault tolerant technique derived from the 18.5
TMR technique, using active redundancy
non-ambiguity An element which has only one interpretation 9.3
non-destructi ve See test: destructive
test
non-functional Part of the product specifications dealing with constraints on the 2.2
characteristics non-functional environment and with dependability requirements
Glossary 633

non-functional See environment


environment
non-regression See test: non-regression
test
non-repairable Product whose faults cannot be removed 6.3
product
notation See modeling tool (generally considered as having informal 2.3
semantics)
NRZ See code: low-level NRZ
N-self checking Fault tolerant technique derived from the N-versions technique 18.7
N-versions Fault tolerance technique, based on several duplicates of a same 18.2
module, whose outputs are treated by a voter to produce the final
result. TMR is a 3-Versions
observability Ease of determining, from the outputs of a product, the current 6.3
state of its behavior by exercising its inputs 14.1
Complementary of controllability for testability
off-chip test Test resources are external to the device under test 12.1
Apply to off-line classical testing methods 14.5
off-line testing Group of techniques to test a product (or a module), suspending its 6.3
operationallife 12.1
on-chip test Test resources are integrated to the device under test 12.1
Apply to BIST techniques 14.5
on-line testing Group of techniques to test a product (or a module) in its operation 6.3
OLT context 12.1
Discontinuous OLT: test functions are applied at predefined
16.1
instants in the life time of the product
Continuous OLT (or self-testing): faults are detected as soon as 16.3
they produce errors/failures
open Seefault: MOS open/off
operation Step of the life cycle which integrates the product in a given 2.2
environment in order to deli ver a service
Also called exploitation, usefullife, or utilization
operational See duration of the mission
lifetime
optimal test See test: optimal sequence
sequence
output See sequence: output
sequence
parametric test See test: parametric
partial scan See scan: partial
634 Glossary

passi ve fault See fault tolerance: passive


tolerance
path sensitizing See test: path sensitizing
path test See test: path
path: control See control path
pattern See test sequence 12.2
pattern Group of faults of a fault model whose effects on the outputs of the 12.2
equivalent product cannot be distinguished by the input sequence application
faults
perturbation See fault: external
phase Defined segment of work. Also called stage or step. A set of 2.2
phases constitutes a process
physical Technological level/model implementing the features of the 2.2
level/model symbolic model
ping-pong See test: memory
post-condition Functional redundancy used for on-line testing of software. It 10.5
analyzes the correctness of an operation at the end of the treatment 16.3
pre-condition Functional redundancy used for on-line testing of software. It 10.5
verifies that the use context of an operation is correct 16.3
prevention Seefault prevention
preventive See maintenance: preventive
maintenance
prime element An element (e.g. agate) is said to be prime if none of its inputs can 8.3
be removed without causing a functional change of the system
behavior
Probabilistic See fault grading: probabilistic
Fault Grading
probable See risk
process A set of phases. Example: development process 2.2
process control Techniques which apply test to the manufacturing equipment. 11.2
Extended to the evaluation of any productlsystem development
process
process: See development process
creation
process: See development process
development
product Physical entity destined to satisfy needs of one or several users 2.1
product code See code: bidimensional
product: See acceptable product
acceptable
Glossary 635

product: See referent product


referent
product: See referent product
standard
production Stage of the life cycle which transforrns a system into the final 2.2
product by hardware andlor software technological means
Also called manufacturing or implementation
production See test: production
testing
program See test: mutation
mutation
propagation Trace of errors in the structure of the system during an error 4.1
path propagation
propagation: Backward simulation of the functioning of a system from 13.2
backward predefined output or internal values or symbols, to find the input
vectors which provoke them. Used in path sensitizing structural
test methods
Also called backward tracing
propagation: Simulation of the functioning of a system with values or symbols, 13.2
forward to find constraints on the propagation of a predefined error. Used
in path sensitizing structural test methods
property: Expression of an intended property on the behavior of a system, 5.1
behavioral whose violation defines an error
property: Property associated with a modeling tool and not with a particular 5.1
generic modeled system
property: Expression of an intended property on the structure of a system, 5.1
physicaV whose violation defines a fault
structural
prototyping In this book, technique which derives a basic tool from a model, to 9.3
detect faults in the understanding of the model
PSA Parallel Signal Analyzer. Circuit based on LFSR structure used for 14.5
compaction testing in BIST techniques
qualitative Deduction of failures from faults or errors (dreaded events) 7.1
assessment:
deductive
approach
qualitative Deduction of events (faults or errors) from potential failures 7.1
assessment:
inductive
approach
quality (ISO Totality of characteristics of an entity that bear on its ability to 1.1
8402) satisfy stated and implied needs
Entity: item which can be individually described and considered
636 Glossary

quality Procedures, techniques and tools applied by professionals to ensure 11.2


assurance that a product meets or exceeds prescribed standards during a
product's development cycle
quality QA tests for electronic components: Iife, mechanical, thermal, lead 11.2
assurance test fatigue, solderability, etc.
quality control Analysis of sampies of the production in order to deterrnine the 6.2
quality of the produced components 11.2
RAID Redundant Array of Independent Disks. Fault tolerance technique 18.7
for mass storage units using structural redundancy
rare See risk
reasonably See risk: reasonably probable
probable
reasoning by See diagnosis: experimental approach
associations
reconfiguration The process for a product to automatically use alternative 6.4
resources, so as to not interrupt or to resurne its operation 18.5
reconvergent Structural property of a gate circuit allowing one signal to 13.3
fan-out propagate through several paths before converging towards a same
component
recovery Technique used in fault tolerance approaches consisting in 18.3
reaching a correct state after an error detection 18.4
See also backward recovery,forward recovery
recovery block One of the forward recovery techniques of fault -tolerance 18.4
recovery cache Backward recovery requires the implementation of execution 18.3
context saving and restoring mechanisms. One of the most popular
technique is named recovery cache
recovery point State of the system in which the system processing is resumed 18.3
during a backward recovery technique
Also called retry point and rollback point
recovery: See backward recovery andforward recovery
backwardl
forward
redundancy Presence of elements of a system which are not necessary to satisfy 8.1
the normal input/output relationships (in absence of fault)
redundancy Functional redundancy rate = (size (Uni verse) - size (Domain»/ 8.2
rate size (Uni verse)
For a EDC code, see code: redundancy rate
redundancy: The structural redundancy of a system is active if the design is not 8.3
active optimal without any possibility to directIy remove any element
<> redundancy: passive
Glossary 637

redundancy: Separable redundant modules which are in a passive state (off- 8.3,
cold standby line), waiting to be activated 18.5
redundancy: A dynamic functional domain of a product is redundant if it is 8.2
dynarnic strictly included in the dynarnic functional uni verse of this product
functional
domain
redundancy: Certain theoretical input values are not applicable to the product by 8.2
functional the functional environment as defined by the specifications. 16.3
Extended to the outputs and inputs/outputs values
redundancy: Separable redundant modules which are in an active state (on-Une) 8.3
hot standby in parallel with the functional module 18.5
redundancy: See redundancy: cold standby
off-line
separable
redundancy: See redundancy: hot standby
on-line
separable
redundancy: The structural redundancy of a system is passive if some elements 8.3
passive can be removed without changing the produced behavior
<> redundancy: active
redundancy: Presence of elements of sentences in a text whose meaning can be 8.1
semantic deduced from others sentences of the text
redundancy: The structural redundancy of a system is separable if the redundant 8.3
separable elements and the non-redundant elements are located in different
modules. Thus, the system possesses afunctional module and
several redundant modules (versions, replicates, duplicas)
redundancy: A system has a structural redundancy if its structure possesses 8.3
structural some elements not necessary to produce a behavior conform to the 16.3
specifications, assurning that all the structure elements provide a
correct functioning 18.1

redundancy: Presence of lexicographical or syntactical elements which are not 8.1


syntactic necessary to understand the sentence's meaning
redundant A functional static/dynarnic domain of a product is redundant if it 8.2
functional is strictly included in the static/dynarnic functional uni verse of this
dornain product
Reed-Muller Gate structure based on Galois's field to design circuits having 14.3
structure short test sequences
reference list Recorded test sequence 12.2
referent Product considered as faultless, used in a test, in parallel with the 12.2
product tested product. Its outputs are compared with the outputs produced
by the tested product
Also called standard product
638 Glossary

relationship: See composition relationship


composition
relationship: See service relationship
service
reliability Attribute of dependability with regard to the continuity of the 7.2
service
The aptitude of a product to accomplish a required function in
given conditions, and for a given interval of time
In a quantified way, reliability is a function of time which
expresses the conditional probability that the system has survived
in a specified environment till the time t, given that it was
operational at time 0
reliability Techniques used during the manufacturing process to guaranty 11.2
assurance test reliability level of the produced components
reliability block Model used for quantitative analysis of reliability 7.9
diagram
reliability Tests applied to sampies of the produced circuits in order to 7.2
evaluation measure or estimate the reliability parameters of this population 11.2
reliability Mathematical function of time expressing the evolution of the 7.2
model reliability of a population of components
reliability tests Experiments applied to sampies of the manufactured population: 7.2
curtailed, censured, progressive, progressive curtailed, with
progressive constraints
repair Actions of the fault removal which restore the functioning of a 6.3
product. Applied to reparable products
repair rate (~) Mathematical estimator of maintenance which expresses arepair 7.4
probability per hour
repairable Product to which fault removal actions can lead to the restoration 2.2
product of its functionality 6.3
<> non-repairable
replica See duplicate
requirements Expression of the needs which justify the creation and the use of a 2.2
product 9.2
retry mode Fault tolerance technique consisting in executing again an 18.3
erroneous component
See also backward recovery
retry point See recovery point

reuse Use of a component previously developed for another product 4.2


review Technique used to remove faults by human analysis 9.4
Glossary 639

risk Occurrence probability of a failure, assessed by measurements. For 17.1


example, an event is said:
• probable if its occurrence probability is > 10.5
• rare if its occurrence probability E (10- 7, 10-5)
• extremely rare if its occurrence probability E (10-9 , 10-7)
• extremely improbable if its occurrence probability is < 10-9
risk Part of the safety space (seriousness of failure, occurrence 17.1
acceptability probability) in which a system is said to be acceptable in terms of
safety
risk: acceptable Maximum probability accepted for the occurrence of a failure. 17.1
rate Often defined for all the failures of a seriousness dass
Also called tolerable probability
risk: frequent A subdivision of the risk class probable. Probability > 10-3 17.1
risk: impossible Event having a very small probability of occurrence « 10-9) 17.1
Also called extremely improbable
risk: reasonably A subdivision of the risk class probable. Probability E (10- 5, 10-3) 17.1
probable
RM structure See Reed-Muller structure
robustness Property of a system which defines its capability to provide a 4.2
function which is acceptable by the user according to given
perturbations. Frequently defined as the characteristic of a system
which guarantees that its functionality is maintained even if
specified operational and utilization requirements are violated
rollback point See recovery point
RSA See code: RSA
safety Attribute of dependability with regard to the non-occurrence of 7.6
failures of given criticality level (generally catastrophic) 17.1
Safety is measured as the probability that the product will not have
failures belonging to unacceptable seriousness classes, between the
initial time and a given time t
safety class Class of safety defined in the space: seriousness of failure x 17.1
acceptable risk rate
safety: Notion associated with fail-safe systems. The behavioral uni verse 17.2
dangerous is split into:
domain • the dangerous domain grouping catastrophic andlor dangerous
failures whose occurrence is unacceptable,
• the safe domain grouping the normal functioning and the failures
whose occurrence is acceptable
safety: intrinsic Group of techniques constraining the development process with 17.2
technological solutions which are known to be safe. These
solutions essentially exploit physical properties
640 Glossary

safety: safe See safety: dangerous domain


domain
safety: Structural redundancy is used in order to reduce the occurrence 17.2
structural probability of failures belonging to dangerous safety c1asses
redundancy
scan design BIT Technique (DFf) which led to LSSD, and the boundary scan 14.4
IEEE 1149.1 standard
scan domain Part of a circuit which implement aseparate scan design 14.4
scan: fuH Scan technique applied to the whole product 14.4
scan: partial Technique which implements scan design in a part of the product 14.4
SCC See self-checking checker
scenario Input/Output sequences which simulates the interactions of a 9.3
system with its environment
scheduled See maintenance: preventive
maintenance
schmoo plots Measure of the influence of parameters (supply voltage, current, 12.1
frequency) on test results. Used to help the IC designer to
characterize the operational regions of a device
scrubbing Technique used to correct soft errors in dynarnic RAMs 18.7
security Attribute of dependability with regard to the prevention of 7.7
unauthorized access andlor handling information
Covers two parameters: confidentiality and integrity
self-checking Achecker is said to be self-checking with respect to a defined fault 16.3
checker model F if it is code-disjoint and self-testing
• Code-disjoint: a module transforrning inputs belonging to an
EDC code to an output EDC code is code-disjoint if any
codeword at the inputs gives an output codeword and conversely
if any non-codeword inputs gives a non-codeword outputs
• Self-testing: expresses that every fault of F is detectable on the
tested output by at least one functional input vector
self-purging Fault tolerance technique derived from the N-Versions with 18.5
adaptive voter
self-testing Continuous on-line testing to detect faults as soon as they produce 6.4
errors
Also for Self-Checking Checker: property of a checker such that
16.3
each fault is detected at its output by application of the normal
input codewords
See self-checking checker
separable See code: separable
sequence Number of vectors of the test sequence 12.3
length
Glossary 641

sequence: Test sequence whose input and output values are dynamically 12.2
adaptive defined, taking the previous results of the test application into
account
<> sequence: fu:ed
sequence: fixed Test sequence whose input and output values are defined prior to 12.2
the test processing
<> sequence: adaptive
sequence: input List of the inputs of a test sequence 12.2
sequence: List of the outputs of a test sequence 12.2
output
serious See failure: serious
seriousness Seefailure: severity or seriousness
service See degradation
degradation
service The delivered service is the product's real behavior when placed in 2.1
delivered its applicative environment
service Relationships between sub-systems expressing that one uses 2.2
relationships services provided by others
serviceability Measure of the ease with which a system functioning is restored to 7.4
a specified state after the system is repaired. Used to express the
maintainability
See maintainability
severity Seefailure: severity or seriousness
shallow See diagnosis: experimental approach
reasoning
short-circuit Seefault: short-circuit
signature See test: signature analysis
signature Technique used in compaction test technique. The signature 12.2
analysis synthesizes the output values as the result of a likelihood property
LFSR signature analysis: used for BIST off-line techniques 14.5
signature See BIST: signature
analysis
function
significant Seefailure: signijicant
simplicity The concepts manipulated by a text (or a model) describing a 9.3
system are simple. In particular, the number of these concepts is
limited and they are loosely coupled
simulation: See fault simulation
fault
simulation: See Monte Carlo simulation
Monte Carlo
642 Glossary

snapshot Image of the system execution context at a given time. It is used 18.3
for example for backward recovery technique implementation
soft fault See fault: soft
software See instrumentation
instrumentation
software: Fault removal techniques based on the program control flow: 13.2
structural • statement test
testing
• branch and path test
• condition and decision test (see C/De and MC/DC)
spare module Redundant off-line module 8.3
specification Stage of the life cyde which defines the characteristics of the 2.2
product to be created. The result of this operation is a document 9.3
called specifications or contract (see contract)
specification See dependability assessment
assessment
value
stable See reliability: stable
reliability
stage See phase
standard See referent product
product
standby: hot I See redundancy: hot standby I cold standby
cold
state Set of the values taken by the attributes of a module 2.3
Internal property of a module 4.1
statement test See test: statement
statie analysis Groups of techniques of the fault removal which are made without 6.3
exeeution of the analyzed models or products
Statistical Fault See fault grading: statistical
Grading
step See phase
STIL See IEEE PI450
stoehastie Petri Non-deterrninistie parallel state graph model whose ares are 7.9
net labeled by probabilistie values; used for dependability assessment
strobing Term used for test: number of times a test equipment looks at the 12.2
output data of a DUT during aperiod
structural Fault grading methods which study the faultless system and 12.3
analysis deduce all the faults (of a model) that ean produce failures
structural A design step/model of the system expressing it as a struetured 2.2
level/model system (composed of sub-systems or components or modules)
Glossary 643

structural See redundancy: structural


redundancy
structural See test: structural
testing
structure Oefines a system as linked components 2.3
structured- Oefines a system by its structure and the behavior of its 2.3
functional components
model
stuck-at fault Fault/error model at gate level: fault that keep a circuit node (input 5.2
or output) at a logicallevel one or zero
stuck-OFF Fault/error models at MOS level 5.2
stuck-ON Stuck-On: the transistor is always conducting
Stuck-Open: the transistor is blocked in the OFF state
stuffing See bit stuffing
style guide See guidelines
sub-system See component
surface See diagnosis: experimental approach
reasoning
symbolic The system is executed with symbols instead of values. The results 10.6
execution are symbolic expressions
symbolic TechnologicalleveUmodel taking an abstract view of the execution 2.2
leveUmodel means
syndrome Vector resulting from a mathematical treatment (check relations) 15.3
of a codeword which allows to detect and/or correct an error from
of a given codeword. This vector is equal to zero is no errors
occurred
system Set of linked components that act together as a whole to achieve a 2.3
given mission, that is a function during a certain period oftime
system Group of faults of a fault model whose effects on the outputs of the 12.2
equivalent product cannot be distinguished, whatever input sequence is
faults applied
Also called absolute equivalent faults
systematic See maintenance: preventive
maintenance
TAP The Boundary Scan Test Access Port. It is formed by the TDI, 14.4
TOO, TCK, TMS and the optional TRST pin
T AP controller A sixteen state FSM that controls the Boundary Scan logic on the 14.4
JC
technological A design step/model conceming the implementation of design 2.2
leveUmodel models using hardware/software technologies. Composed of
symbolic leveUmodel and physicalleveUmodel
644 Glossary

termination One of the forward recovery techniques of fault to1erance, which 18.4
mode consists in completing the task started by a module P by using a
redundant module Q, after the detection of an error in P
test Dynamic techniques relevant to fault removal. It is an experiment 6.3
(input sequences) applied to an executable product or model by a
tester which compares the given results with expected values
The process of exercising or evaluating a system or system
component by manual or automated means to verify that it satisfies
specified requirements, or to identify differences between expected
and actual results (IEEE Std 729.1983)
Also called dynamic analysis
test application Test processing performed by the tester which applies the test 12.3
sequence to the product
test equipment See tester
test evaluation See fault grading
test generation See test pattern generation
test pattern See test sequence
test pattern Technique to determine the test sequence for a given product 12.3
generation
(TPG)
test sequence List of test vectors used by a tester to detect andlor diagnose faults 10.4
in a product. This term is often restricted to the sequence of input 12.2
vectors
Also called test pattern
test sequence See BIST: signature
generator
test sequence: Two main parameters are used to evaluate the quality of a test 12.2
quality sequence: the length (number of test vectors) and thefault
coverage (percentage of the faults of a fault model which are
detected)
test vector Element of a test sequence: couple (input vector, output vector) 12.2
test withl Test method based on the detection of faults belonging to a pre- 12.2
without fault defined fault modell without precise hypotheses about the faults
model
test: Test experiment with stress constraints: elevated power supply 7.2
accelerated andlor temperature 11.2
test: acceptance Another name for final test for final checking of the product 14.2
See test: final, test: compliance, test: conformity
Also used to name on-line checking used in fault tolerance 18.3
mechanism to detect errors
test: A specific ATPG handling each fault of a fault model 12.3
algorithmic
approach
Glossary 645

test: alpha & Test performed by selected groups of users 12.1


beta
test: branch Software structural testing technique which takes the control flow 13.6
branches as elements to define the test sequence coverage
test: burn-in Production test carried out with environmental constraints such as 12.1
the temperature or the electric supply. It is a non-destructive
accelerated test used to detect and eliminate any defects which
might appear in a product during its early life
test: eIDe See test: conditionldecision
test: eensured Test used to evaluate the reliability of a population of components 7.2
which stops when a given number offaults is reaehed
test: Test teehnique which reduces the data coming from the DUT by a 12.2
eompaetion mathematical treatment. Used in signature testing 14.5
test: Test to ensure the adequaey of the produet with its specifieations 12.1
eomplianee
test: conditionl Structural testing methods used for software program, which 13.6
decision require that 1) eaeh decision must take the values True and False at
least onee, 2) eaeh eondition must take the value True and False at
least onee, 3) eaeh input and output point of the eomponents
(subprograms, ete.) must be exeeuted at least onee. The coverage
rate is noted eiDe (ConditioniDecision Coverage)
test: eonformity Aeeeptanee test performed by the dient or an external organization 12.1
test: eontinuity Teehniques whieh verifies that the connections between 12.1
eomponents are without defects: printed circuit boards, cables,
eonneetors, ete.
test: curtailed Test used to evaluate the reliability of a population of components 7.2
whose duration is fixed apriori
test: design Fault removal teehniques based on funetional test, used during 6.3
design stage 10.5
test: destrueti ve A test is destrueti ve if the tested product can be destroyed during 11.2
I non- the test proeess. Destruetive test are employed for quality eontrol 12.1
destrueti ve and reliability evaluation
test: deteetion Test teehniques answering the question: 12.1
does the product function correctly? 12.2
test: diagnosis Test teehniques answering the question: whichfaults affect the 12.1
product? 12.2
There are two main categories of diagnosis teehniques: fixed
diagnosis which uses a fixed test sequence, or adaptive diagnosis
for which the next test veetor depends on the responses given by
the product to the preceding test vectors
test: exhausti ve Test teehnique using all input vectors to test a eombinational 12.3
cireuit
646 Glossary

test: final Test applied to a complete system or product before it is delivered 14.2
to the dient
Also called acceptance test
See also test: unit and test: integration
test: functional Functional verification methods based on a functional model of the 10.5
system to test (e.g. Finite State Machines) 12.3
<> test: structural
test: functional Type of diagnosis techniques which aims at locating faults at 10.5
diagnosis functional level, without precise fault model
test: GO- See test: production
NOGO
test: Formal methods for test pattern generation of sequential systems 12.3
identification without fault model
test: in situ Test is applied to the product in its normal environment 14.6
test: integration Test applied to sub-systems integrating elementary modules or 14.2
others sub-systems
test: likelihood See likelihood test
test: See test: diagnosis
localization
test: logical Test applied to a system modeled at logicallevel 12.1
test: Test applied during the maintenance operations 6.3
maintenance 12.1
12.2
test: MC/DC Structural software testing which adds the foIIowing requirement 13.6
to the ConditionlDecision testing method (see test: condition/
decision): each condition in adecision must be shown to
independently affect the result of the decision
test: memory Specific testing techniques taking into account technological faults 12.3
ofRAM circuits: checkerboard, marching, walking, galloping or
ping-pong
test: modified See test: MC/De
conditionl
decision
test: mutation Test validation technique which consists in injecting modifications 13.8
in a system in order to check whether a given test sequence detects
the faults or not
test: mutation: The weak mutation testing requires that the test sequence activates 13.8
weak the fault introduced by the mutation, but it does not require that
this sequence propagates the initial error to the outputs (as failure)
test: non- Test performed after the repair of a faulty product in order to 12.2
regression assure that no fault has been introduced by the repair operation or
other chan ging
Glossary 647

test: off-chip See off-chip test


test: off-line / See off-line testing and on-line testing
on-line
test: on-chip See on-chip test
test: on-line See on-line testing
test: optimal Test sequence having a minimal Iength (in terms of number of test 12.3
sequence vectors)
test: parametric Test performed on the devices to check AC and DC parameters 12.1
test: path Software structural test technique which takes the program control 13.6
flow paths as elements to define the test coverage
test: path Test pattern generation method for structural testing 13.2
sensitizing
test: path See test: path sensitizing
tracing
test: Technique to test each individual copy ofthe manufactured 6.3
productionl product to insure it was produced without defects 11.2
manufacturing 12.1
12.2
test: Test used to evaluate the reliability of a population of components, 7.2
progressive whose decision to stop depends on the resuIts already obtained
test: Test used to evaluate the reliability of a population of components 7.2
progressive which is identical to the progressive test with a maximum duration
curtailed constraint
test: random Logical testing technique based on random generation of the input 12.3
test vectors
test: reference Conventional algorithmic test procedure based on the comparison 12.2
list between the results produced by a tested product and a predefined
list of known values stored in the tester
test: screening Test techniques used to remove weak products according to 11.2
reliability
test: signature Test method using a property on the output values of the tested 12.2
analysis product in order to evaluate its correctness
test: standard/ Test method using a faultless referent product. The outputs given 12.2
referent by the tested product and the referent product are compared
test: statement Software structural testing technique which takes the program 13.6
statements as structural elements to define the test coverage
test: step stress Test used to evaluate the reliability of a population of components 7.2
which provokes a progressi ve acceleration of the degradation
mechanisms, in general by increasing the temperature (permitting
an accelerated test)
648 Glossary

test: structural Structural test methods are based on a structural model (e.g. gate 12.3
structure) and generally use fault model (e.g. 'stuck-at')
See also software: structural testing
<> test: functional
test: toggle See toggle test
test: unit Test applied to elementary modules 14.2
test: validation Validation of a test sequence, frequently by a fault grading 12.3
See also test: evaluation andfault grading
testability Attribute of dependability which measures the easiness with wh ich 7.3
a product can be tested, Le. the easiness to obtain test sequences, 14.1
and the easiness to apply these sequences
Closely linked to the test sequence properties:
• the length, Le. the number of input vectors
• the coverage or test efficiency, Le. the ratio of the tested fault
and the total number of faults according to a given fault model
Testability can be evaluated on the product, by controllability and
observability parameters
Testability measurement: methods that analyze a design and
estimate the difficulty of test pattern generation as a measure of
testability
testability: There are two are groups: 14
techniques • Ad hoc techniques: design rules listing the structures that cause
testing problems and techniques for avoiding these problems
• Design For testability (DF7): design techniques to increase
testability
tester Any means (human or physical) involved in fault detection and 12.1
diagnosis of a product by a test. Also known as test equipment
TMR Tripie Modular Redundancy. Basic N-version fault tolerant 18.2
technique based on passive redundancy. Three copies (duplicate
modules) of the main module are used and a voter elaborates the
final output. A 3-version also called trip lex
toggle test Test sequence which assures that each line of the tested component 12.3
is switched to '0' and '1'
tolerable See risk: acceptable rate
probability
tolerance See fault tolerance
Glossary 649

totally self- Property of continuous on-line testing systems. 16.3


checking A system is said to be totally self-checking, if it is code-preserving,
system self-testing and fault-secure with regard to a given fault model F
• Code-preserving expresses that the fault free module preserves
the output code on the observed output variables
• Self-testing expresses that every fault of F is detectable on the
tested output by at least one functional input vector
• Fault-secure guarantees that no incorrect functional output can
occur which is not immediately detectable
traceability Existing relationships between the elements used in a step and the 9.3
elements produced by this step
triplex See TMR
trouble See maintenance: troubleshooting and repair
shooting
unit test See test: unit
universe: Set of aB theoretically possible sequences of input andlor output 8.2
dynamic values of a product
universe: static Set of all theoretically possible 110 values of a product 8.2
usefullife See operation
user Entities (physical or human) interacting functionally with the 2.1
product
Also calledfunctional environment
utilization See operation
validation Assessment of the method used in a creation phase 6.3
9.1
10.1
VANBus Vehicle Area Network: example of industrial Bus using on-line 18.7
detection
vector: input Value received or acquired by a product 8.2
vector: output Value produced by a product 8.2
verification Evaluation of the result of a creation phase, in order to check that it 6.3
is in accordance with the requirements 9.1
10.1
version Versions are duplicate modules that have the same specification 18.2
than the original functional module. They are called duplicate if
they have the same implementation
VHDL SeeHDL
vote: adaptive Particular N-Versions technique whose erroneous versions are 18.5
eliminated from the decision
voter Module of a N-Version fault-tolerant structure which elaborates 18.2
the final outputs from the outputs provided by the versions
650 Glossary

VXI VME eXtensions for Instrumentation. IEEE Std 1155.1992 12.1


Industry standard for test and measurement market
walking See test: memory
walkthrough An informal review technique based on a presentation by the 9.4
author and discussions between the author and the reviewer
watchdog Mechanism to detect errors associated with deadlines which are 16.3
not reached at run-time
Weibull The Weibull reliability model is an interesting reliability model, 7.2
reliability because of its flexibility in describing a number of failure patterns
model
yield Percentage of good dice (the electrical portion of the wafer that 12.2
contains the electronic functions) compared to the total number of
dice on the wafer. It is a statistical parameter. Yield is refined into
four major yield groups: wafer processing yield, wafer probe yield,
assembly yield, final test yield
References

1. E.A. Amerasekera and D.S. Campbell, Failure Mechanisms in Semiconductor Devices,


John Wiley & Sons, 1987.
2. T. Anderson and P.A. Lee, Fault Tolerance. Principles and Practice, Prentice Hall
International, 1981.
3. John Andrews, Applied Fault Tree Analysis for Reliability and Risk Assessment, Wiley
Series in Quality and Reliability Engineering, Patrick D.T. O'Connor Editor, John Wiley
& Sons, 2000.
4. C. Ausnit.Hood, K.A. Johnson, R.G. Pettit, and S.B. Opdahl, Ada 95 Quality and Style,
Lecture Notes in Computer Science n° 1344, Springer-Verlag, 1997.
5. The Evolution of Fault-Tolerant Computing, A. Avizienis, H. Kopetz, and J.e. Laprie
Editors, Springer-Verlag, 1987.
6. Michel Banatre and Peter A. Lee, Hardware and Software Architectures for Fault
Tolerance, 311 Pages, Springer-Verlag, 1994.
7. P.H. Bardell, W.H. McAnney, and J. Savir, Built.ln Test for VLSI, Pseudo-Random
Techniques, John Willey & Sons, New York, 1987.
8. J. Bames, High Integrity Ada. The Spark Approach, Addison-Wesley, 1997.
9. Embedded Systems Applications, e. Baron, J.C. Geffroy, and G. Motet Editors, Kluwer
Academic Publishers, 1997.
10. 1. Bashir, Testing Object.Oriented Software, Springer-Verlag, 1999.
11. L. Bening and H. Foster, Principles of Verifiable RTL Design, second edition, Kluwer
Academic Publishers, 2001.
12. B. Beizer, Software Testing Techniques, Van Nostrand Reinhold, 1990.
13. B. Beizer, Black.Box Testing. Techniques for Functional Testing of Software and
Systems, John Wiley & Sons, 1995.
14. J. Bergeron, Writing Testbenches. Functional Verification of HDL Models, Kluwer
Academic Publishers, 2000.
15. R. Billington and R.N. Allen, Reliability Evaluation of Engineering Systems, Plenum
Press, 1982.
16. A. Birolini, Reliability Engineering: Theory and Practice, Springer-Verlag, 1999.
17. G. Birtwistle, and P.A. Surahmanyam, VLSI Specification, Verification and Synthesis,
Kluwer Academic Publishers, 1988.

651
652 References

18. M. L. Bushnell, Vishwani and D. Agrawal, Essentials of Electronic Testing for Digital,
Memory, and Mixed-Signal VLSI Circuits, Kluwer Academic Publishers, 2000.
19. S. Chakravarty and P. Thadikaran, Introduction to IDDQ Testing, Kluwer Academic
Publishers, 1997.
20. Kwang-Ting Cheung, Vishwani and D. Agrawal, Unified Methods for VLSI Simulation
and Test Generation, 'Series in Engineering and Computer Science: SECS73', Kluwer
Academic Publishers, 1989.
21. J. M. Crichlow, An Introduction to Distributed and Parallel Computing, Prentice Hall,
1988.
22. R A. DeMillo, W. Michael McCracken, RJ. Martin, and John F. Passafiume, Software
Testing and Evaluation, The Benjamin Cummings Publishing Company, Inc., Menl0
Parc, Ca. USA, 1987.
23. B. Douglass, Real-Time UML: Developing Efficient Objects for Embedded Systems
Reading, Addison-Wesley, 1998
24. B. Douglass, Doing Hard Time: Using Object Oriented Programming and Software
Patterns in Real Time Applications Reading, Addison-Wesley, 1999.
25. R Drechsler, Formal Verification ofCircuits, Kluwer Academic Publishers, 2000.
26. E. Dustin, J. Rashka, J. Paul John, and D. Mc Diarmid, Automated Software Tf'sting:
Introduction, Management, and Performance, Addison-Wesley, 1999.
27. N.E. Fenton, Software Metrics. A Rigorous Approach, Chapman and Hall, 1991.
28. M. Fowler and K. Scott, UML Distilled: Applying the Standard Object Modeling
Language Reading, Addison-Wesley, 1997.
29. M.A. Friedman and J.M. Voas, Software Assessment: Reliability, Safety, Testability,
Wiley, 1995.
30. T. Gilb and D. Graham, Software Inspection, Addison-Wesley, 1994.
31. D. Harel and M. Politi, Modeling Reactive Systems With Statecharts: The Statemate
Approach, McGraw-Hill, 1998.
32. Hardware Description Languages, RW. Hartenstein Editor, Elsevier Science
Publishers, 1987.
33. Logic Design and Simulation, E. Höerbst Editor, Elsevier Science Publishers, 1986.
34. c.P. Hollocker, Software Reviews and Audits Handbook, Wiley, 1990.
35. Shi-Yu Huang, Formal Equivalence Checking and Design Debugging, Kluwer
Academic Publishers, 1998.
36. W. Humphrey, A Discipline For Software Engineering, Addison-Wesley, 1995.
37. W. Humphrey, Introduction To The Personal Software Process, Addison-Wesley, 1997.
38. IEEE Standardfor Software Unit Testing, IEEE Press, 1987.
39. Finn Jensen, Component Reliability. Fundamentals, Modeling, Evaluation & Assurance,
Wiley Series in Quality and Reliability Engineering, Patrick D.T. O'Connor Editor, John
Wiley & Sons, 1995.
40. Niraj K. Jha and Sandip Kundu, Testing and Reliable Design ofCMOS Circuits, Kluwer
Academic Publishers, 1990.
41. B.W. Johnson, Design and Analysis of Fault Tolerant Digital Systems, Addison-
Wesley, 1989.
42. D. R. H. Jones, Failure Analysis Case Studies: A Source Book of Case Studies Selected
from the Pages of Engineering Failure Analysis 1994-1996, Pergamon Press, 1998.
43. Cem Kaner and D. Pels, Bad Software, Wiley Interscience, 1998.
44. Cem Kaner, J.D. Falk and Nguyen, Testing Computer Software, Wiley Interscience,
1999.
References 653

45. P.K. Kapur, R.B. Garg, and S. Kumar, Contributions to Hardware and Software
Reliability Modeling, World Scientific Publishing Company, 1999.
46. R. Kehoe A. Jarvis, ISO 9000.3. A Tool for Software Product and Process
Improvement, Springer-Verlag, New York, 1996.
47. Fault Tolerance: Achievement and Assessment, M. Kersken and F. Saglietti, Editors,
Strategies, Research Esprit-Project 300. Request, Voll, Springer-Verlag, 1992.
48. Furnihiko Kimura, Computer-Aided Tolerancing, Chapman & Hall, 1996.
49. Z. Kohavi, Switching and Finite Automata Theory, TATA McGraw Hili Publisher,
1978.
50. T. Koomen and M. Pol, Test Process Improvement, Kluwer Acadernic Publishers, 1998.
51. Way Kuo, Wei-Ting Kary Chien, and Taeho Kim, Reliability, Yield, and Stress Bum-
In: A Unified Approach for Microelectronics Systems Manufacturing and Software
Development, Kluwer Acadernic Publishers, 1998.
52. P. K. Lala, Fault.Tolerant & Fault Testable Harware Design, Prentice Hall, 1985.
53. Dependability: basic concepts and terminology, in five languages, J.c. Laprie Editor,
IFIP WG 10.4, Springer-Verlag, 1990.
54. L. Lavagno and A. Sangiovanni-Vincentelli, Algorithms for Synthesis and Testing of
Asynchronous Circuits. Kluwer Acadernic Publishers, 1993.
55. S. C. Lee, Modem Switching Theory and Digital Design, Prentice Hall Inc., 1978.
56. N. G. Leveson, Safeware. System Safety and Computers, Addison-Wesley Publishing
Company, 1995.
57. Advanced Techniques for Embedded Systems: Design and Test, edited by Juan Carlos
Lopez, Roman Herrnida, and Walter Geisselhardt, Kluwer Acadernic Publishers, 1998.
58. D. Luckham, Programming with Specifications. An Introduction to ANNA. A Language
for Specifying Ada Programs, Springer-Verlag, 1990.
59. Software Fault.Tolerance, M.R. Lyn Editor, Wiley, 1995.
60. L. A. Macaulay, Requirements Engineering, Series in 'Applied Computing', Springer-
Verlag, 1996.
61. B. Marick, The Craft of Software Testing Subsystem Testing Including Object-based
and Object-Oriented Testing, Prentice Hall, 1994.
62. L. Perry Martin, Electronic Failure Analysis Handbook, McGraw-HilI, 1998.
63. C. Maunder and R. E. Tulloss, The Test Access Port and Boundary Scan Architecture,
collection of several papers on this subject, IEEE Computer Society Press, Los
Alarnitos, Ca, USA, 1990.
64. Pinaki Mazumder and Kanad Chakraborty, Testing and Testable Design of Random-
Access Memories, Kluwer Acadernic Publishers, 1996.
65. K. L. McMillan, Symbolic Model Checking, Kluwer Acadernic Publishers, 1993.
66. A. Miczo, Digital Logic Testing and Simulation, Harper & Row Publishers, New York,
1986.
67. C. MitchelI, V. Stavridou, Mathematics of Dependable Systems, Clarendon Press, 1995.
68. J. W. Moore, Software Engineering Standards. A User's Road Map, IEEE Computer
Society, Los Alarnitos, Califomia, 1998.
69. G. Motet, A. Marpinard, and J.c. Geffroy, Design of Dependable Ada software,
Prentice Hall, 1996.
70. G. Myers, The Art of Software Testing, Wiley, 1979.
71. B. Nadeau-Dostie, Design for AT-Speed Test. Diagnosis and Measurement, Kluwer
Acadernic Publishers, 1999.
72. W. Nelson, Accelerated Testing: Statistical Models. Test Plans. and Data Analyses,
Wiley Interscience, 1990.
654 References

73. P. G.Neumann, Computer Related Risks Addison-Wesley 1995.


74. P. D.T. O'Connor, Practical Reliability Engineering, 3rd edition, John Wiley & sons,
1995.
75. Formal Methods Specification and Verification Guidebookfor Software and Computer
Systems, Vol. 1 'Planning and Technology Insertion', Office of Safety and Mission
Assurance, NASA, Washington DC, USA, NASA.GB.002.95, Release 1.0, July 1990.
76. A. Pages and M. Gondran, System reliability Evaluation and Prediction in Engineering,
North Oxford Academic, 1986.
77. K. Parker, The Boundary Scan Handbook, second edition, Kluwer Academic Publishers,
Boston, 1998.
78. L.F. Pau, Failure Diagnostic and Performance Monitoring, Ed. M. Dekker Inc., NY
Basel, 1975.
79. W. Perry, Effective Methodsfor Software Testing, John Wiley & Sons, Inc, 1995.
80. R. M. Poston, Automating Specification-Based Software Testing, IEEE Computer
Society Press, 1996.
81. Fault-Tolerant Computing. Theory and Techniques, D.K. Pradhan Editor, 2 Volumes,
Prentice Hall, Englewood Cliffs, 1986.
82. Fault-Tolerant Computer System Design, D.K. Pradhan Editor, Prentice Hall,
Englewood Cliffs, 1996.
83. P. Pukite and J. Pukite, Modeling for Reliability Analysis: Markov Modeling for
Reliability, Maintainability, Safety, and Supportability. Analysis of Complex Systems,
IEEE,1998.
84. I.c. Pyle, Developing Safety Systems. A Guide Using Ada, Prentice Hall, 1991.
85. P. Rashinkar, P. Paterson, and L. Singh, System-On-a-Chip Verification: Methodology
and Techniques, Kluwer Academic Publishers, 2000.
86. S. Robertson and J. Robertson, Mastering the Requirements Process, Addison-Wesley,
1999.
87. J. Rumbaugh, M. Blaha, W. Premeriani, F. Eddy and W. Lorensen, OMT- Modeling and
Object Oriented Design, Masson & Prentice Hall, 1995.
88. D. P. Siewiorek and R. S. Swarz, The Theory and Practice of Reliable System Design,
Digital Press, Bedford Massachussetts, 1982.
89. D. P. Siewiorek and R. S. Swarz, Reliable Computer Systems: Design and Evaluation,
Third Edition, A K Peters, 1998.
90. W. R. Simpson and J. W. Sheppard, System Test and Diagnosis, Kluwer Academic
Publishing, 1994.
91. Nozer Singpurwalla and Simon Wilson, Statistical Methods in Software Engineering:
Reliability and Risk, Springer-Verlag, 1999.
92. D. D. Smith, Designing Maintainable Software, Springer-Verlag, New York, 1999.
93. I. Sommerville and P. Sawyer, Requirements Engineering. A good practice guide,
Wiley, 1997.
94. J. T. de Sousa and P. Y.K. Cheung, Boundary-Scan lnterconnect Diagnosis, Kluwer
Academic Publishers, 2001.
95. A. M. Stavely, Toward Zero-Defect Programming, Addison-Wesley, 1999.
96. N. Storey, Safety.Critical Computer Systems, Addison-Wesley, 1996.
97. A.J. van de Goor, Testing SemiConductor Memories. Theory & Practice, John Wiley &
Sons, 1991.
98. F. Thoen and F. Catthoor, Modeling, Verification and Exploration of Task-Level
Concurrency in Real-Time Embedded Systems, Kluwer Academic Publishers, 1999.
References 655

99. S. A. Vanstone and P. C. van Oorschot, An Introduction to Error Correcting Codes with
Applications, Kluwer Academic Publishers, 1989.
100. A. Villemeur, Reliability, Availability, Maintainability and Safety Assessment: Methods
& Techniques, 2 Volumes, Wiley, 1991.
101. J. Voas and G. McGraw, Software Fault Injection, Wiley, 1998.
102. Formal Techniques in Real-Time and Fault-Tolerant Systems, Jan Vytopil Editor,
Kluwer Academic Publishers, 1993.
103. E. Wallmüller, Software Quality Assurance: A practical approach, Prentice Hall, 1994.
104. B.A. Wichmann, Software in Safety-Related Systems, Wiley, 1992.
105. R. J. Wieringa, Requirements Engineering. Frameworkfor Understanding, John Wiley
and Sons Ltd., 1996.
106. VLSI Testing, T.W. Williams Editor, 'Advances in CAD for VLSI', Vol. 5, North-
Holland, 1986.
107. V.N. Yarmolik, Fault.Diagnosis of Digital Circuits, Wiley & Sons Ltd., England, 1990.
108. V.N. Yarmolik and I.V. Kachan, Self-Testing VLSI Design, Elsevier, Amsterdam, 1993.
109. M. Yoeli, Formal Verification of Hardware Design, Selection of papers, IEEE
Computer Society Press Tutorial, 1990.
110. S. Zahran, Software Process Improvement, Addison-Wesley, 1998.
Index

A MDT,154
acceptability curve, 454 MUT, 155
acceptable product, 455 permanent, 154
acceptance test, 368,478
accident, 43 B
activation, 71, 326 backward fault analysis, 333
adaptive vote, 493 backward propagation, 327
aggression, 47 backward tracing, 327
alias, 389 backward recovery, 476
alternate, 482 acceptance test, 478
arithmetic code, 419 context restoration, 481
Arrhenius law, 148 domino effect, 481
assertion, 246, 436 recovery point, 477, 479
assessment recovery cache, 478
dependability, 141 retry point, 477
design, 212 rollback point, 477
error/fault model, 105 snapshot, 479
requirements, 204 behaviorallevel, 25, 27
reliability, 146,479 behavioral model, 30
risk, 452 behavioral property, 91
safety, 453 benign event, 452
testability , 363 Berger code, 418
attribute bidimensional code, 409, 415
dependability, 9, 14, 154 binary code, 404
module, 30, 71 bit stuffing, 506
quality,2 black-box testing, 237
automata, 27, 270 boundary scan, 385
automatic test pattern generation (ATPG), branch test, 343
304,306 breakdown, 6
availability,9, 154 BSDL,387
instant, 154
657
658 Index

bug, 44 reusabiIity,54
BuHt-In Self-Test (BIST), 290, 366, 388 technical, 54
compaction function, 290, 389 technological, 54
signature, 290, 389 contarnination, 73, 494
signature analysis function, 290, 389 context of the test, 432
BuHt-In Test (BIT), 366, 380, 387 context of execution, 479
boundary scan (IEEE 1149-1), 385 contract, 23, 40
test bus, 385 control flow, 346
FITPLA,380 control path, 344
JTAG,385 controHabiIity, 132,332,364
LSSD,383 COTS, 54, 125
scan design, 383 coverage
fuH scan, 386 code, 344 408
partial scan, 386 fault, 291
scan domain, 387 table, 294
BuHt-In Test Equipment (BITE), 380 test, 149,294,333,363
coverage analysis
C backward fault analysis, 333
CAN Bus, 496, 505 forward simulation, 333
catastrophic event, 452 structural method, 333
checker, 441, 444 CRC, 412, 501, 503
code disjoint, 445 creation process, 22
self-chaching checker, 445 criticality analysis, 456
checksum code, 420 cyclic code 412
dient, 22 Cross-InterIeaved Reed-Solomon
code,402 (CIRC),416,502
code-preserving,441 Reed-Solomon, 416,502
code disjoint, 445 BCH,415
codeword, 405 coding procedure, 414
cold standby, 491 ESF,415
compaction. 290, 389 Fire, 415, 501, 508
compatible, 81
compensation, 473, 486 D
complete diagnosis sequence, 336 dangerous event, 452
complete distinguishing sequence, 300, dead-man, 435
336 debugging, 281
completeness, 80, 212 defect,44
compliance test, 283 degradation, 486
component, 30 delivered service, 19
composition relationships, 27 dependabiIity,9
compositional hierarchy, 31 attribute, 9, 14
computer aided maintenance (CAM), 287 impairments, 7,14,39
condition, 345 means,13
ConditionlDecision Coverage, 346 dependability assessment, 141
C/DC test, 345 attribute, 141
confidentiality, 10, 157 avaHabiIity, 154
confinement, 137,494 forecasting value, 142
conforrnity test, 211, 283 maintainabiIity, 152
consistency, 329 maintenance, 150
constraints exploitation value, 142
Index 659

qualitative approach, 141 LSSD,383


deductive, 143 Reed-Muller structure, 377
Fault tree Method, 168,287 scan design, 383
inductive, 143, 164 software
FMEA, 164,287 exception mechanism, 374
FMECA, 167,456 instrumentation, 373
quantitative approach, 141, 159 design level, 25,27
fault injection, 159 behavioral, 25, 27
fault simulation, 159 electronic, 26
Markov graph, 162 HDL,25
Monte Carlo simulation, 160 logical,25
reliability block diagram, 160, mask,26
475 structural, 25, 27
stochastic Petri net, 162 symbolic, 26, 27
test, 144 system, 25
reliability, 145 technological, 26, 27
safety, 155 physical, 28
security, 10, 157 symbolic, 26, 27
confidentiality, 10, 157 design proof, 231
integrity, 10, 157 design rule, 263
specification value, 142 design testing, 133
standards, 143 designer, 22
testability, 149 detection testing, 149,280,297
dependabilityassurance, 138 development, 22
dependability impairments, 7, 14,39 development process, 22, 221
dependability means, 13 device under test (OUT), 281
fault avoidance, 122 diagnosis, 298, 336, 346, 489
fault forecasting, 13, 142 adaptive sequence, 298
fault prevention, 13, 123 complete diagnosis sequence, 336
fault removal, 13, 122, 127 complete distinguishing sequence,
fault tolerance, 13, 135, 138 300,336
design, 5, 21, 24, 124, 129 diagnosis tree, 298, 336
behaviorallevel, 25, 27 distinguishing sequence, 299, 336
electronic level, 26 fault tree, 299
layout, 25 fixedsequence, 298
logic level, 25 functional. 245
structurallevel, 25, 27 assertion, 246
system, 25 post-condition, 246
technologicallevel, 26, 27 pre-condition, 246
Design For Testability (OFT), 362, 365 partition, 299
Ad hoc approach, 367 diagnosis tree, 298, 338
guidelines, 367 disastrous event, 452
boundary scan, 385 disruption, 50, 400
Built-In Self-Test, 290, 366, 388 distance
Built-In Test, 366, 380, 387 arithmetic, 420
specific design method, 377 Hamming, 406
Built-In Test Equipment (BITE), 380 distinguishing sequence, 299, 336
combinational circuit, 377 disturbance, 6, 47
Galois form, 377 domain
IEEE-1149-1, JTAG, 385 dangerous, 459
660 Index

dynamic, 182 propagation, 73, 495


functional, 180, 402 propagation path, 73
safe, 459 single, 95, 401
scan, 387 soft,95
static, 180, 402 specific, 130
domino effect, 481 static, 73, 95, 102
Double-Duplex, 497 symmetrie, 95
double-rail code, 418, 443 temporary, 73, 95, 102
dreaded event, 143 transient, 73
duplex, 194, 442 unidirectional, 96
duplicate,472 error confinement, 137,494
DUT,281 error contamination, 73, 494
dynamic analysis, 127, 131,237,279 Error Detecting and Correcting Code
dynamic functional domain, 182, 243 (EDCC), 137,399,402
anti-intrusion, 404
E ECC, 404
easily testable system, 135 RSA,404
instrumentation, 135 arithmetic code, 419
monitoring, 135 arithmetic distance, 420
ECC code, 404 arithmetic treatment, 422
EDC/ECC, 402 checksum code, 420
Electro-Magnetic Compatibility (EMC), data storage, 422
273 logical treatment, 422
electronic level, 26 residual code, 420
emergent functionality, 82 module 9 proof, 421
environment, 17 transmission, 422
functional, 4, 17 bidimensional code, 409, 415
non-functional,4, 17 Longitudinal Redundancy Check,
equivalent faults 415
pattern, 297, 300 Vertical Redundancy Check, 415
system, 297, 300, 338 binary,404
error, 7, 72 capacity, 408
asymmetrie, 95 cardinality, 408
burst, 401 codeword, 405
confinement, 137,494 cost, 408
contamination, 73, 494 coverage rate, 408
diffusion, 73 cyclic code, 412
dynamic, 73,95 Cyclic Redundancy Check (CRC),
generic, 130 412,501
hard,95 density, 408
immediate, 75, 326 double-raH, 418, 443
initial activation, 75 disruption, 50, 400
logical,95 disruption operator, 400
multiple, 95, 102,401 disruption set, 401
order, 95, 401 errorcorrection, 402
non-logical,95 error detection, 402
overwritten,74 error model, 400, 408
packet, 401 Hamming distance, 406
permanent, 73, 95 Hamming theorem, 407
primitive, 75, 326 linear code, 410
Index 661

control relations, 410 single error, 95, 401


Galois field, 411 source code level, 102
generator matrix, 411 static, 73, 95, 101
Hamming code, 411 symmetric, 95
modified Hamming code, 410 temporary, 73, 95, 101
parity check bit, 412 unidirectional, 96
syndrome, 410 error recovery, 472, 485
systematic code, 414 error typology, 91
low-level coding, 404 event tree, 169
HDB,405 exception mechanism, 374, 440
Manchester, 405, 508 execution path, 344
NRZ, 404, 506, 508 exploitation, 22
multiple parity code, 409 expression means, 221
m-out-of-n, 417, 443 expression tool, 221
non-separable code, 406 extraction, 231
parity, 409
power of expression, 408 F
preserving, 410 fail-fast system, 464
productcode, 415 fail-passive system, 464
redundancy, 405 fail-safe system, 137,451,456
redundancy rate, 408 intrinsic safety, 457
separable code, 405 structural redundancy, 459
single parity code, 409 fail-silent system, 464
transmission failure, 3, 41
moment, 401 Byzantine, 43
two-rail,418 benign,43
unidimensional code, 409 consequence
unidirectional code, 399 external, 44
Berger code, 418 seriousness classes, 452
m-out-of-n, 417, 443 consistent, 42
two-rail code, 418, 443 crash,43
unidirectional error, 416 disruptive, 50
error detection and correction dynamic,42
mechanisms, 136, 485 external consequence, 77
error logging, 500, 504 inconsistent, 42
error masking, 136,473 omission, 43
error model, 91, 400, 408 persistent, 42
assessment, 105 rate, 146
asymmetric, 95 risk,453
burst error, 401 seriousness, 43, 77, 452
disruption, 400 static,42
disruption operator, 400 stopping, 43
dynamic, 73,95 systemic, 49
error in packet, 401 temporary,42
executable code level, 104 timing,42
logical,95 value,42
multiple, 95, 102, 401 failure mode, 42
order, 401 Failure Modes and Effect Analysis
non-logical, 95 (FMEA), 164, 287
permanent, 95 worksheet, 164
662 Index

Failure Mode and Effects and Criticality verification, 221


Analysis (FMECA), 167, 456 expression of specifications, 209
failure rate, 146 requirement expression, 204
false alarm, 433 expression aid, 205
fault, 4, 44 specification, 201, 209
accidental, 50 validation of the method, 203
active,71 verification of the solution, 203
classification, 63 fault class, 63
common mode, 474, 488 fault collapsing, 310
component, 53 fault contention, 137
conceptual, 49 fault correction, 127
creation, 6, 48, 129 fault coverage, 292
design, 49 fault design, 53
disturbance, 6 fault detection, 127
dormant,71 fault diagnosis, 127
dynamic,51 fault dictionary, 310
equivalent fault forecasting, 13, 142
pattern, 297, 300 fault grading, 306, 307, 333
system, 297,300,338 deterrninistic, 309
external, 6, 47 probabilistic, 309
functional, 6, 49 simulation, 308
hard, 95, 273,499 statistical, 310
hardware, 49 structural analysis, 308, 333
human-made, 49 fault injection, 159,308
intentional, 51 fault isolation, 127
interaction, 53 fault localization, 127, 297
intermittent, 51 fault logging, 441
internal, 6, 46 fault masking, 339
masking, 136, 192,339,473 fault model, 91
module, 53 assessment, 105
operational, 49, 58 bridging, 97
passive, 71 delay,100
permanent, 51 open,97
physical, 49 short,97
production, 49, 56 short -circuit, 98
soft,95,273,499 software, 101
specification, 49 source code level, 102
static,51 stuck-off, 97, 99
structural, 69 stuck-on, 97, 99
systemic, 49 stuck-open, 99
table, 294 stuck-at, 96
technological, 6, 49 temporal, 100
temporary,51 timing, 100
transient, 51, 95 fault prevention, 13, 123
undetectable, 192, 326, 339 design, 124
fault activation, 71, 326 design model choice, 222
initial,75 design process choice, 223
fault avoidance, 122,201 design guide, 224
design, 219 expression guide, 225
validation, 221 choice of words, 227
Index 663

lexicography,226 accelerator, 310


self-documenting, 227 concurrent, 313
expression improvement, 228 deductive,312
operation, 125 fault collapsing, 310
production, 124 parallel, 312
specification, 123,209 serial,311
modeling process, 210 fault specification, 52
modeling tool, 209 fault tolerance, 13, 135, 138,469
technological faults, 257 active tolerance, 485
action on the environment, 272 adaptive vote, 493
action on the product, 261 backward recovery, 476
design rule, 263 CANBus,505
hardware technology, 258 cold standby, 492
software, 265 compensation technique, 473
action on the run-time confinement, 137, 494
environment, 274 Double-Duplex, 497
feature restriction, 267 error contarnination, 73, 494
hazardous feature, 274 error correction, 136
implementation constraints, error detection, 136
275 error logging, 500
language choice, 266 error masking, 136,473
programrning process fault contention, 137
improvement, 269 forward recovery, 482, 499
programrning style, 269 graceful degradation, 486
software technology, 258 hot standby, 492
fault removal, 13, 127 hot swap, 492
design, 129, 229 memory scrubbing, 500
formal proof, 248 NMR,490
functional diagnosis, 245 N-self checking, 497
assertion, 246 N-Version technique, 472, 497
post-condition, 246 passive tolerance, 485
pre-condition, 246 reconfiguration, 137,486
functional test, 240 recovery cache, 478
property satisfaction, 238 redundancy
property analysis, 239, 244 ofdata,470
dynarnic analysis, 127 offunction,470
operation, 134 offunction & data, 470
production, 133 retry mode, 477, 496
specification, 129,211 self-purging, 493
verification, 211 temporal redundancy, 471
conforrnity, 211 TMR,473
qualitative, 211 VANBus,507
static analysis, 127 fault tree, 299
technological faults Fault Tree Method (FTM), 168,252,287
off-line testing, 279 basic event, 168
test, 127, 149,280,297 feature, 29
validation, 128,203,221 final test, 368
verification, 128,203,221 finite state machine, 26,92, 233, 346
fault secure, 442 Fire code, 415, 501, 508
fault simulation, 159,308 FITPLA,380
664 Index

FMEA, 164,287 input vector, 180,288


FMECA, 167,456 inspection, 215
formal identification, 317 instrumentation, 135, 373, 440
formal proof, 127,248 integration test, 368
deductive approach, 252 integrity, 10, 157
Fault Tree Method, 252 JTAG,385
inductive approach, 248 justification, 329
symbolic execution, 251 kernei, 446
forward propagation, 327
forward simulation, 333 L
forward recovery, 482, 499 language, 29, 209
recovery block, 482 latency,75
alternate, 482 layout level, 25
version, 482 life cycle, 5, 21
termination mode, 483 design, 5, 21, 24
fiequentevent, 454 exploitation, 22
functional diagnosis, 245 implementation,21
assertion, 246 manufacturing, 5, 21
post-condition, 246 need,21
pre-condition, 246 operation, 5, 22, 29
functional environment, 4, 17 phase, 21
functional redundancy checking (FRC), production, 5, 21, 28
442,.536 realization, 5, 24
functional test, 237, 240, 284, 303 requirement, 5, 21
likelihood,243 specification, 5, 21, 22
stage, 21
G-K step,21
generic property, 91 usefullife, 5, 22
guidelines, 367 utilization, 22
Hamming code, 411 likelihood,243,436
Hamming distance, 406 linear code, 410
hard faults, 95, 273,499 control relations, 410
HDB(n),405 generator matrix, 411
HDL,25 Hamming code, 411
hierarchy, 31 modified Hamming code, 411
compositional, 31 syndrome, 410
use, 31 systematic code, 414
high reliability, 126, 145,261,451,469 Linear Feedback Shift Register (LFSR),
hot standby, 492 389
hot swap, 492 link,30
identification, 317, 346 localization, 127, 132, 150,485
impairments, 7, 14, 39 log file, 441
implementation, 21 logic level, 25
impossible event, 454 logical test, 288, 302
improbable event, 453 functional, 304
in-situ maintenance, 392 structural, 304, 323
incompatibility,81 LSSD,383
incompleteness, 52, 81, 123
inconsistency,53 M
inertia,80 maintainability,9, 152
Index 665

Mean Time To Repair, 153 Mean Time To Repair (MTTR), 153


repair rate, 153 Mean Up Time (MUT), 155
maintenance, 29, 150,296,392 method,221
CAM,287 minor event, 452
corrective, 29, 152,296 mission, 18
curative, 152 duration, 19
evolutive, 29, 152,296 function, 18
in-situ, 392 operationallifetime, 19
Mean Down Time (MDT), 154 model, 29, 209
Mean Up Time (MUT), 155 behavioral, 30
non-reparable product, 29, 127, 134, continuous, 34
151 discrete, 34
preventive, 29, 152 structural, 30
conditional, 152,296 structured-functional, 31
scheduled, 152 modeling tool, 29, 209, 221
systematic, 152, 296 feature, 29
remote facility, 392 Modified ConditioniDecision Coverage
reparable product, 29, 127, 134, 151 (MCIDC), 346
troubleshooting and repair, 150 modified Hamming code, 411
maintenance test, 134,286,296 module, 30
computer aided maintenance, 287 attribute, 30
deep knowledge, 288 behavioral model, 30
empirical associations, 287 composition, 186
experimental approaches, 287 fusion operator, 186
deductive method, 287 emergence operator, 187
inductive method, 287 hierarchy, 31
expert system, 287 spare, 194, 491
Fault Tree Method (FTM), 287 state,30
FMEA,287 moduln 9 proof, 421
knowledge database, 287 monitoring, 135,434
model-based approach Monte Carlo simulation, 160
diagnosis algorithm, 288 m-out-of-n code, 417, 443
diagnosis process, 288 mutant, 351
model-based approaches, 287 mutation, 351
reasoning by associations, 287 weak mutation testing, 354
shallow reasoning, 287
structure and function, 288 N
surface reasoning, 287 need,21
major event, 452 netlist,25
Manchester code, 405, 508 NMR,490
manufacturing, 5, 21 non-functional environment, 4, 17
Markov graph, 162 non-regression testing, 297
mask design level, 26 non-repairable product, 29, 127, 134
masking, 136, 199,339,473 notation, 29
Mean Down Time (MDT), 154 NRZ code, 404, 506, 508
Mean Time Between Failures (MTBF), N-Self Checking, 497
147 N-Versions technique, 472, 497
Mean Time To Failure (MTTF), 147 voter, 473
Mean Time To First Failure (MTTFF),
147
666 Index

o physical, 90
observability, 76, 132, 150,332,364 structural, 90
off-line testing, 134,279, 362, 366 property satisfaction, 238
maintenance testing, 280 prototyping, 214
production testing, 280 pseudo-random testing, 290
on-line testing (OLT), 134, 137,281,392,
427,485 Q
continuous, 428 qualitative
discontinuous, 427 criticality analysis, 456
context ofthe test, 432 dependabilityassessment, 143
context savinglrestoring, 432 risk assessment, 452
self-testing, 428 safety assessment, 456
operation, 5, 22, 29,125, 134 specification assessment, 212
output vector, 180,288 tolerance assessment, 476
quality, 1
p attribute, 2
parallel signal analyzer (PSA), 390 quality assurance test, 264
parity check, 412 quality control, 125,264,283
parity code, 409 destructive test, 283
partition, 300 non-destructive test, 283
pattern equivalent faults, 297, 300 quality metrics, 365
perturbation, 47 quality of the test sequence, 363
Petri net, 25, 27 coverage, 363
stochastic, 162 length,363
phase, 21 quantitative
physical property, 90 criticality assessment, 456
post-condition, 246,436 dependabilityassessment, 141, 159
pre-condition, 246, 436 risk assessment, 452
prime gate, 189 safety assessment, 453
probable event, 453
process characterization & control, 264, R
283 random testing, 290, 303
product, 17, 29 rare event, 453
product structure, 30 realization, 5, 24
component, 30 reconfiguration, 137,486
logicallink, 30 reconvergent fan-out structure, 328, 334
module, 30 recovery, 472, 485
sub-system, 30 backward, 476
production, 5, 21, 28, 124, 133 block, 482, 497
production testing, 133,264,283 cache, 478
program mutation, 352 forward, 482, 499
proof, 127, 248 point, 477, 479
propagation, 73 redundancy, 11, 135, 176,402
backward, 327 active, 13, 188,493
forward, 327 code,405
path,73 data, 470
property function, 470
analysis, 239, 244 functional, 179,436
behavioral, 91 domain, 402
generic,91 composition of modules
Index 667

emergence operator, 187 Arrhenius law, 148


fusion operator, 186 Mean Time Between Failures
dynarnic domain, 182, 243 (MTBF),147
dynarnicredundancy, 183 Mean Time To Failure (MTTF), 147
dynarnic uni verse, 182 Mean time To First Failure (MTFF),
static domain, 180, 459 147
static functional redundancy, 180 process control, 264
static functional redundancy rate, production testing, 264
181 quality control, 264
static universe, 180,460 statistical description, 145
irredundant element, 188 statistical mathematical tool, 145
passive, 136, 188,372 reliability assurance test, 264
rate, 408 reliability block diagram, 160,475
redundant functional domain, 402 reliabilityevaluation, 146, 283
reusability,55 reliability mastering, 262
semantic, 176 reliability model, 146
separable, 193 exponentiallaw, 146
cold standby redundancy, 194 Weibulllaw, 147
functional module, 193 reliability test, 146
hot standby redundancy, 194 accelerated, 148,264
non-separable, 193 censured, 146
off-line redundancy, 194 curtailed, 146
spare, 194 destructivel non-destructive, 264, 283
on-line redundancy, 194 progressive, 146
redundant module, 193 progressive curtailed, 146
software, 191 step stress, 146
structural, 187,441,471 remote maintenance facility, 392
active, 188 repair, 127
gate level repair rate, 153
prime gate, 189 repairable product, 29, 127, 134
hardware, 187 replica, 193, 472
irredundant element, 188 requirement, 5, 21
passive, 188 requirement expression, 204
software, 187 evaluation of a method, 207
time, 187 expression aid
syntactic, 176 horizontal structuration, 206
temporal, 471 vertical structuration, 206
Reed-Muller structure, 377 retry mode, 477, 496
Reed-Solomon, 416,502 retry point, 475
refinement process, 33 reusability,55
reliability,9, 126, 145,261,451,469 reuse, 82, 125
bathtub curve, 148 review, 127,213,214
infant mortality, 149 risk,452
usefullife, 149 acceptability, 454
wearout, 149 acceptability curve, 454
estimator, 146 acceptable product, 455
evaluation, 264, 283 acceptable risk rate, 454
failure rate, 146 tolerable probability, 454
failure rate estimation, 148 robustness, 83
accelerated test, 148 rollback point, 477
668 Index

RSA code, 404 assertion, 436


run-time executive, 259 exception mechanism, 440
instrumentation, 440
likelihood test, 436
S post-condition, 436
safe state, 458 pre-condition, 436
safety,9, 156,433,451,523 watchdog, 438
active, 458 observation of product operation, 434
criticality analysis, 456 observation of user behavior, 435
qualitative, 456 dead-man technique, 435
FMECA, 167,456 structural redundancy, 441
quantitative, 456 checker, 441
dangerous domain, 459 code-preserving,441
intrinsic, 457 duplex, 442
passive, 458 fault-secure, 442
safe domain, 459 self-testing, 442
safety classes, 453 totally self-checking, 442
seriousness classes, 452 self-testing system, 137,465
benign, 452 semantics, 123
catastrophic, 452 serious event, 452
dangerous, 452 seriousness, 77,452
disastrous, 452 benign,43,77,452
major, 452 catastrophic, 43, 78, 452
minor, 452 dangerous, 78,452
serious, 452 disastrous, 78, 452
significant, 452 major, 78, 452
without effects, 452 minor, 77,452
safety classes, 453 serious, 43, 78, 452
extremely improbable event, 453 significant, 78, 452
extremely rare event, 453 seriousness class, 452
frequent event, 454 service delivered, 19,39
impossible event, 454 service relationships, 27
probable event, 453 serviceability, 153
rare event, 453 severity, 77
reasonably probable event, 454 signature, 290
scan design, 383 signature analysis, 290, 389
full scan, 386 significant event, 452
partial scan, 386 simulation sequence, 237
scan domains, 387 snapshot, 479
scenario, 213 soft fault, 95, 499
scrubbing, 500 spare module, 194, 491
security, 10, 157 specification, 5, 21, 22, 124, 129,209
confidentiality, 10, 157 contract, 23
integrity, 10, 157 functional characteristics, 23
self-checking checker, 445 non-functional characteristics, 23
self-checking system, 441 fault prevention, 209
self-purging, 493 fault removal, 211
self-testing, 433, 442, 465 verification, 211, 229
condition monitoring, 434 stage, 21
functional redundancy, 436 state, 30, 72
Index 669

static analysis, 127, 129 black box testing, 237


step, 21 branch,343
STIL,285,387 Built-In Self-Test, 290, 366, 388
stochastic Petri net, 162 LFSR,389
strobing, 289 Built-In Test (BIT), 366, 380, 387
structural domain, 187 boundary scan, 385
structural fault, 69 test bus, 385
structurallevel, 25, 27 FITPLA,380
structural property, 90 IEEE 1149-1,385
structural software test, 340 JTAG,385
branch & path test, 343 LSSD,383
branch test, 343 scan design, 383
condition, 345 bum-in, 126,284
condition and decision test, 345 BSDL,387
ConditionlDecision Coverage censured, 146
(C/DC),346 compliance, 283
control flow, 346 condition andlor decision, 345, 346
control path, 344 conformity, 283
coverage rate, 341 context, 432
decision, 345 continuity, 284
execution path, 344 continuous, 428
Modified ConditionlDecision test, coverage, 149,294,333,363
345 coverage table, 294
mutation, 351 curtailed, 146
weak mutation testing, 354 design, 133
path test, 344 design for testability, 365
statement test, 342 ad hoc technique, 365, 367
structural test, 305, 323 guidelines, 367
structure, 30 built-in self-test (BIST), 290, 366,
structured-functional model, 31 388
stuck-at fault, 96 built-in test (BIT), 366, 380
stuck-onloff fault, 97, 99 specific design, 366
sub-system, 30 destructive, 264, 283
symbolic level, 26, 27 detection, 149,280,297
syndrome, 410 fault masking, 339
system, 24, 30 device under test (DUT), 281
system equivalent faults, 297, 300, 338 diagnosis, 149,281,297,336,346
system level, 25 adaptive sequence, 299
diagnosis tree, 299, 336
T fault tree, 299
technical constraints, 54 fixed sequence, 298
technological constraint, 54 pattern equivalent faults, 297
system equivalent faults, 297
technological fault, 257
discontinuous, 427
technologicallevel, 26, 27
distributed, 430
termination mode, 483
easily testable system, 135, 362
test, 127, 280
Embedded Core Test (IEEE PI500),
accelerated, 148,264,310
acceptance test, 368,478 285
exhaustive, 303
algorithmic, 303
fault coverage, 291
alphalbeta, 283
670 Index

fault grading, 307 marching, 319


fault injection, 308 ping-pong, 320
fault simulation, 308 walking, 319
deterministie, 309 random, 290, 303
probabilistic, 309 reliability test, 148, 264
statistieal, 310 accelerated, 148,264
structural analysis, 307, 333 destructive, 264
fault localization, 127, 297 non-destructive,264
fault table, 294 schmoo plot, 284
final test, 368 screening, 265
fixed diagnosis, 297 sequential circuit, 316
functional, 237, 240, 284 303 formal identification, 316
functional test sequence, 304 functional test sequence, 317
generation RAM,319
path sensitizing, 325 signature analysis, 290, 389
program mutation, 351 compaction, 290, 389
gray box testing, 237 parallel signal analyzer (PSA),
IDDQ,284 390
in situ test, 392 statement, 342
inputsequence, 288 step stress, 146
integration test, 368 STIL standard, 285, 387
likelihood,243,436 structural, 237, 304
localization, 149,281 algorithmic, 304
logieal, 284, 288 Automatie Test Pattern
maintenance, 296 Generation, 304
maintenance testing, 286 software, 340
Modified ConditioniDecision test generation, 325
(MC/D),346 self-testing, 433
non-destructive test, 264, 283 structural test sequence, 305
non-regression testing, 297 path tracing approach, 307
off-chip, 282, 388 task,431
off-line, 134,279,362,366 test point, 371
in situ maintenance, 392 toggle test, 303
on-chip, 282, 388 unit test, 368
on-line, 134, 137,281,392,427,485 VXI standard, 286
output sequence, 288 white box testing, 237
parametric, 283 test application, 306
path,343 test equipment, 281, 285, 344
production, 133,264,283,292 internal, 282
continuity, 284 external, 282
GO-NOGO, 293 test evaluation, 291, 306, 333
IDDQ,284 test generation, 291, 302, 306, 325
logieal, 284, 288 backward propagation, 326
parametrie, 283 backward tracing, 326
yield,293 consistency, 329
progressive, 146 D-algorithm, 307
pseudo-random, 303 fault activation, 326
RAM,319 forward propagation, 327
checkerboard, 319 justification, 329
galloping, 320 path sensitizing, 307, 325
Index 671

primitive error, 326 Triplex, 473


reconvergent fan-out structure, 328 two-rail code, 418, 443
structured circuit, 332
tracing approach, 307
with fault model, 291
u-z
unidirectional code, 416
without fault model, 291
unit test, 368
test pattern generation, 306, 314, 325
use hierarchy, 31
automatie, 304
usefullife, 5, 22, 149
heuristic, 315
user, 17,22
optimal test sequence, 314
utilization, 22
test sequence, 237, 279, 288
validation, 128,203,221
adaptive, 293
VAN Bus, 507
fixed,292
verification, 128,203,221
input, 288
conformity,211
output, 288
design, 229
test sequence quality, 149,291,363
with specifications, 229
coverage, 291, 363
double transformation, 235
length, 149,291,363
reverse transformation, 230
generation ease, 149,291,363
cost, 291 extraction, 231
test validation, 306 top-down transformation, 236
test vector, 289 property satisfaction, 238
testability, 9, 149,362 simulation, 237
controllability, 132,332, 364 without specifications, 238
coverage, 149,291,363 generic property, 238
generation ease, 149,291,363 dynaInic analysis, 237
length, 149,291,363 property,238
measurement, 363 property satisfaction, 238
observability, 132, 150, 332, 364 qualitative
qualitative estimator, 365 clarity,212
software quality metrics, 365 completeness,212
test application, 363 comprehension, 212
test generation, 363 concision, 212
tester, 281, 285, 288 consistency, 212
signature analysis, 290 non-ambiguity,212
strobing, 289 prototyping,214
with reference list, 290 review, 213, 214
with referent product, 290 inspection, 215
pseudo-random, 290 walkthrough,215
random, 290 scenario, 213
with standard product, 290 simplicity, 212
toggle test, 303 traceability, 212
totally self-checking system, 441, 442 simulation, 237
checker, 441 version, 193,472,482
code-preserving, 441 VHDL,25
fault-secure, 442 voter, 473, 493
self-testing, 442 walkthrough, 215
troubleshooting and repair, 150 watchdog, 438, 496
Tripie Modular Redundancy (TMR), wearout, 149
136,473 Weibull reliability law, 147
672 Index

yield,29

You might also like