0% found this document useful (0 votes)
11 views11 pages

R Wem ZQL On Y

The document discusses the concepts of failures and faults in circuits, distinguishing between failures (deviations from specified behavior) and faults (physical defects). It details various fault models, including stuck-at faults, bridging faults, and the implications of these faults in CMOS circuits, emphasizing the need for accurate modeling at the transistor level. Additionally, it highlights the significance of breaks and stuck-on/-open faults, which cannot be effectively represented by traditional stuck-at fault models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views11 pages

R Wem ZQL On Y

The document discusses the concepts of failures and faults in circuits, distinguishing between failures (deviations from specified behavior) and faults (physical defects). It details various fault models, including stuck-at faults, bridging faults, and the implications of these faults in CMOS circuits, emphasizing the need for accurate modeling at the transistor level. Additionally, it highlights the significance of breaks and stuck-on/-open faults, which cannot be effectively represented by traditional stuck-at fault models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1.

1 Failures and Faults

A failure is said to have occurred in a circuit or system if it deviates from its specified behavior [1.1]. A fault, on the other hand, is a physical
defect that may or may not cause a failure. A fault is characterized by its nature, value, extent, and duration [1.2]. The nature of a fault can be
classified as logical or nonlogical. A logical fault causes the logic value at a point in a circuit to become opposite to the specified value.
Nonlogical faults include the rest of the faults, such as the malfunction of the clock signal, power failure, and so forth. The value of a logical
fault at a point in the circuit i dicates whether the fault creates fixed or varying erroneous logical values. The extent of a fault specifies whether
the effect of the fault is localized or distributed. A local fault affects only a single variable, whereas a distributed fault affects more than one. A
logical fault, for example, is a local fault, whereas the malfunction of the clock is a distributed fault. The duration of a fault refers to whether the
fault is permanent or temporary.

1.2 Modeling of Faults

Faults in a circuit may occur due to defective components, breaks in signal lines. lines shortened to ground or power supply, short-circuiting of
signal lines, ex- cessive delays, and so forth. Besides errors or ambiguities in design specifications, design rule violations, among other things,
also result in faults. Faulkner et al. [1.3] have found that specification faults and design rule violations accounted for 10% of the total faults
encountered during the commissioning of subsystems of a midrange mainframe computer implemented using MSI (Medium Scale Inte-
gration); however, during the system validation such faults constituted 44% of the total. Poor designs may also result in hazards, races, or
metastable flip-flop behavior in a circuit; such faults manifest themselves as "intermittents" through- out the life of the circuit.

In general, the effect of a fault is represented by means of a model, which represents the change the fault produces in circuit signals. The fault
models in use today are

1. Stuck-at fauld

2. Bridging fault
3. Stuck-open fault

1.2.1 STUCK-AT FAULTS

The most common model used for logical faults is the single stuck-at fault. It assumes that a fault in a logic gate results in one of its inputs or
the output being fixed to either a logic 0 (stuck-at-0) or a logic 1 (stuck-at-1). Stuck-at-0 and stuck- at-1 faults are often abbreviated to s-a-0 and
s-a-1, respectively, and these abbre- viations will be adopted here.

Let us consider a NAND gate with input A s-a-1 (Fig. 1.1). The NAND gate perceives the input A as a 1 irrespective of the logic value placed on
the input. The output of the NAND gate in Fig. 1.1 is 0 for the input pattern shown, when the s-a-1 fault is present. The fault-free gate has an
output of 1. Therefore, the pattern shown in Fig. 1.1 can be used as a test for the A input s-a-1, because there is a difference between the output
of the fault-free and the faulty gate.

The stuck-at model, often referred to as classical fault model, offers good representation for the most common types of failures, for example,
short-circuits (shorts) and open circuits (opens) in many technologies. Fig. 1.2 illustrates the CMOS (Complementary Metal Oxide
Semiconductor) realization of a NAND gate, the numbers 1, 2, 3, and 4 indicating places where opens have occurred. The numbers 5 and 6
identify the short between the output node and the ground, and the short between the output node and the Vao, respectively. A short in a CMOS
results if not enough metal is removed by the photolithography, whereas overremoval of metal results in an open circuit [1.4

Fault 1 in Fig. 1.2 will disconnect input A from the gate of transistors 77 and 73. It has been shown that in such a situation one transistor may
conduct and the other remain nononducting [1.5]. Thus, the fault can be represented by a stack- at value of Aj if A is s-a-0, T/ will be ON and 73
OFF, and if A is-a-11T1 will be OFF and 7.1 ON, In the presence of fault 3, only transistor 13 will remain ON, and hence the fault cannot be
represented by the stuck-at model (see the paragraph stuck-on faults in Sec. 1.2.3). Faults 2 and 4 behave similarly as faults 1 and 3,
respectively.

Fault 5 forces the output node to be shoned to Voo, that is, the fault can be considered as s-a-1 fault. Similarly, fault 6 forces the output node to
he s-a-0.

stuck-at model is also used to represent multiple faults in circuits. In a ralffple stack-at fault, it is assumed that more than onc signal line in the
circus are stuck at logic 1 or logic 0, in other wonts, a group of stuck-at faults exist in the circuit at the same time. A variation of the multiple
fault is the unidirectional fault. A multiple fault is unidirectional if all of its constituent fauts are either a-a-0 or s-a-1 but not both simultaneously
The stuck at model has gained wide acceptance in the past mainly because of its relative success with small scale integration. However, it is
not very effective in accounting for all fools in present day VLSI (Very Large Scale Integration), which mainly uses CMOS technology Faults in
CMOS circuits do not necessarily preduce logical faults that can be described as stuck at faults (1.6, 1.71 For example, in Fig. 1.2. faults 3 and
4 create stuck-on transistors faults. As a further example we consider Fig. 1.3, which represents CMOS Implementation of the Boolean function

Z(A+B)(C+ D) + EF.

Two possible shorts numbered 1 and 2 and two possible opens numbered 3 and 4 are indicated in the diagram. Short number I can be modeled
by s-a-1 of input E; open number 3 can be modeled by s-a-0 of input E or input F or both. On the other haust, short number 2 and open number 4
cannot be modeled by any stuck- ut fault beesuse they involve a modification of the network function. For example, in the presence of short
number 2 the network function will change to Z= (A+C)(B+D)(E+F). and open number 4 will change the function Z= overline (AC + BD) * EF .

For this reason, a perfect short between the output of the two gates (Fig. 1.4) cannot be modeled by any stuck-at fault. Without a short, the
outputs of the gates Z_{1} and Z_{2} ure

Z 1 = overline AB

and

Z 2 = overline CD ,
Z 1 =Z 2 = overline AB + CD

whereas with the short,

1.2.2 BRIDGING FAULTS

With the increase in the number or devices on the VLSI chips, the probability of shorts between two or more signal lines has been significantly
increased. Unin tended shorts between the lines form a class of permanent faults, known as bridg ing faults, which cannot be modeled as
stuck-at faults. It has been observed that physical defects i MOS (Metal Oxide Semiconductor) circuits are manifested as bridging faults met
than as any other type of fault [1.8].

Bridging faults can be categorized into three groups:

Input bridging

Feedback bridging

Nonfeedback bridging An input bridging fault corresponds to the shorting of a certain number of primary input lines (Fig. 1.5), whereas a
feedback bridging fault occurs if there is a short between an output line to an input line (Fig. 1.6). A nonfeedback bridging fault identifies a
bridging fault that does not belong to either of the two previous categories. From these definitions, it will be clear that the probability of two
lines getting bridged is higher if they are physically close to each other. Most of the existing literature on the bridging faults assumes that the
probability of more than two lines shorting together is very low, and wired logic is performed at the con- nections. In general, a bridging fault in
positive logic is assumed to behave as a wired-AND (where 0 is the dominant logic value), and a bridging fault in negative logic behaves as a
wired-OR (where 1 is the dominant logic value). If bridging between any s lines in a circuit are considered, the number of single bridging faults
alone will be (n / s)! and the number of multiple bridging faults will be very much larger.

The presence of a feedback bridging fault can cause a circuit to oscillate or convert it into a sequential circuit, For example, a circuit
implementing the function F (x 1 ,...,x n ,...,x n ) oscillates under feedback bridging ( Y*x_{1}, x_{2} ,...,x n ) shown 1.6input combination
( x_{1} ,...,x n ) satisfies condition (1.9): x_{1}*x_{1} *** x q F(0,0,...,0,x i-1 ,...,x n ) overline F (1,1,...,1,x p-1 ,...,x n )=1,

(1.1)

For example, if the network of Fig. 1.7(a) has the feedback bridging fault gamma x_{1}*x_{2} (Fig. 1.7(b)), it will oscillate for the input
combination (x_{1}, x_{2}, x_{3}, x_{4}, x_{3}, x_{6}) = (1, 1, 1, 1, 0, 0) because condition (1.1) is satisfied for this input pattern.

As mentioned previously, early work on bridging faults assumed a wired-AND or wired-OR at the connection of the shorted signal lines.
Although this assump tion is correct for TTL (Transistor Transistor Logic) circuits, it has been realized in recent years that bridging faults in
CMOS circuits show characteristics that cannot be represented by the wired logic concept. The effect of such faults has to be analyzed at the
transistor level rather than at the gate level. However, the number of possible bridging faults in a circuit at the transistor level can be enor mous.
Therefore, for practical reasons, only the effect of single bridging fault in a circuit can be considered. It is important to realize that bridging
faults are layout dependent, that is, shorts may only occur between sagtal lines that are adjacent or overlapping in the layout. A short in the
transistor-level diagram of the circuit under test may not correspond to an actual bridging fault in the layout of the circuit.

Let us consider a two-input CMOS NOR gate circuit to analyze the effect of single bridging faults; the circuit and its layout are shown in Figs.
1.8(a) and (b). respectively. The probable bridging fault in this circuit (or in any other CMOS circuit) can be grouped into four categories (1.10):

1. Metal polysilicon short (a in Figs. 1.8(a) and (b))

2. Polysilicon n-diffusion short (cd in Figs. 1.8(a) and (b))


3. Polysilicon p-diffusion short (ef in Figs. 1.8(a) and (b))

4. Metal polysilicon short (g in Figs. 1.8(a) and (b))

Table 1.1 shows the truth table of the NOR gate in the presence of a bridging fault from each of these four categories. It is often assumed that a
short creating a bridging fault is perfect, that is, the resistance of the short is close to zero. However, it has been observed that the short
resistance can vary from a few ohms to about 5 Kohms (1.11).

1.2.3 BREAKS AND TRANSISTOR STUCK-ON/-OPEN FAULTS IN CMOS

As discussed previously, not all defecta in CMOS VLSI can be represented by using the stuck-at fault model. Recent research indicates that
breaks and transistor stuck-ons are two other types of defects that, like bridging, may remain undetected if testing is performed based on the
stuck-at fault assumption. These defects have been found to constitute a significant percentage of defects occurring in CMOS circuits [1.12]. In
the following two subsections, we discuss the effects of these defects on CMOS circuits.

Breaks

Breaks or opens in CMOS circuits are caused either by missing conducting ma- terial or extra insulating material. Breaks can be either of the
following two types [1.13]:

1. Intragate breaks

2. Signal line breaks


An intragate break occurs internal to a gate. Such a break can disconnect the source, the drain, or the gate from a transistor, identified by b is
b_{2} and b_{3} re- spectively, in Fig. 1.9. The presence of b_{2} will have no logical effect on the operation of a circuit, but it will increase the
propagation delay, that is, the break will result in a delay fault. Similarly, the break at b_{1} will also produce a delay fault without changing the
function of the circuit. However, the break at b_{2} will make the p-transistor nonconducting, that is, the transistor can be assumed to be stark
open.

An inragate break can also disconnect the p-network or the n-network or both networks (b, by, b_{s} in Fig. 1.9) from the circuit. The presence of
b * or b_{3} will have the same effect as the output node getting stuck-at-0 or stack-at-1, respec tively. In the presence of b ss the output voltage
may have an intermittent stuck- a-1 stuck-at-O value, las, if the output node simultaneously drives a p-tramis and an e-transistor, then one of the
transistors will be ON for some onpezilectable pesond of same. Signol fine breaks can force the gates of transistors IMOS crcuits to load. As
shown in Fig 1.9, such a break can make the gate of only a p-transistor or any transistor se flout. It is also possible, depending on the position of
a break, that the gates of both transistors may float, in which case one transistor may conduct and the other remain in a nonconducting state
[1.5]. In general, this type of break can be modeled as a stuck-at fault. On the other hand, if two transistors with floating gates are permanently
conducting, one of them can be considered as stuck on. If a transistor with a floating gate remains in a nonconducting state due to a signal line
break, the circuit will behave in a similar fashion as it does in the presence of the intragate break b

Stuck-On and Stuck-Open Faults

To ensure realistic modeling, faults should be considered at the transistor model, because only at this level is the complete circuit structure
known. In other words, circuits should be tested for shorts and opens at the transistor level. A short corresponds to a stuck-on transistor,
whereas an open corresponds to a stuck-open transistor

A stuck-on transistor fault implies the permanent closing of the path betweza the source and the drain of the transistor, Although the stuck-on
transistor in practice behaves in a similar way as a stuck closed transistor, there is a subtle difference. A stock-on transistor has the same drain-
source resistance as the on- resistance of a fault-free transistor, whereas a stuck-closed transistor exhibits a drain-source resistance that is
significantly lower than the normal on-resistance In other words: In the case of stick-closed transistor, the short between the drain and the
source is almost perfect, and this is not true for a stuck on tramwior. A tansistor stack-on (stuck closed) fault may be mudeled as a bridging
fault from the source to the drain of a transistor. It has been estimated that 10-13% of all faults occurring in CMOS circuits are stuck-on
transistor faults [1.121.

A stuck-open transistor implies the permanent opening of the connection be- tween the source and the drain of a transistor. The drain-source
resistance of a stuck-open transistor is significantly higher than the off-resistance of a nonfaulty transistor. If the drain-source resistance of a
faulty transistor is approximately equal to that of a fault-free transistor, then the transistor is considered to be stuck off. For all practical
purposes, transistor stuck-off and stuck-open faults are func tionally equivalent, and will have the same effect on a CMOS circuit as that of
break fault 2 in Fig. 1.9. Although only about 1% of the CMOS faults are due to stuck-off/stuck-open transistors [1.12), considerable research
has been directed al detecting these faults [1.14-1.17). This is because apart from bridging faults, these are the only faults that can turn a
combinational circuit into a sequential circuit [1.18].

Figure 1.10 shows a two-input CMOS NOR gate. A stuck-open fault causes the output to be connected neither to GND nor to Vop. If, for
example, transistor 72 is open-circuited, then for input AB 00, the pull-up circuit will not be active and there will be no change in the output
voltage. In fact, the output retains its previous logic state; however, the length of time the state is retained is determined by the leakage current
at the output node. Table 1.2 shows the truth table for the two-input CMOS NOR gate. The fault-free output is shown in column 2; the three
columns to the right represent the outputs in presence of the three stuck- open (s-op) faults. The first, As-op, is caused by any input, drain, or
source miss ing connection to the pull-down FET 73. The second, Bs-op, is caused by any input, drain or source missing connection to the pull-
down FET TV The thing, Voos-op, is caused by an open anywhere in the series, p-channel pull up connec tion to Van. The symbol Z, is used to
indicate that the output state iriains the previous logic value. The modeling of the stuck-open faults has been proposed by Wadsack [1.18].

1.24 DELAY FAULTS

As mentioned previously, not all manufacturing defects in VLSI circuits can be represented by the stuck-at fault model. The size of a defect
determines whether the defect will affect the logic function of a circuit. Smaller defects, which are likely to cause partial open or short in a
circuit, have a higher probability of occurrence due to the statistical variations in the manufacturing process [1.19). These defects result in the
failure of a circuit to meet its timing specifications without any alteration of the logic fiction of the circuit. A small defect may delay the
transition of a signal on a fine either from 0 to 1, or vice versa. This type of malfunction is modeled by a delay fault

Two types of delay faults have been proposed in literature:


Gate delay fault 11.20-1.221

Path delay fault (1.23-1.25)

Gate delay faults have been used to model defects that cause the actual propa gation delay of a faulty gate to exceed its specified worst case
value. For example, if the specified worst case propagation delay of a gate is a units, and the actual delay is x Ax units, then the gate is said to
have a delay fault of size Ar. The main deficiency of the gate delay fault model is that it can only be used to model isolated defects, not
distributed defects, for example, several small delay defects. The path delay fault model can be used to model isolated as well as distributed
defects. In this model, a fault is assumed to have occurred if the propagation delay along a path in the circuit under test exceeds the specified
limit .

1.3 Temporary Faults

A major portion of digital system malfunctions are caused by temporary feadta (1.26, 1.27). These faults have also been found to account for
more than 90% of the total maintenance expense, because they are difficult to detect and isolate (1.28, 1.29).

In the literature, temporary faults have often been referred to as interminent or munsient faults with the same meaning. It is only recently that a
distinction between the two types of faults has been made [1.301.

Transient faulis are nonrecurring temporary faults. They are usually caused by -particle radiation or power supply fluctuation, and they are not
repairable be cause there is no physical damage to the hardware. They are the major source of failures in semiconductor memory chips.

Intermittent faults are recurring faults that reappear on a regular basis. Such fantis can occur due to loose connections, partially defective
components, or poor designs, lonermittent faults occurring due to deteriorating or aging components may evemaally become permanent. Some
interminent faults also occur due to environmental conditions such as temperature, humidiny, vibration, and so forth. The likelihood of such
intermittents depends on how well the system is protected Inen its physical environment through shielding, filtering, cooling, and so on. An
dermitient fault in a circuit causes a malfunction of the circuit only if it is active: if it is inactive, the circuit operates correctly. A circuit is said to
be in a fault- active stare if a fault present in the circuit is active, and it is said to be in the Jau-active Mate if a fault is present but inactive 11.31)

Because intermittent faults are random, they can be modeled only by using probabilistic methods. Several probabilistic models for representing
the behavior Intermittent faults have been presented in literature. The first model is a two- scate first-order Markov model (see Appendix)
presented by Breuer for a specific slaus of intermittent faults, which are well behaved and signal independere 11.32) An immittent fault is well
behaved it, during an application of a test pattern, the circunt under test behayos as if either it in fault- -free or a permanent laali exists, An
intermittend fault is signal independent if its being active does not depend on the inputs or the present state of a circuit. Figure 1.11 shows the
fach model proponed by Breuer. It assumes that the Exalt oscillates between the fault-active state (PA) and the fault-not-active state (PN). The
transition proba bilities Indicaned in Fig. 1.11 depend on a selecand time-step, they have to be changed if this time-tep in shanged. Lals and
Hopkins [1.33] used an adaptation of the Breuer's nodel that characterizes the transition between the fault-active sun and the bach-not-active
stane by two parameters and 6. referred to as frequencies of transition. The ratio o/8 is called the latency factor; the higher the Istency factor,
the lower is the probability of the fault being active.

Kamal and Page (1.34) introduced a zero-order Markov model for intermittent faults, and they suggested a procedure for the detection of a
single, well-behaved, signal-independent, intermittent fault in nonredundant combinational circuits. The model assumes prior estimation of the
probability that a circuit possesses an intermittent fault and the conditional probability of a fault being active, given that it is present. The fault
detection procedure employs the repeated application of tests that are generated to detect permanent faults. After applying a test, the prob
ability of detection of a given intermittent fault is calculated using Bayes rule [1.35). This probability approaches | if the test is repeated an
infinite number of times. However, a finite number of repetitions can be found by using one of the two decision rules. One decision rule is to
terminate repetition when the posterior probability (i.c., the probability of a given intermittent fault being present in a circuit after the application
of the test) goes below a certain value. The other decision rule is to stop the application of test when the "likelihood ratio" (which is a function
of the posterior probabilles) becomes less than a threshol number. Usually the number of repetitions required is still very large. The rere order
Marktov model has also been used by Savir (1.36], and by Koren and Kohavi [1.37] for describing the behavior of intermittent faults.

Su et al. (138) have presented a continuous-parameter Markov model for intermittent faults; this is a generalization of the discrete-parameter
mesel pro posed by Breuez. in this model, shown in Fig. 1.12, the transitios probubilities depend linearty on the time-step Al. For example, if a
circuit is in the fault-not active (FN) state at time 1, the probability that it will go to the fault-active (A state at time ar is proportional to A. If a
constant of proportionalery A atsumed, then this probability is given by A år. Similarly the probability for going from FA state at time to FN state
at time is . At. The time-period during which a circuit stays in state FA (FN) is exponentially distributed with mean 1/ (LA), When the time-step åt
is very large, the continuous Markov model reduces to the discrete zero-order Markov model, in which case the probability of the fault being
active ςλ/λ + μ).

The major problem with the interminent fault models discussed so far is that it is very hard to obtain the statistical data needed to verify their
validity. The model proposed by Stifler [1.39] goes a long way to alleviate this problem. It consists of five states (Fig. 1.13). States A and B are
the fault-active and the benign state, respectively. If a fault occurs, the error state E is entered. D is a fault-detected state, and F is a failed state
resulting from the propagation of an undetected error, a(1) represents the probability of transition from the fault-active to the benign state. B(r)
is the rate of occurrence of the transition from the benign state to the fault-active state. p(f), yr, and (1) denose respectively the rates of
occurrence of error generation, fault detection, and error propagation. Each of these transition functions is a function only of the time z, spent in
the source state. The parameter C represents the coverage probability, which is the probability of detecting an error before it causes any
damage. Considerable work has been done in recent years on the diagnosis of per- manent faults in hardware (see Chap. 2); however, the
diagnosis of temporary faults remains a major problem. Currently, two types of technique are used to prevent temporary faults from causing
system failures: fault masking and con- current fault detection. The fault-masking techniques tolerate the presence of faults and provide
continuous system operation. Concurrent fault detection tech niques use totally self-checking circuits to signal the presence of faults but not
mask them.

You might also like