Federal University of
Technology Owerri
Reliability
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
RISK
Major accidents in recent years have taken a sad toll of lives:
Bhopal
Chernobyl
Piper Alpha
Challenger
So have natural disasters:
Bam
December 26 Tsunami
The immediate reaction is always It must never happen again
We need to eliminate hazards as far as possible and reduce
the risks so that the remaining hazards are only a small
addition to the inherent risks of everyday life
RELIABILITY & RISK ANALYSIS TECHNIQUES are the
methods used to assess the safety of modern complex systems
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
RELIABILITY
Definition
Reliability is the ability of a product
to perform as intended ( that is without failure and
within specified performance limits )
for a specified mission time
when used in the manner and for the purposes
intended
under specified application and operational
conditions
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
RELIABILITY
Alternative Definition
Reliability is the probability that a device or system
properly performs its intended function
over time
when operated within the environment for which it
is designed
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
RELIABILITY
Definition stresses 4 elements
Probability
quantitative
Adequate performance must be defined
Time
the period over which we can expect a
certain degree of performance
Operating conditions temperature, humidity,
shock, vibration
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
RELIABILITY
Characteristics of a Product
Estimated in Design
Controlled in Manufacturing
Measured during Testing
Sustained in the Field
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
RELIABILITY
Importance of Reliability
In this modern day of science and technology where
complex devices are used for commercial, military,
scientific, consumer and pleasure purposes
A high degree of reliability is an absolute necessity
There is too much at stake in terms of cost and
human life to take any significant risks with devices
that might not function properly when needed most
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
RELIABILITY
First we will deal with
Foundation of Reliability
Probability and Statistics
Then
In-depth reliability engineering
considerations
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
Objective
To give an overview of
The reliability issues
Techniques
Tasks
Limitations associated with
The design
Manufacture
Operation
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
Probability & Statistics
Pragmatic approach
Discussion will include
Shape of failure distributions
Estimating parameters
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
10
RELIABILITY
Focus on
Preventing failures through
Robust design and manufacturing practices
Based on
Life cycle loads and stresses
Product architecture
Potential defects and failure mechanisms
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
11
RELIABILITY
There are 2 strands in Reliability
FAULT AVOIDANCE
Conservative Design
High Quality Components
FAULT TOLERANCE
Assumes despite all efforts components will fail
USE REDUNDANCY
Price in efficiency
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
12
RELIABILITY
The Characteristics of a Product are:
Estimated in Design
Controlled in Manufacturing
Measured during Testing
Sustained in the Field
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
13
RELIABILITY
Random input
Variables
Continuous
{x}
Discrete
{y}
Output
performance
Characteristics
Binary
{z}
Performance Characteristics of an engineering product
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
14
Quality Production
The Quality of a Product
Performance characteristics may be
Continuous. Fuel consumption
these are objective can be accurately established
by independent measurements and not dependent
on the opinion of an individual
Discrete. Visual appeal, body style
these are subjective based on some scale like (5)
excellent (4) Good..
Binary Based on some feature that the product
does or does not possess. Presence or absence
of sun roof..
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
15
PERFORMANCE CHARACTERISTICS
FOR A FAMILY CAR
Continuous {x}
Discrete {y}
Binary {z}
Urban fuel
Visual appeal of body and Leaded/ unleaded petrol?
consumption
style
Time from 0 to
Visual appeal of interior
Starts first time?
Comfort of ride
Central locking?
60m.p.h.
Braking distance
at 60m.p.h.
Engine noise level Range of exterior colours Quad stereo?
% CO2 in exhaust
Maximum speed
Range of interior colours
Tinted glass?
Power assisted steering?
Sun-roof?
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
16
Specification
The manufacturer of an engineering product will need to
produce a specification
defines the product for a potential customer.
Consists of
a set of target values x1T, x2T,
a target vector {xT}
Urban fuel consumption
40 miles per gallon
Maximum speed
100 miles per hour
Time from 0 to 60m.p.h.
13 seconds
Braking distance at 60m.p.h.
180 feet
Engine noise level
70dB
% CO2 in exhaust
1%
Random effects X1T + 1
Tolerance limits Tolerance vector {}
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
17
RELIABILITY
Target performance {xT}
Tolerance {}
Reject if :
the actual performance {x} lies outside {xT }
Both manufacturer and customer need to know how the actual
random variations in a given performance characteristic {x} for a
given product, across different individual units and under different
environmental and operating conditions, compare with the target and
tolerance values {xT} and { }.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
18
RELIABILITY
A statistical analysis of the variations is required.
This involves calculating:
Mean
Standard deviation
Probability
Probability density function
To do this we require N sample values of the
performance characteristic {x} specified by xi where
i = 1,2,N
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
19
RELIABILITY
Mean x = 1
N
i=N
xi
i=1
1
Standard Deviation = N
1
Root mean square xRMS = N
i=N
(xi - x)2
i=1
i=N
(xi)2
i=1
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
20
RELIABILITY
System Failure Rate The Bathtub Curve
F
a
I
l
u
r
e
R
a
t
e
Infant Mortality
Period
Operating Period
Wear-out Period
Time
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
21
THE BATHTUB CURVE
The infant mortality period or debugging stage
Failures typically caused by manufacturing flaws
Damage received in transit
Damage received in handling
The operating period
Smaller failure rate
Failure rate tends to remain constant
Failure typically due to only to chance
Failure generally results from severe, unpredictable and
usually unavoidable stresses that arise from environmental
factors such as vibrations, temperature, shock and pressure.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
22
THE BATHTUB CURVE
The wear-out period
Failure rate increases rapidly
Failure as a result of gradual degradation of some
property of the system essential to proper operation
The degradation may occur from causes such as
fatigue, creep, corrosion and abrasion
We are most interested in the period between infant
mortality and wear-out.
In this period we have
Constant failure rate
Exponential failure time density function
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
23
RELIABILITY
Failure Rate
Assume at t = 0 we have N0 articles
At time t = t we observe Ns have survived
The number failed is NF
So
N0 = NS(t) + NF (t)
And
R(t) = NS(t) / N0
R(t) is the Reliability as a function of time
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
24
RELIABILITY
Ns
Graph of NS vs t
N0
The failure rate is the limit
as t 0 of
(the gradient at t) NS
Ns(t)
t+t
ECE 510 Reliability and Quality Assurance in Electronics
Time
2nd Semester April. 2013
25
RELIABILITY
The reliability R of a product can be defined
as the probability that the product continues to
meet some specification
The unreliability F of a product can be
defined as the probability that the product fails to
meet the specification
Both reliability and unreliability vary with time
R(t) decreases with time
F(t) increases with time
R(t) + F(t) = 1
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
26
PRACTICAL RELIABILITY
DEFINITIONS
Non- repairable items
Suppose that N individual items of a given non-repairable
product are placed in service and the times at which failures
occur are recorded during a test interval T
Further assume that all N items fail during T and the ith failure
occurs at time Ti
i.e. Ti is the survival time or up time for the ith failure
The total up time for N failures is therefore
I =N
Ti
I =1
and the mean time to failure is given by
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
27
PRACTICAL RELIABILITY
DEFINITIONS
Total up time
Number of failures
Mean Time To Fail =
i.e.
MTTF =
1
N
i=N
Ti
i=1
Number of Failures
Mean Failure Rate =
Total up time
N
i.e.
i=N
i=1
Ti
The mean failure rate is
the reciprocal of MTTF
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
28
REPAIRABLE SYSTEMS
Mean Time To Failure & Mean Time
Between Failures
LIVE
TTF
Under
Repair
Repair
Time
TBF
1 N
MTTF fti
N i 1
fti is TTF
1
MTTF
t[ Nf (t )dt ] tf (t )dt
N0
0
or total life for N devices = N MTTF
and between t and t+t the number live is NR(t)
MTTF
R(t )dt
0
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
29
PRACTICAL RELIABILITY
DEFINITIONS
There are N survivors at
1
time t = 0, N - i at t = Ti,
decreasing to zero at
time t = T. The figure
shows the probability of
survival, i.e. the
reliability, Ri = (N-i) / N
decreases from Ri = 1 at 2/N
t = 0, to Ri = 0 at t = T.
1/N
MTTF = Total area under 0
graph
T1
T2
ECE 510 Reliability and Quality Assurance in Electronics
TN
t
2nd Semester April. 2013
30
Quantification of Reliability
ReliabilityThe probability that a system/component
works
AvailabilityThe probability that a system/component
works on demand
Availability at time t The probability that a system /
component works on demand
AvailabilityThe fraction of the total time that a system /
component can perform its required function
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
31
Quantification of Reliability
Unavailability = 1 - Availability
Unreliability = 1 - Reliability
For the failure process let
F(t) = P[a given component fails in [0,t)]
The corresponding probability density function f(t) is therefore
dF (t )
f (t )
dt
So
So
f(t)dt = P[a component fails in time period [t, t + dt)]
t
F (t ) f (t )dt
0
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
32
Quantification of Reliability
Transition to the failed state can be characterised by the
conditional failure rate h(t).
This function is sometimes referred to as the hazard rate or
hazard function.
This parameter is a measure of the rate at which failures
occur taking into account the size of the population with the
potential to fail, i.e. those that are still functioning at time t:
So
h(t)dt = P[a component fails in time period t, t + dt| it has
not failed in [0, t)]
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
33
Quantification of Reliability
For conditional probabilities we can write:
P( A B)
P( A B)
P( B)
Since h(t)dt is a conditional probability we can define events A
and B as follows by comparing this with the equation above:
A Component fails between t and t + t+dt
B component has not failed in[0, t)
With events defined like this P(A B) = P(A) since if the
component fails between t and t+dt it is implicit that it cannot
have failed before time t
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
34
Quantification of Reliability
h(t)dt =
P[component fails between t and t+dt]
P[component not fail in [0, t)]
f (t ) dt
=
1 F (t )
t
t
f (t ' )
h
(
t
'
)
dt
'
0
0 1 F (t ' ) dt '
Integrating gives
t
h(t ' )dt ' = -ln[1-F(t)]
0
F(t) = 1-exp h(t ' ) dt '
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
35
Quantification of Reliability
t
F(t) = 1-exp h(t ' )dt '
If h(t), the failure rate or hazard rate for a general system or
component is plotted against time we get the bathtub curve.
In the useful life period h(t) = is constant.
So after integration
F(t) = 1-e-t
And the reliability, the probability that the component works
continuously over (0, t] is the exponential function
R(t) = e-t
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
36
System Mean Time to Failure
When system failure can be tolerated and repair can
be instigated an important measure of the system
performance is the system availability
A=
MTBF
MTBF + MTTR
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
37
QUANTIFIED RISK ASSESSMENT
SYSTEM LIFE CYCLE
SYSTEM
DEFINITION
PHASE
CONCEPT
DESIGN
PHASE
Establish
reliability
requirements
DETAIL
DESIGN
PHASE
Perform global
safety/
availability
Set provisional assessment
reliability/
Identify critical
availability
areas and
targets
components
Prepare
reliability
specification
Confirm /
review targets
MANUFACTURING
OPERATING
PHASE
PHASE
FMEA / FTA of
critical systems
and components
Prepare and
implement
reliability
Review reliability specifications
database
Carry out
detailed system
reliability
assessment
Review reliability
demonstrations
Prepare safety
case
ECE 510 Reliability and Quality Assurance in Electronics
Audit reliability
performance
Collect and
analyse
reliability, test
and
maintenance
data
Assess reliability
impact of
modifications
2nd Semester April. 2013
38
RELIABILITY
CRITICAL FAILURES
Where failure causes total loss of function
MAJOR FAILURES
Where failure causes major loss of function but
the product can still be used to some extent
MINOR FAILURES
Where failure leaves the product still able to be
used to perform the major function but with the
loss of some convenience function
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
39
RELIABILITY NETWORKS
A reliability network is a representation of the
reliability dependencies between components of a
system
Dependencies are used in such a way as to
represent the means by which the system will
function
Such a network can be used to assess the
probability of failure of a system
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
40
TOPOLOGICAL RELIABILITY
The functional behaviour of most systems can be
characterised by a network diagram
Nodes denote the subsystems
Branches of the network represent the functional
relationship between these subsystems
Example
A high voltage supply system consisting
Transmitter A
of two transmitters A and B and
Power
C
a power supply.
Supply
For the system to work
the power supply and at least one of
Transmitter B
the transmitters must operate
A path must exist between C and D for the system to work.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
41
RELIABILITY NETWORKS
The question of what constitutes proper operation or
proper function for a particular type of equipment is
usually specific to the equipment
Rather than attempt to suggest a general definition
for proper function we assume that the appropriate
definition for a device of interest has been specified
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
42
RELIABILITY NETWORKS
We can represent the functional status of the
device as
1
0
if the device functions properly
if the device has failed
Note that this representation is intentionally binary.
We assume that the status of the equipment of interest is either
satisfactory or failed.
There are many types of equipment where one or more de-rated
states are possible and methods have been developed to cope.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
43
RELIABILITY NETWORKS
We presume that most equipment is
comprised of components and that the status
of the device is determined by the status of the
components.
Let n be the number of components that make
up the device and define the component status
variables xi as
xi
1
0
if the device functions properly
if the device has failed
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
44
RELIABILITY NETWORKS
The set of n components that make up the device is
represented by the
component status vector:
x = { x1, x2,, xn }
The dependence of the device status on the
component status is represented by the
function
= (x)
referred to as a system structure function
or a
system status function
or simply as a
structure
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
45
RELIABILITY NETWORKS
There are 4 generic types of structural
relationships between a device and its
components.
1.
2.
3.
4.
Series
Parallel
k out of n
All others
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
46
RELIABILITY NETWORKS
SERIES SYSTEMS
Definition
A series system is one in which all components
must function properly in order for the system
to function properly.
Reliability block diagram of a series system
Conceptual analogue
circuit
Series electrical
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
47
TOPOLOGICAL RELIABILITY
Example 2
In an aircraft electronics system consisting of
a sensor subsystem,
guidance subsystem,
computer subsystem and
fire control subsystem
the system can only operate successfully if these four subsystems operate
Sensor
Guidance
Computer
Fire Control
NOTE: The
figure only depicts the functional relationship required for system
operation and does not necessarily mean that these subsystems are electrically
wired together in series.
Examples where components are not physically connected:
The set of legs on a 3-legged stool. The set of tyres on a car.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
48
RELIABILITY NETWORKS
SERIES SYSTEMS
For the series structure the requirement that all
components must function implies that an
algebraic form for the structure function is:
n
(x) =
xi
i=1
Examples
x1= x2 = 1, x3 = 0 results in (x) = 0
results in (x) = 0
functioning x1= x2 = x3 = 0
x1= x2 = x3 = 1
results in (x) = 1
Only the
of all components
results in system
function
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
49
RELIABILITY NETWORKS
Definition
PARALLEL SYSTEMS
A parallel system is one in which any one component must
function properly in order for the system to function properly.
Reliability block diagram of a series system
1
2
3
Conceptual analogue
Parallel electrical circuit
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
50
RELIABILITY NETWORKS
PARALLEL SYSTEMS
1
2
3
Examples
Similar to the series the structure function for
the parallel system may be expressed as:
n
(x) = 1- (1- xi)
i=1
x1= x2 = 1, x3 = 0
1
x1= 1, x2 = x3 = 0
1
x1= x2 = x3 = 0
0
results in (x) =
results in (x) =
results in (x) =
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
51
RELIABILITY NETWORKS
PARALLEL SYSTEMS
Parallel systems are often referred to as Redundancy
Often,
but not always, the parallel components are identical
There are actually several ways in which the redundancy
may be implemented
This diversity can lead to different reliability under different
environmental conditions
A
distinction is made between redundancy obtained using a
parallel structure in which all components function simultaneously
(ACTIVE REDUNDANCY) and that obtained using parallel
components of which one functions and the other or others wait
as standby units (STANDBY REDUNDANCY).
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
52
RELIABILITY NETWORKS
Definition
k-out-of-n SYSTEMS
A k-out-of-n system is one in which any k of the n
components that comprise the system must function properly
in order for the system to function properly.
1
k-out-of-n
2
3
4
(x)=
if
otherwise
i=1
xi k
Example cases for a 3-out-4 system
x1= x2 = x3 = 1, x4 = 0
results in (x) = 1
x1= x2 = 1, x3 = x4 = 0
results in (x) = 0
x1= x2 = x3 = 0, x4 = 1
results in (x) = 0
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
53
TOPOLOGICAL RELIABILITY
Example 3
In a computer system with a computer, a controller, and three
memory units suppose that the system can only satisfy its
operational requirements if at least two of the three memory units
are operable and both the computer and the controller are operable.
The 4 branches represent the 4 possible
ways we can obtain system
operation.
Unit 1
Unit 2
Unit 1
Unit 3
Controller
Unit 2
Unit 3
Unit 1
Unit 2
Computer
Unit 3
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
54
Equivalent Computer Network
2
4
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
55
TOPOLOGICAL RELIABILITY
Communication System diagram
Antenna
Receiver
Converter
Teleprinter
Pulse
Shaping
Unit
Antenna
Receiver
Converter
Teleprinter
At least 1 of the two Antenna Receiver Converters must work
At least 1 of the two Teleprinters must work
The Pulse shaper must work
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
56
TOPOLOGICAL RELIABILITY
In general suppose the topological or network representation
of a system consists of n nodes and define
R(N1,Nm) = probability that nodes number N1,, Nm are
operating and the other n-m nodes are not operating
Then the probability that exactly m nodes are simultaneously
operating is given by
Rm = ..... R(N1, , Nm)
N 1 Nm
Where the sum is taken over all positive integers N1, , Nm
such that n N1 > N2 >.>Nm 1
Thus the probability that at least k nodes are operating is
n
given by
Rm
mk
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
57
Reliability of Systems
1 Series systems
R1
R2
R3
R4
Ri
Rm
System of m elements in series with individual reliabilities
R1, R2, , Ri, Rm respectively.
The system will only survive if every element survives , if
one element fails the system fails.
Assume The reliability of each element is independent
of the reliability of the other elements
The probability that the system survives is the probability
that element 1 survives and the probability that element 2
survives and the probability that element 3 survives etc.
The system reliability is the product of the element
reliabilities.
Rsyst = R1R2R3 Ri Rm
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
58
Reliability of Systems
1 Series systems
If we assume a constant failure rate for the elements
then since
Ri = e-t
Rsyst = e-1 t e-2 t .e-I t .e-mt
So if syst is the overall system failure rate
Rsyst = e-syst t = e-(1 + 2 + .+ I t . +m) t
syst = 1 + 2+. +i+. +m
Failure rate of a series network is the sum of the
individual element failure rates so it is important to
keep the number of elements to a minimum and so the
reliability will be maximum.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
59
Reliability of Systems
1 Series systems
Unreliability of Series System with Small Failure rates
Protective systems have element and system UNRELIABILITIES F
that are very small. The corresponding system reliabilities are
therefore very close to 1, for example 0.9999 may be typical. Then the
calculation of Rsyst = R1R2R3 Ri Rm may be arithmetically unwieldy
and the alternative equation involving unreliabilities may be more
useful since
Rsyst = 1 - Fsyst and Ri = 1- FI
We have
1 - Fsyst= (1- F1 ) (1- F2 ) (1- FI ) (1- Fm )
= 1 (F1 + F2 ++ FI + +Fm )
+ terms involving products of Fs
If the individual Fi are small i.e. Fi << 1 the terms involving products
of Fs can be neglected giving the approximate equation
Fsyst F1 + F2 ++ FI + +Fm
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
60
Reliability of Systems
Parallel Systems
An overall system consisting of n individual
Elements or systems in parallel with
Individual unreliabilities F1, F2, , Fj, Fn
Only one individual element is necessary
to meet the functional requirements of the
overall system
The remaining elements increase the
reliability of the system
THIS IS CALLED REDUNDANCY.
Failure only if ALL the elements fail
The unreliability of a parallel system
Fsyst = F1F2FjFn
ECE 510 Reliability and Quality Assurance in Electronics
F1
F2
Fj
Fn
2nd Semester April. 2013
61
Reliability of Systems
Voting Systems
Majority voting systems are used to protect
hazardous plant and processes and have
applications in the chemical, nuclear and aerospace
industries. Diagram shows a typical system with 2
out of 4 voting with initiators A, B, C, D.
A
B
C
2oo4
voting
element
Shut down
system
Trip setting
Process
parameter inputs
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
62
Reliability of Systems
Voting Systems
Suppose R and F are the reliability and unreliability of
the individual initiators.
The overall initiation system fails to protect the plant if
either all 4 initiators fail
or any 3 initiators fail
If 2 or less initiators fail there are still sufficient left to
trip the plant
Since the Fs are normally small then the rare events
approximation is valid and the overall system
unreliability is the sum of the following probabilities
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
63
Reliability of Systems
Voting Systems
FINIT = Probability that A and B and C and D fail
+
Probability that A and B and C fail
+
Probability that B and C and D fail
+
Probability that A and C and D fail
+
Probability that A and B and D fail
Each of the terms is the product of individual
unreliabilities and reliabilities.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
64
Reliability of Systems
Voting Systems
FINIT = F4 + F3R + F3R + F3R + F3R = F4 + 4F3R
= F3(F + 4R)
This result can be obtained from the binomial
expansion of (F + R)4
(F + R)4 = F4 + 4F3R + 6F2R2 + 4FR3 + R4
The first term F4 represents the probability of all 4
initiators failing and the second the total probability
of 3 failing. If R 1 and F 1 Then FINIT 4F3
The unreliability of the complete protective system is
FSYST FINIT + FVOTING + FSHUT-DOWN
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
65
Reliability of Systems
Majority Voting Systems
In a majority voting system there are n trip channels
and the plant is tripped if m (n m) indicate that the
plant should be tripped.
Such a system is referred to as m out of n or m oo n
The binomial distribution can be used to calculate
overall failure probabilities.
Consider the jth term in the binomial expansion of
(F +R)n where F and R are the single channel
reliability and unreliability and n is the total number
of channels.
n
This is
CjFjRn-j and is the probability that
j channels fail i.e. (n-j) channels survive.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
66
Reliability of Systems
Fail-Safe & Fail-Danger
Fail-Danger failure
Any system or component failure that prevents, or
tends to prevent, the plant being tripped when a
potentially hazardous fault condition occurs.
Example A pressure switch failed to open when the
pressure exceeded the trip pressure
Fail-Safe failure
Any system or component failure that
produces a plant trip when a plant trip is not
required.
Example A pressure switch opened when the
pressure was below the trip pressure
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
67
Reliability of Systems
Fail-Safe & Fail-Danger
A fail-danger failure is a very serious occurrence
Fail-safe failures are less serious but cause loss of
production and confidence in the trip system
Fail-danger and Fail-safe failures will generally
have different failure rates and so different failure
probabilities
Detailed information on the failure rates associated
with all possible modes of failure of trip equipment
is not always available
We may have to assume both rates are equal to the
average failure rate
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
68
Reliability of Systems
Fail-Safe & Fail-Danger
Supposing we wish to calculate overall fail-danger
and fail-safe for a system with two out of three
voting i.e. 2 oo 3 where m = 2 and n = 3 . These
probabilities can be calculated from the binomial
expansion of (R + F) 3, where F and R are the single
channel reliability and unreliability respectively.
So we have (F + R)3 = F3 + 3F2R + 3FR2 + R3
where F3 represents the probability that all 3
channels fail, 3F2R the probability that 2 channels
fail, 3FR2 the probability that 1 channel fails and R3
the probability that no channel fails
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
69
Reliability of Systems
Fail-Safe & Fail-Danger
Looking at fail-danger first
if either 2 or 3 channels fail dangerously then
there are correspondingly only 1 or zero channels
left working.
This is insufficient to trip the plant with 2 oo 3
voting and an overall fail danger situation has
occurred.
If FD is the single channel fail-danger probability
then the overall fail danger probability is
PD = FD3 + 3R FD2 = FD2(3R + FD)
In a protective system R 1 and FD 1 giving
PD 3FD2
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
70
Reliability of Systems
Fail-Safe & Fail-Danger
Looking at fail-safe
A fail-safe failure of no channels or only one
channel will not cause a plant trip with 2 oo 3
voting.
A fail-safe failure of two channels will cause an
unnecessary plant trip.
The failure of a third channel is irrelevant because
the plant is tripped by only 2 channels.
The overall fail-safe probability is therefore
PS = 3RFS2 3FS2
where FS is the single channel fail-safe probability
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
71
Reliability of Systems
Fail-Safe & Fail-Danger
Overall fail-danger probability
n
PD = CrF D
n
where r = n-m+1 and Cr = n! {r!(n-r)!}
FD
MAX
= 1- e
- T
D
DT (if DT 1
Overall fail-safe probability
n
PS = CmF
m
S
where Cm= n! {m!(n-m)!}
MAX
S
= 1- e
- T
S
ST (if ST 1
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
72
Reliability of Systems
Fail-Safe & Fail-Danger
FRACTIONAL DEAD TIME FDT
Is related to fail-danger probability
Is the mean proportion of the testing interval T that
the trip system is incapable of protecting the plant
T
FDT = {1 T} 0 FD(t)dt
FDT is a similar concept to unavailability
n
r
D
FDT = {1 (r+1)} CrF
Majority voting can be implemented with
combinatorial logic.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
73
FAULT TREE , EVENT TREE and
FMECA ANALYSIS
To check for fault propagation one technique is
Failure
Modes
Event
Criticality
Analysis
A full FMECA is hard and expensive. Take every
component, wire, connector and think of every possible
fault. Consider the effects of all of these - are they single
point failures? Can they propogate?
propagate? Document the
results.
An FMECA is a development from an FMEA - (Failure
Modes Event Analysis)
FMEA and FMECA are bottom-up analyses. The alternative
approach is a fault tree analysis.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
74
FMECA ANALYSIS
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
75
FMECA ANALYSIS
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
76
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
77
FAULT TREE , EVENT TREE and
FMECA ANALYSIS
Event trees are
Encountered frequently in the analysis of
events including human activities that can
lead to disasters or undesirable events
Sometimes called Cause - Consequence
analysis
Used more frequently in Safety studies
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
78
EVENT TREE
Consider the following example of a fire alarm system.
Ideally if there is a fire then
The alarm goes off.
A sprinkler system extinguishes the fire.
In each case there is a human standby
If either the alarm or the sprinkler system fails
Human operator can operate either or both
This can be represented by the event tree in the figure
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
79
EVENT TREE
Fire
starts
Alarm
functions
Operator notices
malfunction
YES
0.999
YES
510-4
NO
10-3
NO
YES
0.99
NO
10-2
Sprinkler
system
functions
Operator notices
malfunction
YES
0.998 YES
NO
0.9
NO
-3
210
0.1
YES
0.998
YES
NO 0.999
NO
210-3
10-3
0.9995
ECE 510 Reliability and Quality Assurance in Electronics
Fire Suppressed
4.9810-4
Fire Suppressed
8.9910-7
Fire Spreads
9.9910-8
Fire Suppressed
4.910-7
Fire Suppressed
9.910-10
Fire Spreads
9.910-13
Fire Spreads
510-9
NO FIRE
0.9995
2nd Semester April. 2013
80
EVENT TREE
Notice that of all the possible outcomes only three are that the fire
spreads
The possible sequence of events that that can lead to this undesireable
event can now be identified from these outcomes.
The alarm fails to function and the operator fails to notice and
take action in time
The alarm functions but the sprinkler fails to function and the
operator fails to notice and take action in time
The alarm fails to function, and the operator notices, but the
sprinkler fails to function, and the operator fails to notice
If sufficient data exists to estimate probabilities the likelihood of the
various outcomes can be obtained
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
81
FAULT TREE
An Example of a Deductive approach.
What can cause this?
Used to identify the causal relationships leading to a
specific system failure mode.
The system failure mode is the TOP event and the
FAULT TREE is developed in branches below this
event showing its causes.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
82
Fault Tree From Logic Expression
T = (abc + f)[(a + d)f](a +be)
a
b
c
a
T4
T1
f
f
T5
T2
d
b
T6
e
T3
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
83
Fault Tree from Logic Expression
Simplifying the expression:
T = (abc + f)[(a + d)f](a +be)
= (abc +f)(af + df)(a + be)
= abcf + af + abcdf + adf + abcef + abef + abcdef + bdef
Using XX = X
= abcf(1+d+e+de) + af (1+d+be) + bdef
Using (1 + X) = 1
= abcf + af + bdef
= af (bc + 1) + bdef
= af + bdef
= f(a +bde)
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
84
QUALITY DESIGN AND QUALIFICATION
TESTING ACTIVITIES
Design testing refers to
laboratory tests
on computer
and / or
prototype models
to prove that the design is capable of
meeting the quality specification
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
85
QUALITY DESIGN AND QUALIFICATION
Qualification testing refers to
field testing of
pre-production models
and
production models
involving
all performance characteristics
over the full range of relevant
environmental variables
to further verify that the specification can
be met.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
86
DESIGN FOR RELIABILITY
Objective
To design a given product or system which
meets the target failure rate T
under the environmental conditions
specified.
It is assumed that
all components and elements are operating
in the useful life region where failure rate is
constant with time.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
87
DESIGN FOR RELIABILITY
General principles to be observed.
a) Element / component selection
Only elements / components with well
established failure rate data / models should be
used
Some technologies are inherently more reliable than
others.
e.g.
Solid state switching devices are more reliable
than electromechanical reed relays
Inductive displacement transducers are more
reliable than the resistive potentiometer type.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
88
DESIGN FOR RELIABILITY
b) De-rating
Stress (x) was defined as variable which when applied
to an element or component tends to increase failure
rate.
e.g.
mechanical stress
voltage
Strength (y) was defined as any property of the
element or component which resists the applied
stress
e.g.
elastic limit
rated voltage
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
89
DESIGN FOR RELIABILITY
To reduce failure rate
strength should exceed stress by an
adequate
(y-x)
Safety Margin
(x2 + y2)
In a mechanical element SM > 5.0
In an electronic circuit the voltage
Stress Ratio SR should be kept below
0.7
x
Stress Ratio
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
90
DESIGN FOR RELIABILITY
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
91
DESIGN FOR RELIABILITY
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
92
DESIGN FOR RELIABILITY
e)Redundancy
The use of several identical elements /
systems connected in parallel increases the
reliability of the overall system.
Redundancy should be considered in situations
where either the complete system or certain
elements of the system have too high a failure
rate.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
93
DESIGN FOR RELIABILITY
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
94
DESIGN FOR RELIABILITY
If the probability of common mode failure limits
the reliability of the overall system
equipment diversity should be considered
Here a common function is carried by two
systems in parallel
but
with
Each element is made up of
different elements
different operating principles
e.g.
A temperature measurement device made up of
two subsystems in parallel
one electronic
one pneumatic
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
95
DESIGN FOR RELIABILITY
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
96
High Reliability Design
The system designer may consider component
redundancy
ADVANTAGES of redundancy
The quickest solution if time is of prime importance
The easiest solution, if the component is already
designed
The cheapest solution, If the component is
economical in comparison with the cost of redesign
The only solution, if the reliability requirement is
beyond the state of the art
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
97
High Reliability Design
DISADVANTAGES of redundancy
Too expensive, if the components are costly
Exceed the limitations on size and weight,
particularly in satellites
Exceed the power limitations, particularly in active
redundancy
Attenuate the input signal, requiring additional
amplifiers which increase complexity
Require sensing and switching circuitry so complex
as to offset the advantage of redundancy
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
98
Exercises
1) Discuss, giving examples, the methods including procurement and testing procedures,
used by manufactures to ensure the reliability of a product.
2) Discuss the differences in reliability required in systems such as consumer products,
trains, aeroplanes, satellites. What value would you assign to the overall failure rate of
each of these systems.
3) What do you understand by the reliability of a system? Discuss some practical ways to
assign a quantitative value to the reliability of a system.
4) Draw a fault tree for the lighting system in a car and hence derive a logic equation for
the failure of the headlights.
5) Discuss the concepts of fail-safe and fail-danger. Why is the single unit probability for
fail-safe and fail-danger often assumed to be the same. Explain why in spite of this
assumption the overall fail-safe probability of a complex system will be different from the
overall fail-safe probability.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
99
Problems
1) )The figure shows a protective system, based on temperature measurement. The system is to have a
maximum fail-danger probability not exceeding 810-3 and a maximum fail-safe probability not exceeding
510-2. The system is tested and proved to be working correctly at three-week intervals. Annual fail-safe
and fail-danger failure rates for each component are:
Thermocouple
S = D = 0.5
Thermocouple input trip amplifier/comparator
S = D = 0.1
m out of n voting element
S = D = 0.05
Logic operated switch
S = D = 0.1
Solenoid valve
S = D = 0.1
Trip valve
S = D = 0.1
Calculate the maximum fail-safe and fail-danger probability PS and PD for:
a) The high integrity voting equipment, HIVE.
b) The high integrity trip initiator, HITI.
c) The high integrity shutdown system, HISS.
And hence
d) The total system fail-danger probability
e) The total system fail-safe probability
State whether the system meets the design criteria
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
10
0
Problems
0.5
0.1
0.1
0.5
0.1
0.1
Thermocouple
HITI
0.1
0.05
0.1
0.5
0.1
2 oo 3 voting
Logic switch
0.1
Solenoid valve
0.1
Trip valve
Trip amp/comp
HIVE
HISS
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
10
1
Problems
Solution
i
ii
HIVE Maximum FS = FD = 1- e-0.053/52 = 2.8 10-3
HITI single channel
Maximum FS = FD = 1- e-0.63/52 = 3.4 10-2
2 00 3 voting HITI;
PD= 3FD2 = 3 (3.4 10-2)2 3.4 10-3
iii
HISS single channel
Maximum FS = FD = 1- e-0.33/52 = 1.7 10-2
Two channels in parallel
PD= FD2 = (1.7 10-2)2 0.3 10-3
PS= 2FS 2 1.7 10-2 = .3.44 10-2
iv
Total System fail-danger probability = (PD)HITI + (PD)HIVE + (PD)HISS
= 3.4 10-3 + 2.8 10-3 + 0.3 10-3
= 6.5 10-3
Total System fail-safe probability = (P S)HITI + (PS)HIVE + (PS)HISS
= 3.4 10-3 + 2.8 10-3 + 34.4 10-3
= 40.6 10-3
= 4.1 10-2
The system meets the design specification
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
10
2
Problems
2) A taxi owner has 20 cars. Records show each car on average breaks down once every 2 years and that this is
reasonably constant. How many breakdown calls will he have per year?
What is the probability of 1 breakdown in a 3 month period?
Solution.
Statistical assumptions hold
(for 1 car)
c = 1/2
F = Nc = 10
3 months = 1/4 year
F(t) = 1 - e
-Ft
-10.1/4
F(1/4) = 1 - e
= 0.92
OR
92% chance of at least one failure in 3 months.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
10
3
Problems
3) A basic guidance and navigation system for a proposed space probe consists of an Inertial Set, a
Canopus Sensor, a Sun Sensor, and a Computer. The reliability for each device is Rinertial set =
0.95; Rsun sensor = 0.90; Rcanopus sensor = 0.85; Rcomputer = 0.90. For the system to operate all four
subsystems must be operating. Due to design constraints the space probe can only contain one
Inertial set and one Computer. To increase the reliability of the system three Canopus Sensors
and two Sun Sensors are used in hot redundancy.
(i). Draw a reliability block diagram for the system without the redundancy
(ii). Draw a reliability block diagram for the system including the redundancy
Calculate the reliability of the system in each case.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
10
4
Problems
(4) In a distillation column a hazardous situation is created if the flow rate of steam to the reboiler goes high; this
causes a high flow rate of vapour up the column producing a high pressure which could cause the vessel to rupture.
The temperature control loop consists of a platinum resistance thermometer (PRT), a transmitter ( which converts
resistance change to a 4-20mA current signal), a controller, a current-to-pneumatic converter and a control valve.
The plant is protected by a pressure trip system consisting of a pressure switch and three-way solenoid valve
located in the air line between the converter and the control valve.
Failure mode and effect analysis of the system shows that:
1. A fail-danger situation F in which the Steam control valve moves fully open, occurs if either Pressure in valve
bonnet increases (F1) or Control valve fails open (F2).
2. F1 occurs if Pressure signal to control valve increases (F3) and Solenoid does not vent air (F4).
3. F3 occurs if PRT short circuit (F5) or Transmitter O/P fails low (F6) or Controller O/P fails high (F7) or I/P
Converter O/P fails high (F8).
4 F4 occurs if Pressure switch fails to open (F9) or Solenoid fails to vent (F10)
(i). Draw a fault tree diagram for the fail-danger failure
(ii). Write down the logic expression for the fail-danger failure F.
ECE 510 Reliability and Quality Assurance in Electronics
2nd Semester April. 2013
10
5