Reliability Engineering (Lecture Note On Topic 1 and 2)
Reliability Engineering (Lecture Note On Topic 1 and 2)
Course Content
1. Basic reliability concept: definition, need for reliability, reliability programme plan.
2. Continuous theoretical probability distribution; Exponential, Log-Normal and
Weibull.
3. Reliability measures; failure function, reliability function, hazard rate function (bath-
tub shape), mean time to failure (MTTF), mean operating time between failures
(MTBF).
4. Reliability block diagram; series, parallel, series-parallel, redundant systems and
complex.
5. Reliability assessment at design phase: Failure Modes Effects and Critical Analysis
(FMECA), Fault Tree Analysis (FTA).
6. Reliability tests/. Prototype Testing
7. Maintainability and Availability.
8. Reliablity of Electrical and Electronics System
9. Specifications
2) Identify and correct the causes of failures that do occur, despite the efforts to prevent
them.
3) Determine ways of coping with failures that do occur, if their causes have not been
corrected.
4) Apply methods for estimating the likely reliability of new designs, and for analysing
reliability data.
1
Course Learning Outcome
Upon successful completion of this course, the student should be able to:
S/N CLOs
CLO1
Understand and Discuss the basic parameter of quality which includes;
Reliability, Availability, Maintainability, Reparability Compatibility
Durability etc
CLO2
Discuss failure and repair characteristics of system with respect to their
reliability
2
1. INTRODUCTION
1.1 Historical Backgroung
The theory of reliablity is a science which studies the laws of occurrence of failures in techical
equipment. Reliablity theory grew up in the World War II and ever since then, this subject has
been given a good attention by various countries, especially in industry and defense (military).
In advance country like the USA, UK, Russia and others, various reliability reserarch groups for
industries and defense establishments were orgainzed. Some of the areas of reliability research
are life testing, structural reliability, machine maintenance problem and replacement problems.
The overlaping areas are quality control, extreme value theory, order statistics and censhorship
in sampling.
Beside industry and defense (military), reliability technology is also being used in Biomedical
Enegineering, Electronics, Mining, and Electrical Engineering. In reality, reliability techolology
is applicable to almost all practical problem of life.
1) Reliability evaluation
2) Reliability apprortionment
3) Design review
4) Design Control
5) Specification, material and process review
6) Vendor control
7) Test plainng operation and analysis
8) Reliability knowledge
9) Reliability and failure reporting systems
10) Mathematical and statistical services for reliability problems
Parameter of Quality
Reliability
Availability
Maintainability ( Ability to be maintain)
3
Reparability (availability of repair expert)
Compatibility
Durability
Interchangeability of parts
2. Reliability, ( ),: The probability that an item/product will perform its intended function
(prescribe purpose) without failure for specified period of time, when operated correctily in
a specified environment.
N.B:
The defination of reliability stresses four (4) elements; probability, tolerance or
adequate perfomance, time and operating codition,
Reliability is the extension of quality into the time domain
A figure of reliability is meaningless unless the operating conditions are specified.
Example:
Suppose we start a test at with devices. After some time, devices of the total have
failed and will have survived = + .
The Reliability at any time t is given by;
( )= = =1− =============(1)
( )
Therefore, the failure rate is =− - ………………(3)
We obtain the instantanous probability of failure, r(t) when we divide both side by ( )
( )
( )= =− ----------------------------------(4)
Since ( ) = , therefore ( )
=
( ) = exp − ∫ ( )
4
3. Reliability definition in term of Electric power system
Reliability definition of electric power system can be classify into parts:
a) Adequacy: This is the ability of electric power system to supply the aggregate
electrical demand and energy requirements of consumer at all time taking into
consideration the schedule and forced outages of the system components.
b) Security: The ability of electric power system to withstand sudden disturbances such
as electric short circuits or unanticipated loss of system components or elements.
4. Durability: This is the ability of an item to withstand the effects of time dependent
mechanisms such as fatigue, wear, corrosion, electrical parameter change, and so on. Durability
is usually expressed as a minimum time before the occurrence of wearout failures. In repairable
systems it often characterizes the ability of the product to function after repairs.
5. Availability: It is the ability of an item to perform its intended function at a specified instant of
time or over a stated period of time.
Note: reliability can be obtained from availability which require calculations or probabilities of
the system being in operable state at the instan of time by integration.
6. Repairablity; it is the probability that a failed system will be restored to operable condition
with a specified active repair time.
7. Serviceability; It is the degree of ease or difficulty with which a system can be replaced
8. Maintainability: The ability of an item under specified conditions of use, to be retained in, or
restored to a state in which it can perform its required functions when maintenance is
performed under stated conditions and using prescribed procedure and resources.
N.B: Maintainability is a main factor determining the availability of items.
9. Failure: The termination of the ability of the product to perform its intended function.
10. Failure Rate, ( ): The ratio of number of failures within a sample to the cumulative
operating time.
1
t MTBF
1
MTBF
Example:
Q1: Samples of 1000 transistors are tested for a week, and two (2) of them failed. Assuming
they failed at the end of the week, what is the Failure Rate?
Solution:
2 failures
t 1.19e5 failures / hr
Failure rate : 1000* 24*7 hrs
11. Hazard Rate, ℎ( ): The instantaneous probability of failure of an item given that it has
survived until that time. This is also called as “Instantaneous Failure Rate”.
12. Downtime: The total time which the system is not in acceptable operating condition
13. Mean Time Between Failure (MTBF): for repairable systems, it is the ratio of the
cumulative operating time to the number of failures for that item.
5
i. For a repairable system, MTBF is the average time in service between
failures. Note that, this does not include the time spent at repair facility
by the system.
Examples:
Q1: A motor is repaired and returned to service six (6) times during its life and provides
45,000 hours of service. Calculate MTBF.
Solution:
Q2: If MTBF for a motor is 7,500 hours, what is the probability that it will operate for 30
days without failure?
Solution :
1 30*24 hrs
*total operating time
R e *t
e MTBF e 7,500 hrs 0.908
14. Mean Time To Failure (MTTF):, for non-repairable items, it is the total number of life-
units of an item population divided by the number of failures within that population, during a
particular measurement interval under-stated conditions.
i. MTTF is the average life of a non-repairable system.
ii. For a repairable system, MTTF represents the average time before the
first failure.
Designers, manufacturers and end users strive to minimize the occurrence and recurrence of
failures. In order to minimize failures in engineering systems, it is essential to understand ‘why’ and
‘how’ failures occur. It is also important to know how often such failures may occur.
Reliability deals with the failure concept, where as the safety deals with the consequences after the
failure. Inherent safety systems/measures ensure the consequences of failures are minimal.
Reliability and safety engineering provides a quantitative measure of performance, identifies
important contributors, gives insights to improve system performance such as how to reduce
likelihood of failures and risky consequences, measures for recovery, and safety management. The
objectives of reliability engineering, in the order of priority, are:
6
1) To apply engineering knowledge and specialist techniques to prevent or to reduce the
likelihood or frequency of failures.
2) To identify and correct the causes of failures that do occur, despite the efforts to prevent
them.
3) To determine ways of coping with failures that do occur, if their causes have not been
corrected.
4) To apply methods for estimating the likely reliability of new designs, and for analysing
reliability data.
1) Risk Analysis
Risk = (Probability of failure) x (exposure) x (consequence)
analysis
uence
Conseq
analysis
Causal
Accidental Event
Methods
(b) Causal Analysis (a) Accidental Event (c) Consequence Analysis
- Fault Tree - Checklist - Event tree Analysis
analysis - Preliminary hazard - Consequence
- Reliability analysis Model
Block Diagram - FMECA - Reliability
- FMECA - Event data sources assessments
- Reliability data - Simulation
sources
(a) Identification and description of potential accidental events (e.g, gas lieak in an oil/gas
processing plants
(b) The potential causes of each accidental events are identified by a causal analysis using
e.g fault tree analysis
(c) The consequence analysis is usually carried out by event-tree analysis
7
2) Environmental Protection
Reliability studies may be used to improve the design and operational regularity of anti-
pollution system, such as gas/water cleaning system
3) Quality
Reliability is one of the major parameter of quality of a product. Reliability is used in Total
Quality Management (TQM)
4) Optimization of Maintenance and Operation in Industry
The prime objective of maintenance is to maintain or improve the system reliability and
production/operation regularity. Maintenance is carried out to prevent system failures and
to restore the system function when a failure has occurred. Reliability Centered
Maintenance (RCM) approach is the main tools to improve the cost effectiveness and
control of maintenances in many industries, and hence to improve availability and safety.
5) Engineering Design
Reliability is considered to be one of the most important quality characteristics of technical
products. Reliability assurance should therefore be an important topic during the
engineering design process.
Conceptuali
zation
Customer Design
Procurement
After sales /Purchasing
service
Manucaturing/
Sales Fabrication
. Production
Inspecting
and Testing
8
1.7. Reliability Programme Plan
What are the actions that managers and engineers can take to influence reliability?
One obvious activity is quality assurance (QA), the whole range of functions designed to ensure that
delivered products are compliant with the design. For many products, QA is sufficient to ensure
high reliability, and we would not expect a company mass-producing simple diecastings for non-
critical applications to employ reliability staff. In such cases the designs are simple and well proven,
the environments in which the products will operate are well understood and the very occasional
failure has no significant financial or operational effect. QA, together with craftsmanship, can
provide adequate assurance for simple products or when the risks are known to be very low. Risks
are low when safety margins can be made very large, as in most structural engineering. Reliability
engineering disciplines may justifiably be absent in many types of product development and
manufacture. QA disciplines are, however, essential elements of any integrated reliability
programme.
A formal reliability programme is necessary whenever the risks or costs of failure are not low.
Risks of failure usually increase in proportion to the number of components in a system, so
reliability programmes are required for any product whose complexity leads to an appreciable risk.
Having an objective and the authority, how does the reliability programme manager set about his/
her task, faced as he/she is with a responsibility based on uncertainties? Brief outline is given
below.
The reliability programme must begin at the earliest, conceptual phase of the project. It is at
this stage that fundamental decisions are made, which can significantly affect reliability.
These are decisions related to the risks involved in the specification (performance,
complexity, cost, producibility, etc.), development time-scale, resources applied to
evaluation and test, skills available, and other factors.
The shorter the project time-scale, the more important is this need, particularly if there will
be few opportunities for an iterative approach. The activities appropriate to this phase are an
involvement in the assessment of these trade-offs and the generation of reliability objectives.
The reliability staff can perform these functions effectively only if they are competent to
contribute to the give-and-take inherent in the trade-off negotiations, which may be
conducted between designers and staff from manufacturing, marketing, finance, support and
customer representatives.
As the project proceeds from initial study to detail design, the reliability risks are controlled
by a formal, documented approach to the review of design and to the imposition of design
rules relating to selection of components, materials and processes, stress protection,
tolerancing, and so on. The objectives at this stage are to ensure that known good practices
9
are applied, that deviations are detected and corrected, and that areas of uncertainty are
highlighted for further action.
The programme continues through the initial hardware manufacturing and test stages, by
planning and executing tests to show up design weaknesses and to demonstrate achievement
of specified requirements and by collecting, analysing and acting upon test and failure data.
During production, QA activities ensure that the proven design is repeated, and further
testing may be applied to eliminate weak items and to maintain confidence. The data
collection, analysis and action process continues through the production and in-use phases.
Throughout the product life-cycle, therefore, the reliability is assessed, first by initial
predictions based upon past experience in order to determine feasibility and to set
objectives, then by refining the predictions as detail design proceeds and subsequently by
recording performance during the test, production and in-use phases. This performance is
fed back to generate corrective action, and to provide data and guidelines for future
products.
10
6) Conducting reliability test
7) Perform statistical analysis on test data
8) Maintenance of relevant data system
9) Provision of assistance to production, quality assurance and purchasing department
10) Writing reliability specification
Note: Parameters of a distribution can be classified in the following three categories (note that not
all distributions will have all the three parameters, many distributions may have either one or two
parameters):
1. Scale parameter, which controls the range of the distribution on the horizontal scale.
2. Shape parameter, which controls the shape of the distribution curves.
3. Source parameter or Location parameter, which defines the origin or the minimum value
which random variable, can have. Location parameter also refers to the point on the
horizontal axis where the distribution is located.
2.1. Normal (Gaussian) Distribution
This is the most frequently used and most extensively covered theoretical distribution in the
literature. The Normal Distribution is continuous for all values of X between -∞ and + ∞. It has a
characteristic symmetrical shape, which means that the mean, the median and the mode have the
same numerical value. The normal data distribution pattern occurs in many natural phenomena,
such as human heights, weather patterns etc. The mathematical expression for normal distribution
probability density function (pdf) is given by:
( )= exp −
( ) ⁄
where is the scale parameter, equal to the standard deviation (SD), while is the location
parameter, equal to mean. The mode and the median are coincident with the mean as the pdf is
symmetrical. The influence of the parameter on the location of the distribution on the horizontal
axis is shown in Figure 1, where the values for parameter are constant
11
Figure 1: Probability density of normal distribution for different values of
An important reason for the wide use of normal distribution is the fact that whenever several
random variables are added together, the resulting sum tends to normal regardless of the distribution
of the variable being added. This is known as central limit theorem. It justifies the use of normal
distribution in many engineering applications including, quality control. Normal distribution is a
close fit to most quality control and some reliability observations, such as size of machined parts
and the lives of items subject to wearout failure.
The pdf of a normal distribution with parameter and can be calculated using Excel as ( ) =
( , , , ) and the reliability as ( ) = 1 − ( , , , ).
1 ln x 2
l
1 2 l
f x e , x0
x l 2
12
1 2
l l
mean e 2
variance e
2 2 l l2
e l2 1
Where l mean of the natural logs of individuals
Note: Log-normal distribution is more versatile than normal distribution as it has wild range of
shapes, and therefore, is often a better fit to reliability data, such as for populations with ware out
characteristics. The reliability of a system following log-normal distribution with parameter
can also be calculated using Excel functions: ( ) = 1 − ( , , ).
Examples:
Q1: Twenty five measurements are taken of time to failure of a component in hours. The natural
logarithms are found to be normally distributed with an estimated mean l 3.5 and varianc
l2 1.3 (note that l and l2 are for natural log values). Find the untransformed mean and
variance in hours.
Solution:
1 .3
3 .5
2
m ean e 6 3 .4 3 4
v a ria n c e e
2 3 .5 1 .3
e 1 .3 1
1 0 7 4 0 .9 1
13
Figure 2: Exponential distribution is the most commonly used in reliability.
f t e t , t0 or
f t 1 e t , where t0
MTBF
1
rate of failure , t time
R t e t or e t where t0
F t Unreliability Failure 1 R t
The hazard function for the exponential distribution is , and is constant throughout the function.
Therefore, the exponential distribution should be used for reliability prediction during the rate of
constant failure or at random cause of failure or period of operation.
Some unique failures to the exponential distribution include:
14
Examples:
Q1: An equipment in a manufacturing plant has an MTBF of 1500 hr. What is the probability of
operating for a period of 750 hours without failure?
Solution:
0.000666, t 750
R 750 e t e
0.000666 750
e 0.5 0.6065
Note: MTBF and λ do not need to be a function of time in hours. The characteristic of “time” or
usage can be such units as cycles instead of hours. In this case, MCBF (Mean Cycles Between
Failures) could be the appropriate measure.
Q2: One cycle of a machine completes the assembly of 20 units. A study of this machine predicted
an MCBF of 14,000 cycles (λ = 0.00007143/cycle). What is the probability of operating 15,000
cycles without failure?
Solution:
( )= ; ℎ = ,
( )=
Q3: A component failure rate is constant and found to be 0.02per thousand hours. Calculate the
probability that the component will survive 10,000 hours.
Solution
.
= = 2 × 10 ℎ , t = 10,000
( × × , )
R(t) = = = .
Consider a number of components arrange in series and each component with different failure rate
( ) as shown below. The reliability of the entire components connected in series is thus:
1 2 n
R1(t) R2(t) Rn(t)
( ) = 1( ) × 2( ) × − − − −× ( )
15
( )= × × − − − −×
( )=
ℎ =
Example 1
An electronic component has an exponential life distribution with constant failure rate of 0.0002.
(a) What is the probability that the component will survive the first 300hrs of operation?
(b) What is the 300hrs reliability of the system comprising 4 of such components connected in
series?
Solution
(a) ( )=
( . × )
(300) = = 0.9418
(b) ( )=
ℎ = = + + + = 4(0.0002) = 0.0008
( . ×
Therefore, (300) = = 0.7866
Ten electrical switches were put to test and the number of cycle (on and off of operation of a
switch) to failure counted. The number of cycles of failure are; 560,685, 820, 956, 1024,
1150,1689,1850,1900, 1956. Assuming constant failure rate, calculate the following:
Solution
(a)
560 + 685 + 820 + 956 + 1024 + 1150 + 1689 + 1850 + 1900 + 1956 12590
= =
10 10
= 1259
16
(b) Failure rate, = = = 7.94 × 10 /
(c) Reliability at 300 cycles
( )= . × ×
= = 0.7880
(d) No. of cycle at which reliability is 90%
( )=
Take natural log of both sides:
( )=−
( ) ( . )
Therefore, = = = 132.67 = 133 ( )
. ×
Fifteen (15) automobile tyres were tested to destruction and the distance to the first 8 failures (in
1,000km) at 10,000km were 2.92, 3.26, 5.61, 6.09, 7.71, 8.36, 9.55, and 9.65. Calculate:
Note: In many cases when life data are analyzed, all of the component in the sample may not have
failed (i.e, the event of interest was not observed) or the exact time to failure of all the components
are not known. This is sometimes done for economic or other reasons. In censored test, all items are
activated at time t= 0, and follow until failure or until time (t), when the experiment is terminated.
Solution
N.B: The remaining 7tyres survived the test after 10,000km distance when the test is
terminated.
Therefore,
53,150 + 70,000
= = 15,393.75
8
17
Take natural log of both sides
( )=−
( ) ( . )
Therefore, = = . ×
= 10,663.8
Consider a number of components arrange in parallel and each component with different failure
rate, as shown below. The reliability of the entire component connected in parallel is thus:
R1(t)
R2(t)
Rn(t)
( )=1− (1 − ( ))
( ) = 1− 1− 1− − − − − − (1 − )
= 1+ + +−−−+ (1)
Assumption: Assume that all parallel component have equal reliablity (i.e ):
( )= ( )= ( )
Therefore, =
But, ( ) =
Substituting eqn(2) in eqn(1) and find the reciprocal of to obtain the failure rate of the entire
parallel system.
Example 1
18
The figure below shows the reliability diagram of a system consisting of 4 different units. The
reliability value for 12hrs of operation is as indicated. Each unit is assumed to have an exponential
life distribution model.
B D
(0.92) (0.95)
C
(0.92
Solution
( ) (0.92)
= = = = = = 0.00695/ℎ
− −12
Substituting for in eqn (1):
1 1 1 1 1
= 1+ + +−−−+
2 3
Since n =3,
1 1 1 1
= 1+ +
2 3
1 1 11
= = 263.79
6.95 × 10 6
Therefore, = = .
= 3.79 × 10 /ℎ
( ) ( . )
Likewise for the series components, = = = 4.27 × 10 /ℎ
(b) To obtain the total system reliablity after 12hours of operation, we first obtain the overall
for the entire system.
= +
19
= 3.79 × 10 + 4.27 × 10 = 8.06 × 10
Total system reliablity after 12hours of operation,
(12) =
( . × × )
= = .
1 t
t
f t exp
where , , 0 .
If failures start at time t 0 , then 0 and the pdf of the Weibull distribution becomes;
1 t
t
f t exp
By altering the shape parameter , the Weibull distribution takes different shapes. For example,
when = 3.4 the Weibull approximates to the normal distribution; when =1, it is identical to
the exponential distribution.
t
F t 1 exp for 0
t
F t 1 exp for 0
20
t
R t exp
1
1
2
2 1
Standard deviation: 1 1
1) It is two-parameter model hence more flexible than exponential model that has only one
parameter ( ).
2) It describe well the weakest link in data
3) It is amendable to graphical analysis
4) It can be used for life-test (i.e from early stage to aging period)
5) The value of obtain in Weibull distribution model can be used to determine the stage of
the product (i.e DFR, CRF, or IFR). If:
< 1, (
= 1, (
> 1, ( )
= × ---------------------- (3)
From eqn.(2), if =
( )
Therefore, ( ) = = = 0.3679
21
and the failure probability, ( ) = 1 − ( )
= 1 − 0.3679 = 0.6321
Also, , ( ) = 1 − ( )
Inversing both side,
1 1
=
( ) 1− ( )
Therefore, ( )
= ( )
=
1
= log( ) −
1− ( )
Where y is the ( )
on vertical axis, m is (the slope), log( ) on horizontal axis
and c (the intercept) =−
Example 1
100 electric lamps are put to life test, and the failure in days for the first 10 that fail are as follows;
12, 25, 36, 46, 54, 66, 70, 82, 88, and 95. On plotting these data on Weibull graph paper, the
estimate of Weibull distribution parameters i.e scale parameter ( ) and shape parameter ( ) were
obtain to be 550 days and 1.25 respectively.
22
Solution
But, ( ) =
At t = 1000days,
.
( )=1− = .
( )=
.
At t = 200days, ( ) =
= .
Example 2
A unit was tested and the following were the results of the test; = 25, 000, : = 2.0. Calculate (i)
Weibull mean (MTTF), (ii) the standard deviation and (iii) the probability of survival for
10,000hours.
Solution
√
Note: =
= 25,000 + ) − + )
= 25,0000 ( )− = , .
( )=
,
At t = 10, 000hrs, ( ) = ,
= .
23