Design for reliability 1
Design for Reliability
• Reliability design is an iterative process that begins with the
specification of reliability goals consistent with cost and
performance objectives.
• It considers life-cycle costs of the system and system
effectiveness
Specify reliability goals
Allocate reliability to components
Implements design methods
Failure analysis (FMEA/FMECA)
No
Are goals
achieved?
Yes
System Safety Analysis (FTA)
No
Are goals
achieved?
Yes
Ready for production
Design for reliability 2
Reliability activities and product life cycle
Conceptual Detailed Production and Product Use
preliminary design and Manufacturing and Support
design development
and
prototyping
Specification Design Acceptance Preventive
methods testing and predictive
maintenance
Allocation Failure Quality control Modification
analysis
Design methods Growth testing Burn-in and Parts
screen testing replacements
Safety analysis
Reliability specification and system measurements
• Mean time to failure (MTTF) or mean time between failures
(MTBF) has been primarily and often only measures of
reliability.
• This parameter is not sufficient except in case of exponential
failure distribution.
• A better measure would be to specify the reliability at specific
points in time.
For example: to state that 99% reliability is required after 2 years
of operation and 95% reliability is required after 5 years of
operation is much more precise than stating an MTTF of 10 years
is required.
• If the MTTF is to be the only reliability specification to be used,
a constant failure rate should be assumed as part of the
specification and subsequently demonstrated. Otherwise, a
wide range of distributions must be tried.
Design for reliability 3
System effectiveness
System effectiveness is the probability that the system can
successfully perform its intended purpose or mission when operated
under specified conditions. It includes reliability
System effectiveness
AND
Operational Mission Design
readiness availability
OR
adequacy
Reliability Maintainability
• Operational readiness is the probability that the system is
operational when first used or at the start of the mission.
• Mission availability is the percentage of time the system will be
operating during the mission. If system is not repairable,
availability is same as reliability.
• Design adequacy is the probability that the system will
accomplish its mission given that the system is operating within
its design parameters.
System effectiveness = operational readiness*availability*design
adequacy
Example: A copy machine is working on demand. However, in order
to complete the job in time, it must operate at a speed of 45 copies a
minute. If it could do not do this, then the copies is inadequately
designed for the job.
Design for reliability 4
Reliability allocation
Once the system reliability goals have been defined, reliability must
then be allocated to the components and possibly subcomponents in
a manner that will support these goals.
h{R1 (t ), R2 (t ),..., Rn (t )} ≥ R * (t )
Where, Ri(t) is the reliability at time t of ith component, R*(t) is the
system reliability goal at time t, h is a function that relates
components to the system reliability
Similarly for MTTF
*
g{MTTF 1 , MTTF 2 ,... MTTF n } ≥ MTTF
If all components are serially related and their failures are
independent
n
*
∏ Ri (t ) ≥ R (t )
i =1
Exponential case
If all the components have constant rate of failure:
n
−λit
∏e ≥ R * (t )
i =1
or equivalent ly
n
∑λi ≤ λs
i =1
Where λ s is the system failure rate goal
Optimal allocation
Ideally, reliability allocation should be accomplished in a least-cost
manner.
Design for reliability 5
Let each component has a current reliability Ri where Π Ri < R*, the
optimal solution may be obtained optimizing:
n
Min z = ∑ Ci ( xi )
i =1
n
*
subject to ∏ ( Ri + xi ) ≥ R and o < Ri + xi ≤ Bi < 1
i =1
Where i = 1,2,…n, xi is the increase in the reliability of the ith
component, Ci(xi) is the corresponding cost to achieve this growth,
and Bi is an upper bound on the attainable component reliability
Considering quadratic cost function (the most common function that
shows reliability growth cost increase at an increasing rate).
n
Min z = ∑ ci xi2
i =1
Let ignore the inequality sign and form the Lagrangian function
n
n
L( xi ,θ ) = ∑ ci xi2 − θ ∏ ( Ri + xi ) − R *
i =1 i =1
Where θ is the Langrangian multiplier. Now optimizing the function
∂
L ( xi , θ)
=2ci xi θ (
−∏
n
Rj +x j )=0
∂ xi j=
1
j≠
i
∂
L ( xi ,θ)
∏
n
(Ri +xi )−R*
∂i θ = i=1
=0
Multiplying (Ri+xi) to the first equation and rearranging terms
n
2ci xi ( Ri + xi ) = θ ∏ ( Ri + xi ) = θR*
i =1
Rearranging and solving the equation
2ci xi2 + 2ci x i Ri − θR * = 0
The solution is:
− 2ci Ri + 4ci2 Ri2 + 8ciθR *
xi =
4 ci
Design for reliability 6
There are a few popular strategies discussed in the literature,
which do not require optimization. They are ARINC and AGREE
ARINC method
The simplest one is ARINC which assumes components are in series,
are independent, and have constant failure rates.
λi
wi =
newλ i = wiλ *
where n
i= 1,2,..n
∑ λi
i =1
Where λ * is the target failure rate
AGREE method
The AGREE (Advisory Group on Reliability of Electronic Equipment)
method considers system is composed of n components each having
ni modules or sub components.
This approach allocate equal share of the reliability to each module in
the system. The ith component contribution to system reliability is
given by [R*(t)]ni/N. This leads to
ni
wi (1 − e −λiti ) = 1 −[ R * (t )] N
The left side is the joint probability that ith component fails and results
in a system failure. The right side of the failure probability allocated to
the ith component. Solving for λ i result in:
ni
1 1 − R * (t ) N
λi = − ln 1 −
ti wi
n
−λiti
such that ∏ e ≤ R * (t )
i =1
Where
Design for reliability 7
t is system operating time
R*(t) is system reliability goal
n is number of components
ni complexity number, number of modules within component i
N = sum of ni, total number of modules in system
ti is operating time of ith component
λ i is failure rate of ith component
wi is probability that the system will fail given component i failed
Redundancies R2
Redundancy may be used to achieve the R1 R4
allocated component reliability.
R3
*
If R is the system reliability goal, one
R’
can write R*=R1 * R’ * R4 .
Assuming R’ is allocated reliability
R’ = 1-(1-R2)(1-R3) = R2 + R3 – R2R3
One can assign a reliability to one of the component, say component
2 such that R2 <R’ and then solving for other.
R '−R2
R3 =
1 − R2
If both component receive the same probability R, we have R’ = 2R–
R2, which has the solution R = 1-(1-R’)0.5
In case of complex systems, it is generally possible to reduce the
system initially to serially related components and then further
decompose as necessary.
Design for reliability 8
Design methods
A product fails prematurely because of the inadequate design
features, manufacturing part defects, abnormal stresses introduced
due to packaging or distribution, operator and maintenance error, or
external conditions that exceed the design parameters.
Various activities and parameter that are involved design of
products:
1. Material selection. It involve consideration of following
parameters
• Tensile strength
• Hardness
• Fatigue life
• Creep
2. Derating. It is use of a component under stress significantly
below its rated value.
3. Stress-strength analysis. The traditional approach is to design
safety margins, or safety factors in to the equipment/ component.
Failure is likely to occur if safety factor is less than 1 or safety margin
is negative.
• Safety factor: The safety factor is the ratio of the capacity of
the system to the load placed on the system
• Safety margin: The safety margin is the difference between
the system capacity and load
4. Complexity and Technology. The number of the parts in a
system measure of system complexity.
5. Redundancy. Redundancy includes both active and standby
units. There may be duplicate active units with all operating and only
required to survive or there may be the more general k-out-n
redundancy.
• Redundancy optimization
Design for reliability 9
Redundancy optimization
Let Ri(t) reliability of component i at time t
ni number of parallel components i
ci unit cost of component i
B budget available for additional units
The problem is find the optimal ni so that
M
max ∏[1 − (1 − Ri (t )) ni ]
i =1
M M
subject to ∑ci ni ≤ B + ∑ci
i =1 i =1
Marginal analysis may be used to solve this problem, if natural log of
the reliability is maximized rather than the function itself.
M
max ∑ln[1 − (1 − Ri (t )) ni ]
i =1
ln[1 − (1 − Ri (t )) ni +1 ] − ln[1 − (1 − Ri (t )) ni ]
∆i =
ci
Marginal analysis consist of following steps:
1. Set ni =1, i =1,2,3,…, M and set cost =0
2. Compute ∆ i, i =1,2,3,…, M.
3. Find max{∆ 1, ∆ 2, ∆ 3,…, ∆ m}, call it ∆ k
4. Set cost = cost + ck
5. if cost<B, then se nk= nk+1, recompute ∆ k, and go to step 3,
otherwise stop.
Design for reliability 10
Failure Analysis
Failure mode effect analysis (FMEA) or failure mode effect, and
criticality analysis (FMECA) is formalized design process with an
objective to improve the inherent reliability.
This is an iterative process that influences design by identifying
failure modes, assessing their probabilities of occurrence and their
effects on the system. It may also consider isolation of the causes
and determining corrective actions or preventive measures.
FMEA comprises of following steps:
1. System definition:
This step is to identify those system components that will be subject
to failure. A functional and physical description of the system provides
the definition and boundaries for performing analysis.
2. Identification of failure modes
Failure modes will be identified by hardware or function approach.
Failures modes are observable manners in which a component fails.
For example: valve open, circuit short, pipe or valve rupture, power
loss, etc.
3. Determination of causes
For each failure mode an assessment is made as to the probable
cause or causes. A failure mode may have more than one
cause. Example includes:
Failure mode Category Cause Failure
Mechanism
Capacitor short Electrical High voltage Derating
Failure of metal Chemical Humid and Corrosion
salty
atmosphere
Connector Mechanical Excessive Fatigue
fracture vibration
Design for reliability 11
4. Effect assessment
The impact each failure has on the operation or status of the system
is assessed. Effects may range from complete system failure to
partial degradation to no impact on performance
Failure mechanism Failure mode Failure effect
Corrosion Failure of tank Tank rupture
wall
Manufacturing Leaking Failure to flashlight to light
defect in casing battery
Frication and Drive belt Shutdown of production line
excessive wear break
Prolonged low Brittle seals Leakage in hydraulic system
temperature
5. Classification of severity
A severity classification is assigned to each failure mode to be used
for prioritization of corrective actions. Generally severity is classified
in four classes.
Category I: Catastrophic. Significant system failure occurs that can
result in injury, loss of life, or major damage.
Category II: Critical. Complete loss of system occurs, performance
unacceptable.
Category III: Marginal. System is degraded with partial loss in
performance.
Category IV: Negligible. Minor failure occurs with no effect on
acceptable system performance
6. Estimation of probability of occurrence
Probability of occurrence of each failure mode is estimated generally
using handbook or existing databases. Some of the standard
handbook on FMEA classifies qualitatively frequency of occurrence in
five major levels:
Design for reliability 12
Level A: Frequent: High probability of failure (p≥ 0.20)
Level B: Probable: Moderate probability of failure (0.1≤ p≤ 0.20)
Level C: Occasional: Marginal probability of failure (0.01≤ p≤ 0.1)
Level D: Remote: Unlikely probability of failure (0.001≤ p≤ 0.01)
Level E: Extremely unlikely: Rare probability of failure (p≤ 0.001)
7. Computation of criticality index
This is a quantitative measure of the criticality of the failure mode that
combines the probability of the failure mode’s occurrence with its
severity ranking. The index may be defined as:
C k = αkp βk λp t
Where Ck is critical index for failure mode k
α kp the fraction of the component p’s failures having failure
mode k
β k the conditional probability that failure mode k will result in
the identified failure effect
λ p the failure rate of component p
t duration of time used in the analysis
β k, the conditional probability is subjective estimate that may be
quantified as:
Failure effect β
Certain β =1.0
Probable 0.10<β <1.0
Possible 0<β <0.10
No effect β =0
For a given p, the sum of α kp over all its failure modes would
normally equal 1.
Failure mode classification matrix
Design for reliability 13
Criticality index
Severity classification
A IV III II I
B
C
D
E
8. Determination of corrective action
This is very dependent on the problem. Those failure modes having
high criticality index and severity classification should receive the
most attention.
Design activities should be oriented toward removing the cause of
failure, decreasing the probability of occurrence, and reducing the
severity of failure.
Design for reliability 14
Fault Tree Analysis
Fault Tree: is a logical representation of the relationship
between an accident/event with their basic
causes. The relationship is expressed using
logic gates (And or Or)
Fault Tree Analysis: A deductive study to quantify the
probability of occurrence of an accident using
fault tree and basic failure data.
Logic gates
AND An output from the gate (event)
will occur only when all input
.
occurs
OR An output from the gate (event)
.
will occur if any of the inputs occur
INHIBT Gate An output from the gate (event)
will occur inputs occur and the
inhibit event also occur
Basic event A fault event that need no further
definition
Undevelope An event that cannot be further
d event developed due to lack of
information
Design for reliability 15
Steps in Fault Tree Analysis
1. Fault Tree Development
2. Minimum cut set finding
3. Probabilistic analysis using basic failure data
4. Importance factor estimation
Fault Tree Development
Identify the top event as
accident scenario
Identify the events that
may cause top event to
occur
Does these
Yes events may
be broken
down?
Identify the causes Identify the
that may lead these relationship of these
events events to top event
Identify relationship
events and their Transform these
basic causes relationships in fault
tree using gates
Transform these
relationships in fault
tree using gates
Design for reliability 16
Minimum Cut Set Finding
• Analytical Procedure
• Simulation methods
Probabilistic analysis using basic failure data
• Analytical Procedure
• Monte Carlo Simulation method
Importance factor estimation
Here importance of each basic event is quantified.
Repeating steps 2 and 3 do it while one particular event is made
totally safe (not to fail).
Design for reliability 17
Analytical Simulation Methodology and PROFAT
Start
Represent an undesired
event in terms of fault tree
Transform the fault tree
in to boolean matrix
Solve the boolean matrix for
minimum cut- sets
Optimization of the cut- Optimization
sets criteria
No
Is optimization over
?
Yes
Transformation of
Probability analysis static probability to Probabilities
fuzzy probability set
Improvement index
calculation
Stop
Design for reliability 18
Start
Represent a fault tree in
Boolean Matrix such that
Row = Gate
Col=Event+Gate
Take top gate
OR AND
Open new row to Gate Check Gate
enter elements All entries of this
for type
of this gate of gate gate in same row
Calculate
factor 'G'
Take new
row
Is any Yes
gate present?
No
No Are all
rows checked?
Yes
Optimization
Apply optimization techniques
using optimization criteria
Criteria
Optimum minimal cutset
Stop