0% found this document useful (0 votes)
191 views

Design For Reliability by Adesh

This document discusses designing products for reliability. It begins by defining key reliability terms like mean time between failures and introduces the bathtub curve for describing failure rates over time. Common causes of failure like wearout and overstress are described. Tools for improving reliability are discussed, like failure mode and effects analysis and reliability block diagrams. Steps in the design process like developing a reliability plan, analyzing noise factors, and tracking failures are outlined. The document aims to introduce methods for achieving high reliability in product design.

Uploaded by

Delhiites Adesh
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
191 views

Design For Reliability by Adesh

This document discusses designing products for reliability. It begins by defining key reliability terms like mean time between failures and introduces the bathtub curve for describing failure rates over time. Common causes of failure like wearout and overstress are described. Tools for improving reliability are discussed, like failure mode and effects analysis and reliability block diagrams. Steps in the design process like developing a reliability plan, analyzing noise factors, and tracking failures are outlined. The document aims to introduce methods for achieving high reliability in product design.

Uploaded by

Delhiites Adesh
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 68

DESIGN FOR RELIABILITY

ADESH KUMAR M.TECH-1ST YEAR (MACHINE DESIGN) JAMIA MILLIA ISLAMIA NEW DELHI

Chapter Objectives
Introduce the need for design for reliability List the main causes of reliability failures How do failures relate to their mechanisms Describe each failure Propose design guidelines against the failure

What is Reliability?
Reliability is:
The ability of an item to perform its required

function under defined customer operating conditions for a stated period of time. The probability that no (system) failure will occur in a given time interval In research, the term reliability means "repeatability" or "consistency". A measure is considered reliable if it would give us the same result over and over again

Other Names of DFR

DFR has many aliases:


Design for Durability Design for Robustness

Design for Useful Life

What do Reliability Engineers Do?


Implement Reliability Engineering Programs

across all functions Engineering Research manufacturing Testing Packaging field service

What is Probability?
Probability is: A measure that describes the chance or

likelihood that an event will occur. The probability that event (A) occurs is represented by a number between 0 (zero) and 1. When P(A) = 0, the event cannot occur. When P(A) = 1, the event is certain to occur. When P(A) = 0.5, the event is as likely to occur as it is not.

Why Design for Reliability?


Reliability can make or break the long-

term success of a product. Too high reliability will cause the product to be too expensive Too low reliability will cause warranty and repair costs to be high and therefore market share will be lost.

Cost-Reliability Functions

What are Noise Factors?

Noise Factors are sources of disturbing influences that can disrupt the ideal function, causing error states which lead to quality problems.

Reliability Terms
Mean Time To Failure (MTTF) for non-repairable

systems Mean Time Between Failures for repairable systems (MTBF) Reliability Probability (survival) R(t) Failure Probability (cumulative density function ) F(t)=1-R(t) Failure Probability Density f(t) Failure Rate (hazard rate) (t)

MTBF & MTTF


Mean Time Between Failures Applies to repairable items. Mean Time To Failure Applies to non-repairable items. Both of these terms indicate the average time an item is expected to function before failure.

Reliability Function
Probability density function of failures

f(t) = le-lt for t > 0 Probability of failure from (0 to T) F(t) = 1 e-lT Reliability function R(T) = 1 F(T) = e-lT

Series Systems

RS = R1 R2 ... Rn

14

Serial reliability
Series systems are also referred to as

weakest link or chain systems. System failure is caused by the failure of any one component. Therefore, for a series system, the reliability of the system is the product of the individual component reliabilities More components = less reliability
s e r ia l r e lia b ility

i 1

xi

Parallel Systems
1
2

RS = 1 - (1 - R1) (1 - R2)... (1 - Rn)


15

Parallel reliability
oParallel systems are also referred to as redundant. oThe system fails only if all of the components fail. oTherefore, for a parallel system, the system probability of failure is the product of the individual component probabilities.
n

p a ra llel relia b ility 1

i 1

(1 x i )

Series-Parallel Systems
C

RA
A

RB
B

RC C

RD
D

RC

Convert to equivalent series system


RA A RB B C RD D

RC = 1 (1-RC)(1-RC)

A Simple Example
A system has 4000 components with a

failure rate of 0.02% per 1000 hours. Calculate and MTBF.


= (0.02 / 100) * (1 / 1000) * 4000 = 8 *

10-4 failures/hour
MTBF = 1 / (8 * 10-4 ) = 1250 hours
18 ADESH

An Example
A first generation computer contains 10000 components each

with = 0.5%/(1000 hours). What is the period of 99% reliability?

MTBF = t / (1 R(t)) = t / (1 0.99) t = MTBF * 0.01 = 0.01 / av Where av is the average failure rate N = No. of components = 10000 = failure rate of a component = 0.5% / (1000 hours) = 0.005/1000 = 5 * 10-6 per

hour
Therefore, av = N = 10000 * 5 * 10-6 = 5 * 10-2

per hour

19

ADESH

Therefore, t = 0.01 / (5 * 10-2 ) = 12 minutes

Reliability Failure Modes


Failures may be SUDDEN (non-predictable) or GRADUAL (predictable). They may also be PARTIAL or COMPLETE. A Catastrophic failure is both sudden and complete. A Degradation failure is both gradual and partial. Two root causes: 1. lack of robustness 2. mistakes

Causes of Failure
Misuse Failures attributable to the application of stresses beyond the stated capabilities of the item. Inherent Weakness Failures attributable to weakness inherent in the item itself when subjected to stresses within the stated capabilities of the item.

Classifications of Reliability Failure


Early stage failure Causes for such type of failure are

inadequate design, poor manufacturing, and inappropriate usage. these can be catastrophic to human life.
Overstress Mechanisms These occur due to insufficient

safety factor in design, higher than expected random loads, human errors, misapplication.
Wearout Mechanisms Occur late in life and then increase

with age.This happens on corrosion, material fatigue, poor maintenance, creep , degradation in strength.

Common Measures of Unreliability


% Failure - % of failures in a total population MTTF (Mean Time To Failure) - the average time of operation to first failure.

MTBF (Mean Time Between Failure) - the average time between product failures.
Repairs Per Thousand (R/1000) Bq Life Life at which q% of the population will fail

Cumulative Failure Rate Curve

The Bathtub Curve


Reliability specialists often describe the lifetime of

a population of products using a graphical representation called the bathtub curve. The bathtub curve consists of three periods: an infant mortality period with a decreasing failure rate followed by a normal life period (also known as "useful life") with a low, relatively constant failure rate and concluding with a wear-out period that exhibits an increasing failure rate.

Reliability
90 80 70
Prob of dying in the next year (deaths/ 1000)

60 50 40 30 20 10 0 0 2 5 12 16
Age From the Statistical Bulletin 79, no 1, Jan-Mar 1998

19

30

50

70

86

27

Steps in Designing for Reliability


Develop a Reliability Plan Determine Which Reliability Tools are Needed 2. Analyze Noise Factors 3. Tests for Reliability 4. Track Failures and Determine Corrective Actions
1.

Develop a Reliability Plan


Planning for reliability is just as important as

planning for design and manufacturing. Why? To determine: useful life of product what accelerated life testing to be used Reliability must be as close to perfect as possible for the products useful life. You MUST know where your product's major points of failure are!

Tools for testing


Stress Analysis
Reliability Predictions (MTBF) FMEA (Failure Mode and Effects Analysis) Fault Tree Analysis Reliability Block Diagrams

Why do Reliability Calculation?


Reliability calculations make the product

more reliable which can be used as a selling feature by the marketing department. Also, this adds to the company reputation and can be used for comparisons with competition.

Stress Analysis
It establishes the presence of a safety margin

thus enhancing system life. Stress analysis provides input data for reliability prediction. It is based on customer requirements.

Reliability Predictions (MTBF)


MTBF (Mean Time between Failures) for an

existing product can be found by studying field failure data. For a new product however, or if significant changes are made to the design, it may be required to estimate or calculate MTBF before any field data is available.

Failure Modes and Effects Analysis


Failure modes and effects analysis (FMEA) is a

qualitative technique for understanding the behaviour of components in an engineered systems The objective is to determine the influence of component failure on other components, and on the system as a whole FMEA can also be used as a stand-alone procedure for relative ranking of failure modes that screens them according to risk.
ADESH

Failure mode and effects analysis (FMEA)


Failure Mode: Consider each component or functional block and

how it can fail. Determine the Effect of each failure mode, and the severity on system function. Determine the likelihood of occurrence and detecting the failure. Calculate the Risk Priority Number (RPN = Severity X Occurrence X Detection). Consider corrective actions (may reduce severity of occurrence, or increase probably of detection). Start with the higher RPN values (most severe problems) and work down. Recalculate RPN after the corrective actions have been determined, the aim is to minimize RPN.

Reliability Block Diagrams


Most systems are defined through a combination of both

series and parallel connections of subsystems Reliability block diagrams (RBD) represent a system using interconnected blocks arranged in combinations of series and/or parallel configurations They can be used to analyze the reliability of a system quantitatively Reliability block diagrams can consider active and stand-by states to get estimates of reliability, and availability (or unavailability) of the system Reliability block diagrams may be difficult to construct for very complex systems
ADES H

CASE STUDY: Network Storage Evaluations Using Reliability Calculations


This section uses a case study to introduce

concepts and calculations for systematically comparing redundancy and reliability factors as they apply to network storage configurations. We will determine a reliability figure on three very basic architectures. The starting point of our study is the network storage requirements.

Network Storage Requirements


We want networked storage that has access to one

server. Later, this storage will be accessible to other servers. The server is already in place, and has been designed to sustain single component hardware failures (with dual host bus adapters (HBAs), for example). Data on this storage must be mirrored, and the storage access must also stand up to hardware failures. The cost of the storage system must be reasonable, while still providing good performance.

Architecture 1
Architecture 1 provides the

basic storage necessities we are looking for with the following advantages and disadvantages: Advantages: Storage is accessible if one of the links is down. Storage A is mirrored onto B. Other servers can be connected to the concentrator to access the storage. Disadvantages: If the concentrator fails, we have no more access to the storage. This concentrator is a single

Architecture 2
Architecture 2 has been

improved to take into account the previous SPOF. A concentrator has been added. Advantages: If any links or components go down, storage is still accessible (resilient to hardware failures). Data is mirrored (Disk A <-> Disk B). Other servers can be connected to both concentrators to access

Architecture 3
The main difference is that

Disk A and Disk B have only one data path. Disk A is still mirrored to Disk B, as required. This architecture has all the advantages of the previous architectures with the following differences: Disk A can only be accessed through Link C, and Disk B only through Link D. There is no data multi pathing software layer, which results in easier administration and easier troubleshooting.

Determining Reliability
Using the reliability formulas , we can determine

which architecture has the highest reliability value. For the purpose of this article , we will use sample MTBF values (as obtained by the manufacturer) and AFR*(Annual Failure Rate) values shown in the table below:
*(The AFR for each component was calculated using the MTBF where (8760/MTBF) = AFR). The example MTBF values were taken from real network storage component statistics. However, such values vary greatly, and these numbers are given here purely for illustration.

Determining Reliability
Component HBA 1 HBA 2 LINK A LINK B Concentrator 1 Concentrator 2 LINK C LINK D Disk A Disk B AFR Variable
H H L L C C L L D D

Sample MTBF Values (hours) 800,000 400,000

AFR 0.011 0.022

580,000 400,000 1,000,000

0.0151 0.022 0.0088

Determining Reliability
Having the rate of failure of each individual

component, we can obtain the system's annual failure rate AFR and consequently the system reliability (R) and system MTBF values. The AFR values of redundant components are multiplied to the power equal to the number of redundant components. The AFR values of non-redundant components are multiplied by the number of those components in series.

Calculation
In case of Architecture 1, concentrator(C) is the

only non-redundant component. AFR1 = (H+L)2 + C + L2 + D2 AFR1 = (0.011+0.022) 2 + 0.0151 + (0.022)2 + (0.0088)2 = 0.0167 R1 = 1 - AFR1 = 1 0.0167 = 0.9833, or 98.33% MTBF1= 8760/AFR1 = 8760/0.0167 = 524,551 hours.

Calculation
The architecture 2 has a different configuration

with no non-redundant components. AFR2 = (H+L+C+L) 2 + D2 AFR2 = (0.011+0.022+0.0151+0.022) 2 + (0.0088)2 = 0.0005 R2 = 1 AFR2 = 1 0.0005 = 0.995, or 99.50% MTBF2= 8760/AFR2 = 8760/0.0005 = 1,752,000 hours.

Calculation
Architecture 3 has yet another configuration and

has no non-redundant components. AFR3 = (H+L+C+L+D) 2 AFR3 = (0.011+0.022+0.0151+0.022+0.0088) 2 = 0.0062 R3 = 1 AFR3 = 1 0.0062 = 0.9938, or 99.38% MTBF3= 8760/AFR3 = 8760/0.0062 = 1,412,903 hours.

Conclusion
When the calculations are complete, we compare the

data: Architecture 1 = 98.33%, or a System's MTBF = 524,551 hours Architecture 2 = 99.50%, or a System's MTBF = 1,752,000 hours Architecture 3 = 99.38%, or a System's MTBF = 1,412,903 hours The MTBF figures are the most revealing, and indicate that architecture 2 is statistically the most reliable of all.

Failure Effects (What customer experiences)


Noise
Inoperability Instability

Intermittent operation
Roughness Excessive effort requirements

Unpleasant or unusual odor


Poor appearance

Factors Affecting Reliability

Installation & Environmental


Temperature Humidity Vibration Chemical Attack Interconnections

Design & Manufacture


Pre-Production Design Control of Production Working Tolerances Material Quality Component Quality Component Stress

Design against failure

Important to understand the failure (why, where, how long, application, etc.) Two methods for design against failure: By reducing the stress that cause the failure. By increasing the strength of the component. Either one can be achieved by: Selecting materials Changing the package geometry Changing the dimensions Protection

1. 2.

Fatigue Failure?
Fatigue is the most common mechanism of failure

and responsible for 90% of all structural and electrical failures.


Occurs in metals, polymers, and ceramics. Metal paper clip example

Bend in both directions


Repeat the process

Design Against Fatigue Failure


Increase fatigue strength.
Reduce the amplitude of cylic loading. avoid stress concentration region

Design Against Brittle Fracture


Brittle fracture is an overstress failure

mechanism that occurs rapidly with little or no warning when the induced stress in the component exceeds the fraction strength of the material.
Occurs in brittle materials (ceramics, glasses

and silicon).
Applied stress and work could break the

atomic bonds.

Design Guidelines to Reduce Brittle Fracture


Designs with materials and processing

conditions that would produce the least stress in brittle materials should be created.
The brittle material should be polished to

remove surface flaws to enhance reliability.

Design Against Creep Failure

What is Creep? A time-dependent deformation process under load.

Thermally-activated process: the rate of deformation for a given stress level increases significantly with temperature. Deformation depends on
1. 2. 3.

The applied load. The duration through which the load is applied Elevated temperature

Design Against Creep Failure

Creep can occur at any stress level. Creep is most important at elevated

temperatures.

Design Guidelines to Reduce CreepInduced Failure.


Use materials with high melting point if the

application calls for harsh temperature conditions.


Reduction of mechanical stress will reduce creep

deformation.
Creep is a time controlled phenomenon.

Design Against Plastic Deformation


What is Plastic Deformation? When the applied mechanical stress exceeds the

elastic limit or yield point of a material. It is permanent.


Excessive deformation and continued

accumulation of plastic strain due to cyclic loading will eventually lead to cracking of the component and make it unusable.

Design Guidelines Against Plastic Deformation


Limit the design stresses in the packaging structure

below the yield strength of the materials used. If possible, use materials that have high yield strength.
Design and control the local plastic deformation at

regions of stress concentrations.

Chemically Induced Failures


What are Chemically Induced Failures?
Chemical process such as electrochemical

reactions can result in cracking of components leading to electrical failures.


Two Types Corrosion Intermetallic Diffusion

Design Against Corrosion-Induced Failure


What is Chemical Corrosion? The chemical or

electrochemical reaction between a material, usually a metal, and its environment that produces a deterioration of the material and its properties.

Design Guidelines to Reduce Corrosion


Metals with a high oxidation potential tend to

corrode faster.
Use hermetic packages to prevent moisture

absorption.
Ensure there are no trapped moisture or

contaminants during the processing an assembly of the packages.

Design Against Intermetallic Diffusion


What is Intermetallic Diffusion?

During wirebonding and solder reflow, the

joining process generates intermetallic layers which are byproducts of the joining process.

Design Guidelines Against Intermetallic Diffusion


Limit the process temperatures and control the

time exposed to high temperatures during the joining process.


Control the temperature range and cycles of

exposure at the high temperature period.


Application of nickel/gold coating on the bare

copper pad surfaces.

Achieving reliability growth


Detect failure causes Feedback

Redesign
Improved fabrication Verification of redesign

References
Mechanical reliability and design by A.D.S Carter Introduction to reliability in design by Charles O.

Smith.

http://www.reliabilityanalysislab.com/ReliabilityServic

es.asp
http://pms401.pd9.ford.com:8080/arr/concept.htm

You might also like