0% found this document useful (0 votes)
329 views

Reliability Analysis For Repairable

This document provides an overview of reliability analysis for repairable systems, focusing on three key areas: life data analysis, recurring data analysis, and system reliability simulation. The course objectives are to provide foundations in analyzing reliability data from repairable assets to quantify failure rates and inform maintenance activities. The introduction defines qualitative and quantitative reliability analysis and their purposes. Life data analysis uses statistical methods to model failure time distributions for non-repairable items. Recurring data analysis and simulation are used to analyze repairable systems.

Uploaded by

Muhammad Ghufran
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
329 views

Reliability Analysis For Repairable

This document provides an overview of reliability analysis for repairable systems, focusing on three key areas: life data analysis, recurring data analysis, and system reliability simulation. The course objectives are to provide foundations in analyzing reliability data from repairable assets to quantify failure rates and inform maintenance activities. The introduction defines qualitative and quantitative reliability analysis and their purposes. Life data analysis uses statistical methods to model failure time distributions for non-repairable items. Recurring data analysis and simulation are used to analyze repairable systems.

Uploaded by

Muhammad Ghufran
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 261

Reliability Analysis for

Repairable System
Life Data Analysis
Recurring Data Analysis
System Reliability Simulation
Course Objectives
• To provide a solid foundation of the methods, analyses
related to repairable systems for the asset management
professional.

• To provide the concept of transforming historical data into


reliability information that quantify the failure rate behavior
of the assets.

• To learn the statistical approach for operation excellence


where maintenance activities are driven by reliability
information.
About this Course
• This course includes the following subjects.

• Life Data Analysis (LDA), including basic concepts and


methodologies as they apply to reliability engineering and
maintenance.

• Concepts and applications of repairable system analysis utilizing


Recurrent Event Data Analysis (RDA) techniques.

• Concepts and applications of repairable system analysis utilizing


simulation approach.
Contents

Introduction
Part 1. Life Data Analysis
Part 2. Recurring Data Analysis
Part 3. Simulation approach for Repairable System
Introduction

• In a broad sense, maintenance is about


managing failures, while reliability is
about understanding the failure
characteristics.
Introduction

• There are two categories of Reliability Engineering:


Qualitative and Quantitative analysis.

Qualitative Quantitative
Qualitative Analysis

• The objectives of Qualitative analysis are:


• Identify the failure modes.
• Understand the failure mode (Physics of Failure) and its
impact.
• Determine/formulate strategies to manage the failures.
• In the process industry, RCM and Root Cause
Analysis are some of the methodologies used to
facilitate Qualitative analysis.
Qualitative Analysis (cont’d)

• We can use Qualitative analysis to identify what are


the failures, which spares and what types of crews
are required.

• It tells you WHAT can happen and how to deal with


it, but it does not tell you WHEN and how many
times it is likely to happen.
Quantitative Analysis

• In Quantitative analysis, we combine the


knowledge of Physics-of-Failure and statistics to
quantify the failure rate behaviors.

• This method allows us to answer the uncertainties


(failures) with a scientific and mathematical
approach.
The Concerns in Maintenance

• From shareholder’s point of view:


• How much does it cost to own the assets?
• What is the ROI?
• What is the investment risk?
• (Every question is about money!)
The Concerns in Maintenance (cont’d)

• From management’s point of view:


• What is the expected number of failures in the next time
interval?
• How much spares to order for the next time interval?
• How many crews are required?
• When is the best overhaul time?
• How efficient is the production line?
•…
• (Every question is related to money!)
Reliability Analysis

• Companies that invest in reliability know-how


understand how un-reliability impact the financial
bottom line.
• Knowing how to quantify their asset reliability
performance allow them to…
• Identify the gaps for improvements
• Optimized maintenance policies
• Optimized spare policy
Part 1:
Life Data Analysis

Introduction
Statistic Background
Distribution Models
Distribution Parameter Estimation
Censored Data
Confidence Bounds
Life Distributions
Summary
Life Data Analysis (What)

• Using statistical approach to quantify the life of a


product.
• Statistically, it describe the Time-to-Event (failure)
distribution of a population (of a product).
Definition

• In the context of Data Analysis, Reliability is defined


as…
• The probability that an item will perform its
intended function for a designated period of time
without “failure” under specified conditions.

R(0)=1 R(t1)=0.95 R(t2)=0.7 R(t3)=0.5 R(t4)=0.01


Life Data Analysis (What)
• Life Data Analysis (Distribution Analysis) deals with a
population of identical items that can fail only once
• The Time-to-Failure of individual items are independent of
each others.
• In statistic, this is called Identical and Independently
Distributed (IID)
• In reliability application, we call this Non-Repairable Item.
• In fact, as long as the data set is IID, Life Data Analysis is
applicable.
• Time-to-repair of an equipment
• Weight of a Male/Female population
• Strength of Materials…
Life Data Analysis (What)

• We need to distinguish the difference between


repairable and non-repairable item (system).
• Life Data Analysis is not applicable to repairable
item (system)
Life Data Analysis (Who and why)

• Manufacture
• Customer expectations/satisfactions
• Competitions (Market share/pricing)
• Warranty cost
• Demonstration of product performance
• Maintenance Organization (Asset Owner)
• Spare management
• Optimum replacement interval
• Quantify failure rate behaviors and downtime
distributions for RAM analysis
Life Data Analysis (Why)

• From maintenance perspective, the Lowest Actionable Items


of a repairable system can be non-repairable and/or
repairable items.

• We can model the system reliability in term


of Lowest Actionable Items.

• LDA is applicable to non-repairable


items.
Life Data Analysis (How)

• Data
• Time to failure data
• Understand the data type
• The most time consuming task of analysis process
• Software tools
• Weibull Toolbox (AssetStudio)
• Facilitate data classification and data entry
• Visualize the results
• Focus on your engineering problems
• Other commercial software also available
• Free statistical tools: R, Python (coding required)
Life Data Analysis (How)

• Know-how
• Data classifications, validations
• Understand the underlaying statistical concepts
• Understand when it is not appropriate to use LDA and
what are the alternatives
• Interpret results
Life Data Analysis

Introduction
Statistic Background
Distribution Models
Distribution Parameter Estimation
Censored Data
Confidence Bounds
Life Distributions
Summary
Statistical Background

Random Variable
PDF, CDF
Reliability, Unreliability
Failure Rate
Condition Reliability
Mean and Median
Random Variables
• For Life data analysis, we focus on Time-to-Failure
Distribution. The unit of your data set is time. The random
variable is time-to-failure.
• Supposing you want to perform a distribution analysis on
the weight distribution of boys in your school, sample
weights data are collected. In this case weights is the
random variable.
• Our random variables (TTF) are Real Number in the positive
domain.
• TTF is classified as Continuous Random Variable
Discrete Random Variable

• In contrast to Continuous Random Variables is the


Discrete Random Variables.
• Toss a dice N times, and we have N data points.
• The outcome of each toss is probabilistic, but
constrained to one of the value: {1, 2, 3, 4 , 5, 6}.
• In Life Data Analysis, we deal with Continuous
Random Variables only.
Probability Density Function (PDF)
• Assuming following is times-to-failure (days) data for a population of a
1000 Bearings.

902 234 489 511 748 443 567 353 494 1170
1130 252 175 241 591 366 484 262 521 644
843 632 184 494 322 774 587 896 310 683
642 291 871 574 233 543 809 425 265 949
177 717 699 372 742 484 618 715 576 1020
577 490 360 394 745 341 649 922 453 1002
539 436 456 183 635 500 379 207 551 757
715 913 592 620 336 348 247 422 872 837
245 595 656 987 549 594 534 280 727 395
212 401 965 359 316 356 499 638 726 429
… … … … … … … … … …
… … … … … … … … … …
Probability Density Function (PDF)

• Create a histogram with interval = 200 days


329

258
222

193
67

25
5 1
0-200 200-400 400-600 600-800 800-1000 1000-1200 1200-1400 1400-1600

67 289 618 876 969 994 999 1000


Probability Density Function (PDF)

• Histogram with interval = 20 days


Probability Density Function (PDF)

• We denote pdf as f(x)

0.000 300.000 600.000 900.000 1200.000 1500.000


Cumulative Density Function, F(t)

• The cdf is a function F(t) of a random variable t,


defined by:

Where, F(𝑡 ) is the probability that the


observed value t will be less than or equal to 𝑡 .
Cumulative Density Function, F(t)

• Given a pdf f(t), the corresponding cdf F(t) is:

• Given cdf F(t), the corresponding pdf f(t) is:


Unreliability Function

• cdf is also known as Unreliability Q(t).


• Q(t) is the probability of failure at time t.
Reliability Function

• R(t) is the probability of success at time t.


Failure Rate Function, λ(t)

• The failure rate function provides the percentage of


failures occurring per unit time.

Where N(t) is the number of


survival at t, 𝑁 is the initial size.
Conditional Reliability Function, R(t/T)

• Conditional reliability is the probability of a unit


surviving a mission of duration, given that it has
already accumulated an age of .
R(T+t)
R(T) R(t/T)

T t
Mean Life

• Mean time to failure (MTTF),

Weibull(5, 327) Normal(300, 100) Exponential(λ=1/300)


Median Life
• The time ( ) by which 50% of the population fails.
• For exponential distribution…

Weibull(5, 327) Normal(300, 100) Exponential(300, 100)


B(X) Life
• B(X) life: The time by which X% of the population fails.
• B(1) life: The time by which 1% of the population fails.
• Median Life = B(50)

• For exponential distribution,


Common Metrics

• Reliability, ( ): Probability that a failure will not be observed by time .

• Unreliability, ( ): Probability that a failure will be observed by time .

• Life: Time at which unreliability is equal to X% (e.g., B10: time by


which 10% will fail).

• Mean Life ( ): Average time-to-failure.

• Median Life: Time by which 50% are expected to fail.

• Conditional Reliability ( , ): Probability of a unit surviving a mission of


duration, given that it has already accumulated an age of .
Life Data Analysis

Introduction
Statistic Background
Distribution Models
Distribution Parameter Estimation
Censored Data
Confidence Bounds
Life Distributions
Summary
Distribution models

• In the field of Reliability Engineering, only few


failure data are observed.
• Not possible to draw a histogram.
• Life Data Analysis is a technique used to derive the
PDF given the constraints.
• I want the PDF, but I have only a few data
points!
• To facilitate the analysis, mathematicians
predefined some PDF functions…
Commonly Used Distributions
• The common Life Distributions for Reliability Engineering:
• Normal (μ,σ)

• Lognormal (μ’,σ’)

• Exponential (λ)

• Weibull (β,η)

• Other models are also available.


Life Data Analysis

Introduction
Statistic Background
Distribution Models
Distribution Parameter Estimation
Censored Data
Confidence Bounds
Life Distributions
Summary
Distribution Parameter Estimation

Rank Regression Method


Maximum Likelihood Estimation
Parameter Estimation
• If you have many data points (say > 100), you can draw the
histogram to observe the distribution profile.
• But if you have only a few, say 4 data points?
• To make it mathematically feasible to describe the
distribution of a small data set, we assume a distribution
model, say Lognormal( ), then estimate the model
parameters .
• We could perform the same analysis across different
distribution models, and determine which model best
describe the data set.
Parameter Estimation

• The objective is,


• Given that we have some observed data, determine the
parameter values so that the model best describe the
distribution of these data.
• The subsequent sections will describe 2 methods
that can achieve the objective,
• Rank Regression
• Maximum Likelihood Estimation (MLE)
Parameter Estimation: Rank
Regression
• Assuming we want to fit the following data set to a Weibull
distribution.

• 15, 20, 30, 40 (hours)

• Objective:
• Solve for the values that best describe the distribution of the
observed data.
Rank Regression
• Associate each observed data with a probability value
(probability of failure), z.
j tj zj
1 15 z1
2 20 z2
3 30 z3
4 40 z4

where
is the rank position
is the Time-to-Failure (TTF), and
is the rank value (probability of failure)

Q(0)=0 Q(15)=𝑧 Q(20)=𝑧 Q(30)=𝑧 Q(40)=𝑧


Median Rank

• zj is the probability of failure at tj, such that there is


a 50% chance of observing at least j failures for a
sample size of N.
j tj zj
1 15 z1
2 20 z2
3 30 z3
4 40 z4

• z1 is the probability of failure at 15 hours, such that


there is a 50% chance of observing at least 1
failures for a sample size of 4.
Median Rank

• z is calculated using Cumulative Binomial equation.

• Solution for z requires numerical methods.


• See appendix for more details.
Median Rank
• Solving for 1, 2, 3 and 4:

j tj zj
1 15 0.159
2 20 0.386
3 30 0.614
4 40 0.841

Q(0)=0 Q(15)=0.159 Q(20)=0.386 Q(30)=0.61.4 Q(40)=0.841


Probability plot

• The unreliability function is not a linear function

1.000

0.800

0.600

0.400

0.200

0.000
0.000 10.000 20.000 30.000 40.000 50.000
Probability-Weibull plot

• Perform axis transformation:

Y m.x c
Probability-Weibull plot j tj zj
1 15 0.159
2 20 0.386
3 30 0.614
4 40 0.841

0.90

0.50

0.10
10.000 100.000

10 15 20 30 40
Probability-Weibull plot, value

0.90

1 1
∆𝑌 = 𝑙𝑛 𝑙𝑛 − 𝑙𝑛 𝑙𝑛
0.50 1 − 0.86 1 − 0.32
= 1.629

∆𝑋 = 𝑙𝑛 40 − 𝑙𝑛 20
= 0.689

0.10
10 15 40
10.000 100.000

20 30
Probability-Weibull plot, value
• One could determine the Y-interception and workout the .
• An easier way is to evaluate Q( ) from cdf:
Probability-Weibull plot, value
Parameter Estimation
• Similarly, you could choose to fit other distributions with the
same data set.
99.000 99.000

90.000

50.000

50.000

Probability Probability
Weibull Exponential
10.000 10.000
10.000 100.000 0.000 10.000 20.000 30.000 40.000 50.000

99.000
99.000

50.000
50.000

Probability Probability
Normal Lognormal
10.000
0.000 10.000 20.000 30.000 40.000 50.000 10.000
0.000 10.000 20.000 30.000 40.000 50.000
Least Square Regression

• In probability plotting method, the data set should


form a straight line if they obey the distribution
model.

• A straight line is fitted to a set of data points such


that the sum of the squares of the vertical
deviations from the points to the line is minimized.
Least Square Regression

RRX RRY

• Minimize error in x- • Minimize error in y-


direction direction
Least Square Regression

• RRX and RRY produce different line for the same


data set.
• The median ranks (y-axis) are calculated base on
sample size, and is therefore deterministic, while
the times (x-axis) are random.
• For probability plot with data set, RRX is preferred
as it minimize the error along x-axis.
Correlation Coefficient, ρ
• The correlation coefficient, denoted by ρ, measures how
well the data form a straight line.
• The range is between -1 and +1.
• -1 is a perfect fit with negative slop.
• +1 is a perfect fit with positive slop.
Correlation Coefficient, ρ

• Correlation Coefficient is used to assess which


distribution provides a better fit for a given data
set.
Probability-Lognormal Probability-Weibull
99.000 99.000

90.000

50.000
Unreliability, F(t)

Unreliability, F(t)
50.000

10.000

5.000

10.000

5.000

1.000 1.000
0.010 0.100 1.000 10.000 100.000 1000.000 0.010 0.100 1.000 10.000 100.000 1000.000
Time, (t) Time, (t)

ρ =0.991 ρ =0.954
Parameter Estimation: MLE

• Maximum Likelihood Estimation (MLE) is a method


of estimating the parameters by maximizing a
likelihood function, so that under the assumed
statistical model the observed data is most
probable.

• What is likelihood function?


Parameter Estimation: MLE
• Assuming we want to fit the following data set to a Weibull
distribution.

• 15, 20, 30, 40 (hours)

• Objective:
• Solve for the values that best describe the distribution of the
observed data.
MLE Concept

• Construct Likelihood Equation

• Solve for and η such that L is maximum.


MLE solution
• The likelihood function is given by:

• The logarithmic likelihood function is given by:

• The maximum likelihood estimators (MLE) of are obtained by


maximizing L or .

• Solve this, or
MLE solution
• The derivatives of is much easier to obtain than L.

• Solve for and .


MLE solution, Likelihood Value
• The MLE solution for the data set (15, 20, 30, 40), with
Weibull distribution is ( .
• Substitute these parameters back to the Log-Likelihood
function

• This is called the Log-Likelihood value.


MLE solution, Likelihood Value
• Similarly, you could apply the method to fit the same data
set to other distributions and solve for the corresponding
parameters.
• Weibull(3.03, 29.5), LK Value: -14.617
• Normal(26.25, 11.1), LK Value: -14.80
• Lognormal(3.20, 0.433), LK Value: -14.624
• Exponential(0.0381), LK Value: -17.07

• Weibull distribution produce the largest LK Value for this


data set. I.e. Weibull is more likely than the others.
Life Data Analysis

Introduction
Statistic Background
Distribution Models
Distribution Parameter Estimation
Censored Data
Confidence Bounds
Life Distributions
Summary
Data Type for Distribution analysis

• Exact Time-to-Failure
TTF

• Right Censored (Suspension)


suspension

• Interval Censored
Failed interval

• Left Censored
Failed interval
Data Type
• A Complete Data Set refers to a data set that contains only
exact time-to-failure data.

• A data set may contain a combination of different data type


=> Censored data set.

• Censored data are important. It should not be omitted.

• It should be classified correctly.


Censored Data

• Consider the failure data:


970 463 266 353 1097 148 741 373 59 35
484 849 369 194 257 350 487 308 245 665
377 671 373 125 502 620 488 162 543 551
819 235 230 398 317 522 721 893 384 823
60 534 500 230 434 196 551 158 279 457
654 475 301 250 282 695 285 43 372 200
248 775 331 429 273 540 490 231 766 655
393 126 234 135 218 535 541 1260 336 117
231 185 522 315 668 589 282 338 853 299
385 259 300 456 103 669 222 307 906 227
Censored Data (cont’d)

• Assuming you perform preventive maintenance


after the first 150 days, the data would have been:

970 463 266 353 1097 148 741 373 59 35


484 849 369 194 257 350 487 308 245 665
377 671 373 125 502 620 488 162 543 551
819 235 230 398 317 522 721 893 384 823
60 534 500 230 434 196 551 158 279 457
654 475 301 250 282 695 285 43 372 200
248 775 331 429 273 540 490 231 766 655
393 126 234 135 218 535 541 1260 336 117
231 185 522 315 668 589 282 338 853 299
385 259 300 456 103 669 222 307 906 227
Censored Data
F 35
• If you consider only the failure data that you have F
F
43
59
observed (35, 43, 59, 60, 103, 117, 125, 126, 135, F 60

148) for your analysis… F


F
103
117

(you ignore the 90 items that where still working F


F
125
126
after 150 days) F
F
135
148

Original Original
1 F 35
1 F 43

Censored Data 1
1
1
F
F
F
59
60
103
1 F 117
1 F 125
• If you consider the 90 suspensions data… 1 F 126
1 F 135
1 F 148
90 S 150
Parameter Estimation with Censored
Data
• Using rank regression analysis, the rank position of each
failure has to be adjusted to accommodate suspension data
(Leonard Johnson’s approach).
• In the case of MLE, the complete likelihood function
consider the censoring time.

Complete likelihood function with Weibull Distribution


Analysis Methods (Rule of Thumb)

• Use MLE if data set contains censored data


(especially heavily censored data)
• MLE method considers censored time of data
• RRX is less robust in handling censored data

• Use RRX if sample size is small and uncensored


• Biasing properties of MLE is more pronounced for small
sample size.
Life Data Analysis

Introduction
Statistic Background
Distribution Models
Distribution Parameter Estimation
Censored Data
Confidence Bounds
Life Distributions
Summary
Confidence Bounds
• Assuming you received the Supplier A Supplier B
following data from supplier 25
35
18
19
A and B respectively: 40 26
53 26
60 31
33
33
34
39
40
42
45
47
48
55
58
58
59
67
74
Probability - Weibull
99.000000

90.000000

Supplier A and B… 50.000000

Unreliability, F(t)
• Both suppliers tell you B(5)= 17.9 Hrs
that their products have a 10.000000

B(5) life of 18 hours. 5.000000

1.000000
10.000 100.000

• Which supplier do you


Time, (t)

Probability - Weibull

prefer? 99.000000

90.000000

50.000000

Unreliability, F(t)

10.000000
B(5)= 17.7 Hrs
5.000000

1.000000
10.000 100.000
Time, (t)
Confidence Bounds
B(5) life @ 90% Confidence level
Probability Density Function

10%
B(5)
Probability - Weibull
99.000000

90.000000

50.000000
Unreliability, F(t)

10.000000

B(5)
5.000000

B(5) @ 50% CL =17.9


1.000000
10.000 100.000
Time, (t)

B(5) @ 90% CL =10.6


B(5) life @ 90% Confidence level
Probability - Weibull
99.000000

90.000000

50.000000
Unreliability, F(t)

10.000000

5.000000

1.000000
10.000 100.000
Time, (t)

10.6 13.5
Life Data Analysis

Introduction
Statistic Background
Distribution Models
Distribution Parameter Estimation
Censored Data
Confidence Bounds
Life Distributions
Summary
Life Distributions

Exponential Distribution
Weibull Distribution
Normal Distribution
Lognormal Distribution
Exponential Distribution
• The exponential is given by:

where:
• = failure rate
• = mean time to failures ( )

• The exponential distribution is a commonly used


distribution in reliability engineering.
• Due to its simplicity, it has been widely employed even in
cases where it does not apply.
Exponential Distribution

• Unreliability function

• Reliability function
Exponential Distribution
• Recall failure rate function,

• For exponential distribution,

• Hence failure rate for exponential distribution,


Exponential Distribution
• Mean time to failure (MTTF) is,

• Substitute

• What is the unreliability at ,


Exponential Distribution
• Median, is the time by which 50% of the population fails.

i.e.
Exponential Distribution

• Exponential Distribution is a good choice…


• When failure mode is external in nature. Example, tires
failures due to puncture by nails.

• The failure-interval-times of a repairable system


tend to follow Exponential Distribution… but this is
most likely a wrong analysis!
Weibull Distribution
• The W is given by:

where:
• is the slop parameter
• is the scale parameter
Weibull Distribution

• Reliability Function

• Unreliability Function

• Failure Rate

• Mean

• Median
Weibull Distribution

• Weibull Distribution is a common choice in Life


Data Analysis as its pdf can take different shapes,
and consequently can approximate other
distributions.
• This distribution can be used to model decreasing
failure rate (infant mortality ), constant failure rate,
and increasing failure rate (wear-out).
Weibull Distribution

• Probability-Weibull plot with changing .


Probability - Weibull
99

90

50

10

1
1 10 100 1000
Weibull Distribution

• Weibull PDF plot with changing .


Probability Density Function
0.020

0.016

0.012

0.008

0.004

0.000
0.000 60.000 120.000 180.000 240.000 300.000
Weibull Distribution

• Weibull failure rate plot with changing .


Failure Rate vs Time Plot
0.200

0.160

0.120

0.080

0.040

0.000
0.000 60.000 120.000 180.000 240.000 300.000
Weibull Distribution

• Unreliability-Weibull plot with changing .


Unreliability vs Time Plot
1.000

0.800

0.600

0.400

0.200

0.000
0.000 60.000 120.000 180.000 240.000 300.000
Weibull Distribution

• Probability-Weibull plot with changing eta, .


Probability - Weibull
99

90

50

10

1
10 100 1000 10000
Weibull Distribution

• Weibull pdf plot with changing .


Probability Density Function
0.012

0.010

0.007

0.005

0.002

0.000
10.000 248.000 486.000 724.000 962.000 1200.000
Weibull Distribution

• Weibull failure rate plot with changing .


Failure Rate vs Time Plot
0.300

0.240

0.180

0.120

0.060

0.000
0.000 1000.000 2000.000 3000.000 4000.000 5000.000
Weibull Distribution

• Unreliability-Weibull plot with changing .


Unreliability vs Time Plot
1.000

0.800

0.600

0.400

0.200

0.000
0.000 300.000 600.000 900.000 1200.000 1500.000
Normal Distribution
• The N is given by:

where
• is the mean
• is the standard deviation
Normal Distribution

• Reliability Function

• Unreliability Function

• Failure Rate

• Mean

• Median
Normal Distribution
• The normal distribution is useful in statistic because of the
central limit theorem:
• The averages of samples of observations of random variables
become normally distributed when the number of observations is
sufficiently large.

• In Life Data Analysis, time-to-failure are always positive.


User should avoid Normal distribution when its random
variables spread to the negative range
Probability Density Function
0.030

0.024

0.018
f(t)

0.012

0.006

0.000
0.000 20.000 40.000 60.000 80.000 100.000
Time, (t)
Lognormal Distribution
• The Logn is given by:

where
• is the mean of the natural logarithms of t
• is the standard deviation of the natural logarithms of t
Lognormal Distribution

• Consider a data set { … }.

• Take logarithmic of the data set:


{ … } where

• If the data set { … } follow normal


distribution, then { … } will follow Lognormal
distribution.
Lognormal Distribution

• Reliability Function

• Failure Rate

• Mean

• Median
Lognormal Distribution

• The lognormal distribution is often used to model


times to repair a maintainable system.
• Many phenomena follow Lognormal Distribution
• Strength of materials
• Measures of size of living tissue (length, skin area,
weight)
• In neuroscience, the distribution of firing rates across a
population of neurons
•…
Lognormal Distribution
• Pdf with varying Log-mean, .
• Log-std, 0.2
Probability Density Function
0.800

0.640

0.480
f(t)

0.320

0.160

0.000
0.000 6.000 12.000 18.000 24.000 30.000
Time, (t)
Lognormal Distribution
• Pdf with varying Log-std, .
• Log−mean, 1
Probability Density Function
0.800

0.640

0.480
f(t)

0.320

0.160

0.000
0.000 1.200 2.400 3.600 4.800 6.000
Time, (t)
Life Data Analysis

Introduction
Statistic Background
Distribution Models
Distribution Parameter Estimation
Censored Data
Confidence Bounds
Life Distributions
Summary
Summary

• Life Data Analysis (Distribution Analysis) deals with (a


population of) items that can fail only once (Statistic term:
Identical and Independently Distributed, IID).

Perform LDA on this


population.
Deduce the Failure Rate
Behavior of original population

• Describe the failure rate behavior of an IID population.


Summary

• For IID population (non-repairable items), common metrics


that describe its life are:
• Reliability, R(t): Probability that a failure will not be observed by time t.
• Unreliability, Q(t): Probability that a failure will be observed by time t.
• BX Life: Time by which X% are expected to fail.
• Mean Life (MTTF): Average time to failure.
• Median Life: Time by which 50% are expected to fail.
• Conditional Reliability R(t/T): Probability that a failure will not be
observed by an additional time t, given that the item operated
successfully for time T.
Part 2: Recurring Data Analysis

Introduction
NHPP with Power Law
Optimum overhaul
Summary
Introduction
• Life Data Analysis deals with (a population of) units that
each experience only one failure. Each sample has one
observed value, either its age at “failure" or its current age
while “non-failed."

• In this section, we are exploring analysis that involve


recurring data where a sample unit may accumulate
multiple events over time.
• Examples include number of repairs on a product, number and
treatment of recurrent disease episodes in patients (e.g., bladder
tumors), childbirths, divorces, etc.
• Such repeated events are referred to as “recurring data.”
Recurring Failure Data

• Followings are the failure records for 3 pumps working


under similar stress condition.
• Pump 1 has been operating for a year, while pump 2 and 3
have been operating for 2 years.

Pump 1 Pump 2 Pump 3 LSB Line Shaft Bearing


Parts TTE/day Parts TTE/day Parts TTE/day ASA Arm & Seal Assembly
RTR Rotor
LSB 281 SSL 190 SBV 252
IPL Impeller
ASA 421 LSB 450 LSB 350 SBV Suction Bell Vanes
SSL 550 RTR 511 ASA 684 SSL Shaft Seal
SBV 556 IPL 622 End Time 730 SWA Switch Assembly
LSB 800 End Time 730
SWA 904
IPL 955
RTR 960
SBV 1010
End Time 1095
Recurrent Events/Failures

Pump 1

Pump 2

Pump 3
Common Mistake

• Take the time-between-failures for each system and


fit a distribution:

Pump 1 Pump 2 Pump 3


281 281 F 190 190 F 252 252 F
421 140 F 450 260 F 350 98 F
550 129 F 511 61 F 684 334 F
556 6 F 622 111 F 730 46 S
800 244 F 730 108 S
904 104 F
955 51 F
960 5 F 960 - 955
1010 50 F
1095 85 S Wrong
Analysis!
Common Mistake

• Fit the data set to Weibull distribution.

Time-to-Event Status(F/S) Comment


281 F Pump1
140 F Pump1
129 F Pump1
6 F Pump1
244 F Pump1
104 F Pump1
51 F Pump1
5 F Pump1
50 F Pump1
85 S Pump1
190 F Pump2
260 F Pump2
61 F Pump2
111 F Pump2
108 S Pump2
252 F Pump3
98 F Pump3
334 F Pump3
46 S Pump3
Common Mistake

• Beta = 1, and Eta = 175 days

• What is the probability of failure at 100 days,


Q(100)?
• Q(100) = 0.43 Pump 1 Pump 2 Pump 3
Parts TTE/day Parts TTE/day Parts TTE/day
LSB 281 SSL 190 SBV 252
ASA 421 LSB 450 LSB 350
SSL 550 RTR 511 ASA 684
SBV 556 IPL 622 End Time 730
LSB 800 End Time 730
SWA 904
IPL 955
RTR 960
SBV 1010
End Time 1095
Correct analysis

• The time-to-First-Failure of the pump is IID.

Time-To-Failure Comment
281 Pump1
190 Pump2
252 Pump3
Correct analysis
• Beta = 4.84, and Eta = 261 days

• What is the probability of failure before 100 days, Q(100


days)?
• Q(100 days) = 0.9905
• Upper Bound: 0.9998
• Lower Bound: 0.5965

• What is the probability of failure before 730 days (2 years)?


• Q(730 days) = 1;
Correct analysis

The issues with this analysis are


• Only time-to-first failures of each system are
utilized.
• The confidence interval is large.
• No information on subsequent failures.
• User is interested in reliability of equipment that already
has accumulated both a number of failures and
operating age.
Recurring Data Analysis

Introduction
NHPP with Power Law
Optimum overhaul
Summary
Recurring Data Analysis

• In-short all we have are cumulative time to events


from your data source.

Pump 1 Pump 2 Pump 3


281 281 F 190 190 F 252 252 F
421 140 F 450 260 F 350 98 F
550 129 F 511 61 F 684 334 F
556 6 F 622 111 F 730 46 S
800 244 F 730 108 S
904 104 F
955 51 F
960 5 F
1010 50 F
1095 85 S
Non-Homogeneous Poisson Process

• We need a methodology appropriate for repairable systems that


allows for recurrences of failures – NHPP with Power Law.

• NHPP assumes that a repair makes the system AS-BAD-AS-OLD


(minimal repair).

• The repair is just enough to get the system operational again.

• The time to first failure follows the Weibull distribution, then each
succeeding failure is governed by the Power Law model in the
case of minimal repair.
NHPP/Power Law

• The NHPP with a Power Law Failure Intensity


(Power Law Poisson Process):

where:
• P[N(t)=n] is the probability that n failures will be observed by time, t.
• (t) is the cumulative no. of failure (Mean Value Function).
• u(t) is the Failure Intensity Function (Rate of Occurrence of Failures).
NHPP/Power Law

• Consider that there are k identical systems.


NHPP/Power Law

• We would like to have a model that estimate the


cumulative number of failure over time.
NHPP/Power Law parameters

• MLE solution for λ and β:

N q

ˆ  q 1

 T   S  
k
ˆ ˆ
q q
q 1
k

N q

ˆ  q 1

 
k k Nq

ˆ  Tq ln(Tq )  S q ln( S q )    ln( X iq )


ˆ ˆ

q 1 q 1 i 1
NHPP/Power Law parameters

• Note that NHPP with Power Law can also be expressed


with β and η, where β and η are the parameters of
Time-to-First-Failure Weibull distribution.
NHPP/Power Law
• Cumulative no. of failure (Mean Value Function).

• Failure Intensity Function (Rate of Occurrence of Failures)

• Mission Reliability for duration d, given current age is t


NHPP/Power Law

• Failure intensity function :


• Cumulative no. of failure function:

Decreasing Intensity No Trend Increasing Intensity


Pump Example
• Apply this model to the previous pump recurring data.

Pump 1 Pump 2 Pump 3


Start 0 0 0
End Time 1095 730 730
281 190 252 Λ 𝑡 =𝜆 𝑡
421 450 350
550 511 684
556 622
800
904
955
960
1010

• Beta = 1.891, Lambda = 1.48 x10-5 /days


• Alternative format: Beta =1.891, Eta = 378 days
Query NHPP/Power Law model
• What is the probability that there is no failure in 100 days?
• How many times will it fail within 2 years (730 days)?
• Given that system 1 is already 3 years old, what is the
probability that it will not fail within the next 30 days?
Recurring Data Analysis

Introduction
NHPP with Power Law
Optimum overhaul
Summary
Optimum Overhaul (Economical Life)

• Assuming an equipment failure behavior obeys NHPP/Power


Law.
• Optimum Overhaul interval exist if
• The equipment has an increasing failure intensity (β > 1).
• The overhaul cost COH is greater than corrective maintenance cost
CCM. (COH > CCM)

• Let overhaul time = TOH, the system cost per unit time
(CPUT) is
Optimum Overhaul (Economical Life)

• At optimum overhaul time

where ,

• Solving
Optimum Overhaul (Economical Life)

• For our pumps, = 1.891, = 1.48 x10-5 /days

• Assuming the average repair cost and overhaul cost for the
pump are $10,000 and $50,000 respectively,

1.48 10−5
Recurring Data Analysis

Introduction
NHPP with Power Law
Optimum overhaul
Summary
Summary (RDA)

• RDA is an approach to describe the failure rate


behavior of the equipment/sub-system
• When data are not collected at non-repairable
component level, LDA analysis is not feasible.
• When items, even though they are lowest actionable
items, are repairable themselves.
• When collecting equipment failure modes are not
feasible.

• A quick analysis to check for MTBF trends.


Summary (RDA)
• In general, RDA data for equipment is easy to collect.
• RDA can be used to determine if a system has a failure
trends.
• For smaller repairable system (like LRU), RDA can be
used to determined the optimum overhaul time.
• RDA does not provide information like:
• Availability,
• Production impact,
• Spare part, Crew impact,
• Criticality and other details,
• Maintenance policies impact,
• Others…
Part 3: Simulation approach for
Repairable System

Introductions (What is RAM? Why? How?)


RBD concepts
Simulation concepts
Beyond RBD… AeROS constructs
Resource managements & Operation policies
Life-Stress-Relationship (LSR)
Summary
Introductions: What is RAM?
The concerns for maintenance organizations:
• What is the expected number of failures in the next time
interval?
• How much spares to order for the next time interval?
• How many crews are required?
• When is the best overhaul time?
• What is the production loss due to unavailability?
• …
In order to answer these uncertainties, system level
reliability analysis through simulation is required.
A general term for this analysis is called RAM (Reliability,
Availability and Maintainability)
Introductions: What is RAM?
• RAM involve analysing the historical failure
behaviours of assets, constructing System
Reliability Model, and to project the expected
failures and estimate its performance into the
future through simulation.
• Anticipate what is likely to happen in the future
• Check for any trend in term of productions, events

• Translate historical data into information that help


engineer to identify gaps and generate actionable
tasks with quantifiable effect.
Introductions: What is RAM?

Cost of operating
the system

Failure Behaviors
Availability

Spare & Crew Throughput


Policies

Optimum
System Overhaul
Maintenance Modeling Time
Policies
Introductions: Why RAM?
• Identify the gaps for improvement.
• Quantify the performance of assets
• Quantify the production impact due to asset unreliability

• Improvement program
• What-if analysis
• Tracking of improvement program (because you can quantify
it)

• Anticipant events
• Resource planning (e.g. spare ordering)
Introductions: How to perform RAM?

• Historical data
• Equipment status log
• Work Order data

• Failure rate and downtime behaviors of assets


• Identify the assets to be included in RAM analysis.
• Obtain the failure rate behaviors of asset.

• Construct System Reliability Model


• Study the production (process) network
• Understand how equipment failure affect the production
output
System Reliability Model for RAM
analysis
• In the subsequent sections we will explore the concepts of
RAM analysis, and the tools for system reliability modeling.

• Introduction to RBD (Reliability Block Diagram)

• Advanced system modeling concept (beyond RBD)


• Complex Redundancy construct (e.g. 3x2 standby configuration)
• Storage construct
• Shadow node construct
• Life-Stress-Relationship for modeling item life due to production
rate.
• Others…
Introduction to Reliability Block
Diagram
• Reliability Block Diagram (RBD)

• RBD focus on success/failure of a “complex” system


Introduction to Reliability Block
Diagram
• The original intention of RBD is to derive the analytical
expression of a system’s reliability given that the PDFs
(probability density function) of the components that make
up the system are known.

• Take a computer as an example…

System PDF
System Level Reliability, RS

• The PDF describes the failure rate behavior of the non-


repairable system (the computer).

• Assumptions:
• The computer is a non-repairable item.

• All components (that makeup of the system) operate


independently (this assumption is not true in most cases!).

• RBD analysis is meant for non-repairable system (e.g.


Computer)
Simulation approach for
Repairable System

Introductions (What is RAM? Why? How?)


RBD concepts
Simulation concepts
Beyond RBD… AeROS constructs
Resource managements & Operation policies
Life-Stress-Relationship (LSR)
Summary
RBD Constructs
• RBD is not designed for RAM simulation, however the basic
RBD concept is useful for describing System Reliability
Model for RAM analysis.

• Useful concepts from RBD:


• Series
• Parallel
• K-out-of-N
• Complex
Series construct

• N Nodes connected in series


Node 1 Node 2 Node N

• If any of the nodes/items fails, the system fails.

• System equation:
Parallel construct

• Nodes connected in parallel


• For the system to fail, all nodes must fail.
• System equation:

Note that a valid RBD must have


a start node and an ending node
Series-Parallel combinations

• Combination of series
and parallel construct.

• System equation:

𝑅 =𝑅 ⋅𝑅

𝑅 =𝑅 ⋅𝑅

𝑅 = 1 − (1 − 𝑅 ) ⋅ (1 − 𝑅 )
K-out-of-N construct

• N parallel paths, and min k paths must be


working.

• System equation:

Assuming R1 = R2 = …= RN = R
Complex Configuration

• Complex configurations cannot be expressed as a simple


combination of series and/or parallel configurations.

• Require more advanced analytical technique.


• Decomposition method
• Event Space method
• Path-Tracing method
• Bayes’ theorem method
• Others…
Simulation approach for
Repairable System

Introductions (What is RAM? Why? How?)


RBD concepts
Simulation concepts
Beyond RBD… AeROS constructs
Resource managements & Operation policies
Life-Stress-Relationship (LSR)
Summary
Simulation concepts
• Let’s consider a simple series network that consist of a
pump and distiller.

• The pump has an MTBF of 100 hours (Exponential


distribution), and repair time of 10 hours (fixed), while the
distiller has an MTBF of 200 hours and repair time of 20
hours.
Simple Series example
• AeROS generates sequence of Time-To-Failures (TTF) and Corrective
Maintenance (CM) downtimes of the pump and distiller, base on the
distributions assigned.
Simple Series example
• 1000 hours simulation

• The system fails if either Pump or Distiller fails.


• The blue color regions indicate the flowrate. Note that when
Pump fails, Distiller flowrate is also zero (and vice versa).
Basic simulation information

Availability Efficiency
• System Max Flowrate, FRmax = 1
unit/hour
• Av =
• Max Productions, Pmax= sim_time x FRmax

• Efficiency, η =
Simple Parallel example 1

• 3 identical pumps working independently.


• TTF: Exp(200 hours)
• CM: Exp(20 hours)
• Flowrate: 1 unit/hour

• In AeROS implementation, Start and End


nodes are not required
Simple Parallel example 1

Top Level summary Regular Node summary


Simple Parallel example 2

• 3 identical pumps. TTF: Exp(200 hours)


• CM: Exp(20 hours)
• Flowrate: 1 unit/hour
• Combined capacity of 3 units/hour

• Production Flowrate is constrained by


Cooler: 2.5 units/hour
• Assuming Cooler doesn’t fail.
Simple Parallel example 2
K-out-of-N example

• 3 identical pumps, each individual


pump has a Max Flowrate of 1.5
units/hour.
• The system is considered fail if it
cannot deliver 3 units/hour
Asset Reliability Performance

Repairable Asset
Production Network
Asset Reliability Performance
Consider the following scenarios:
• Scenario 1
A reparable asset fails due to components. How can we describe
the Asset Performance that is meaningful for maintenance
planning?

• Scenario 2
A production system is made of equipment (assets) whose MTBF
are known. What are the impact on the production?
Repairable Asset

• Followings are the failure records for 3 pumps working


under similar stress condition.
• Pump 1 has been operating for a year, while pump 2 and 3
have been operating for 2 years.

Pump 1 Pump 2 Pump 3 LSB Line Shaft Bearing


Parts TTE/day Parts TTE/day Parts TTE/day ASA Arm & Seal Assembly
RTR Rotor
LSB 281 SSL 190 SBV 252
IPL Impeller
ASA 421 LSB 450 LSB 350 SBV Suction Bell Vanes
SSL 550 RTR 511 ASA 684 SSL Shaft Seal
SBV 556 IPL 622 End Time 730 SWA Switch Assembly
LSB 800 End Time 730
SWA 904
IPL 955
RTR 960
SBV 1010
End Time 1095
Line Shaft-Bearing (LSB) Failures

• Extract Time-To-Event (TTE) for component LSB.


LSB (281) LSB (800)
Pump 1

LSB (450)
Pump 2 LSB
TTE/day Status ID
281 F Pump 1
LSB (350) 519 F Pump 1
295 S Pump 1
Pump 3
800 - 281 450 F Pump 2
280 S Pump 2
350 F Pump 3
380 S Pump 3
Line Shaft-Bearing (LSB) Failure
Distribution

Original
Observed LSB events
LSB

LSB
TTE/day Status ID
281 F Pump 1
519 F Pump 1
295 S Pump 1
450 F Pump 2
280 S Pump 2
350 F Pump 3
380 S Pump 3
Life Data Analysis on LSB
Converting Recurring Data to IID data

• Repeat the analysis for the remaining


components…
RTR IPL SWA SBV
TTE/day Status ID TTE/day Status ID TTE/day Status ID TTE/day Status ID
960 F Pump 1 955 F Pump 1 904 F Pump 1 556 F Pump 1
135 S Pump 1 140 S Pump 1 191 S Pump 1 539 S Pump 1
622 F Pump 2 730 S Pump 2 730 S Pump 2
511 F Pump 2
730 S Pump 3 252 F Pump 3
219 S Pump 2 108 S Pump 2
478 S Pump 3
730 S Pump 3 730 S Pump 3

SSL ASA LSB


TTE/day Status ID TTE/day Status ID TTE/day Status ID
550 F Pump 1 421 F Pump 1 281 F Pump 1
545 S Pump 1 674 S Pump 1 519 F Pump 1
190 F Pump 2 730 S Pump 2 295 S Pump 1
540 S Pump 2 684 F Pump 3 450 F Pump 2
730 S Pump 3 46 S Pump 3 280 S Pump 2
350 F Pump 3
380 S Pump 3
Life Distribution of each component

• Life distribution of each component

Distributions (day)
s/n Component Abbr.
Model Param 1 Param 2
1 Line Shaft Bearing LSB Weibull 5.83 452
2 Arm & Seal Assembly ASA Weibull 4.10 778
3 Rotor RTR Weibull 4.44 880
4 Impeller IPL Weibull 6.47 887
5 Suction Bell Vanes SBV Weibull 2.25 810
6 Shaft Seal SSL Weibull 1.71 912
7 Switch Assembly SWA Exponential 2.56E+03
Reliability Performance of Repairable
Asset
• The reliability information is assigned to the
corresponding components.
• Assuming every failure takes 10 hours to fix (to
replace the faulty component with a new one).
Reliability Performance of Repairable
Asset
• Run a 730-days simulation with 1000 executions
Reliability Performance of Repairable
Asset
• Top level results
• Using RDA, the projected
number of failure was 3.8
(lower: 2.5 , upper: 6.0 @
90% confidence bound)

• Item level results


Reliability Performance of Repairable
Asset
• If you are tasked to improve the performance of
this pump, is this information useful?
Reliability Performance of Repairable
Asset
• What if converting from your pumps data to IID
data is not feasible?
• Failure modes were not recorded
• The component itself is repairable (not IID)
• Too time consuming…

• Recurring Data Analysis is your only choice to


model the failure rate behavior.
Reliability Performance of Repairable
Asset
Is Repairable?

No Yes

LDA Is Multiple modes?

RDA RBD
Scenario 2, Example
s/n Node Name Flowrate Barrels/day MTBF/day Downtime/hour

• Consider an offshore oil


1 6S 30 100.0 LGN (2.94, 1.72)
2 BNDPA-WHCP6S 30 2372 LGN (2.77, 1.95)
3 13L 250 224 LGN (2.61, 1.78)

production platform.
4 BNDPA-WHCP13L 250 2372 LGN (2.77, 1.95)
5 49L 250 34.5 LGN (3.08, 1.79)
6 BNDPI-WHCP49L 250 2372 LGN (2.77, 1.95)
7 49S 400 Cannot Fail
8 BNDPI-WHCP49S 400 2372 LGN (2.77, 1.95)
9 50S 100 224 LGN (3.08, 1.79)
10 BNDPI-WHCP50S 100 2372 LGN (2.77, 1.95)
11 50L 400 363 LGN (2.43, 1.96)
12 BNDPI-WHCP50L 400 2372 LGN (2.77, 1.95)
13 53 400 224 LGN (2.61, 1.78)
14 BNDPI-WHCP53 400 2372 LGN (2.77, 1.95)
15 65 1500 548 EX1 (48.1)
16 BNDPI-WHCP65 1500 2372 LGN (2.77, 1.95)
17 21S 30 Cannot Fail
18 BNJTC-WHCP21S 30 2372 LGN (2.77, 1.95)
19 23S 1000 Cannot Fail
20 BNJTC-WHCP23S 1000 2372 LGN (2.77, 1.95)
21 BNPA-V200 4333 100 LGN (0.523, 1.4)
22 BNDPA-IGScrubber 280 1095 EX1 (6.5)
23 BNDPA-Others 280 274 EX1 (210)
24 BNDPI-Autocon 3050 274 LGN (0.368, 0.427)
25 BNDPI-IGScrubber 3050 1095 EX1 (25)
26 BNDPI-Others 3050 548 LGN (3.82, 1.68)
27 BNJTC-Autocon 1030 1095 EX1 (5.5)
28 BNJTC-IGScrubber 1030 100 LGN (2.43, 1.17)
29 BNJTC-CompIAPAC 1030 548 EX1 (183)
Reliability Performance of Repairable
Asset
• Run a 365-days simulation for 1000 executions
Reliability Performance of Repairable
Asset
• Take well 49L for example. The projected downtime over a
year is 866 hours over a year.

• Given its Flowrate is 250 barrels/day, the loss contributed by


well 49L:
Loss(49L) = 250 x 866/24 = 9,021 barrels (a year)

• Repeat the for the other equipment…


Reliability Performance of Repairable
Asset
• Rank asset criticality based on production loss
contributions.
FlowRate Loss Reliability Downtime Distribution
s/n Node Name # Events Downtime
(barrels/day) (barrels) MTBF Model Param1 Param2
1 BNDPI-Others 0.656 103.5 3050 13153 548 LGN 3.82 1.68
2 49L 9.166 866 250 9021 35 LGN 3.08 1.79
3 BNJTC-CompIAPAC 0.637 118.4 1030 5081 548 EXP 183
4 BNJTC-IGScribber 3.551 75.94 1030 3259 100 LGN 2.43 1.17
5 BNDPA-Others 1.305 264.6 280 3087 274 EXP 210
6 BNPA-V200 3.547 15.38 4363 2796 100 LGN 0.523 1.4
7 65 0.665 30.26 1500 1891 548 EXP 48.1
8 53 1.616 99.75 400 1663 224 LGN 2.61 1.78
9 50L 1.047 76.53 400 1276 363 LGN 2.43 1.96
10 13L 1.576 100.5 250 1047 224 LGN 2.61 1.78
11 BNDPI-IGScrubber 0.348 8.179 3050 1039 1095 EXP 25
Improvement Opportunity
• Identify the gaps for improvement.
• Quantify the performance of assets
• Quantify the production impact due to asset unreliability

• Improvement program
• What-if analysis
• Tracking of improvement program (because you can quantify it)

• Anticipant events
• Resource planning (e.g. spare ordering)
Simulation approach for
Repairable System

Introductions (What is RAM? Why? How?)


RBD concepts
Simulation concepts
Beyond RBD… AeROS constructs
Resource managements & Operation policies
Life-Stress-Relationship (LSR)
Summary
Beyond RBD… AeROS constructs
Regular Node
Schematic Node
Shadow Node
Storage Node
Beyond RBD

• In order to describe more complex


operating scenarios of a process, AeROS
implements the following concepts.
• Schematic constructs:
• Regular Node for both repairable and
non-repairable models
• Sub-Schematic
• Shadow Node
• Storage Node
Beyond RBD

• Resource and Operation Policy management:


• Redundant Resource (e.g. standby)
• Grouped PM
• Crew and Spare

• Life-Stress-Relationship
• Inverse Power Law (IPL): item cannot fail at 0 flowrate
• Modified IPL: item can fail at 0 flowrate
• Optimum: item TTF is optimum at design flowrate
Regular Node

• It represents a maintainable
component/asset and is used to
define the characteristics of
production network components.

• Regular Node Properties:


1. General
2. Reliability
3. Corrective Maintenance
4. Inspection
5. Preventive Maintenance
6. Overhaul
Regular Node
• By default, Regular Node are
“non-repairable”.
• PM and Inspection tasks are
applicable to non-repairable
node.
• Overhaul settings is applicable to
repairable node (see Recurring
Data Analysis).
Regular Node Flowrate Settings

• Max. Flowrate is the maximum production


rate the node can process.
• In the network below, max. flowrate of
node 1 and node 2 are 10 and 5 units/hour
respectively.

• The flowrate in Node1 will 5 units/hour.


Regular Node Flowrate Settings
• The reliability can change with production
rate (flowrate). The reliability is specified at
Design Flowrate.
• In the network below, reliability of Node1 is
specified at 10 units/hour.

• The flowrate in Node1 is 5 units/hour, the


TTF will be reduced according to the LSR
model specified in Life-Stress-Relationship
settings.
Regular Node Flowrate Settings

• The default LSR model is IPL,


with n = 1.

where L is life, V is the stress (flowrate) and K is a constant.

• User does not need to supply another other


information for this model. K is determined by the
application.
• User can choose other LSR model.
• If the life is independent of flowrate, set LSR to
Undefined.
• More on LSR in later section.
Regular Node Reliability (Non-
repairable)

• By default, Regular Node are “non-


repairable”.
• Reliability Model options are:
• Fixed
• Exponential and Exponential-2P
• Weibull-2P and Weibull-3P
• Lognormal
• Normal
Regular Node Reliability (Repairable)

• To set a regular node to “Repairable”,


select Repairable checkbox in General
section.

• The reliability model available will be


NHPP with Power Law. The model is
defined using Weibull Time-to-First-
Failure (TTFF) format.
Schematic Node
• Recall our pump example…

• To create a schematic of 3 of such pump working in parallel…


Schematic Node
• Alternatively, encapsulate the “Pump” as a Schematic Node
and use it 3 times…

3 instances of pump in parallel


Schematic Node

• Consider a network contain only a Schematic Node


(call Schematic1) as shown:

• Schematic1 consists of 2 regular node in series:


Schematic Node

• Node1 and 2 are in series. If any of the nodes down


(Out of Service, OoS), the schematic node is OoS.
Schematic Node (Hands-on exercise)

• Create a schematic node that contain 2 identical


nodes in parallel.
• Reliability: Exp(200 hours)
• Corrective Maintenance: Fixed(50 hours)

• Run a 1000 hours simulation with


1 execution.

• Plot the profiles


Shadow Node
• Shadow Node is used by Regular and Schematic node to
implement shadowing logic.

• If its host (Regular or Schematic node) is either in standby or


out-of-service (failed), it behaves as if it has failed.

• Shadow node can be set to negative logic. In this case


shadow node behaves as failed if its host is operating.
Shadow Node
• 2 Compressors K1 ad K2
working in parallel.

• If K1 (K2) fails, shut off


wells W1 (W3) and W2
(W4).
Storage Node
• Storage node provides buffering function in a production flow
network.

• Storage node cannot fail hence doesn't generate maintenance


event.

• But it can generate storage events like Storage-Full, Storage-


Empty and user defined Storage-High, Storage-High-High,
Storage-Low and Storage-Low-Low events.

• A storage event can be linked to regular nodes in order to control


the nodes operating status upon occurrence of this event.
Storage Node
• Consider a system that consists of a pump, tank and a distiller
connected in series.
• During normal operation, the pumped charge up the storage tank and
supply to the distiller with crude oil.
• If the pump fails, the tank can still deliver the crude to the distiller.
• If the pump is repair before the tank is empty, the system didn’t fail.

TTF: Exp(200 hr) TTF: Exp(600 hr)


CM: Exp(60 hr) CM: Fixed(10 hr)
Flowrate: 2 Capacity: 50 units Flowrate: 2
units/hr Initial level: 100% units/hr
Input: 2 units/hr
Output: 1 unit/hr

CourseExamples\StorageExample.aro
Storage Node

• Storage affect the availability calculation


• If Pump fails and Tank is not empty, such that the system is still able to
“produce”, the system is considered available.
Simulation approach for
Repairable System

Introductions (What is RAM? Why? How?)


RBD concepts
Simulation concepts
Beyond RBD… AeROS constructs
Resource managements & Operation policies
Life-Stress-Relationship (LSR)
Summary
Resource managements &
Operation policies
Redundant Resource (e.g. standby)
PM Group
Crew and Spare Resources
Resource and Operation Policy
management
• Redundant Resource (e.g. standby)
• configure a set of regular nodes operating in
coordinated sequence.

• PM Group
• PM-Group manager defines the inventory of items for a
maintenance group, and the PM triggering mechanism.

• Crew and Spare Resources


• Define crew and spare resources available for
performing maintenance tasks.
Redundant Resource Manager
• Redundant Resource Manager is used to configure a set of
regular nodes operating in coordinated sequence (e.g.
Standby configuration).

• The process involves:


• Select participating Regular Nodes (at least 2).
• Define valid operating states. A state specifies a unique
combination of operating and standby nodes.
• Determine the transition policy: switch periodically or on failure
events.
Standby Example
• Consider a 3x2 standby configuration with the following
requirements:
• Total of 3 pumps in parallel with 2 pumps operating and 1 standby,
with a total production rate of 2 unit/hour.

• The pumps are switched every 600 hours.

• If a given pump fails, the standby become active. The failed pump
become standby.

• If 2 pumps fail, the system still continue to operate with the only
working pump, with a reduced production rate of 1 unit/hour.

Examples\Redundancy.aro
Standby Example

• Redundant Resource Manager uses State Table to


define valid operating states.

State no. Pump1 Pump2 Pump3


1 OFF ON ON Switch every
2 ON OFF ON 600 hours
3 ON ON OFF
4 OFF OFF ON
5 ON OFF OFF
6 OFF ON OFF
Redundant Resource Manager

• Redundant Resource Manager


Standby Example

1. State changed due to Pump3 failure (Event Switching)


2. Pump2 failed, and only Pump1 was working (Event Switching, but to the same state)
3. Pump3 recovered and start working immediately (Event Switching)
4. Pump2 recovered and put to standby
5. Timer Switching Event. (event 3 to event 5: 600 hours)
PM Group
• Configure a trigger event to activate a pre-defined group of
regular nodes for Preventive Maintenance upon failure of
certain regular nodes within the group.
• The process involves:
• Configure the PM-settings of those Regular Nodes whose PM are to
be triggered. Node whose PM is not defined cannot be triggered.

• Create a PM-Group and select the Regular Nodes participating in


this maintenance group.

• Define which nodes, upon failure, can trigger the group, and nodes
to be triggered for PM.

Examples\UserGuide\PMGroup.aro
PM Group example
• Consider a series network. When any of the items (A, B or C)
fails, perform PM on the remaining items.
PM Group Manager
• PM Group Manager is used to define Regular Nodes participating in the
maintenance group, and the PM triggering mechanism.

Trigger
source
PM Group Manager

• Consider the following


network:

• PM is not defined for node


A (Run to failure), and its
failure will trigger B and C
for PM.
PM Group Manager
• Only A can trigger the PM Group…
Crew Resource
• Crew Resource Manager is used to define crew resource and set this resource
properties.
Crew Resource
• Assigned crew resource to nodes to perform maintenance (corrective,
inspection, preventive and Overhaul) tasks.

• “Crew1” describes the conditions that determine


• When a crew will be available to perform corrective maintenance for Pump.
• Logistical time and costs associated when engaging the crew
Spare Resource
• Spare Resource Manager is used to define spare resource and set this resource
properties.
Spare Resource
• Assigned spare resource to nodes for maintenance (corrective, preventive and
Overhaul) tasks.

• “Spare1” describes the conditions that determine


• Whether a spare is available upon CM request of Pump.
• The cost associated with the spare.
• The replenishment policies.
Replenishment policies
• TimeBased Restock
• At a regular time interval (Restock Interval), determine whether to place order, and the
quantity to order, so as to maintain the Restock Level.

2 1
Replenishment policies
• LevelBased Restock
• Restock occurs when the spare part inventory level is equal to or less than the user
defined trigger-level (Trigger Restock Level).

2
1
Simulation approach for
Repairable System

Introductions (What is RAM? Why? How?)


RBD concepts
Simulation concepts
Beyond RBD… AeROS constructs
Resource managements & Operation policies
Life-Stress-Relationship (LSR)
Summary
Life-Stress-Relationship (LSR)
Inverse-Power Law (IPL)
Modified IPL
Optimized LSR
Life-Stress-Relationship (LSR)
• A mathematical model to describe Flowrate (or production
rate) as a stress to the Life of an asset (Regular Node).

• Time-to-Failure (TTF) is estimated using cumulative-damage


with life-stress model.
Life-Stress-Relationship (LSR)

• The implementation can be grouped into 3 main


categories :
• Inverse-Power Law (IPL): Node does not age at 0 stress
(flowrate)

• Modified IPL: Node can age (i.e. can fail) at 0 stress (for
example during standby).

• Optimized LSR: For node whose life is optimum at design


production rate (FD). The item experiences more stress
when production rate deviates from FD.
Inverse-Power Law (IPL)

• Node doesn’t age at zero stress


(0 flowrate).

• The life-stress-relationship is

where L is the Life, and F is the Flowrate (Stress),


and K and n are model parameters
Modified Inverse-Power Law (IPL)
• Modified IPL allows Node to age (i.e. can fail) at 0 stress
(flowrate), for example during standby.

• The life-stress-relationship is

where L is the Life, and F is the Flowrate (Stress),


K, n and d are model parameters
Optimized LSR
• For node whose life is optimum at design production rate
(FD).
• When flowrate (stress) is below FD,

• When flowrate is above FD,

where y1 and y2 is 1/L, and


a1, b1, c1, a2, b2 and c2 are model parameters
Setting Life-Stress-Relationship
• The default setting for LSR is DefaultIPL.
• Inverse-Power-Law with n = 1.
• K is determined from Designed Flowrate.

• If a node life (TTF) is independent of Flowrate, set LSR to


Undefined.
Example
• This example demonstrates the effect of flowrate on item
life.

• If B fails, C will increase its flowrate to 2 units, and A reduces


to 2 units.
• Note that C operates above its design flowrates, while A
operates below its design flowrate.
Example
Simulation approach for
Repairable System

Introductions (What is RAM? Why? How?)


RBD concepts
Simulation concepts
Beyond RBD… AeROS constructs
Resource managements & Operation policies
Life-Stress-Relationship (LSR)
Summary
Summary (System Reliability-
Simulation approach)
• LDA and RDA are used to convert failure data into
statistical models that describe failure rate
behaviors. These are the fundamental input to
AeROS (System Reliability Models).
• Provide maintenance and production related
detailed:
• Expected number of failures and downtimes at
component/asset levels.
• Availability and production efficiency.
• Resource usage information
• Criticality and Sensitivity…
Summary (System Reliability-
Simulation approach)
AeROS allows system reliability modeling using
• Standard RBD
• Series, Parallel, K-out-of-N
• Advanced constructs
• Sub-schematic
• Shadow
• Storage
• Advanced concepts
• Redundant Resource Manager
• Life-Stress-Relationship
Comparison between RDA and
AeROS analysis
Compare RDA with AeROS
We are making this comparison because they both apply to
Repairable System.

• RDA (NHPP with Power Law) is appropriate for equipment


(asset) level analysis.
• AeROS is appropriate for equipment and process (network
of equipment) analysis.
• In the case of increasing failure intensity (FI), RDA assumes
that FI will keep on increasing (not true in most practical
cases).
• Using AeROS, the failure intensity will reach a constant level
(steady-state), which is more realistic.
Compare RDA with AeROS
• Recall the pumps analysis using RDA and AeROS
Pump 1 Pump 2 Pump 3
Parts TTE/day Parts TTE/day Parts TTE/day
LSB 281 SSL 190 SBV 252
ASA 421 LSB 450 LSB 350
SSL 550 RTR 511 ASA 684
SBV 556 IPL 622 End Time 730
LSB 800 End Time 730
SWA 904
IPL 955
RTR 960
SBV 1010
End Time 1095
3 years
Compare RDA with AeROS
• The blue line is the model calculated using RDA: Cumulative
number of failures over 5 years.
• While the points are generated using AeROS (seed=3) over 5
years.
RDA Approach
Advantages
• Data for RDA can be obtain quickly from organization data
source (WO, Operation/Maintenance log).
• Convenient way to calculate optimum overhaul time.
• Convenient way to check for failure trends.

Can not quantify the reliability impact on productions


AeROS Approach

Advantages
• Quantify the impact of assets reliability on
production efficiency.
• Provide a model for engineers to perform “what-if”
analysis.
• Identify gaps and improvement program to close
the gaps
Reference Textbooks
Reliability Engineering Handbook
Vols 1 & 2, by Dimitri Kececioglu
Reliability and Life Testing Handbook
Vols 1 & 2, by Dimitri Kececioglu
Statistical Methods for Reliability Data
by William K. Meeker and Luis A. Escobar
Applied Life Data Analysis
by Wayne Nelson
Appendix

Median Rank
Appendix: Median Rank calculation
• Rank value z can be calculated using Cumulative Binomial
equation.

where
N is the sample size,
P is the probability that at least j failures are observed.

Note: z (Rank value) is the probability of failure


Side note: Median Rank calculation
• If P = 0.5, such that:

• There is a 50% chance of observing at least j failures out of


N, for given z (to be determined)

• For P=0.5, z is known as Median Rank


Side note: Median Rank calculation
• Solve for z1 (N=4 and j=1):

Prob. that exactly 1 fails

Prob. that exactly 2 fail

Prob. that exactly 3 fail

Prob. that exactly 4 fail

Prob. that at least 1 fail


Side note: Median Rank calculation
• Solving for z,

for N = 4 and j = 1

=> = 0.159

Interpretation:
• If = 0.159 (the probability of failure of this population)
then, there is a 50% chance of observing at least 1 failure.
Side note: Solving for Median Rank, z

• Solution for z requires numerical methods.

• Excel function
• BETA.INV(0.5, j, N)

• Bernard's approximation

• Median Rank Table

You might also like