Problems Booklet
Problems Booklet
http://www.ntnu.edu/ross/books/sis
RAMS Group
Department of Production and Quality Engineering
Norwegian University of Science and Technology
Trondheim, Norway
1
Contents
1 Introduction 4
5 Reliability Quantification 25
10 Common-Cause Failures 54
11 Imperfect Proof-Testing 58
12 Spurious Activation 62
13 Uncertainty Assessment 66
2
Preface
3
Chapter 1
Introduction
Safety-critical system
Functional safety
4
(b) Why is it useful to classify safety barriers? Give some examples of such
classification.
(c) What measures can be used to define the performance of a safety barrier?
Risk reduction
Functionality/effectiveness
Reliability/availability
Response time
Robustness
Explain what is meant by each of these, and why they are important characteristics
of a safety-barrier.
(b) A safety barrier may be installed to either prevent hazardous events from
occuring or to mitigate the consequences of hazardous events. Due to this, the
textbook suggest two different ways to calculate the risk reduction for each of
these. Explain these two approaches.
(c) How can you formuulate a reliability or availability requirement on the basis
of required amout of risk reduction?
5
Sensors
Logic solver
(c) What is the difference between de-energize to trip and energize-to trip, and
which of these two principles would you select for a shutdown valve?
(d) What is the difference between fail-safe passive, fail-safe active, and fail-safe
operational?
(e) Which of these fail-safe design principles should be selected to allow:
Fly-by-wire (avionics)
Ensure that red light is activated if if the interlock system detects a failure
the green light has been set (railway)
Safe state
Fail-safe
De-energize to trip
Energize to trip
6
(a) What are the main characteristics of a generic standard?
(b) What do we mean by a sector-specific standard?
(c) IEC 61508 is often referred to as a to be a risk-based standard. What is
meant by risk-based in this context, and what is the main implication of taking
this approach?
(d) Some other sectors (like maritime sector) prefer rule-based standards for de-
sign of safety-critical systems. What can be arguments for and against rule-based
and risk-based standards?
(e) IEC 61508 takes a life cycle approach in the structuring of requirements. Why
is reasonable to address the whole life cycle of a system, and not only the design
phases, in order to achieve functional safety?
Give some examples when IEC 61508 must be used in stead of IEC 61511
when designing a new safety-critical system for a process plant
7
(a) What subsystems constitute a signaling system, and how are they related?
(b) Give some examples of safety functions associated with signaling systems?
(c) Give some examples of scenarios that must be treated by a signaling system
in relation to a two-track station with single tracks to and from the station.
(d) What standards are applicable for the design of a signaling system?
(e) What do we mean by a safety case, and what is the difference between a
generic product safety case, a generic application safety case and a specific appli-
cation safety case?
8
Chapter 2
Channel
Element
Voted group
Subsystem
Problem 2. Redundancy
Redundancy is often introduced to enhance reliability of SIFs.
(a) What do we mean by the term redundancy?
(b) Give some arguments for and against the use of redundancy as a means to
improve reliability.
9
(c) Give several examples of devices that are often made redundant in safety
instrumented systems for the process industry.
(d) What are the differences between active and passive redundancy? Give some
illustrative examples.
(e) What do we mean by partly loaded redundancy? Give some examples.
10
Demands and demand rate are two important issues to address during a risk as-
sessment of the EUC.
(a) What do we mean by the term demand?
(b) Give several examples of typical demands within different application areas
(c) What do we mean by the term demand rate?
(d) Why is the demand rate of importance for the design of a SIF?
(e) The demand rate is de D 5:2 10 5 per hour. How many demands should we
expect during a period of 20 years? What is the probability that we will have at
least one demand during one year?
(f) Give examples of demands where the demand duration may be important.
Safety integrity and safety integrity level (SIL) are two key concepts in IEC
61508. In fact, some may refer to IEC 61508 as a SIL-standard.
(a) What do we mean by the term safety integrity?
(b) Which quantitative reliability measures are used for safety integrity? Give a
brief explanation.
11
(c) IEC 61508 defines three categories of requirements that must be met in order
to achieve a certain level of safety integrity. Explain the meaning of each category.
(d) The safety integrity requirements are given as four distinct safety integrity
levels, SIL 1-4, where SIL 4 is the most strict requirement. What is, according to
your opinion, the rationale for splitting the requirements into four levels (SILs)?
Give a brief explanation.
(e) The process industry (see IEC 61511) does not recommend the use of SIL 4
requirements. Why may this be a reasonable position to take?
(f) What is the principal difference between a SIL requirement and the achieved
SIL for a SIF?
For channels that are not proven in use, it is necessary to also demonstrate
compliance with the requirements for systematic safety integrity. Systematic safety
integrity is mainly met by following certain qualitative requirements. Some of the
requirements are SIL independent (meaning that they apply to all SILs), whereas
others are SIL dependent. The SIL dependent requirements are listed in separate
tables in IEC 61508- 2 and 3.
(a) Give some rationales to why systematic safety integrity is a meaningful con-
cept (in view of what is covered and not covered by hardware safety integrity)
12
(b) Why can it be argued that software safety integrity is a subset of systematic
safety integrity?
(c) Explain the difference between a highly recommended (HR) requirement and
a recommended (R) requirement.
(d) Why are some requirements classified as not recommended (NR)?
(e) Go through tables B.1 and B.2 in IEC 61508-2 (with the support from IEC
61508-7) and discuss how easy it is to apply these requirements.
SIL allocation is the process of defining SIL requirements for individual safety
instrumented functions (SIFs), based on the overall need for risk reduction as
defined by the risk acceptance criteria.
(a) Mention some methods/approaches that can be used to allocate SILs to SIFs.
(b) Give a brief description of the risk graph method and discuss pros and cons
related to this method
(c) Give a brief description of the LOPA method and give some pros and cons
related to this method.
(d) What are the main differences between the IEC 61508 and the NOG Guide-
line 70 with respect to principles for determining the required SIL? Mention and
discuss some pros and cons for the NOG guideline 070 approach compared to the
IEC approach. Hint: To solve this problem it may be feasible to read selected
sections of Norsk Olje og Gass (NOG) guideline 070, which can be found at
www.norskoljeoggass.no/no/Publikasjoner (select “Retningslinjer” on
this page). You may read sections 7.2 (Approach), 7.6 (Minimum SIL require-
ments), and e.g. Appendix A.3.3 for an practical example).
(e) The SIL requirements derived at in Appendix A in NOG guideline 070, and
presented in table 7.2, are highly influenced by the choice of failure rates used
for the underlying calculations. Discuss some effects on the SIL requirement
setting from using overly conservative ( “too high”) failure rates versus using
overly optimistic ( “too low”) failure rates.
13
what is sometimes referred to as the SIL budget for the function, from end to end.
(a) The SIL budget may be distributed down to individual subsystems of the SIF.
What could be possible strategies to distribute this SIL budget (i.e., what could be
possible ways to define how much each subsystem can “consume” of the total SIL
budget)?
(b) Consider a SIF that must fulfill SIL 3. Assume that the subsystem of final
elements is allowed to consume 70% of the maximum allowed PFDavg for the SIF.
What is the PFDavg requirement for this subsystem?
14
(a) Explain the main objectives of a functional safety assessment (FSA).
(b) IEC 61508-1 gives requirements to the level of independence for those car-
rying out an FSA. Explain briefly how this level of independence is defined, and
describe the factors contributing to a high level of independence.
(c) Assume that you would like to carry out an FSA just after the SIL allocation
process has been completed (the design of the SIFs has not yet started). Assume
further that at least one SIF of the SIFs within the scope of the FSA has been
assigned a SIL 3 requirement. You suggest that an independent group in your
company, for example from an office within your company that is situated in an-
other city. Would this be an acceptable approach? Hint: The SIL 3 requirement
is not part of your decision here, but still it may indicate the severity level of
consequences if a SIF with a SIL 3 requirement fails to perform its functions.
(d) Assume now that your project has proceeded and that you are close to finaliz-
ing the detail design phase. You decide to carry out an FSA before the construction
starts, so ensure that no major issues are overlooked. This time you suggest using
an external consultant company to carry out the FSA who has not been involved
in any previous phases of the project. Is this a feasible approach according to IEC
61508? Explain.
(e) Assume now instead that this external company was involved in the develop-
ment of the SRS. Would you still think it was feasible to use this company to carry
out the FSA? Explain.
15
section argues why a minimum SIL 2 requirement can be set for this function. The
arguments are based on calculated values of PFDavg and some expert judgment,
but do not check the architectural constraints.
(a) Check if the SIL2 requirement is met when the architectural constraints are
taken into account
(b) Architectural constraints are introduced to compensate for uncertainty in re-
liability calculations. However, there may be uncertainty associated with the as-
sumptions and calculations made to determine the minimum HFT. Discuss main
uncertainties that are made to find the architectural constraints.
SIL tables give a relationship between the selected reliability measure and the
achievable SIL.
(a) A SIF has PFDavg D 5 10 3 . Which SIL can the SIF fulfill?
7
(b) A SIF has PFH D 4 10 per hour. Which SIL can the SIF fulfill?
(c) When the demand rate is close to once per year, we may, according to IEC
61508, use either PFH or PFDavg as reliability measure. A careful analysis has
shown that PFD of a single system is PFDavg D 8:0 10 4 such that the SIL 3
requirement is fulfilled. It has been assumed that the system is tested four times
every year (1 year =8760 hours). Is it possible to say if the same system would
also meet SIL 3 in the high demand mode, and how would you do this evaluation?
(d) The SIL table can also be used the opposite way. If a SIL requirement has been
stated, it outlines the required PFDavg or PFH range. Assume that you would like
to select one value as a PFDavg or PFH target value (so that you have one specific
value to compare with the calculated PFDavg or PFH for a SIF). Discuss where in
the range you would select the target value (upper, lower, or in the middle)?
(a) Is it correct to say that a SIS has SIL 3? (explain your position)
(b) Is it correct to say that a subsystem fulfills SIL 2, given that the architectural
constraints for SIL 2 are met? (explain your position)
4 3
(c) Will a SIF with a PFDavg between 10 and 10 automatically fulfill the SIL
16
3 requirements? Explain.
17
Chapter 3
18
(b) Failure mechanism
(c) Failure effect
(d) What are the main differences between a failure mode and a failure effect?
19
(c) What is a systematic failure/fault? Give some examples.
(d) Would you classify an excessive stress failure as random or systematic, and
why?
(e) Are there any relationships between common-cause failures and systematic
failures? Give some illustrative examples.
(f) How are failures/faults classified in the OREDA project (and data handbooks)?
Failure mode, effects and criticality analysis (FMECA) is a widely used method
for identifying and classifying failures of a system and its components.
(a) Why is it possible to argue that FMECA may be used to achieve reliability
growth in a design process?
(b) A similar approach, the failure modes, effects, and diagnostics analysis (FMEDA),
is often used to document compliance to the IEC 61508. In fact, an FMEDA is of-
ten included in an equipment safety manual or safety analysis report (SAR). What
is the main difference between an FMECA and an FMEDA?
(c) Assume that you would like to use an FMEDA to determine DU, DD, SU
and SD failure rates. Assume further that the component in question constitutes
some parts with high level of redundancy (on the control side) and other parts
that has only single elements. One such example could be a blow out preventer
(BOP) used to shut in the well in case of a well kick or rig problem. A BOP
manufacturer may want to provide failure rates for the BOP as such, since the
BOP from their perspective is a single unit of delivery. Discuss some challenges
in applying FMEDA in this case. Would you argue that it is reasonable to calculate
DU, DD, SD and SU failure rates for the BOP as such?
Assume that you are part of a team reviewing failures reported for safety-
critical items. The failures found in relation to point gas detectors are described in
the table below. Detection method means how the failure was discovered. Detec-
tion method PM means that the failure was found during regular preventive main-
tenance, such as function testing. Alarm means that the failure was announced
by an alarm from the self-diagnostic system of the component, or the fire and gas
central.
20
Hint: If you are not familiar with point gas detectors, you may do a search
on the internet to find some useful sources. Here is one example accessed in July
2017: https://www.gmiuk.com/wp-content/uploads/2014/09/GD10P_
Operator_Manual.pdf
(a) Classify failures using failure categories DU, DD, S, NA (NA means here Not
Applicable, due to not being a failure at all or the equipment being in a degraded,
but still functioning state).
(b) Discuss some of the challenges you face when you do the classification. What
would you do to clarify missing information?
(c) What types of failures seem to be of a type that is reoccurring. How easy is it
to avoid such failures in operation?
21
Chapter 4
22
forcing input/output signals). Discuss the possible implications that these design
measures may have on the reliability in light of random hardware failures and
systematic failures.
– Partial test
– Full test
– Manual test
– Automatic test
23
– Imperfect test
– Online test
– Offline test
– Staggered testing
– Sequential testing
– Simultaneous testing
24
Chapter 5
Reliability Quantification
Problem 1. RBDs
Consider the system represented by the reliability block diagram in Figure 5.1.
– Explain what we mean by the concept minimal cut set in a reliability block
diagram (RBD).
Assume that the components of the system are independent with the following
function probabilities (reliabilities):
p1 D 0:90; p2 D 0:95; p3 D 0:85; p4 D 0:90; p5 D 0:80.
2 4
1 5
25
1 4
2 5
Problem 2. RBDs
A system has two minimal cut sets: C1 D f1; 2; 3g and C2 D f1; 3; 4; 5g.
(a) Carry out the following:
– Draw the corresponding reliability block diagram.
Problem 3. RBDs
Consider the system described by the reliability block diagram in Figure 5.2.
The six components are assumed to be independent with reliabilities: p1 D 0:90,
p2 D 0:95, p3 D 0:85, p4 D 0:80, p5 D 0:95, and p6 D 0:85.
(a) Carry out the following:
– Identify the minimal cut sets of the system
26
– Explain, with words, what a minimal cut set is
27
a system stop, the mean time to bring the system back to operation is 48 hours,
irrespective of state of the system when it entered the idle state.
Record any additional assumptions you have to make to answer the questions
below.
(a) Define the relevant system states. Use as few states as possible.
(b) Draw the corresponding state transition diagram (Markov diagram).
(c) Establish the transition rate matrix A for the production system.
(d) Establish the Markov steady-state equations on matrix form.
(e) Explain (briefly) what we mean by the concept steady-state probability in this
case.
(f) Find the steady-state probability of the production system.
(g) Establish the Petri net model for this system.
(h) Identify markings with 100%, 50%, and 0% capacity respectively.
(i) Compare the pros and cons of using Markov method and Petri net for this
particular problem and in general.
– Find the mean time to failure, MTTF, of the gas detector (with respect to all
(both) failures).
(b) Assume that one of the two failure modes has occurred. Carry out the follow-
ing:
28
– What is the probability that this failure is a DU failure?
– Explain (briefly) how you determine this probability (or, develop the for-
mula).
(c) Assume that the production of the gas detectors is subject to variations. When
we buy a gas detector, it will have a constant DU failure rate DU , but the failure
rate may vary from detector to detector. The variation may be described by a
gamma distribution with probability density function
ˇ˛ ˛ 1 ˇDU
fƒ .DU / D e for DU > 0 (5.1)
.˛/ DU
The mean value of this distribution is ˛=ˇ and the variance is ˛=ˇ 2 . Based on
earlier experience, we assume that the mean value of the failure rate DU is 1:6
10 6 påer hour, and that the standard deviation is 0:5 10 6 per hour. Carry out
the following:
– Assume that we choose a gas detector at random from the production and
find the survivor function RDU .t/ for this detector with respect to the DU
failure mode.
– Determine the corresponding failure rate function zDU .t/ for the gas detec-
tor and make a sketch of the function. Discuss the result.
Consider the 2oo3 system that is modeled in figure 5.18 and 5.19 in the text
book.
(a) Verify the formulas and the numerical results of example 5.23 (which does
not include CCFs) and example 5.24 which does include CCFs.
(b) Determine the MTTF for both cases and compare the results.
29
Chapter 6
30
Table 6.1: Influencing factors
Influencing factor Weight Score
Working principle 0.1 1.0
Location 0.2 1.5
Frequency of use 0.2 0.9
Environmental exposure 0.2 1.2
Frequency and quality of maintenance 0.3 1.2
A generic failure rate, as it is given in e.g. the PDS data handbook, may not
necessarily capture plant-specific conditions. Brissaud et. al (2010) has suggested
an approach where the generic failure rate may be adjusted, see chapter 6.5.2 in
text book. Assume that an analysis has been carried out and that the following
weight has been assigned for the most important influencing factors, see Table
6.1:
(a) Explain the meaning of weight and score in this model.
(b) Assume that you are considering a shutdown valve. Calculate the plant spe-
cific dangerous undetected (DU) failure rate P if the generic DU failure rate,
B D 1:9 10 6 failures per hour.
(c) Compare the approach (at a high level) of this model with the approach used
31
in MIL-HDBK-217F. What are some of the differences?
(a) What do we mean by a (reliability) data dossier and what type of information
is provided here?
(b) Study one specific reliability data dossier, for example the sample pages pro-
vided for the PDS data handbook at http://www.sintef.no/projectweb/
pds-main-page/pds-handbooks/pds-data-handbook/. Explain in more
detail the information provided and why this information is important in relation
to a reliability assessment.
32
Chapter 7
33
Problem 3. Safe failure fraction (SFF)
The safe failure fraction (SFF) is a disputed reliability parameter.
(a) Explain briefly what the SFF is and what it is used for
(b) Assume that you want to purchase a valve. Would the SFF be different if the
valve is to be used to open on demand or close on demand? Explain your position.
(c) A SFF=99% may be obtained for a component with high dangerous failure
rates as well as for low dangerous failure rates. Why is it so? Under what condi-
tions would this statement apply?
(d) Assume that you have designed a component and that you have determined
the SFF to be 72%. However, you would like to initiate a reliability improvement
program to increase the SFF to 95%. What could you do and what would be the
consequences (pros/cons) of your approach?
34
Chapter 8
35
(a) What are the underlying assuptions for using the average value of PFD as a
reliability measure?
(b) The PFD is the (average) probability of failure on demand. But demand rate
is not a parameter of the formula for PFDavg . Why do you think on demand has
been added to the term?
(c) Mean fractional downtime (MFDT) is another term used with the same mean-
ing as PFD. Why may it be argued that MFDT is a better (or more prescriptive)
term than PFD?
– Markov analysis
– Petri nets
Mention some pros and cons related to each of these methods. Suggest some
criteria that would be useful when selecting which method to use.
36
(d) Indicate how you may include the contribution from DD failures for a single
element and a subsystem comprising two elements voted 1oo2.
37
(c) Comment the principle difference between ˇ and ˇD .
(a) Explain and discuss briefly the following terms used by the PDS method:
(b) What is terms are included in PFD, and what is included in CSU?
(c) Based on the previous question: What is the main difference between CSU
and PFD?
38
(d) Assume that the DU failure rate of the component is DU D 1 10 6 per hour,
that the mean time to repair (MTR) is 8 hours, and that the test is carried out every
year (1 year is 8760 hours). Calculate the PFDavg using GRIF (google “GRIF
workshop”, trial version).
(b) Now, assume that the four detectors are not independent, but that 10% of
all DU failures of a detector are common cause failures (CCFs), and assume that
CCFs can be modeled by a beta-factor model with ˇ D 0:10.
39
– Determine the PFDavg of the 2oo4 voted group
– How much safer is a 2oo4 voted group compared with a 2oo3 voted group
when ˇ D 0:10?
(d) How many test intervals may pass before the subsystem is found in a failed
state considering the situation with and without CCFs? Does the result seem rea-
sonable?
(e) What is the mean time to the a failed state of the subsystem considering the
two cases in 14(d)?
– Find the probability that the gas detector survives 6 months (in continuous
operation) without any of the two failure modes.
– Find the mean time to failure, MTTF, of the gas detector (with respect to all
(both) failures).
– Explain briefly why the assumption about independent failure modes may
be a bit doubtful in this case.
40
(b) The gas detector is therefore proof-tested after regular intervals of length
D 6 months. The time required to test and repair a failed detector is so short
that it may be neglected. After a test/repair, the gas detector is assumed to be
as-good-as-new. Carry out the following:
– How many hours per year are we not “protected” by the gas detector – when
we assume that the gas detector should always be functioning?
(c) Assume now that we have four gas detectors of the same type. The four
detectors are connected to a logic solver with a 3-out-of-4 (3oo4) logic. The
gas detectors are tested at the same time every six months. Otherwise the same
assumptions as in point (c) apply. The logic solver is assumed to be so reliable
that its failure rate may be set to zero. In this question we assume that the four
detectors are independent. Carry out the following:
(d) Now, assume that the gas detectors are exposed to common cause failures that
can be modeled by a beta-factor model with ˇ D 0:08. Carry out the following:
– Find the PFDavg of the 3oo4 voted group in this case. Specify the proportion
of the PFDavg that is caused by independent failures and the proportion
caused by common-cause failures.
(e) Establish a Markov diagram for the 3oo4 system (with common-cause fail-
ures). Define the states required, the relevant transitions between these states, and
include the transition rates. You may assume that no repair actions are carried out.
Explain briefly how this model can be used to determine the PFDavg of the system.
41
Table 8.1: Reliability data
Components Value
DU-failure rate 3:0 10 5 per hour
DD-failure rate 5:0 10 5 per hour
Test interval 6 months
Mean repair time (DD and DU) 8 hours
ˇDD and ˇDU 0.1
Remember to list any additional assumptions you have to make to answer the
questions below.
(a) Calculate the PFD of the pumps with IEC 61508 formula, PDS formula, and
Markov model.
In a fire situation, the pumps need to run for a period of time to successfully
put out the fire. If the pumps stop in this period, the fire fighting is not successful.
This period of time is not accounted for in PFD calculation. During fire fighting,
the pumps are normally under much higher stress than when they are idle, so the
failure rate is higher. If the pumps need to run for 8 hours to put out a fire and a
running pump is 10 times as likely to failure as an idle pump.
(b) What is the probability of an unsuccessful fire fighting when we know that the
pump group has started?
(c) An unsuccessful fire fighting is a critical event, assume that fires break out
once every second year, what is the average frequency of critical events? Discuss,
and preferably show, how this measure may be calculated using e.g., an analytical
approach and Markov.
42
Pressure transmitters voted 2oo4
SIS
logic solver
Pipeline
ESDV1 ESDV2
proof-testing. The whole system is proof-tested at the same time at regular inter-
vals – with test interval D 1 year.
(a) Carry out the following:
– Establish a reliability block diagram of the whole system with respect to the
system’s main function as a safety barrier.
– List the minimal cut sets of the system.
The two valves, ESDV1 and ESDV2 , have two main failure modes: dangerous
undetected (DU) failures and safe (S) failures. The failure rate with respect to
DU failures is DU;V D 2:5 10 6 per hour, and the failure rate with respect to S
failures is S;V D 3:0 10 6 per hour. To act as a safety barrier, it is sufficient that
one of the valves is functioning.
(b) Carry out the following:
– Find the mean time to a DU-failure for one of the valves
– Determine the probability that both valves survive a test interval without
any failures.
– Consider one single valve, and find the probability that an S failure occurs
before a DU failure.
A pressure transmitter is has failure rate DU;PT D 3:0 10 7 per hour with
respect to DU failures and failure rate S;PT D 5:0 10 6 per hour with respect to
S failures.
(c) Carry out the following:
– Explain (briefly) what we mean by a DU failure and an S failure for a pres-
sure transmitter.
43
– Calculate the probability that the 2oo4 voted group of pressure transmitters
survives a test interval (1 year) without any DU failures – when you assume
that all items are independent.
– Calculate the PFDavg for the 2oo4 voted group (when you assume that the
pressure transmitters are independent – and when you assume that the time
required to test and repair the transmitters is negligible).
– List and explain the assumptions you make in order to calculate PFDavg .
The logic solver (LS) has failure rate DU;LS D 7:010 7 per hour with respect
to DU failures and failure rate S;LS D 1:010 6 per hour with respect to S failures.
(d) Carry out the following:
– Calculate the PFDavg of the whole system when you assume that all the
items are independent.
– List the assumptions you make to calculate this PFD, and explain (briefly)
what we mean by this PFD.
When a (single) signal about high pressure from a pressure transmitter is re-
ceived by the logic solver, the control room is alarmed and a repair-man is sent to
check and fix the problem. When the signal is “false” (safe), the repair-man needs
around 2 hours to repair the problem.
(e) Carry out the following:
– Calculate the total frequency of S failures from the SIS (that give production
shutdown).
– How many production shutdowns caused by S failures from the SIS must
we expect during a period of 10 years?
Assume now that the pressure transmitters are not independent, but that they
are exposed to common-cause failures that can be modeled by a beta-factor model.
Assume that the ˇ-factor with respect to DU-failures is ˇDU;PT D 0:10 while the
ˇ-factor with respect to S-failures is ˇS;PT D 0:25. The two shutdown valves and
the logic solver are still assumed to be independent.
(f) Carry out the following:
44
– How many production shutdowns caused by S failures from the SIS must
we now expect during a period of 10 years?
Remark:Some of these questions require that also chapters 9-12 have been cov-
ered.
45
Flow
transmitter 1
2oo3
Flow
transmitter 2
Pressure
transmitter 3
(a) Set up a reliability block diagram of the whole system with respect to the
system’s main function as a safety barrier.
(b) Explain briefly why a 2oo3 configuration of transmitters may have been cho-
sen for this particular SIS.
For the following analyses, we consider two failure modes dangerous undetected
(DU) failures and safe (S) failures. The times required for periodic proof testing
and the possible repair after a failure has been detected are first considered to be
negligible.
The failure rates for the various components are listed in Table 8.2.
Table 8.2: Failure rates for the SIS components in Figure 8.2.
Component DU-failure rate Safe failure rate
(hours 1 ) (hours 1 )
Flow transmitter DU;FT D 6:0 10 7 S;FT D 1:1 10 6
Pressure transmitter DU;PT D 3:0 10 7 S;PT D 4:5 10 7
Logic solver DU;LS D 1:0 10 8 S;LS D 5:0 10 8
Shutdown valve DU;V D 2:1 10 6 S;V D 2:3 10 6
(c) Find the probability that the whole system survives a test interval without any
failures at all.
It is first assumed that all components are independent.
46
A consultant claims that the PFD of the system can be determined by the upper
bound approximation formula.
(d) Use fault tree analysis along with the upper bound approximation formula to
find the PFD of the system.
(e) Discuss (briefly) the accuracy of the result you obtain.
Another consultant claims that it would be better to first find the PFD of each
of the 2oo3 transmitter subsystems by using approximation formulas and then
combine these to find the system PFD.
(f) Perform this calculation. Which of the two approaches would you prefer? Will
the last approach give a more correct result?
47
(j) Find the frequency of shutdowns caused by S-failures in the SIS-system.
How many production shutdowns caused by S-failures from the SIS must we now
expect during a period of 10 years?
Another consultant suggests using partial stroke testing (PST) of the valves
instead of staggered testing. A 60% coverage for the partial stroke test is assumed
for each valve.
(l) Calculate the PFD of the valve group when PST is conducted every month.
(m) Discuss briefly the pros and cons of stagger testing and partial stroke testing,
and tell us which testing technique you prefer.
48
Chapter 9
49
Table 9.1: Failure rates for the PSD system components in Figure 9.1.
Component DU-failure rate ˇ
(hours 1 )
Pressure transmitter (PT) DU;PT D 3:0 10 7 5%
Logic solver (LS) DU;LS D 1:0 10 7
Shutdown valve (XV) DU;V D 2:1 10 6 10%
tion to PFD formulas. What is the main difference between DGF in the high-
demand/continuous demand mode and the low-demand mode? Include one ex-
ample.
(b) Under what assumptions can we disregard the contributions from DD failures?
(c) Explain how to set up the formula for a single element and for a subsystem
of two elements voted 1oo2. Explain briefly how DU failures and DD failures,
including CCFs, are incorporated into the formula.
Most systems in the process industry are designed such that the demand for a
process shutdown is rather infrequent (<<once per year), therefore most process
shut down (PSD) systems are operated in the low demand mode and their reliabil-
ity are quantified by PFD. Due to more extensive use of automatic trips, one may
find that PSD systems on oil and gas installations offshore are demanded more
often than once per year (up to once per month). In such a situation, it may be
reasonable to calculate the PFH rather than the PFDavg.
Consider a PSD function that shall close one shutdown valve (XV) upon
pipeline pressure above specified setpoint, as shown in Fig. 9.1. The pressure
transmitters are voted 1oo2. All the components are proof tested at the same time
with an interval of 12 months. The failure data of the components are given in
Table 9.1, and for simplicity we include the contribution from DU failures only.
(a) Set up a reliability block diagram for the PSD function.
50
Figure 9.1: Process shutdown (PSD) function
(b) Calculate the PFH using what is referred to as simplified formula presented
in the SIS book and the IEC 61508 formulas. Since no information is provided
about DD failures, we ommit these failures from the calculations.
(c) Assume that demands occur with some months between (but still more often
than once per year, on the average). Is it reasonable to also use PFDavg for this
function, despite being in the high demand mode?
51
Figure 9.2: Pressure protection systems for a pipeline section.
(f) Calculate the PFD for the HIPPS systems using simplified formulas and IEC
61508 formula and the formula presented in the book. Since no information is
provided about DD failures, we ommit these failures from the calculations.
(g) Assume that a tolerable frequency of overpressuring the pipeline is 1 10 5
per year. Will the PSD system and the HIPPS system provide the necessary risk
reduction? What could you do if they don’t?
Consider the Markov model in Figure 9.5 in textbook, and also shown in Fig-
ure 9.3.
The states are shown in Table 6:
(a) Add the transitions needed to prepare the model for calculating steady-state
52
Safe
state
6 μSS
μS0
2DU 2DD
4 3
λDU λDD
βλDU βλDD
β
2(1-β)λDU
1OK 1DU
2(1-β)λDD
2OK 1OK 1DD
2 0 1
μDD
μDD
Figure 9.3: Markov model for a 1oo2 system exposed to DU and DD failures
53
Chapter 10
Common-Cause Failures
54
(b) In some cases, it may be argued that the C-factor model is more realistic than
the beta-factor model. Why is this the case?
55
(b) Make a reflection on the way CCFs are included here versus in the standard
beta factor model and the PDS CCF model.
56
are considered).
(c) Establish a Markov model for possible transitions within one test period, and
find the PFDavg when the test interval is 6 months.
Hint: After a DU-failure, there will be only two channels left, and n in the bino-
mial distribution is reduced to two.
(d) When using the beta-factor model, the effect adding more redundancy is very
small. What would we gain by introducing four channels, and vote them 2oo4,
when using the above shock model? (You may use approximation formulas).
Discuss what will be the result when p ! 1?
57
Chapter 11
Imperfect Proof-Testing
Partial stroke testing of a valve is a partial proof test designed to operate the
58
valve partially at regular intervals. This means that the valve is moved e.g. 20%
of its full stroke, before returned to its initial (open) position. Since the partial
operation does not cause any significant disturbances in the process, it is possible
to carry out this type of testing more often than a full proof test. Partial stroke
testing of HIPPS valves was introduced for the HIPPS system subsea at the Kristin
field (see the case study for more details).
Manually
activated PST
SIS
logic solver Solenoid
Pilot valve
Actuator
Pump
Pressure To process
PT PT control system Tank
transmitters
Pipeline
Shutdown valve
Manually or automatically
activated PST
Vendor
PST package
SIS
logic solver
Actuator
Pressure To process
PT PT control system
transmitters
Pipeline
Shutdown valve
1. PST activated via SIS/HIPPS logic solver: A timer starts to count when the
power is removed to the solenoid, and which repower the solenoid when the
timer has reached its setting.
(a) Why do you think partial stroke testing may be a desired option?
(b) What are the pros and cons of each of these options?
59
Failure mode Revaled by PST?
Fail to close (FTC)
Delayed operation (DOP)
Leakage in closed position (LCP)
Premature closure (PC)
Fail to open (FTO)
Leakage to environment (LTE)
(c) Assume that the following failure modes are applicable for the valve, see
Figure 11.2. Which ones of these may be revealed by partial stroke testing?
Assume that you have been given the following data for a shutdown valve, and
you are asked to use this as input for determining the partial stroke test coverage.
(a) What do we mean by partial stroke test coverage, and what are the factors
influencing its value?
(b) What is the partial stroke test coverage, ‚PS T , using the data in Table 11.1?
(c) Assume that the DU failure rate is 8 10 6 failures per hour. What is the
DU failure rate revealed by full proof test, DU;F T , and what is the DU failure
rate,DU;PS T , revealed by partial stroke testing?
Problem 4. Assume now that you have calculated the PST coverage factor and
60
the DU failure rates from from Problems 3(b) and 3(c) .
(a) Calculate the effects of introducing partial stroke testing compared to not
using partial stroke testing (give also the percentage reduction). You may assume
that the partial stroke testing is carried out every month (every 730 hours), and the
full proof test every year (8760 hours).
Problem 5. We often say that partial stroke testing can be used for either improv-
ing safety or reducing costs.
(a) How would you explain this statement with basis in the formula for PFDavg ?
Assume now that the PFD requirement (and thereby the SIL requirement) was
developed under the assumption that the HIPPS function is subject to full proof
test every 6 months, and no PST implemented (you may now calculate the PFDavg
using this assumption). Consider the simplified HIPPS architecture with 1oo2
voted pressure transmitter, a single logic solver, a single solenenoid valve, and a
single shutdown valve. The DU failure rate of the pressure transmitters is 5 10 6
per hour, the DU failure rate of the logic solver 1 10 7 per hour, and the DU
failure rate of the solenoid valve is 4 10 6 per hour. The beta factor for the
pressure transmitters is assumed 5%.
(b) Consider the two alternative implementations of PST introduced in Problem
2. How would you calculate the PFDavg for these two SIFs, when including the
effects of PST. You may use the PST coverage factor for the valve as calculated
in Problem 3(b) for both options, but you may want to make other assumptions
about the PST coverage factor of the solenoid valve.
(c) How much can you extent the full proof test, when PST is added (consider e.g.,
option 1), without comprimising the required PFDavg ? Would you recommend
this new interval? We assume that PST is carried out every month.
(d) Assume now that you consider two valves voted 1oo2 (rather than one valve
voted 1oo1). How would you calculate the PFDavg in this case (consider the
valves only, and not the rest of the SIF).
61
Chapter 12
Spurious Activation
– Spurious operation
– Spurious trip
– Spurious shutdown
What is the difference between these terms, and why may it be important to dis-
tinguish them?
(c) What do we mean by the spurious trip rate (STR), and what failure rates may
be included in the calculation of this measure?
(d) Give one example for how the STR formula is set up for a subsystem com-
prising three elements, voted e.g. 1oo3. Explain in each case, the different types
of contributions.
(e) Assume that you have identified value for ˇ for e.g. pressure transmitters.
Would you use this value for safe as well as for dangerous failures? Why or why
not?
62
Consider a HIPPS system comprising four pressure transmitters voted 2oo4,
one logic solver, and two shutdown valves voted 1oo2 located subsea.
(a) Consider the pressure transmitters and discuss the interpretation of the terms
spurious operation, spurious trip, and spurious shutdown in relation to these.
(b) It is suggested that both DD and spurious operation (SO)/safe failures may
result in spurious trips. Why do you think that DD failures are considered?
(c) What is the hardware fault tolerance (HFT) of the 2oo4 system with respect
to spurious trips (hint: A 2oo4 system means that 2-out-of-4 elements must carry
out the function in order for the SIF to be carried out. In relation to spurious
trips, the function is “to avoid spurious trips”, so the question should be: How
many spurious operation failures are tolerated without getting a spurious trip of
the SIF?) What is the HFT for a general koon system with respect to spurious
trips.
(d) Common cause failures may also be an issue with spurious trips, and we may
introduce ˇS for this purpose (considering just only spurious operation failures).
Why is it reasonable to assume that ˇS may be different from ˇ (for DU failures)
and ˇD (for DD failures). Give some examples, using either pressure transmitters
or shutdown valves as examples.
Assume that SO failure rates for pressure transmitters, logic solver, valves are
1 10 6 , 1 10 7 and 1 10 5 respectively. ˇS is set to 5% for all components.
Assume further that the downtime of a channel after an SO failure is 6 hours. For
the missing ones use input data from table 7.2 in textbook.
(e) Calculate the total spurious trip rate for the HIPP system, by consider SO
and DD failures only, but exclude the contribution from false demands. Indicate
the percentage contribution from the independent part and the CCF part for each
subsystem.
(f) Calculate the probability of having exactly 1 (spurious) failures during a period
of 5 years?
(g) How many spurious trips due to spurious operation of the valves will you,
on the average, experience in a period of 5 years, if SO D 1 10 5 ? Would
you find this result satisfactory? If you don’t, what could you recommend to the
engineering department? Some control question to base your discussions:
63
Would you recommend that a 2oo2 configuration was chosen instead, to
reduce the contribution from spurious trips?
64
Problem 4. Using Markov to calculate STR
Calculate the STR for the Markov transition diagram shown in Figure 12.1.
Use input data from table 7.2 in the textbook for the ones not given in problem 1.
65
Chapter 13
Uncertainty Assessment
Nothing yet.
66