Unit 2 Software Reliability (SR) PDF
Unit 2 Software Reliability (SR) PDF
A software failure is a software functional imperfection resulting from the occurrence(s) of defects or faults.
A software fault or bug is an unintentional software condition that causes a functional unit to fail to perform its
required function due to some error made by the software designers.
An error is a mistake that unintentionally deviates from what is correct, right, or true; a human action that
results in the creation of a defect or fault.
Types of time in Reliability
• Execution time: It is the processor’s actual time span on an input state run, i.e., the time which is spent on running
a test case of a specified run of system function.
• Calendar time: This type of time component is related to the number of days, weeks, months, or years the
processor spends on running a system function.
• Clock time: This time component is related to the elapsed time from the start of a computer run to its termination.
It is clear that the waiting and execution time of other programs is included in this component on a parallel
processor.
Note that system down time is not included in the execution and clock time component. It is important to know that
most of the software reliability models are based on the calendar time component as often the actual failure data
sets are defined on the calendar time component, but nowadays the execution time component is preferred in many
cases as it is accepted by most of the researchers that results are better with the execution time component. Even
then we need to relate the execution time component to the calendar time as this is more meaningful to the
software engineers and developers
Reliability estimation and Reliability prediction
Reliability assessment typically involves two basic activities—reliability estimation and reliability prediction.
Estimation activity is usually retrospective and determines achieved reliability from a point in the past to
present using the failure data obtained during system test or operation. The prediction activity usually
involves future reliability forecast based on available software metrics and measures. Depending on the
software development stage this activity involves either early reliability prediction using characteristics of
the software and software development process (case when failure data are unavailable) or
parameterization of the reliability model used for estimation and utilizes this information for reliability
prediction (case when failure data are available). In either activity, reliability models are applied on the
collected information, and using statistical inference techniques, reliability assessment is carried out
Software Versus Hardware Reliability
Need for Software Reliability
The reliability of many modern applications depends on the reliability of the software systems in them.
Consumer durables, such as toys microwave once, mobile phones, television receivers, etc. contain software and therefore
correct performance of both the hardware and software is essential to use them.
The reliability of software systems in safety-critical applications, such as nuclear power plants, space shuttles, etc. is quite
critical since the loss of mission can lead to catastrophic results.
Failure of consumer durables or PC systems can be annoying as well as affect productivity.
The reliability of the software system should therefore be adequate and commensurate with the needs of the user and
hence it is important to measure and quantify it.
• The reliability of hardware is improving due to maturity in design and manufacturing processes.
• • Reliability of software is a major area of concern since our discipline is still nascent. • Software-driven outages exceed
hardware outages by a factor of 10.
• • The ability to deliver reliable computer hardware can be considered to be given.
• Software reliability engineering is a major area of concern and focus in the 21st century.
• • According to Musa, “Software reliability is the most important in the most measurable aspect of software quality, and it is
very much customer-oriented.
• • The increasing global competition and high development costs have necessitated quantifying software quality in terms of
reliability achieved and to measure and control the level of quality delivered.
7. Model Classification
• The software reliability models are proposed based on a set of assumptions and hypotheses about
the failure pattern. Muas and Okumoto categorized the software reliability models based on the
following :
i. Time domain—wall clock versus execution time
ii. Type—depending on the probability distribution of the number of failures experienced by time
t.
Goel (1985) gave another type of classification of software reliability models which are as follows:
i. Time between (consecutive) failures models,
ii. Failure (fault count) models,
iii. Error seeding models, and
iv. Input domain based models.
Architecture-Based Models These models put emphasis on the architecture of the software and derive reliability estimates by combining
estimates obtained for the different modules of the software. The architecture-based software reliability models are further classified into
State-based models; Path-based models; and Additive models.
Software Reliability Growth Models These types of models capture failure behaviour of software during testing and extrapolates it to
determine its behaviour during operation using failure data information and observed trends deriving reliability predictions. The SRGM are
further classified as concave models and S-shaped models.
Input Domain-Based Models These models use properties of the input domain of the software to derive a correctness probability estimate
from test cases that executed properly.
Hybrid Black Box Models These models combine the features of input domain-based models and SRGM.
Hybrid White Box Models The models use selected features from both white box models and black box models. However, since the models
consider the architecture of the system for reliability prediction, these models are considered as hybrid white box models.
The early prediction and architecture-based models are together known as called as white box models which regard software as consisting of
multiple components, while software reliability growth models and input domain-based models are together known as black box models
which regard software as a single unit. Black box models are studied widely by many eminent research scholars and engineers.
Popstojanova and Trivedi [12] classified black box models as failure rate models, failure count models, static models, Bayesian models, and
Markov models. Most of the research work in software reliability modeling is done on failure count models, Bayesian models, and Markov
models
SRGM
• Software Reliability Growth Models (SRGMs) are meant to be used during the
software testing phase.
• They can be used during the operational phase also if faults causing failures are
removed immediately.
• The SRGMs are probabilistic in nature. Therefore, it may be important to
understand the underlying statistical principles governing SRGMs.
• In subsequent slides, we’ll understand the underlying statistical principles and
thereafter study some of the simple SRGMs proposed in the early days of software
reliability engineering.
Note that o(h) denotes a quantity having an order of magnitude smaller than h and it tends
to be zero for smaller values of h. i.e.
• Such a counting process is called the Non-Homogeneous Poisson Process (NHPP) with
intensity function λ(t).
• Thus, the hazard rate will decrease by a constant for each fault corrected.
• The publication of this model gave a new direction to the software reliability engineers to
take into account the faults corrected, to determine the hazard rate at any given time.
• The model also proved that hazard rate decreases with fault correction and thereby the
reliability grows. Therefore, it is possible to improve reliability of the software code by
testing it and removing faults causing failures.
Musa and Ackerman (1989) assert that software failure occurrence is a Poisson process and
failure intensity should decrease as testing proceeds due to detection and correction of
faults that cause failures.
• This model assumes that the faults are equally likely to cause failures. The model is
derivable from G-O model, but is based on execution time..
vii. There are also infinite failure models where the above assumption is not valid.
• It may be noted that the failure rate, which will hereinafter be called failure intensity
function or Rate of Occurrence of Failure (ROCOF); λ(t) is the rate of change of mean
value function or the number of failures per unit time and hence, it is the derivative of
mean value function with respect to time and is an instantaneous value. That is
• Note that i is a discrete number starting from 1. It can represent test case number or day 1,2...
or week 1,2,…
• The mean value function of the process μ(i), the expected number of defects detected is given
as follows:
• Example 2-
• Example 3-
Solution 2-
• Mostly all failures occurred with failure intensity at 3600s with b=0.002/seconds (small).
When b is high, fault detection is fast on account of more failures initially and hence
failure intensity function is higher. But at 3600sec, b=0.002/sec most faults have been
detected and hence failure intensity is lower than when b=0.0002.
• The failure intensity for the logarithmic Poisson model will never become zero as in the case of BET
model. It will keep decreasing exponentially. It will become zero only when infinite failures have been
observed. The quantity θ is called the failure intensity decay parameter.
• Suppose we plot the natural logarithm of failure intensity against mean failures experienced, then we
can see by transforming the above equation that the failure intensity decay parameter θ is the magnitude
of the slope of the line we have plotted. It represents relative change of failure intensity per failure
experienced. The mean value function of the M-O model is given as-
• Okumoto (1985) clarifies that the parameter θ may be related to the efficiency of testing and repair
activity and a larger value of θ implies a higher efficiency since the failure intensity reduces at a faster
rate.
•
a.
b.
c.
c.
• Hence, if data fits in the model, the plot of mean value function vs. (1 + t) will be
a straight line on a log-log log scale.
• Zhao and Xie (1992) have brought out that by substituting x=ln(I+t)
• This is the popular Duane model. They argue that Duane model did not receive
much attention since it was overestimating the cumulative number of failures
because its mean value function approaches infinity too fast. Thus, Duane model
is also an infinite failure model.
By Priya Singh (Assistant Professor, Dept of SE, DTU)
The Learning Phenomenon of the Testing Team
• Learning is said to occur if testing efficiency appears to be improving dynamically during the
progress of the testing phase (Pham et al., 1999) in SDLC.
• According to Kuo et al. (2001), the failure rate has 3 possible trends as time progresses
increasing, decreasing or constant.
• The reliability models proposed in the early period such as Jelinski-Moranda, Goel-Okumoto,
BET, and LPET etc. assumed that the failure rate is the highest at the start of defect removal
process such as system test and as and when defects are removed, the failure rate reduces.
This assumption may be valid for some projects, but not for all projects.
• In some testing projects, it has been observed that the failure rate first increases and then starts
decreasing (I/D) due to the learning phenomenon of the testing team. This occurs due to poor
efficiency of the testing team at the beginning of testing phase, but improves down the line
dynamically.
• The above model cannot address strictly decreasing failure intensity function (Xie
2000). The failure intensity function of the Yamada delayed S shaped model can
be derived by taking derivative of the above equation with respect to time as
follows:
• Example 2
• Solution 2
• The equation for failure intensity function of the Log-logistic model is as follows:
• It is to be noted that (N- μ(t)) is equal to the remaining number of faults yet to be corrected at a
given time and it will decrease with time as the fault removal process continues. Therefore if
h(t) is monotonically decreasing, λ(t) will also decrease monotonically.
• Since I/D pattern of failure intensity arises because the test team is not efficient in the early
stages, but pick up the efficiency later, the number of faults remaining will decrease faster in
the later stages.
• Therefore, although in the initial stages of testing λ(t) may be increasing due to dominance
of h(t), the trend will be reversed in the later stages due to faster reduction of faults.