0% found this document useful (0 votes)
48 views14 pages

What Is A Fault and Why Does It Matter Nouv2015

This document defines and discusses the concept of faults in software. It begins by noting that faults play a crucial role in software dependability but that existing definitions of faults are insufficient. It then proposes a new, more formal definition of a software fault as any part of a program that could be substituted to make the program more correct relative to its specification. This definition aims to address issues with prior definitions by making fault determination objective rather than subjective and by allowing verification that a fault has been removed. The document discusses using this definition of relative correctness to establish whether one program is more correct than another and thus identify faults. It concludes by assessing this new approach to defining and identifying software faults.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views14 pages

What Is A Fault and Why Does It Matter Nouv2015

This document defines and discusses the concept of faults in software. It begins by noting that faults play a crucial role in software dependability but that existing definitions of faults are insufficient. It then proposes a new, more formal definition of a software fault as any part of a program that could be substituted to make the program more correct relative to its specification. This definition aims to address issues with prior definitions by making fault determination objective rather than subjective and by allowing verification that a fault has been removed. The document discusses using this definition of relative correctness to establish whether one program is more correct than another and thus identify faults. It concludes by assessing this new approach to defining and identifying software faults.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1

What is a Fault? And Why Does It Matter?


Nafi Diallo, Wided Ghardallou, Jules Desharnais, Marcelo Frias, Ali Jaoua and Ali Mili

Abstract—Faults are an important concept in the study of system dependability, and most approaches to dependability
can be characterized by the way in which they deal with faults (e.g. fault avoidance, fault removal, fault tolerance, fault
forecasting, etc). In their seminal work on modeling dependable computing, Laprie et al. define a fault as the adjudged or
hypothesized cause of an error. In this paper, we propose a more formal definition of a fault in the context of software
products, and discuss the diverse implications of our definition.

Index Terms—Correctness, Partial Correctness, Total Correctness, Relative Correctness, Absolute Correctness, Software
Fault, Fault Removal, Fault Density, Software Testing, Software Repair, Software Design.

1 M OTIVATION AND BACKGROUND designer’s intent; clearly, this determination is only as


good as our assumption about the designer’s intent.
1.1 The Trouble with Faults
• Contingent determination. The same faulty behavior
This research stems from asking ourselves the question: of a software product may be repaired in more than
what is a (software) fault? We argue that a formal definition one way, possibly involving more than one part; hence
of faults is indispensable, given that faults play a crucial the determination that one part is a fault is typically
role in the study of software dependability, that they are the contingent upon the assumption that other parts are
basis of the classification of methods of dependability (fault not in question.
avoidance, fault removal, fault tolerance), and that they • Tentative determination. Usually, we determine that a
are at the center of several software engineering processes program part is faulty because we believe that if we
and metrics, such as fault density, fault proneness, fault could change it in some specific way, the program
forecasting, program repair, mutation testing, etc. In [2], would be better; but in the absence of a clear definition
[15], [16], [17] Laprie et al. define a fault as the adjudged of what it means for the program to be better, this
or hypothesized cause of an error [2]; we argue that, as determination is tentative.
far as software is concerned, this definition is not suffi-
ciently precise, first because adjudging and hypothesizing In order to overcome the difficulties raised above, we
are highly subjective human endeavors, and second because resolve to proceed as follows: We introduce a concept of
the concept of error is itself insufficiently defined, since relative correctness, i.e. the property of a program to be
it depends on a detailed characterization of correct system more correct than another program with respect to a spec-
states at each stage of a computation (which is usually ification [24]. Then we define a fault in a program as any
unavailable). Still, defining software faults is fraught with program part (be it a simple statement, a lexical token, an
difficulties: expression, a compound statement, a block of statements, a
set of non-contiguous statements, etc.) for which there exists
• Discretionary determination. Usually we determine a substitution that would make the program more-correct
that a program part is faulty because we think we know than the original with respect to a relevant specification [24].
what the designer intended to achieve in that particular With such a definition, we address the difficulties raised
part, and we find that the program does not fulfill the above, namely:

• N. Diallo and A. Mili are with NJIT, Newark NJ 07102-1982, USA.


• A Fault as an Intrinsic Attribute. The definition of
a fault is not dependent on any design assumptions,
• Wided Ghardallou is with University of Tunis El Manar, Tunisia but involves only the (incorrect) program, the faulty
program part, and the specification with respect to
• Jules Desharnais is with Laval University, Quebec, Canada
which correctness is defined.
• Marcelo Frias is with ITBA, Buenos Aires, Argentina • A Fault as a Definite Property. If we let a fault be any
program part that admits a substitution that makes the
• Ali Jaoua is with Qatar University, Doha, Qatar
program more-correct, then the designation of a fault
is no longer contingent on any hypothesis; we need of relation R is the relation denoted by R b and defined by2
not make any assumption on whether other parts of the Rb = {(s, s′ )|(s′ , s) ∈ R}. The domain of a relation R
program are faulty or not. is defined as the set dom(R) = {s|∃s′ : (s, s′ ) ∈ R}.
• Fault Removal as a Verifiable Process. By testing a A relation R is said to be reflexive if and only if I ⊆ R,
program for relative correctness rather than (tradi- antisymmetric if and only if (R∩ R) b ⊆ I , asymmetric if and
tional) absolute correctness, we can determine with b
only if (R ∩ R) = φ, and transitive if and only if RR ⊆ R.
greater confidence that a particular fault has been re- A relation is said to be a partial ordering if and only if it is
moved, regardless of whether that makes the program reflexive, antisymmetric, and transitive. Also, a relation R
correct or not (due to other residual faults). is said to be total if and only if I ⊆ RRb, and deterministic
In order to reap these benefits, we must first introduce b
(or, a function) if and only if RR ⊆ I . A relation R is said
a definition of relative correctness; this is the subject of to be a vector if and only if RL = R; we use vectors to
section 3. In preparation for this objective, we present some represent subsets of S .
mathematical definitions and notations, and some elements
of relations-based program semantics in section 2. In section
2.2 Relational Semantics
4, we consider in turn several properties that we would
want a concept of relative correctness to satisfy, and prove Given a program p on space S , we denote by P the function
that our definition does satisfy all of them. In section 5, that p defines on S , i.e. the set of pairs (s, s′ ) such that
we discuss the uses of the concept of relative correctness, if p starts execution in state S it terminates normally in
and its implications for relevant software processes. Once state s′ . In this paper we consider programs written in some
we know how to use relative correctness, the next issue we C-like programming language, to which we add the skip
address is: how do we establish relative correctness, i.e. how statement (which defines the identity relation on S ) and the
to build the case that a program is more-correct than another abort statement (which defines the empty relation).
with respect to a specification; this is the subject of section
6. Section 7 summarizes and assesses our findings, discusses 2.3 Refinement Ordering
related work, and sketches directions of future research.
The concept of refinement is at the heart of any program-
ming calculus; the following definition captures our concept
2 M ATHEMATICS FOR P ROGRAM A NALYSIS of refinement.
We assume the reader familiar with discrete mathematics, Definition 1. We let R and R′ be two relations on space S .
most notably relational algebra; this section introduces defi- We say that R refines R′ if and only if RL ∩ R′ L ∩
nitions and notations, but it is our assumption that the reader (R ∪ R′ ) = R′ .
is familiar with these concepts [4].
We write this relation as: R ⊒ R′ or R′ ⊑ R. Intuitively, R
refines R′ if R has a larger domain than R′ and has fewer
2.1 Relational Notations images than R′ inside the domain of R′ . It is easy to prove
Dealing with programs, we represent sets using a that ⊑ is a partial ordering; also, it is easy to prove that for
programming-like notation, by introducing variable names functions R and R′ , R ⊑ R′ if and only if R ⊆ R′ .
and associated data types. For example, if we represent set
S by the variable declarations { x : X; y : Y ; z : Z,}
then S is the Cartesian product X × Y × Z . Elements of 2.4 Refinement Lattice
S are denoted in lower case s, and are triplets of elements Since refinement is a partial ordering between specifica-
of X , Y , and Z . Given an element s of S , we represent its tions, it is legitimate to ponder its lattice-like properties. The
X -component by x(s), its Y -component by y(s), and its Z - following proposition, due to [3], provides a useful result
component by z(s). When no risk of ambiguity exists, we with regards to the lattice of specifications.
may write x to represent x(s), and x′ to represent x(s′ ). Proposition 1. Any two specifications R and R′ that satisfy
A (binary) relation on S is a subset of the Cartesian the following condition (R ∩R′ )L = RL ∩R′ L (called
product S × S . Special relations on S include the universal the consistency condition) admit a least upper bound,
relation L = S × S , the identity relation I = {(s, s′ )|s′ = denoted by R⊔R′ (R join R′ ) and defined by: R⊔R′ =
s}, and the empty relation φ = {}. Operations on relations (R′ L ∩ R) ∪ (RL ∩ R′ ) ∪ (R ∩ R′ ) .
(say, R and R′ ) include the set theoretic operations of
union (R ∪ R′ ), intersection (R ∩ R′ ), and complement Interpretation: The consistency condition between two spec-
(R). They also include the relational product, denoted by ifications is the condition under which the specifications
(R ◦ R′ ), or (RR′ , for short) and defined by: RR′ = admit a joint refinement; the join of two specifications
{(s, s′ )|∃s′′ : (s, s′′ ) ∈ R ∧ (s′′ , s′ ) ∈ R′ }. The converse represents the sum of all the requirements of each term.
R  XXP P′  3
3 A BSOLUTE C ORRECTNESS AND R ELATIVE 0
: 0 0 X z 0 0 :

0

XX
: XX XX :
:
C ORRECTNESS 1 XXX 1 1 X z 1 1 1
X

XXz
:

XX XX XX 
:
2 2 2 X z 2 2 2
X
 z XX XX
Whereas absolute correctness characterizes a program with
X :

XX 

3 X 3 3 X z 3 3 3
X
 z XX XX
respect to a specification, relative correctness ranks two
4 X :

 z 4 4 4 4  : 4
programs with respect to a specification. XXX X z
XX XX  
5 5 5 XXX 5 5  : 5
6 6 6 z 6 6  6
3.1 Absolute Correctness
Definition 2. Let p be a program on space S and let R be a Fig. 1. Enhancing Correctness Without Imitating Behavior
specification on S . We say that program p is correct with
respect to R if and only if P (the function of program p P5
HH -
P7 -P6
on space S ) refines R. We say that program p is partially @ 
HH @
 
correct with respect to specification R if and only if P HH @
 HH @
refines R ∩ P L.
  HH
@
This definition is consistent with traditional definitions of P3


?
HH
R
@
jP
H?4

partial and total correctness [7], [11], [12], [13], [23]. When- HH 
ever we want to contrast correctness with partial correctness, HH
we may refer to it as total correctness.
 HH
  HH
P1


? jP
H
-?2
Proposition 2. (Due to Mills et al. [26]). Program p is
H 
correct with respect to specification R if and only if HH 
(P ∩ R)L = RL. HH 
j
H
P0
3.2 Relative Correctness
Fig. 2. Relative Correctness Relations
Definition 3. Let R be a specification on space S and let p
and p′ be two programs on space S whose functions are
respectively P and P ′ . We say that program p′ is more- We consider the following candidate programs, denoted p0
correct than program p with respect to specification R through p7 . Next to each program pi , we represent its com-
(denoted by: P ′ ⊒R P ) if and only if: (R ∩ P ′ )L ⊇ petence domain (CDi ). Figure 2 shows how these candidate
(R∩P )L. Also, we say that program p′ is strictly more- programs are ranked by relative correctness with respect
correct than program p with respect to specification R to R; this graph merely reflects the inclusion relationships
(denoted by: P ′ ⊐R P ) if and only if (R ∩ P ′ )L ⊃ between the competence domains.
(R ∩ P )L.
p0 : {x=1; y=-1;}. CD0 = {}.
Whenever we want to contrast correctness (given in Def- p1 : {x=2*x; y=0;}. CD1 = {s|x = 0}.
inition 2) with relative correctness, we may refer to it as p2 : {x=x*x; y=0;}. C2 = {s|x = 0}.
absolute correctness. Note that when we say more-correct p3 : {x=2*x; y=1;}. CD3 = {s|0 ≤ x ≤ 2}.
we really mean more-correct or as-correct-as; we use the p4 : {x=2*x; y=2;}. CD4 = {s|x = 0 ∨ 2 ≤ x ≤ 4}.
shorthand, however, for convenience. We give a simple in- p5 : {x=2*x; y=x/2;}. CD5 = S.
tuitive interpretation of this definition: The relation (actually p6 : {y=x/2; x=2*x;}. CD6 = S.
a vector) (R ∩ P )L represents the set of initial states for p7 : {x=x*x; y=2;}. CD7 = S.
which program p behaves according to the requirements
of specification R (we refer to this set as the competence This example illustrates a number of properties:
domain of program p with respect to specification R); so
• Note that this relation is not antisymmetric; so that two
that to be more-correct merely means to have a larger com-
programs may be mutually related and still be distinct
petence domain. see Figure 1. In this figure, the competence (such is the case for P1 and P2 ).
domains of P and P ′ are, respectively, CD = {1, 2, 3, 4} • The top of the graph represents the programs that are
and CD′ = {1, 2, 3, 4, 5}. Hence p′ is more-correct than p (absolutely) correct with respect to specification R:
with respect to R.
P5 , P6 and P7 .
To illustrate this definition, we consider the space S • A program may be more-correct than another without
defined by two integer variables x and y , and we let R be imitating its correct behavior. For example, p3 is more-
the following specification on S : correct than p1 and yet it does not behave as p1 on the
R = {(s, s′ )|x2 ≤ x′ y ′ ≤ 2x2 }. competence domain of p1 .
4
3.3 Faults and Fault Removal p: {s|q ∈ listhαa ∪ ν ∪ σi}.
Definition 4. Let p be a program on space S and R be a p01 : {s|q ∈ listh(αA \ {′ Z ′ }) ∪ αa ∪ ν ∪ σi}.
specification on S , let f be a program part of p. We say p10 : {s|q ∈ listhαa ∪ ν ∪ σi}.
that f is a fault if and only if there exists a substitution p11 : {s|q ∈ listhαA ∪ αa ∪ ν ∪ σi}.
f ′ of f such that the program p′ obtained from p by By comparing the competence domains, we draw the fol-
substituting f by f ′ is strictly more-correct than p with lowing conclusions:
respect to R. • The statement (let+=2) is a fault in p, and its sub-
Definition 5. Let p be a program on space S and R be stitution by (let+=1) is a fault removal, yielding the
a specification on S , let f be a fault in p, and let f ′ more-correct program p01 .
be a substitute for f . We say that the pair (f, f ′ ) is a • The statement (’Z’>c) is a fault in p01 , and its
(monotonic) fault removal if and only if the program substitution by (’Z’>=c) is a fault removal, yielding
p′ obtained from p by substituting f by f ′ is strictly the more-correct program p11 .
more-correct than p. • The program part defined by the two statements

For illustration, we consider the following program, say (let+=2) and (’Z’>c) is a fault in p, and its substi-
p, taken from [8] (with some modifications): tution by (let+=1) and (’Z’>=c) is a fault removal,
#include <iostream> ... ... ... // line 1 yielding the more-correct program p11 .
void count (char q[]) // 2 • The program p11 is correct with respect to R.
{int let, dig, other, i, l; char c; // 3 Note that the statement (’Z’>c) is a fault in p01 but it is
i=0;let=0;dig=0;other=0;l=strlen(q); // 4
while (i<l) { // 5 not a fault in p; also note that the statement (’Z’>c), in
c = q[i]; // 6 combination with the statement (let+=2) is a fault in p,
if (’A’<=c && ’Z’>c) let+=2; // 7 but it is not a fault in p by itself.
else // 8
if (’a’<=c && ’z’>=c) let+=1; // 9
else //10 4 VALIDATION OF R ELATIVE C ORRECTNESS
if (’0’<=c && ’9’>=c) dig+=1; //11
else //12 4.1 Litmus Tests
other+=1; //13
i++;} //14 How do we know that our definition of relative correctness
printf ("%d %d %d\n",let,dig,other);} //15 is sound? To answer this question, we list some properties
that a definition of relative correctness ought to meet; then
We let S be the space defined by the declarations of line
we check that our definition does satisfy them.
3, to which we add variable os which represents the output
stream (in C++ parlance), and we let R be the following • Reflexivity and Transitivity, and non-Antisymmetry. Of

specification: course, we want relative correctness to be reflexive and


transitive; we do not want it to be antisymmetric, since
R = {(s, s′ )|q ∈ listhαA ∪ αa ∪ ν ∪ σi we want to have programs that are mutually more-
correct, yet distinct (not only syntactically distinct, but
∧os′ = os ⊕ #α (q) ⊕ #ν (q) ⊕ #σ (q)}
computing different functions as well).
where we let αA =′ A′ . . .′ Z ′ , αa =′ a′ . . .′ z ′ , ν =′ • Absolute Correctness as the Culmination of Relative
0′ . . .′ 9′ , and σ ={set of ascii symbols}. Also, we let ⊕ Correctness. Relative correctness ought to be defined
denote the concatenation, we let listhT i denote the set of in such a way that if a program keeps getting more
lists of type T, and we let #A , #a , #ν and #σ be the and more-correct with respect to a specification, it will
functions that to each list l assigns (respectively) the number eventually be (absolutely) correct.
of upper case alphabetic characters, lower case alphabetic • Relative Correctness as a Sufficient Condition, but
characters, numeric digits and symbols; also, we let #α be not a Necessary Condition, of Higher Reliability. If
defined as #α (l) = #a (l) + #A (l). We introduce the program p′ is more-correct than program p, then of
following programs, which are derived from p by some course we want p′ to be more reliable than p; but
modifications of its source code: we do not want more-correct to be equivalent to more
p01 The program obtained from p when we replace reliable, as the former is a logical/functional property,
(let+=2) by (let+=1). whereas the latter is a stochastic property.
p10 The program obtained from p when we replace • Refinement is equivalent to Relative Correctness with
(’Z’>c) by (’Z’>=c). respect to any Specification. When program p′ refines
p11 The program obtained from p when we replace program p, we interpret this to mean that whatever p
(let+=2) by (let+=1) and (’Z’>c) by (’Z’>=c). can do, p′ can do as well or better; in particular, it
We find the following competence domains for these pro- means that p′ is more-correct than (or as-correct-as) p
grams: with respect to any specification R.
5
4.2 Passing the Tests 4.2.4 Relative Correctness and Refinement
4.2.1 Reflexivity, Transitivity, and non-Antisymmetry The following proposition casts relative correctness as a
form of pointwise refinement.
Program p′ is more-correct than program p if and only Proposition 5. Let p and p′ be programs on space S . Then
if (R ∩ P ′ )L ⊇ (R ∩ P )L. Transitivity and reflexivity p′ refines p if and only if p′ is more-correct than p with
stem readily from the definition, as does non-antisymmetry: respect to any specification R on S .
Indeed, two functions P and P ′ may satisfy (R ∩ P )L =
(R ∩ P ′ )L while P and P ′ are distinct. Consider R = Proof. Proof of necessity: We have seen in section 2.3 that
{(0, 1), (0, 2)}, P = {(0, 1)} and P ′ = {(0, 2)}. if P and P ′ are two functions then P ′ refines P if and only
if P ′ ⊇ P . The condition (P ′ ∩ R)L ⊇ (P ∩ R)L stems
readily, by set theory.
4.2.2 Absolute Correctness as the Culmination of
Proof of sufficiency: Let p′ be more-correct than p with
Relative Correctness
respect to any specification R on S . Then p′ is more-correct
Proposition 3. Let R be a specification on space S and let than p with respect to specification R = P . This can be
p be a program on S . Then p is correct with respect to written as: (P ∩ P ′ )L ⊇ (P ∩ P )L, which we simplify
R if and only if p is more-correct with respect to R than as: (P ∩ P ′ )L ⊇ P L. On the other hand, we have, by
any candidate program on S . construction, (P ∩P ′ ) ⊆ P . Combining the two conditions,
we obtain: (P ∩ P ′ ) = P , from which we infer (by set
Proof. Proof of necessity: Let p′ be correct with respect to theory) P ′ ⊇ P and, by the remark above, P ′ ⊒ P . qed
R; then, according to proposition 2, RL = (R ∩ P ′ )L. Let
p be an arbitrary program on space S ; by set theory, we have We write this as:
RL ⊇ (P ∩ R)L. Hence p′ is more-correct with respect to P ′ ⊒ P ⇔ (∀R : P ′ ⊒R P ) .
R than p.
Proof of sufficiency: Let p′ be more-correct with respect
to R than any candidate program p on S . Let p′′ be a correct 5 I MPLICATIONS AND A PPLICATIONS
program with respect to R; then (R ∩ P ′′ )L = RL. Since 5.1 Measuring Faultiness
p′ is more correct with respect to R than p′′ , (R ∩ P ′ )L ⊇
(R ∩ P ′′ )L, hence (R ∩ P ′ ) ⊇ RL, which is equivalent to A naive interpretation of fault density in a program views
(R ∩ P ′ )L = RL since the inverse inclusion is a tautology. the faults in a program as if they were black balls in a
qed bucket full of otherwise white balls: they are intrinsically
identifiable (black vs. white), their number is well-defined,
they are independent of each other (removal of one ball does
We write this as: not change the color of the others), they can be removed
P ′ ⊒ R ⇔ (∀P : P ′ ⊒R P ) . in an arbitrary order, and whenever one is removed, their
number is reduced by one. In this section we see to what
extent this analogy is unfounded: unlike black balls in a
4.2.3 Relative Correctness and Reliability bucket of white balls, faults can be viewed at different levels
of granularity; they are highly inter-related; removal of one
Proposition 4. Let p and p′ be two programs on space S
fault may affect the nature, number and location of other
and let R be a specification on S . If program p′ is more-
faults; a fault may need to be corrected at more than one
correct than program p with respect to specification R
location; the same fault may be corrected in more than one
then p′ is more reliable than p.
way; and the order in which faults are removed matters, as
does the way faults are removed.
Proof. Let θ be a probability distribution of the input
space dom(R); the probability that a random execution of a 5.1.1 Elementary Faults
candidate program p on an element of dom(R) succeeds We consider program p given in section 3.3, and we re-
is the integral of θ over the competence domain of p; member that we have found it to have two faults: First
clearly, the larger the competence domain, the higher the the statement {let+=2}; second the combination of two
probability. qed statements {let+=2, ’Z’>c} (remember, {’Z’>c} is
a fault in p01 but is not a fault in p). This inspires the
We write: following definition.
R R
P ′ ⊒R P ⇒ dom(R∩P ′ ) θ(s)ds ≥ dom(R∩P ) θ(s)ds . Definition 6. Let f be a fault in program p on space S with
respect to specification R on S . We say that f is an
6
elementary fault in p if and only if no part of f is a fault depends on which of the ten faults we have removed, and
in p with respect to R. on how we have removed it. So that whereas we want to
think of fault density as a measure of program faultiness/
Hence if we are going to count faults, we need to count
imperfection or as a measure of repair effort, we can argue
elementary faults rather than arbitrarily large faults.
that it measures neither imperfection nor repair effort; we
5.1.2 Multi-Site Faults consider an alternative.
The foregoing discussion about elementary faults may cre- Definition 7. We let R be a specification R on space S and
ate the impression that elementary faults are merely single- p be a program on space S . The fault depth of program p
site faults, i.e. faults that involve a single statement or a with respect to specification R is the minimal number of
single contiguous block; this is not so. We consider the elementary fault removals that are required to transform
following space S , specification R, and program p: p into a correct program.
S : {x: float; i: int; a: array [0..N] If program faults were like black balls, then density and
of float;}. depth would be identical: if we have ten black balls, it takes
PN
R: {(s, s′ )|x′ = i=1 a[i]}. ten removals to get rid of them; but faults are different. We
p: {x=0; i=0; while (i <= N-1) consider below a case where several faults can be removed
{x=x+a[i]; i=i+1;}} by a single fault removal, and a case where a single fault can
We compute the function of this program, then its compe- be remedied multiple times before the program is correct.
tence domain with respect to R, and we find: We consider the array sum program discussed in the
dom(R ∩ P ) = {s|a[0] = a[N ]}. previous section (section 5.1.2). We had identified a sin-
Since dom(R ∩ P ) is not equal to dom(R), which is S , this gle elementary multi-site fault, which is the aggregate
program is not correct. One way to correct this program is f1:({i=0}, {i<=N-1}); we argue that there is another
to change {i=0} to {i=1} and to change {i<=N-1} to possible fault in this program, namely f2:{x=x+a[i]}.
{i<=N}. The question that we raise here is: do we have Indeed, this statement admits a substitution, namely
two faults here ({i=0} and {i<=N-1}) or just one fault f2’:{x=x+a[i+1]}, that would make the program more
that spans two sites? To answer this question we consider correct (as the reader can easily see). If we remove f1 by
the proposed substitutions and check whether they produce replacing it with f1’:({i=1}, {i<=N}), then we find a
more-correct programs. We let p01 be the program obtained correct program, hence f2 is no longer a fault; if we remove
from p by replacing {i=0} by {i=1}, we let p10 be f2 by replacing it with f2’, then we find a correct program,
the program obtained from p by replacing {i<=N-1} by hence f1 would no longer be a fault. But if we substitute
{i<=N}, and we let p11 be the program obtained from p by both f1 and f2 by, respectively, f1’ and f2’, we would
performing the two substitutions simultaneously; we find end up with an incorrect program, that has two faults, f1’
the following competence domains for these programs. and f2’. Hence program p has a fault depth of 1 and a
dom(R ∩ P01 ) = {s|a[N ] = 0}. fault density of 2. Note, incidentally, that having two faults
dom(R ∩ P10 ) = {s|a[0] = 0}. means that we have two distinct opportunities to enhance the
dom(R ∩ P11 ) = S . correctness of the program. More generally, for a given fault
Since the competence domain of p is not a subset of depth, the higher the fault density the better; hence, not only
the competence domains of p01 and p10 , neither p01 nor is fault density not representative of program imperfection,
p10 is more-correct than p. We extrapolate: neither {i=0} it can actually be seen as representing a quality attribute of
nor {i<=N-1} is a fault in p, but program part ({i=0}, the program.
{i<=N-1}) is a fault in program p with respect to R, since The following example shows a case where a fault
program p11 is more-correct than p with respect to R. We removal makes the program strictly more-correct without
say that this is a multi-site fault; in this example we do not changing its fault density; it is an artificial example, but is
have two faults but a single multi-site elementary fault. illustrative nevertheless. We consider the following space S ,
specification R, and propgram p (where N ≥ 2):
5.1.3 Fault Density and Fault Depth S : {int i; float a[0..N];}
We use the term fault density to refer to the number of faults R: {(s, s′ )|∀j : 0 ≤ j ≤ N : a′ [j] = 0}.
in a program. The trouble with counting faults in a program p: {i=2; while (i<=N) {a[i]=0; i=i+1;}}
is that the number of faults in a program violates a simple Clearly, the domain of R is S . We compute the competence
rule that counting any other commodity satisfies: if we have domain of p, and we find:
ten black balls in a bucket of otherwise white balls, and we dom(R ∩ P ) = {s|a[0] = 0 ∧ a[1] = 0}.
remove one black ball, we are left with nine black balls. Hence p is not correct with respect to R. We can check
But if we have ten faults in a program and we remove one easily that {i=2} is a fault in p with respect to R, and we
fault, the number of remaining faults is undetermined: it show that the substitution of {i=2} by {i=1} produces a
7
more-correct program: Specifica-
p′ : {i=1; while (i<=N) {a[i]=0; i=i+1;}}, tion R
whose competence domain is:

dom(R ∩ P ′ ) = {s|a[0] = 0}.
Even though p′ is more-correct than p, it is still not correct, ⊑

since its competence domain is not equal to dom(R). We
find that {i=1} is a fault in p′ , and that substituting  


Imperfect
it by {i=0} yields a program, say p′′ , which is correct Correctness

 
Design Preserving
with respect to R. Even though it is grossly artificial, this Refinements

example shows that the same fault may require more than
one removal to be completely eliminated from a program. ⊑
Hence we adopt fault depth as a measure of program
?⊑
Incorrect  -R Correct
faultiness. Unlike fault density, fault depth does decrease by ⊑R ⊑R ⊑R ⊑R ⊑

 
one whenever we remove a fault that is in the minimal path.
Given a faulty program p and a program p′ obtained from Program Correctness Enhancing Repairs Program
p by monotonic fault removal; if the fault removal is in a
minimal sequence to a correct program, then we can write:
Fig. 3. A Framework for Monotonic Fault Removal
depth(p) = 1 + depth(p′ ).
it is wrong to test the program for correctness at the end
5.2 Monotonic Fault Removal of this process, unless we have reason to believe that the
As programmers, we are all familiar with the frustration fault we have just removed is the last fault of the program.
of trying to remove faults from a program, only to find Given that in general we have no way to check such an
that we are running in circles, patching the program at asumption, there is no reason we should expect the program
one end only to break it down at another. This would to be correct, even if we assume that the fault was properly
not happen if we restricted program transformations to removed. Instead, the most we can hope for is that the new
provably monotonic fault removals; with such a discipline, program is more-correct than the original, and we should
we are assured that with each transformation, the program be testing it for relative correctness rather than absolute
becomes more-correct. Of course, ensuring that a program correctness.
transformation qualifies as a monotonic fault removal is
generally a non-trivial exercise; we postpone the discussion 5.4 Software Repair
of how to do this to section 6. Here we simply argue
that in the same way that stepwise refinement provides Most techniques for program repair [1], [5], [9], [10],
a logical framework for software design, which proceeds [29], [31] proceed by applying transformations on an
monotonically from a specification to a program through original faulty program. These transformations may be
correctness-preserving transformations, relative correctness macro-transformations (including multi-site program mod-
provides a logical framework for stepwise fault removal, ifications), or micro-modifications (intra-statement) using
which starts from an incorrect program and proceeds mono- mutation operators as those provided by the muJava [22]
tonically towards a correct program through correctness- program mutation tool. Two main approaches exist towards
enhancing transformations. This process is illustrated in assessing the suitability of the generated transformations:
Figure 3. The concept of relative correctness ought to play test-based techniques [1], [5], [10], [29] (which use the
for software fault removal the same role that refinement successful execution of the candidate program on a test
plays for software design: first, as a logical framework for suite as the acceptance criterion) or specification-based
reasoning about faults and fault removal; second, as an techniques [9], [31] (which use a specification and some
ideal process to be followed scrupulously when the stakes sort of constraint-solving to determine if the new code
warrant it; and third, as a yardstick against which large scale complies with the specification). Both techniques share the
methods can be evaluated. following feature: repair candidates that fail in some test, or
that do not comply with the specification, are discarded and
new candidates are examined, while those that succeed are
5.3 A Software Testing Lifecycle considered as potential fixes.
The traditional lifecycle of software testing is triggered by We argue that when mutants are evaluated on the basis
an observation of failure and proceeds by analyzing the of their execution on a sample test data T , both the decision
failure, tracing it back to a hypothetical fault, removing the to retain successful mutants and the decision to reject
fault, then testing the program for correctness. We argue that unsuccessful mutants, are wrong:
$ once, when we do not know how each fault removal affects
'
8

'$

others.

' $
Multiple mutations are deployed to repair multi-site
   
elementary faults. In this case, it is sensible to deploy
multiple mutations, but note that the mutiplicity of the
mutation is not the estimated number of faults we are
 T  T
& %
trying to repair simultaneously but rather the multiplicity

 
CD ′
of the multi-site elementary faults we are trying to repair
&CD
%
CD individually, usually a much smaller number.
& CD %

(a) (b)
5.6 Software Design

Fig. 4. Poor Precision, Poor Recall In section 4.2.4, we have found that program P ′ refines
program P if and only if P ′ is more-correct than P with
respect to any specification. This sheds new light on pro-
• As Figure 4(a) shows (if CD is the competence do- gram derivation by successive refinements, which requires
main of the original program and CD ′ is the compe- that at each stage of this process, we transform a program
tence domain of the mutant), a mutant may pass the into a more-refined program. According to our discussion
test T (since T ⊆ CD ′ ) yet is not more-correct than of section 4.2.4, this process requires that at each stage,
the original (since CD is not a subset of CD ′ ). we transform a program, say P , into a program P ′ that
• As Figure 4 (b) shows, a mutant may fail the test T is more-correct than P with respect to any specification.
(since T is not a subset of CD ′ ) and yet still be more- But this raises the question: why should P ′ be more-
correct than the original (since CD is a subset of CD ′ ). correct than P with respect to any specification when we
As a result, neither the precision nor the recall of the are only interested in specification R? Is it possible that
selection algorithm is assured. the requirement of refinement is too strong? To explore
this venue, we revisit the process depicted in Figure 3,
and imagine that instead of designing a (possibly incor-
5.5 Multiple Mutation rect) program then proceeding with correctness enhancing
transformations towards a correct program, we start with the
Debroy and Wong [29] use a single muJava mutation in
(trivially incorrect) abort program, and transform it into
order to generate fix candidates. A clear limitation of such
increasingly more-correct (rather more-refined) programs
an approach is that many faults will not be fixed; this
until we find a correct program. As an illustration of this
happens in the case of multi-site faults (that span through
process, we briefly present an example borrowed from [6]
more than one program location), as well as whenever the
(to which the interested reader is referred for further details).
program under analysis has multiple faults. The natural
We let S be the space defined by natural variables x, y and
alternative is to apply multiple mutations. This is the case
n, and we let R be the following specification (known as
in tools such as those presented in [9] and [31]. The impact
Fermat’s factorization):
of relative correctness on multiple mutation testing depends
on the reason for deploying multiple mutations; we see two R = {(s, s′ )|n = x′2 − y ′2 ∧ 0 ≤ y ′ ≤ x′ }.
possible scenarios, which we will discus in turn.
Multiple mutations are deployed to repair multiple To find a program that is correct with respect to this specifi-
faults. When one uses a test of absolute correctness to cation, we consider increasingly complex configurations of
assess the validity of program repairs, one has to remove x and y , and derive the corresponding Fermat factorization;
all faults at once in order for the test to be meaningful. this yields the following sequence of programs, which are
Multiple mutation proceeds by applying mutation operators ranked by relative correctness with respect to R (though
at different places in the program then testing the resulting not by refinement), and culminate in a program that is
program for absolute correctness. We argue that with rela- absolutely correct with respect to R.
tive correctness, it is no longer necessary to consider several p0 : abort.
faults at once, since we can characterize fault removals one p1 : {int r; x=0; y=0; r=0;
fault at a time. Managing faults one a time offers many while (r<n) {r=r+2*x+1; x=x+1;}}
advantages: First and foremost, it spares us the massive p2 : {int r; x=0; r=0;
combinatorial explosion that stems from applying several while (r<n) {r=r+2*x+1; x=x+1;}
simultaneous mutations through the program; second, it if (r>n) {y=0; while (r>n)
spares us the trouble of dealing with many fault removals at {r=r-2*y-1; y=y+1;}}}
9
p3 : {int r; x=0; r=0; Hence the oracle of relative correctness, Ω(s, s′ ), should be
while (r<n) {r=r+2*x+1; x=x+1;} written as follows:
while (r>n) {int rsave=r; y=0;
while (r>n) {r=r-2*y-1; y=y+1;}
Ω(s, s′ ) ≡ (ω(s, P (s)) ⇒ ω(s, s′ )).
if (r<n) {r=rsave+2*x+1; x=x+1;}}} This formula shows how to derive the oracle of relative
Imagine a scenario where our goal is not necessarily to pro- correctness (Ω) from the oracle of absolute correctness (ω );
duce a correct program, but rather to produce a sufficiently in [25] we discuss how to derive the oracle of absolute
reliable program, for a prespecified reliability threshold. correctness (ω(s, s′ )) from specification R.
Now, consider that, according to proposition 4, relative
correctness logically implies higher reliability. Hence the 6.2 Proving Relative Correctness
programs that we generate in this sequence are more and
6.2.1 Relative Correctness of Iterative Programs
more reliable; if we can estimate the reliability of each pro-
gram that we generate in this sequence, then we can imagine In principle, we can prove a program p′ more-correct than
a scenario where this stepwise transformation concludes, a program p with respect to specification R by computing
not when we obtain a correct program, but rather when P and P ′ then comparing dom(R ∩ P ) and dom(R ∩ P ′ );
we obtain a program whose reliability equals or exceeds of course, in practice this is usually very difficult. In this
the pre-specified threshold. While we have not yet proven section, we briefly explore some preliminary results. We
the viability of this approach, it certainly sounds like a consider a while loop w on space S , of the form {while
worthwhile venue to pursue; as an exercise, we have found (t) {b}}, and we denote by B the function of the loop
that under the hypothesis of uniform probability distribution body b and by T the vector that represents the loop condition
of the inputs, the reliability of the sequence of programs T = {(s, s′ )|t(s)}. An invariant relation of loop w is a
given above (p0 , p1 , p2 , p3 ) is, respectively, (0.0, 0.0133, reflexive transitive superset of (T ∩B); the interested reader
0.1328, 1.0). is referred to [27] for more details on invariant relations.
The following proposition (due to [21]) shows how we
can use invariant relations to prove the correctness or the
6 E STABLISHING R ELATIVE C ORRECTNESS incorrectness of a loop with respect to a specification.
Proposition 6. Let R be a specification on space S , let w be
Given a program p and a specification R, how can we
a while statement on S of the form w: {while (t)
determine whether some program p′ is or is not more-
{b}}, which terminates normally for any state in S , and
correct than p with respect to R?
let V be an invariant relation of w.
Sufficient Condition of Correctness: If V satisfies the
6.1 Testing for Relative Correctness following condition V T ∩ RL ∩ (R ∪ V ∩ T b ) = R,
then w is correct with respect to R.
How do we test a program for relative correctness over
Necessary Condition of Correctness: If w is correct
another with respect to a given specification? How is that
with respect to R then the following condition holds
different from testing the program for absolute correctness?
for invariant relation V : (R ∩ V )T = RL.
We argue that testing a program for relative correctness
rather than absolute correctness affects two separate aspects Intuitive interpretation: The sufficient condition of correct-
of testing, namely test data generation and oracle design. ness means in effect that the invariant relation V captures
Test Data Generation. The essence of test data gen- enough information about the loop to subsume the spec-
eration is to approximate an infinite or very large input ification R; the necessary condition of correctness means
space by a small representative test data set; clearly, what that no loop that admits an invariant relation that violates
input space we are trying to approximate influences what this condition can possibly be correct with respect to R. If
test data we select, regardless of the selection criterion that we encounter an invariant relation V that does not satisfy
we apply. When we test a program for absolute correctness the necessary condition of correctness, we conclude that the
with respect to a specification R, the relevant input space is loop w is not correct with respect to specification R. We
dom(R). By contrast, when we test a program for relative say about such invariant relations that they are incompatible
correctness over program p with respect to specification R, with specification R; when a relation does satisfy the
the relevant input space is dom(R ∩ P ). necessary condition of correctness, we say about it that
Oracle Design. Let ω(s, s′ ) be the oracle that we use to it is compatible with the specification, even though not
test a program for absolute correctness with respect to speci- incompatible is a better characterization of such a relation.
fication R. To test a program p′ for relative correctness over Given a while loop w and a specification R, we generate
program p, we need to check that oracle ω(s, s′ ) holds only all the invariant relations of w and we divide them into two
for those inputs s on which program p runs successfully. classes: compatible relations and incompatible relations. If
10
at least one relation (say Q) is incompatible with specifica- to modify x1 and x2 in the loop, we write the following
tion R, then we conclude that the loop is incorrect, and we condition on x1 , x2 , x′1 , x′2 :
prepare to repair it; the following proposition provides the  
x1 x′1
basis for doing so. ′
 x x 
∃x3 , x4 , ...xn , x′3 , x′4 , ...x′n : h 2 i, h 2 i ∈ C,
Proposition 7. Let R be a specification on space S and ... ...
let w be a while loop on S of the form, w: {while xn x′n
(t) {b}} which terminates for all s in S . Let Q be
where C is the intersevtion of all the compatible invariant
an invariant relation of w that is incompatible with R;
relations.
and let C be the largest invariant relation of w such that
Validation. Once we change variables x1 and x2 , we
W = (C ∩ Q) ∩ Tb . Let w′ be a while loop that has recompute the new invariant relation Q′ involving these
C as an invariant relation, terminates for all s in S , and variables; if Q′ is compatible with R, then the new loop
admits an invariant relation Q′ that is compatible with is strictly more-correct than the original loop.
R and satisfies the condition W ′ = (C ∩ Q′ ) ∩ Tb . Then This process enables us to remove a fault without
w′ is strictly more-correct than w. testing.

Interpretation: This proposition provides that if we change 6.2.2 Illustration


the loop in such a way as to replace an incompatible invari- We consider the following loop, taken from a C++ financial
ant relation (Q) with a compatible invariant relation (Q′ ) of application, where all the variables except t (of type int)
equal strength (so that ((C ∩ Q′ ) ∩ Tb ) is deterministic, just are of type double, and where a and b are positive
b )), while preserving all the other invariant constants.
as ((C ∩ Q) ∩ T
w: while (abs(r-p)>ups) {t=t+1; n=n+x; m=m-1;
relations (C ), then we obtain a more-correct (though not l=l*(1+b);k=k+1000;y=n+k;w=w+z;z=(1+a)+z;
necessarily correct) while loop. v=w+k; r=(v-y)/y; u=(m-n)/n; d=r-u;}
Proof. By hypothesis, Q is incompatible with R, hence we We consider the following specification, which we are
write: judging the loop against:
(R ∩ Q)T 6= RL ′
⇒ {by set theory (R ∩ Q)T ⊆ (R ∩ Q)L ⊆ RL} 1 − (1 + a)t −t
R = {(s, s′ )|b < a < 1∧x′ = x∧w′ = w−z×
(R ∩ Q)T ⊂ RL a
⇒ {By hypothesis, Q′ is compatible} ∧k′ = k + 1000 × (t′ − t) ∧ t ≤ t′ ∧ 0 < l ≤ l′ ∧ z > 0
(R ∩ Q)T ⊂ (R ∩ Q′ )T ′

⇒ {Taking the intersection with C on both sides} ∧l × (1 + b)−t = l′ × (1 + b)−t }.


(R ∩ Q ∩ C)T ⊂ (R ∩ Q′ ∩ C)T Analysis of this loop by an invariant relations generator [21]
⇒ {For any vector v and relation R, Rv = (R ∩ vb)L} derives fourteen invariant relations, of which five are found
(R ∩ Q ∩ C ∩ Tb )L ⊂ (R ∩ Q′ ∩ C ∩ Tb )L to be incompatible with the specification. We select the
⇒ {associativity} following incompatible invariant relation for remediation:
(R ∩ (Q ∩ C ∩ Tb ))L ⊂ (R ∩ (Q′ ∩ C ∩ Tb ))L z z′
Q = {(s, s′ )|l × (1 + b)− 1+a = l′ × (1 + b)− 1+a }.
⇒ {substitution}
(R ∩ W )L ⊂ (R ∩ W ′ )L. qed We resolve that to remediate this incompatibility, we must
alter variable z and/ or variable l. We compute the condition
A fault removal action proceeds through four steps on z and l under which a change in these variables does not
(viz. observation of failure, fault localisation, fault removal, alter any of the existing compatible relations, and we find:
validation); we discuss below how we perform each step,
z ′ ≥ z ∧ (l = l′ ∨ l × (l′ − l) > 0).
using propositions 6 and 7.
Observation of Failure. If one of the invariant relations We focus our attention on variable z , and consider the
(say, Q) is incompatible with R, then the loop is incorrect; possible mutations of the statement {z=(1+a)+z} that
preserve the equation z ′ ≥ z ; for each mutant of this
hence there is a fault. statement, we recompute the new invariant relation that
Fault Localisation. We focus on the variables that are substitutes for Q and check whether it is compatible with
referenced by relation Q. R. We find that the statement {z=(1+a)*z} produces a
compatible invariant relation, and conclude, by virtue of
Fault Removal. We must change the statements that af- proposition 7, that the following loop is more-correct with
fect the identified variables without altering the compatible respect to R than the original loop.
invariant relations (C ). Let x1 , x2 , x3 , ... xn be the variables
wm: while (abs(r-p)>ups) {t=t+1; n=n+x; m=m-1;
of the program, and let us assume that only x1 and x2 l=l*(1+b);k=k+1000;y=n+k;w=w+z;z=(1+a)*z;
are involved in the definition of Q. In order to know how v=w+k; r=(v-y)/y; u=(m-n)/n; d=r-u;}
11
We have removed a fault from w and shown that the Using this lemma, we now show the proposition:
new program wm is strictly more-correct than the original P ′ ⊒Q⊔R P
program w: we call this Debugging without Testing. In order ⇔ {definition of relative correctness}
to illustrate the difference between absolute correctness ((Q ⊔ R) ∩ P )L ⊆ ((Q ⊔ R) ∩ P ′ )L
and relative correctness, we ran this program on randomly ⇔ {definition of ⊔}
generated test data using the oracle of absolute correctness (((RL ∩ Q) ∪ (QL ∩ R) ∪ (R ∩ R)) ∩ P )L
derived from R; the program fails at the third test execution. ⊆ (((RL ∩ Q) ∪ (QL ∩ R) ∪ (R ∩ R)) ∩ P ′ )L
But its failure does not mean that our fault removal was ⇔ {factoring L on both sides}
wrong; rather it means that while wm is more-correct than (RL ∩ (Q ∩ P )L) ∪ (QL ∩ (R ∩ P )L) ∪ (Q ∩ R ∩ P )L
w, it is not yet absolutely correct. When we run this loop ⊆ (RL∩(Q∩P ′ )L)∪(QL∩(R∩P ′ )L)∪(Q∩R∩P ′ )L
on randomly generated test data using an oracle that tests ⇔ {boolean algebra, P ′ ⊒R P and P ′ ⊒Q P }
for relative correctness rather than absolute correctness (see (Q ∩ R ∩ P )L ⊆ (Q ∩ R ∩ P ′ )L
Section 6.1), it runs for over eight hundred thousand test ⇐ {for any relations A, B , (A ∩ B)L ⊆ AL ∩ BL}
data without failure. (Q ∩ P )L ∩ (R ∩ P )L ⊆ (Q ∩ R ∩ P ′ )L
Running the invariant relations generator on the new ⇐ {(Q∩P )L ⊆ (Q∩P ′ )L and (R∩P )L ⊆ (R∩P ′ )L}
loop produces fourteen invariant relations, of which only
one is incompatible; it seems that by removing the earlier (Q ∩ P ′ )L ∩ (R ∩ P ′ )L ⊆ (Q ∩ R ∩ P ′ )L
fault we have remedied four invariant relations at once. ⇐ {rewriting the first L as LL and factoring L}
Applying the same process to the new loop, we find the ((Q ∩ P ′ )L ∩ (R ∩ P ′ ))L ⊆ (Q ∩ R ∩ P ′ )L
following loop, which is correct with respect to R: ⇐ {we apply the lemma above to P ′ and (R ∩ P ′ )}
wc: while (abs(r-p)>ups) {t=t+1; n=n+x; m=m+1; (Q ∩ R ∩ P ′ )L ⊆ (Q ∩ R ∩ P ′ )L
l=l*(1+b);k=k+1000;y=n+k;w=w+z;z=(1+a)*z; ⇔ {tautology}
v=w+k; r=(v-y)/y; u=(m-n)/n; d=r-u;}
true. qed

6.3 Stepwise Proof of Relative Correctness This proposition is interesting in practice, for the follow-
In the previous section we have discused how to prove ing reason: We had found in [3] that complex specifications
relative correctness of p′ over p with respect to some can be composed from simpler specifications by means of
specification R without having to compute p and p′ (as the join operator; this proposition provides that in order to
they may be too complex). In this section we turn our prove that a program p′ is more-correct than a program p
attention to the other potential source of complexity, which with respect to a complex specification R = R′ ⊔ R′′ , it is
is specification R. sufficient to prove that p′ is more-correct than p with respect
to each component of R.
Proposition 8. Let p and p′ be two programs on space S and
let R and Q be two specifications on S . If p′ is more-
correct than p with respect to R and with respect to Q 7 C ONCLUDING R EMARKS
then it is more-correct than p with respect to (R ⊔ Q). 7.1 Summary
In this paper we have introduced the concept of relative
Proof. We introduce a lemma that will be useful for our
correctness, used it to propose a definition for program
proof:
faults, then explored the implications of these two concepts
Pb P ⊆ I ∧ Q ⊆ P ⇒ (R ∩ P )L ∩ Q = R ∩ Q. on a variety of aspects of testing and fault removal. Among
the most salient contributions of this paper, we cite the
To this effect, we write: following:
(R ∩ P )L ∩ Q = R ∩ Q
• A definition of relative correctness, and an analysis of
⇔ {(R∩P )L∩Q ⊆ Q, R∩Q ⊆ (R∩P )L, R∩Q ⊆ Q}
the proposed definition to ensure that it meets all the
(R ∩ P )L ∩ Q ⊆ R
properties that one wants to see in such a concept.
⇐ {Dedekind, [4]}
• A definition of fault and fault removal, and the analysis
(R ∩ P ∩ QL)(L ∩ (R \ ∩ P )Q) ⊆ R of monotonic fault removal, as a process that trans-
⇐ {hypothesis: Q ⊆ P } forms a faulty program into a correct program by a
(R ∩ P )(R \ ∩ P )P ⊆ R sequence of correctness-enhancing transformations.
⇐ {monotonicity of intersection} • An analysis of mutation-based program repair, high-
RPb P ⊆ R lighting that when repair candidates are evaluated
⇐ {monotonicity of product} by testing them for absolute correctness rather than
Pb P ⊆ I relative correctness, one runs the risk of selecting
⇐ {hypothesis: Pb P ⊆ I} programs that are not adequate repairs, and rejecting
true. programs that are.
12
• A critique of the concept of fault density, and the represent program executions by functions mapping initial
introduction of fault depth as perhaps a more mean- states into final states; finally, whereas Logozzo et al define
ingful measure of the degree of imperfection of a a successful execution as a trace that satisfies all the relevant
faulty program; also the observation that for a given assertions, we define it as an initial state/ final state pair that
fault depth, the higher the fault density the better falls within the relational specification.
(which is the opposite of what fault density purports In [14] Lahiri et al. introduce a technique called Differ-
to represent). ential Assertion Checking for verifying the relative correct-
• An analysis of techniques for testing that a program is ness of a program with respect to a previous version of the
more-correct than another with respect to a specifica- program. Lahiri et al. explore applications of this technique
tion, and discussion of the difference between testing as a tradeoff between soundness (which they concede) and
a program for relative correctness and testing it for lower costs (which they hope to achieve). Like the approach
absolute correctness. of Logozzo et al. [20] (from the same team), the work of
• A study of techniques for proving, by static analysis, Lahiri uses executable assertions as specifications, repre-
that a program is more-correct than another with re- sents executions by traces, defines successful executions
spect to a given specification, as well as techniques as traces that satisfy all the executable assertions, and
for decomposing a proof of relative correctness with targets abort-freedom as the main focus of the executable
respect to a compound specification into proofs of assertions. Also, they define relative correctness between
relative correctness with respect to its building com- programs P and P ′ as the property that P ′ has a larger
ponents. set of successful traces and a smallest set of unsuccessful
traces than P ; and they introduce relative specifications as
specifications that capture functionality of P ′ that P does
7.2 Related Work not have. By contrast, we use input/ output (or initil state/
In [20] Logozzo et al. introduce a technique for extract- final state) relations as specifications, we represent program
ing and maintaining semantic information across program executions by functions from initial states to final states, we
versions: specifically, they consider an original program P characterize correct executions by initial state/ final state
and a variation (version) P ′ of P , and they explore the pairs that belong to the specification, and we make no
question of extracting semantic information from P , using distinction between abort-freedom (a.k.a. safety, in [14]) and
it to instrument P ′ (by means of executable assertions), then normal functional properties. Indeed, for us the function of a
pondering what semantic guarantees they can infer about the program is the function that the program defines between its
instrumented version of P ′ . The focus of their analysis is initial states and its final states; the domain of this function
the condition under which programs P and P ′ can execute is the set of states for which execution terminates normally
without causing an abort (due to attempting an illegal and returns a well-defined final state. Hence execution of
operation), which they approximate by sufficient conditions the program on a state s is abort free if and only if the
and necessary conditions. They implement their approach state is in the domain of the program function; the domain
in a system called VMV (Verification Modulo Versions) of the program function is part of the function rather than
whose goal is to exploit semantic information about P in being an orthogonal attribute; hence we view abort-freedom
the analysis of P ′ , and to ensure that the transition from as a special form of functional attribute, rather than being an
P to P ′ happens without regression; in that case, they say orthogonal attribute. Another important distinction with [14]
that P ′ is correct relative to P . The definition of relative is that we do not view relative correctness as a compromise
correctness of Logozzo et al [20] is different from ours, for that we accept as a substitute for absolute correctness; rather
several reasons: whereas [20] talk about relative correctness we argue that in many cases, we ought to test programs
between an original program and a subsequent version in for relative correctness rather than absolute correctness,
the context of adaptive maintenance (where P and P ′ may regardless of cost. In other words, whereas Lahiri et al.
be subject to distinct requirements), we talk about relative argue in favor of relative correctness on the grounds that
correctness between an original (faulty) software product it optimizes a quality vs. cost ratio, we argue in favor on the
and a revised version of the program (possibly still faulty grounds that it optimizes quality.
yet more-correct) in the context of corrective maintenance In [19], Logozzo and Ball introduce a definition of
with respect to a fixed requirements specification; whereas relative correctness whereby a program P ′ is correct relative
[20] use a set of assertions inserted throughout the pro- to P (an improvement over P ) if and only if P ′ has more
gram as a specification, we use a relation that maps initial good traces and fewer bad traces than P . Programs are
states to final states to specify the standards against which modeled with trace semantics, and execution traces are
absolute correctness and relative correctness are defined; compared in terms of executable assertions inserted into P
whereas [20] represent program executions by execution and P ′ ; in order for the comparison to make sense, programs
traces (snapshots of the program state at assertion sites), we P and P ′ have to have the same (or similar) structure and/or
13
there must be a mapping from traces of P to traces of P ′ . operators are applied to these statements and the results are
When P ′ is obtained from P by a transformation, and when tested again against the positive and negative test data to
P ′ is provably correct relative to P , the transformation narrow the set of eligible mutants.
in question is called a verified repair. Logozzo and Ball In [18] Le Goues et al. survey existing technology
introduce an algorithm that specializes in deriving program in automated program repair and identify open research
repairs from a predefined catalog that is targeted to spe- challenges; among the criteria for automated repair meth-
cific program constructs, such as: contracts, initializations, ods, they cite applicability (extent of real-world relevance),
guards, floating point comparisons, etc. Like the work cited scalability (ability to operate effectively and efficiently for
above ( [14], [20]), Logozzo and Ball model programs products of realistic size), generality (scope of application
by execution traces and distinguish between two types domain, types of faults repaired), and credibility (extent of
of failures: contract violations, when functional properties confidence in the soundness of the repair tool). Among the
are not satisfied; and run-time errors, when the execution research issues they identify, they cite mining specifications
causes an abort; for the reasons we discuss above, we do for extant software, introducing formal methods to improve
not make this distinction, and model the two aspects with repair quality and user trust, and modeling monotonic fault
the same relational framework. Logozzo and Ball deploy removal.
their approach in an automated tool based on the static
analyzer cccheck, and assess their tool for effectiveness and 7.3 Assessment and Prospects
efficiency. The research presented in this paper is clearly in its infancy;
In [28], Nguyen et al. present an automated repair we have merely introduced some new definitions of old
method based on symbolic execution, constraint solving, concepts, and shown the ramifications that stem from these
and program synthesisi; they call their method SemFix, on definitions. Yet we feel that in doing so, we have opened
the grounds that it performs program repair by means of up many new venues of investigation, which we envision to
semantic analysis. This method combines three techniques: explore:
fault isolation by means of statistical analysis of the possible • Debugging without Testing. Traditionally, it is so in-
suspect statements; statement-level specification inference, conceivable to debug a program without testing it that
whereby a local specification is inferred from the global these two words are often used interchangeably; yet
specification and the product structure; and program syn- section 6.2 shows precisely that this can be done, albeit
thesis, whereby a corrected statement is computed from (so far) in a special context; we envision to broaden the
the local specification inferred in the previous step. The scope of this line of research.
method is organized in such a way that program synthesis is • Programming without Refinement. In section 5.6, we
modeled as a search problem under constraints, and possible argue that while refinement-based program derivation
correct statements are inspected in the order of increasing is a sufficient condition for producing correct pro-
complexity. When programs are repaired by SemFix, they grams, it may be viewed as unnecessarily strong;
are tested for (absolute) correctness against some predefined as a substitute, we show how we can derive a pro-
test data suite; as we argue throughout this paper, it is not gram by successive correctness-enhancing transforma-
sensible to test a program for absolute correctness after a tioins rather than the traditional process of successive
repair, unless we have reason to believe that the fault we correctness-preserving transformations. We envision to
have just repaired is the last fault of the program (how elaborate on this idea.
do we ever know that?). By advocating to test for relative • Mutation Testing with Relative Correctness. In light
correctness, we enable the tester to focus on one fault at a of the discussions of section 5.4, it appears that if
time, and ensure that other faults do not interfere with our we deploy mutation testing with relative correctness
assessment of whether the fault under consideration has or rather than absolute correctness, we may significantly
has not been repaired adequately. improve the precision and recall of the technique; we
In [30], Weimer et al. discuss an automated program envision to test this conjecture in practice.
repair method that takes as input a faulty program, along • Measuring Faultiness with Fault Depth. The discus-
with a set of positive tests (i.e. test data on which the sions of section 5.1.3 appear to show that fault depth is
program is known to perform correctly) and a set of negative a better measure of product imperfection (failure rate,
tests (i.e. test data on which the program is known to fail) repair effort) than fault density; we envision to test this
and returns a set of possible patches. The proposed method hypothesis.
proceeds by keeping track of the execution paths that are • Testing for Relative Correctness. We envision to inves-
visited by successful executions and those that are visited tigate test data generation strategies that are appropri-
by unsuccessful executions, and using this information to ate for relative correctness, and to generate broadly
focus the search for repairs on those statements that appear applicable conditions of relative correctness in the
in the latter paths and not in the former paths. Mutation style of proposition 7.
14
Acknowledgements [23] Z. Manna. A Mathematical Theory of Computation. McGraw
Hill, 1974.
The authors are grateful to Dr Kazunori Sakamoto, from the [24] Ali Mili, Marcelo Frias, and Ali Jaoua. On faults and faulty
National Institute of Informatics, Tokyo, Japan, for feedback programs. In Peter Hoefner, Peter Jipsen, Wolfram Kahl, and
on an earlier version of this paper. Martin Eric Mueller, editors, Proceedings, RAMICS: 14th In-
ternational Conference on Relational and Algebraic Methods in
Computer Science, volume 8428 of Lecture Notes in Computer
R EFERENCES Science, Marienstatt, Germany, April 28–May 1st 2014. Springer.
[25] Ali Mili and Fairouz Tchier. Software Testing: Operations and
[1] A. Arcuri and X. Yao. A novel co-evolutionary approach to Concepts. John Wiley and Sons, 2015.
automatic software bug fixing. In CEC 2008, 2008. [26] H.D. Mills, V.R. Basili, J.D. Gannon, and D.R. Hamlet. Struc-
[2] Algirdas Avizienis, Jean Claude Laprie, Brian Randell, and Carl E tured Programming: A Mathematical Approach. Allyn and Ba-
Landwehr. Basic concepts and taxonomy of dependable and con, Boston, Ma, 1986.
secure computing. IEEE Transactions on Dependable and Secure [27] Olfa Mraihi, Asma Louhichi, Lamia Labed Jilani, Jules De-
Computing, 1(1):11–33, 2004. sharnais, and Ali Mili. Invariant assertions, invariant relations,
[3] N. Boudriga, F. Elloumi, and A. Mili. The lattice of specifications: and invariant functions. Science of Computer Programming,
Applications to a specification methodology. Formal Aspects of 78(9):1212–1239, September 2013.
Computing, 4:544–571, 1992. [28] Hoang Duong Thien Nguyen, DaWei Qi, Abhik Roychoudhury,
[4] Ch. Brink, W. Kahl, and G. Schmidt. Relational Methods in and Satish Chandra. Semfix: Program repair via semantic analy-
Computer Science. Springer Verlag, January 1997. sis. In Proceedings, ICSE, pages 772–781, 2013.
[5] Kim D., Nam J., Song J., and Kim S. Automatic patch generation [29] Debroy V. and Wong W.E. Using mutation to automatically
learned from human-written patches. In ICSE 2013, pages 802– suggest fixes to faulty programs. In Proceedings, ICST 2010,
811, 2013. pages 65–74, 2010.
[6] Nafi Diallo, Wided Ghardallou, and Ali Mili. Program derivation [30] Weimer W., Nguyen T., Le Goues C., and Forrest S. Automati-
by correctness enhancements. In Refinement 2015, Oslo, Norway, cally finding patches using genetic programming. In Proceedings,
June 2015. ICSE 2009, pages 364–374, 2009.
[7] E.W. Dijkstra. A Discipline of Programming. Prentice Hall, 1976. [31] L. Zemı́n, S. Guttiérrez, S. Perez de Rosso, N. Aguirre, A. Mili,
[8] A. Gonzalez-Sanchez, R. Abreu, H-G. Gross, and A.J.C. van A. Jaoua, and M. Frias. Stryker: Scaling specification-based
Gemund. Prioritizing tests for fault localization through am- program repair by pruning infeasible mutants with sat. Technical
biguity group reduction. In proceedings, Automated Software report, ITBA, Buenos Aires, Argentina, 2015.
Engineering, Lawrence, KS, 2011.
[9] Divya Gopinath, Mohammad Zubair Malik, and Sarfraz Khur-
shid. Specification based program repair using sat. In Proceed-
ings, TACAS, pages 173–188, 2011.
Nafi Diallo earned a Bachelor from Gunma University
[10] C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. Genprog: A in Japan and a Master from NJIT; she is currently a PhD
generic method for automated software repair. IEEE Transactions student and teaching assistant at NJIT in Newark, NJ.
on Software Engineering, 31(1), 2012.
[11] D. Gries. The Science of programming. Springer Verlag, 1981.
[12] E.C.R. Hehner. A Practical Theory of Programming. Prentice Wided Ghardallou earned the M.S and PhD from the
Hall, 1992.
[13] C.A.R. Hoare. An axiomatic basis for computer programming. University of Tunis El Manar, and serves currently as an
Communications of the ACM, 12(10):576 – 583, October 1969. assistant Professor at the MIS Institute of Kairoun, Tunisia.
[14] Shuvendu K. Lahiri, Kenneth L. McMillan, Rahul Sharma, and
Chris Hawblitzel. Differential assertion checking. In Proceedings,
ESEC/ SIGSOFT FSE, pages 345–455, 2013.
[15] J.C. Laprie. Dependability —its attributes, impairments and Jules Desharnais holds a M.S from Laval University and
means. In Predictably Dependable Computing Systems, pages
1–19. Springer Verlag, 1995. a PhD from McGill University, and serves currently on the
[16] Jean Claude Laprie. Dependability: Basic Concepts and Ter- faculty of Laval University in Quebec City, Canada.
minology: in English, French, German, Italian and Japanese.
Springer Verlag, Heidelberg, 1991.
[17] Jean Claude Laprie. Dependable computing: Concepts, chal- Marcelo Frias holds a M.S. from the University of
lenges, directions. In Proceedings, COMPSAC, 2004. Buenos Aires and a PhD from PUC in Rio de Janeiro. He
[18] Claire LeGoues, Stephanie Forrest, and Westley Weimer. Cur-
rent challenges in automatic software repair. Software Quality serves on the Faculty of ITBA in Buenos Aires, Argentina.
Journal, 21(3):421–443, 2013.
[19] Francesco Logozzo and Thomas Ball. Modular and verified
automatic program repair. In Proceedings, OOPSLA, pages 133– Ali Jaoua holds an engineering degree from ENSIIHT
146, 2012. in Toulouse, France and a PhD from the University of
[20] Francesco Logozzo, Shuvendu Lahiri, Manual Faehndrich, and Toulouse, France. He serves on the faculty of Qatar Uni-
San Blackshear. Verification modulo versions: Towards usable
verification. In Proceedings, PLDI, 2014. versity in Doha, Qatar.
[21] Asma Louhichi, Wided Ghardallou, Khaled Bsaies, Lamia Labed
Jilani, Olfa Mraihi, and Ali Mili. Verifying loops with invariant
relations. International Journal of Critical Computer Based Ali Mili holds a PhD from the University of Illinois and
Systems, 5(1/2):78–102, 2014. a Doctort d’Etat from the University of Grenoble. He is on
[22] Yu Seung Ma, Jeff Offutt, and Yong Rae Kwon. Mu java: An
automated class mutation system. Software Testing, Verification
the faculty of NJIT in Newark, NJ.
and Reliability, 15(2):97–133, June 2005.

You might also like