What Is A Fault and Why Does It Matter Nouv2015
What Is A Fault and Why Does It Matter Nouv2015
Abstract—Faults are an important concept in the study of system dependability, and most approaches to dependability
can be characterized by the way in which they deal with faults (e.g. fault avoidance, fault removal, fault tolerance, fault
forecasting, etc). In their seminal work on modeling dependable computing, Laprie et al. define a fault as the adjudged or
hypothesized cause of an error. In this paper, we propose a more formal definition of a fault in the context of software
products, and discuss the diverse implications of our definition.
Index Terms—Correctness, Partial Correctness, Total Correctness, Relative Correctness, Absolute Correctness, Software
Fault, Fault Removal, Fault Density, Software Testing, Software Repair, Software Design.
For illustration, we consider the following program, say (let+=2) and (’Z’>c) is a fault in p, and its substi-
p, taken from [8] (with some modifications): tution by (let+=1) and (’Z’>=c) is a fault removal,
#include <iostream> ... ... ... // line 1 yielding the more-correct program p11 .
void count (char q[]) // 2 • The program p11 is correct with respect to R.
{int let, dig, other, i, l; char c; // 3 Note that the statement (’Z’>c) is a fault in p01 but it is
i=0;let=0;dig=0;other=0;l=strlen(q); // 4
while (i<l) { // 5 not a fault in p; also note that the statement (’Z’>c), in
c = q[i]; // 6 combination with the statement (let+=2) is a fault in p,
if (’A’<=c && ’Z’>c) let+=2; // 7 but it is not a fault in p by itself.
else // 8
if (’a’<=c && ’z’>=c) let+=1; // 9
else //10 4 VALIDATION OF R ELATIVE C ORRECTNESS
if (’0’<=c && ’9’>=c) dig+=1; //11
else //12 4.1 Litmus Tests
other+=1; //13
i++;} //14 How do we know that our definition of relative correctness
printf ("%d %d %d\n",let,dig,other);} //15 is sound? To answer this question, we list some properties
that a definition of relative correctness ought to meet; then
We let S be the space defined by the declarations of line
we check that our definition does satisfy them.
3, to which we add variable os which represents the output
stream (in C++ parlance), and we let R be the following • Reflexivity and Transitivity, and non-Antisymmetry. Of
Design Preserving
with respect to R. Even though it is grossly artificial, this Refinements
⊑
example shows that the same fault may require more than
one removal to be completely eliminated from a program. ⊑
Hence we adopt fault depth as a measure of program
?⊑
Incorrect -R Correct
faultiness. Unlike fault density, fault depth does decrease by ⊑R ⊑R ⊑R ⊑R ⊑
one whenever we remove a fault that is in the minimal path.
Given a faulty program p and a program p′ obtained from Program Correctness Enhancing Repairs Program
p by monotonic fault removal; if the fault removal is in a
minimal sequence to a correct program, then we can write:
Fig. 3. A Framework for Monotonic Fault Removal
depth(p) = 1 + depth(p′ ).
it is wrong to test the program for correctness at the end
5.2 Monotonic Fault Removal of this process, unless we have reason to believe that the
As programmers, we are all familiar with the frustration fault we have just removed is the last fault of the program.
of trying to remove faults from a program, only to find Given that in general we have no way to check such an
that we are running in circles, patching the program at asumption, there is no reason we should expect the program
one end only to break it down at another. This would to be correct, even if we assume that the fault was properly
not happen if we restricted program transformations to removed. Instead, the most we can hope for is that the new
provably monotonic fault removals; with such a discipline, program is more-correct than the original, and we should
we are assured that with each transformation, the program be testing it for relative correctness rather than absolute
becomes more-correct. Of course, ensuring that a program correctness.
transformation qualifies as a monotonic fault removal is
generally a non-trivial exercise; we postpone the discussion 5.4 Software Repair
of how to do this to section 6. Here we simply argue
that in the same way that stepwise refinement provides Most techniques for program repair [1], [5], [9], [10],
a logical framework for software design, which proceeds [29], [31] proceed by applying transformations on an
monotonically from a specification to a program through original faulty program. These transformations may be
correctness-preserving transformations, relative correctness macro-transformations (including multi-site program mod-
provides a logical framework for stepwise fault removal, ifications), or micro-modifications (intra-statement) using
which starts from an incorrect program and proceeds mono- mutation operators as those provided by the muJava [22]
tonically towards a correct program through correctness- program mutation tool. Two main approaches exist towards
enhancing transformations. This process is illustrated in assessing the suitability of the generated transformations:
Figure 3. The concept of relative correctness ought to play test-based techniques [1], [5], [10], [29] (which use the
for software fault removal the same role that refinement successful execution of the candidate program on a test
plays for software design: first, as a logical framework for suite as the acceptance criterion) or specification-based
reasoning about faults and fault removal; second, as an techniques [9], [31] (which use a specification and some
ideal process to be followed scrupulously when the stakes sort of constraint-solving to determine if the new code
warrant it; and third, as a yardstick against which large scale complies with the specification). Both techniques share the
methods can be evaluated. following feature: repair candidates that fail in some test, or
that do not comply with the specification, are discarded and
new candidates are examined, while those that succeed are
5.3 A Software Testing Lifecycle considered as potential fixes.
The traditional lifecycle of software testing is triggered by We argue that when mutants are evaluated on the basis
an observation of failure and proceeds by analyzing the of their execution on a sample test data T , both the decision
failure, tracing it back to a hypothetical fault, removing the to retain successful mutants and the decision to reject
fault, then testing the program for correctness. We argue that unsuccessful mutants, are wrong:
$ once, when we do not know how each fault removal affects
'
8
'$
others.
' $
Multiple mutations are deployed to repair multi-site
elementary faults. In this case, it is sensible to deploy
multiple mutations, but note that the mutiplicity of the
mutation is not the estimated number of faults we are
T T
& %
trying to repair simultaneously but rather the multiplicity
CD ′
of the multi-site elementary faults we are trying to repair
&CD
%
CD individually, usually a much smaller number.
& CD %
′
(a) (b)
5.6 Software Design
Fig. 4. Poor Precision, Poor Recall In section 4.2.4, we have found that program P ′ refines
program P if and only if P ′ is more-correct than P with
respect to any specification. This sheds new light on pro-
• As Figure 4(a) shows (if CD is the competence do- gram derivation by successive refinements, which requires
main of the original program and CD ′ is the compe- that at each stage of this process, we transform a program
tence domain of the mutant), a mutant may pass the into a more-refined program. According to our discussion
test T (since T ⊆ CD ′ ) yet is not more-correct than of section 4.2.4, this process requires that at each stage,
the original (since CD is not a subset of CD ′ ). we transform a program, say P , into a program P ′ that
• As Figure 4 (b) shows, a mutant may fail the test T is more-correct than P with respect to any specification.
(since T is not a subset of CD ′ ) and yet still be more- But this raises the question: why should P ′ be more-
correct than the original (since CD is a subset of CD ′ ). correct than P with respect to any specification when we
As a result, neither the precision nor the recall of the are only interested in specification R? Is it possible that
selection algorithm is assured. the requirement of refinement is too strong? To explore
this venue, we revisit the process depicted in Figure 3,
and imagine that instead of designing a (possibly incor-
5.5 Multiple Mutation rect) program then proceeding with correctness enhancing
transformations towards a correct program, we start with the
Debroy and Wong [29] use a single muJava mutation in
(trivially incorrect) abort program, and transform it into
order to generate fix candidates. A clear limitation of such
increasingly more-correct (rather more-refined) programs
an approach is that many faults will not be fixed; this
until we find a correct program. As an illustration of this
happens in the case of multi-site faults (that span through
process, we briefly present an example borrowed from [6]
more than one program location), as well as whenever the
(to which the interested reader is referred for further details).
program under analysis has multiple faults. The natural
We let S be the space defined by natural variables x, y and
alternative is to apply multiple mutations. This is the case
n, and we let R be the following specification (known as
in tools such as those presented in [9] and [31]. The impact
Fermat’s factorization):
of relative correctness on multiple mutation testing depends
on the reason for deploying multiple mutations; we see two R = {(s, s′ )|n = x′2 − y ′2 ∧ 0 ≤ y ′ ≤ x′ }.
possible scenarios, which we will discus in turn.
Multiple mutations are deployed to repair multiple To find a program that is correct with respect to this specifi-
faults. When one uses a test of absolute correctness to cation, we consider increasingly complex configurations of
assess the validity of program repairs, one has to remove x and y , and derive the corresponding Fermat factorization;
all faults at once in order for the test to be meaningful. this yields the following sequence of programs, which are
Multiple mutation proceeds by applying mutation operators ranked by relative correctness with respect to R (though
at different places in the program then testing the resulting not by refinement), and culminate in a program that is
program for absolute correctness. We argue that with rela- absolutely correct with respect to R.
tive correctness, it is no longer necessary to consider several p0 : abort.
faults at once, since we can characterize fault removals one p1 : {int r; x=0; y=0; r=0;
fault at a time. Managing faults one a time offers many while (r<n) {r=r+2*x+1; x=x+1;}}
advantages: First and foremost, it spares us the massive p2 : {int r; x=0; r=0;
combinatorial explosion that stems from applying several while (r<n) {r=r+2*x+1; x=x+1;}
simultaneous mutations through the program; second, it if (r>n) {y=0; while (r>n)
spares us the trouble of dealing with many fault removals at {r=r-2*y-1; y=y+1;}}}
9
p3 : {int r; x=0; r=0; Hence the oracle of relative correctness, Ω(s, s′ ), should be
while (r<n) {r=r+2*x+1; x=x+1;} written as follows:
while (r>n) {int rsave=r; y=0;
while (r>n) {r=r-2*y-1; y=y+1;}
Ω(s, s′ ) ≡ (ω(s, P (s)) ⇒ ω(s, s′ )).
if (r<n) {r=rsave+2*x+1; x=x+1;}}} This formula shows how to derive the oracle of relative
Imagine a scenario where our goal is not necessarily to pro- correctness (Ω) from the oracle of absolute correctness (ω );
duce a correct program, but rather to produce a sufficiently in [25] we discuss how to derive the oracle of absolute
reliable program, for a prespecified reliability threshold. correctness (ω(s, s′ )) from specification R.
Now, consider that, according to proposition 4, relative
correctness logically implies higher reliability. Hence the 6.2 Proving Relative Correctness
programs that we generate in this sequence are more and
6.2.1 Relative Correctness of Iterative Programs
more reliable; if we can estimate the reliability of each pro-
gram that we generate in this sequence, then we can imagine In principle, we can prove a program p′ more-correct than
a scenario where this stepwise transformation concludes, a program p with respect to specification R by computing
not when we obtain a correct program, but rather when P and P ′ then comparing dom(R ∩ P ) and dom(R ∩ P ′ );
we obtain a program whose reliability equals or exceeds of course, in practice this is usually very difficult. In this
the pre-specified threshold. While we have not yet proven section, we briefly explore some preliminary results. We
the viability of this approach, it certainly sounds like a consider a while loop w on space S , of the form {while
worthwhile venue to pursue; as an exercise, we have found (t) {b}}, and we denote by B the function of the loop
that under the hypothesis of uniform probability distribution body b and by T the vector that represents the loop condition
of the inputs, the reliability of the sequence of programs T = {(s, s′ )|t(s)}. An invariant relation of loop w is a
given above (p0 , p1 , p2 , p3 ) is, respectively, (0.0, 0.0133, reflexive transitive superset of (T ∩B); the interested reader
0.1328, 1.0). is referred to [27] for more details on invariant relations.
The following proposition (due to [21]) shows how we
can use invariant relations to prove the correctness or the
6 E STABLISHING R ELATIVE C ORRECTNESS incorrectness of a loop with respect to a specification.
Proposition 6. Let R be a specification on space S , let w be
Given a program p and a specification R, how can we
a while statement on S of the form w: {while (t)
determine whether some program p′ is or is not more-
{b}}, which terminates normally for any state in S , and
correct than p with respect to R?
let V be an invariant relation of w.
Sufficient Condition of Correctness: If V satisfies the
6.1 Testing for Relative Correctness following condition V T ∩ RL ∩ (R ∪ V ∩ T b ) = R,
then w is correct with respect to R.
How do we test a program for relative correctness over
Necessary Condition of Correctness: If w is correct
another with respect to a given specification? How is that
with respect to R then the following condition holds
different from testing the program for absolute correctness?
for invariant relation V : (R ∩ V )T = RL.
We argue that testing a program for relative correctness
rather than absolute correctness affects two separate aspects Intuitive interpretation: The sufficient condition of correct-
of testing, namely test data generation and oracle design. ness means in effect that the invariant relation V captures
Test Data Generation. The essence of test data gen- enough information about the loop to subsume the spec-
eration is to approximate an infinite or very large input ification R; the necessary condition of correctness means
space by a small representative test data set; clearly, what that no loop that admits an invariant relation that violates
input space we are trying to approximate influences what this condition can possibly be correct with respect to R. If
test data we select, regardless of the selection criterion that we encounter an invariant relation V that does not satisfy
we apply. When we test a program for absolute correctness the necessary condition of correctness, we conclude that the
with respect to a specification R, the relevant input space is loop w is not correct with respect to specification R. We
dom(R). By contrast, when we test a program for relative say about such invariant relations that they are incompatible
correctness over program p with respect to specification R, with specification R; when a relation does satisfy the
the relevant input space is dom(R ∩ P ). necessary condition of correctness, we say about it that
Oracle Design. Let ω(s, s′ ) be the oracle that we use to it is compatible with the specification, even though not
test a program for absolute correctness with respect to speci- incompatible is a better characterization of such a relation.
fication R. To test a program p′ for relative correctness over Given a while loop w and a specification R, we generate
program p, we need to check that oracle ω(s, s′ ) holds only all the invariant relations of w and we divide them into two
for those inputs s on which program p runs successfully. classes: compatible relations and incompatible relations. If
10
at least one relation (say Q) is incompatible with specifica- to modify x1 and x2 in the loop, we write the following
tion R, then we conclude that the loop is incorrect, and we condition on x1 , x2 , x′1 , x′2 :
prepare to repair it; the following proposition provides the
x1 x′1
basis for doing so. ′
x x
∃x3 , x4 , ...xn , x′3 , x′4 , ...x′n : h 2 i, h 2 i ∈ C,
Proposition 7. Let R be a specification on space S and ... ...
let w be a while loop on S of the form, w: {while xn x′n
(t) {b}} which terminates for all s in S . Let Q be
where C is the intersevtion of all the compatible invariant
an invariant relation of w that is incompatible with R;
relations.
and let C be the largest invariant relation of w such that
Validation. Once we change variables x1 and x2 , we
W = (C ∩ Q) ∩ Tb . Let w′ be a while loop that has recompute the new invariant relation Q′ involving these
C as an invariant relation, terminates for all s in S , and variables; if Q′ is compatible with R, then the new loop
admits an invariant relation Q′ that is compatible with is strictly more-correct than the original loop.
R and satisfies the condition W ′ = (C ∩ Q′ ) ∩ Tb . Then This process enables us to remove a fault without
w′ is strictly more-correct than w. testing.
6.3 Stepwise Proof of Relative Correctness This proposition is interesting in practice, for the follow-
In the previous section we have discused how to prove ing reason: We had found in [3] that complex specifications
relative correctness of p′ over p with respect to some can be composed from simpler specifications by means of
specification R without having to compute p and p′ (as the join operator; this proposition provides that in order to
they may be too complex). In this section we turn our prove that a program p′ is more-correct than a program p
attention to the other potential source of complexity, which with respect to a complex specification R = R′ ⊔ R′′ , it is
is specification R. sufficient to prove that p′ is more-correct than p with respect
to each component of R.
Proposition 8. Let p and p′ be two programs on space S and
let R and Q be two specifications on S . If p′ is more-
correct than p with respect to R and with respect to Q 7 C ONCLUDING R EMARKS
then it is more-correct than p with respect to (R ⊔ Q). 7.1 Summary
In this paper we have introduced the concept of relative
Proof. We introduce a lemma that will be useful for our
correctness, used it to propose a definition for program
proof:
faults, then explored the implications of these two concepts
Pb P ⊆ I ∧ Q ⊆ P ⇒ (R ∩ P )L ∩ Q = R ∩ Q. on a variety of aspects of testing and fault removal. Among
the most salient contributions of this paper, we cite the
To this effect, we write: following:
(R ∩ P )L ∩ Q = R ∩ Q
• A definition of relative correctness, and an analysis of
⇔ {(R∩P )L∩Q ⊆ Q, R∩Q ⊆ (R∩P )L, R∩Q ⊆ Q}
the proposed definition to ensure that it meets all the
(R ∩ P )L ∩ Q ⊆ R
properties that one wants to see in such a concept.
⇐ {Dedekind, [4]}
• A definition of fault and fault removal, and the analysis
(R ∩ P ∩ QL)(L ∩ (R \ ∩ P )Q) ⊆ R of monotonic fault removal, as a process that trans-
⇐ {hypothesis: Q ⊆ P } forms a faulty program into a correct program by a
(R ∩ P )(R \ ∩ P )P ⊆ R sequence of correctness-enhancing transformations.
⇐ {monotonicity of intersection} • An analysis of mutation-based program repair, high-
RPb P ⊆ R lighting that when repair candidates are evaluated
⇐ {monotonicity of product} by testing them for absolute correctness rather than
Pb P ⊆ I relative correctness, one runs the risk of selecting
⇐ {hypothesis: Pb P ⊆ I} programs that are not adequate repairs, and rejecting
true. programs that are.
12
• A critique of the concept of fault density, and the represent program executions by functions mapping initial
introduction of fault depth as perhaps a more mean- states into final states; finally, whereas Logozzo et al define
ingful measure of the degree of imperfection of a a successful execution as a trace that satisfies all the relevant
faulty program; also the observation that for a given assertions, we define it as an initial state/ final state pair that
fault depth, the higher the fault density the better falls within the relational specification.
(which is the opposite of what fault density purports In [14] Lahiri et al. introduce a technique called Differ-
to represent). ential Assertion Checking for verifying the relative correct-
• An analysis of techniques for testing that a program is ness of a program with respect to a previous version of the
more-correct than another with respect to a specifica- program. Lahiri et al. explore applications of this technique
tion, and discussion of the difference between testing as a tradeoff between soundness (which they concede) and
a program for relative correctness and testing it for lower costs (which they hope to achieve). Like the approach
absolute correctness. of Logozzo et al. [20] (from the same team), the work of
• A study of techniques for proving, by static analysis, Lahiri uses executable assertions as specifications, repre-
that a program is more-correct than another with re- sents executions by traces, defines successful executions
spect to a given specification, as well as techniques as traces that satisfy all the executable assertions, and
for decomposing a proof of relative correctness with targets abort-freedom as the main focus of the executable
respect to a compound specification into proofs of assertions. Also, they define relative correctness between
relative correctness with respect to its building com- programs P and P ′ as the property that P ′ has a larger
ponents. set of successful traces and a smallest set of unsuccessful
traces than P ; and they introduce relative specifications as
specifications that capture functionality of P ′ that P does
7.2 Related Work not have. By contrast, we use input/ output (or initil state/
In [20] Logozzo et al. introduce a technique for extract- final state) relations as specifications, we represent program
ing and maintaining semantic information across program executions by functions from initial states to final states, we
versions: specifically, they consider an original program P characterize correct executions by initial state/ final state
and a variation (version) P ′ of P , and they explore the pairs that belong to the specification, and we make no
question of extracting semantic information from P , using distinction between abort-freedom (a.k.a. safety, in [14]) and
it to instrument P ′ (by means of executable assertions), then normal functional properties. Indeed, for us the function of a
pondering what semantic guarantees they can infer about the program is the function that the program defines between its
instrumented version of P ′ . The focus of their analysis is initial states and its final states; the domain of this function
the condition under which programs P and P ′ can execute is the set of states for which execution terminates normally
without causing an abort (due to attempting an illegal and returns a well-defined final state. Hence execution of
operation), which they approximate by sufficient conditions the program on a state s is abort free if and only if the
and necessary conditions. They implement their approach state is in the domain of the program function; the domain
in a system called VMV (Verification Modulo Versions) of the program function is part of the function rather than
whose goal is to exploit semantic information about P in being an orthogonal attribute; hence we view abort-freedom
the analysis of P ′ , and to ensure that the transition from as a special form of functional attribute, rather than being an
P to P ′ happens without regression; in that case, they say orthogonal attribute. Another important distinction with [14]
that P ′ is correct relative to P . The definition of relative is that we do not view relative correctness as a compromise
correctness of Logozzo et al [20] is different from ours, for that we accept as a substitute for absolute correctness; rather
several reasons: whereas [20] talk about relative correctness we argue that in many cases, we ought to test programs
between an original program and a subsequent version in for relative correctness rather than absolute correctness,
the context of adaptive maintenance (where P and P ′ may regardless of cost. In other words, whereas Lahiri et al.
be subject to distinct requirements), we talk about relative argue in favor of relative correctness on the grounds that
correctness between an original (faulty) software product it optimizes a quality vs. cost ratio, we argue in favor on the
and a revised version of the program (possibly still faulty grounds that it optimizes quality.
yet more-correct) in the context of corrective maintenance In [19], Logozzo and Ball introduce a definition of
with respect to a fixed requirements specification; whereas relative correctness whereby a program P ′ is correct relative
[20] use a set of assertions inserted throughout the pro- to P (an improvement over P ) if and only if P ′ has more
gram as a specification, we use a relation that maps initial good traces and fewer bad traces than P . Programs are
states to final states to specify the standards against which modeled with trace semantics, and execution traces are
absolute correctness and relative correctness are defined; compared in terms of executable assertions inserted into P
whereas [20] represent program executions by execution and P ′ ; in order for the comparison to make sense, programs
traces (snapshots of the program state at assertion sites), we P and P ′ have to have the same (or similar) structure and/or
13
there must be a mapping from traces of P to traces of P ′ . operators are applied to these statements and the results are
When P ′ is obtained from P by a transformation, and when tested again against the positive and negative test data to
P ′ is provably correct relative to P , the transformation narrow the set of eligible mutants.
in question is called a verified repair. Logozzo and Ball In [18] Le Goues et al. survey existing technology
introduce an algorithm that specializes in deriving program in automated program repair and identify open research
repairs from a predefined catalog that is targeted to spe- challenges; among the criteria for automated repair meth-
cific program constructs, such as: contracts, initializations, ods, they cite applicability (extent of real-world relevance),
guards, floating point comparisons, etc. Like the work cited scalability (ability to operate effectively and efficiently for
above ( [14], [20]), Logozzo and Ball model programs products of realistic size), generality (scope of application
by execution traces and distinguish between two types domain, types of faults repaired), and credibility (extent of
of failures: contract violations, when functional properties confidence in the soundness of the repair tool). Among the
are not satisfied; and run-time errors, when the execution research issues they identify, they cite mining specifications
causes an abort; for the reasons we discuss above, we do for extant software, introducing formal methods to improve
not make this distinction, and model the two aspects with repair quality and user trust, and modeling monotonic fault
the same relational framework. Logozzo and Ball deploy removal.
their approach in an automated tool based on the static
analyzer cccheck, and assess their tool for effectiveness and 7.3 Assessment and Prospects
efficiency. The research presented in this paper is clearly in its infancy;
In [28], Nguyen et al. present an automated repair we have merely introduced some new definitions of old
method based on symbolic execution, constraint solving, concepts, and shown the ramifications that stem from these
and program synthesisi; they call their method SemFix, on definitions. Yet we feel that in doing so, we have opened
the grounds that it performs program repair by means of up many new venues of investigation, which we envision to
semantic analysis. This method combines three techniques: explore:
fault isolation by means of statistical analysis of the possible • Debugging without Testing. Traditionally, it is so in-
suspect statements; statement-level specification inference, conceivable to debug a program without testing it that
whereby a local specification is inferred from the global these two words are often used interchangeably; yet
specification and the product structure; and program syn- section 6.2 shows precisely that this can be done, albeit
thesis, whereby a corrected statement is computed from (so far) in a special context; we envision to broaden the
the local specification inferred in the previous step. The scope of this line of research.
method is organized in such a way that program synthesis is • Programming without Refinement. In section 5.6, we
modeled as a search problem under constraints, and possible argue that while refinement-based program derivation
correct statements are inspected in the order of increasing is a sufficient condition for producing correct pro-
complexity. When programs are repaired by SemFix, they grams, it may be viewed as unnecessarily strong;
are tested for (absolute) correctness against some predefined as a substitute, we show how we can derive a pro-
test data suite; as we argue throughout this paper, it is not gram by successive correctness-enhancing transforma-
sensible to test a program for absolute correctness after a tioins rather than the traditional process of successive
repair, unless we have reason to believe that the fault we correctness-preserving transformations. We envision to
have just repaired is the last fault of the program (how elaborate on this idea.
do we ever know that?). By advocating to test for relative • Mutation Testing with Relative Correctness. In light
correctness, we enable the tester to focus on one fault at a of the discussions of section 5.4, it appears that if
time, and ensure that other faults do not interfere with our we deploy mutation testing with relative correctness
assessment of whether the fault under consideration has or rather than absolute correctness, we may significantly
has not been repaired adequately. improve the precision and recall of the technique; we
In [30], Weimer et al. discuss an automated program envision to test this conjecture in practice.
repair method that takes as input a faulty program, along • Measuring Faultiness with Fault Depth. The discus-
with a set of positive tests (i.e. test data on which the sions of section 5.1.3 appear to show that fault depth is
program is known to perform correctly) and a set of negative a better measure of product imperfection (failure rate,
tests (i.e. test data on which the program is known to fail) repair effort) than fault density; we envision to test this
and returns a set of possible patches. The proposed method hypothesis.
proceeds by keeping track of the execution paths that are • Testing for Relative Correctness. We envision to inves-
visited by successful executions and those that are visited tigate test data generation strategies that are appropri-
by unsuccessful executions, and using this information to ate for relative correctness, and to generate broadly
focus the search for repairs on those statements that appear applicable conditions of relative correctness in the
in the latter paths and not in the former paths. Mutation style of proposition 7.
14
Acknowledgements [23] Z. Manna. A Mathematical Theory of Computation. McGraw
Hill, 1974.
The authors are grateful to Dr Kazunori Sakamoto, from the [24] Ali Mili, Marcelo Frias, and Ali Jaoua. On faults and faulty
National Institute of Informatics, Tokyo, Japan, for feedback programs. In Peter Hoefner, Peter Jipsen, Wolfram Kahl, and
on an earlier version of this paper. Martin Eric Mueller, editors, Proceedings, RAMICS: 14th In-
ternational Conference on Relational and Algebraic Methods in
Computer Science, volume 8428 of Lecture Notes in Computer
R EFERENCES Science, Marienstatt, Germany, April 28–May 1st 2014. Springer.
[25] Ali Mili and Fairouz Tchier. Software Testing: Operations and
[1] A. Arcuri and X. Yao. A novel co-evolutionary approach to Concepts. John Wiley and Sons, 2015.
automatic software bug fixing. In CEC 2008, 2008. [26] H.D. Mills, V.R. Basili, J.D. Gannon, and D.R. Hamlet. Struc-
[2] Algirdas Avizienis, Jean Claude Laprie, Brian Randell, and Carl E tured Programming: A Mathematical Approach. Allyn and Ba-
Landwehr. Basic concepts and taxonomy of dependable and con, Boston, Ma, 1986.
secure computing. IEEE Transactions on Dependable and Secure [27] Olfa Mraihi, Asma Louhichi, Lamia Labed Jilani, Jules De-
Computing, 1(1):11–33, 2004. sharnais, and Ali Mili. Invariant assertions, invariant relations,
[3] N. Boudriga, F. Elloumi, and A. Mili. The lattice of specifications: and invariant functions. Science of Computer Programming,
Applications to a specification methodology. Formal Aspects of 78(9):1212–1239, September 2013.
Computing, 4:544–571, 1992. [28] Hoang Duong Thien Nguyen, DaWei Qi, Abhik Roychoudhury,
[4] Ch. Brink, W. Kahl, and G. Schmidt. Relational Methods in and Satish Chandra. Semfix: Program repair via semantic analy-
Computer Science. Springer Verlag, January 1997. sis. In Proceedings, ICSE, pages 772–781, 2013.
[5] Kim D., Nam J., Song J., and Kim S. Automatic patch generation [29] Debroy V. and Wong W.E. Using mutation to automatically
learned from human-written patches. In ICSE 2013, pages 802– suggest fixes to faulty programs. In Proceedings, ICST 2010,
811, 2013. pages 65–74, 2010.
[6] Nafi Diallo, Wided Ghardallou, and Ali Mili. Program derivation [30] Weimer W., Nguyen T., Le Goues C., and Forrest S. Automati-
by correctness enhancements. In Refinement 2015, Oslo, Norway, cally finding patches using genetic programming. In Proceedings,
June 2015. ICSE 2009, pages 364–374, 2009.
[7] E.W. Dijkstra. A Discipline of Programming. Prentice Hall, 1976. [31] L. Zemı́n, S. Guttiérrez, S. Perez de Rosso, N. Aguirre, A. Mili,
[8] A. Gonzalez-Sanchez, R. Abreu, H-G. Gross, and A.J.C. van A. Jaoua, and M. Frias. Stryker: Scaling specification-based
Gemund. Prioritizing tests for fault localization through am- program repair by pruning infeasible mutants with sat. Technical
biguity group reduction. In proceedings, Automated Software report, ITBA, Buenos Aires, Argentina, 2015.
Engineering, Lawrence, KS, 2011.
[9] Divya Gopinath, Mohammad Zubair Malik, and Sarfraz Khur-
shid. Specification based program repair using sat. In Proceed-
ings, TACAS, pages 173–188, 2011.
Nafi Diallo earned a Bachelor from Gunma University
[10] C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. Genprog: A in Japan and a Master from NJIT; she is currently a PhD
generic method for automated software repair. IEEE Transactions student and teaching assistant at NJIT in Newark, NJ.
on Software Engineering, 31(1), 2012.
[11] D. Gries. The Science of programming. Springer Verlag, 1981.
[12] E.C.R. Hehner. A Practical Theory of Programming. Prentice Wided Ghardallou earned the M.S and PhD from the
Hall, 1992.
[13] C.A.R. Hoare. An axiomatic basis for computer programming. University of Tunis El Manar, and serves currently as an
Communications of the ACM, 12(10):576 – 583, October 1969. assistant Professor at the MIS Institute of Kairoun, Tunisia.
[14] Shuvendu K. Lahiri, Kenneth L. McMillan, Rahul Sharma, and
Chris Hawblitzel. Differential assertion checking. In Proceedings,
ESEC/ SIGSOFT FSE, pages 345–455, 2013.
[15] J.C. Laprie. Dependability —its attributes, impairments and Jules Desharnais holds a M.S from Laval University and
means. In Predictably Dependable Computing Systems, pages
1–19. Springer Verlag, 1995. a PhD from McGill University, and serves currently on the
[16] Jean Claude Laprie. Dependability: Basic Concepts and Ter- faculty of Laval University in Quebec City, Canada.
minology: in English, French, German, Italian and Japanese.
Springer Verlag, Heidelberg, 1991.
[17] Jean Claude Laprie. Dependable computing: Concepts, chal- Marcelo Frias holds a M.S. from the University of
lenges, directions. In Proceedings, COMPSAC, 2004. Buenos Aires and a PhD from PUC in Rio de Janeiro. He
[18] Claire LeGoues, Stephanie Forrest, and Westley Weimer. Cur-
rent challenges in automatic software repair. Software Quality serves on the Faculty of ITBA in Buenos Aires, Argentina.
Journal, 21(3):421–443, 2013.
[19] Francesco Logozzo and Thomas Ball. Modular and verified
automatic program repair. In Proceedings, OOPSLA, pages 133– Ali Jaoua holds an engineering degree from ENSIIHT
146, 2012. in Toulouse, France and a PhD from the University of
[20] Francesco Logozzo, Shuvendu Lahiri, Manual Faehndrich, and Toulouse, France. He serves on the faculty of Qatar Uni-
San Blackshear. Verification modulo versions: Towards usable
verification. In Proceedings, PLDI, 2014. versity in Doha, Qatar.
[21] Asma Louhichi, Wided Ghardallou, Khaled Bsaies, Lamia Labed
Jilani, Olfa Mraihi, and Ali Mili. Verifying loops with invariant
relations. International Journal of Critical Computer Based Ali Mili holds a PhD from the University of Illinois and
Systems, 5(1/2):78–102, 2014. a Doctort d’Etat from the University of Grenoble. He is on
[22] Yu Seung Ma, Jeff Offutt, and Yong Rae Kwon. Mu java: An
automated class mutation system. Software Testing, Verification
the faculty of NJIT in Newark, NJ.
and Reliability, 15(2):97–133, June 2005.