0% found this document useful (0 votes)
28 views

Dart

DART is a tool that automatically tests software by combining three techniques: 1) It statically analyzes source code to extract a program's interface with its external environment. 2) It automatically generates a test driver that performs random testing to simulate the program's environment. 3) It dynamically analyzes how a program behaves under random testing and generates new test inputs to systematically direct execution along alternative paths. This allows testing to be performed completely automatically without needing to write test code. Preliminary experiments found DART successfully found bugs in security protocols and crashed many functions in an SIP library.

Uploaded by

22028162
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Dart

DART is a tool that automatically tests software by combining three techniques: 1) It statically analyzes source code to extract a program's interface with its external environment. 2) It automatically generates a test driver that performs random testing to simulate the program's environment. 3) It dynamically analyzes how a program behaves under random testing and generates new test inputs to systematically direct execution along alternative paths. This allows testing to be performed completely automatically without needing to write test code. Preliminary experiments found DART successfully found bugs in security protocols and crashed many functions in an SIP library.

Uploaded by

22028162
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

DART: Directed Automated Random Testing

Patrice Godefroid Nils Klarlund Koushik Sen


Bell Laboratories, Lucent Technologies Computer Science Department
{god,klarlund}@bell-labs.com University of Illinois at Urbana-Champaign
[email protected]

Abstract unit testing is so hard and expensive to perform that it is rarely done
We present a new tool, named DART, for automatically testing soft- properly. Indeed, in order to be able to execute and test a component
ware that combines three main techniques: (1) automated extrac- in isolation, one needs to write test driver/harness code to simulate
tion of the interface of a program with its external environment the environment of the component. More code is needed to test
using static source-code parsing; (2) automatic generation of a test functional correctness, for instance using assertions checking the
driver for this interface that performs random testing to simulate component’s outputs. Since writing all this testing code manually
the most general environment the program can operate in; and (3) is expensive, unit testing is often either performed very poorly or
dynamic analysis of how the program behaves under random test- skipped altogether. Moreover, subsequent phases of testing, such as
ing and automatic generation of new test inputs to direct systemati- feature, integration and system testing, are meant to test the overall
cally the execution along alternative program paths. Together, these correctness of the entire system viewed as a black-box, not to check
three techniques constitute Directed Automated Random Testing, or the corner cases where bugs causing reliability issues are typically
DART for short. The main strength of DART is thus that testing can hidden. As a consequence, many software bugs that should have
be performed completely automatically on any program that com- been caught during unit testing remain undetected until field de-
piles – there is no need to write any test driver or harness code. Dur- ployment.
ing testing, DART detects standard errors such as program crashes, In this paper, we propose a new approach that addresses the
assertion violations, and non-termination. Preliminary experiments main limitation hampering unit testing, namely the need to write
to unit test several examples of C programs are very encouraging. test driver and harness code to simulate the external environment
of a software application. We describe our tool DART, which com-
Categories and Subject Descriptors D.2.4 [Software Engineer- bines three main techniques in order to automate unit testing of
ing]: Software/Program Verification; D.2.5 [Software Engineer- software:
ing]: Testing and Debugging; F.3.1 [Logics and Meanings of Pro-
grams]: Specifying and Verifying and Reasoning about Programs 1. automated extraction of the interface of a program with its
external environment using static source-code parsing;
General Terms Verification, Algorithms, Reliability 2. automatic generation of a test driver for this interface that per-
Keywords Software Testing, Random Testing, Automated Test forms random testing to simulate the most general environment
Generation, Interfaces, Program Verification the program can operate in; and
3. dynamic analysis of how the program behaves under random
testing and automatic generation of new test inputs to direct
1. Introduction systematically the execution along alternative program paths.
Today, testing is the primary way to check the correctness of soft-
ware. Billions of dollars are spent on testing in the software indus- Together, these three techniques constitute Directed Automated
try, as testing usually accounts for about 50% of the cost of software Random Testing, or DART for short. Thus, the main strength of
development [27]. It was recently estimated that software failures DART is that testing can be performed completely automatically on
currently cost the US economy alone about $60 billion every year, any program that compiles – there is no need to write any test driver
and that improvements in software testing infrastructure might save or harness code. During testing, DART detects standard errors such
one-third of this cost [31]. as program crashes, assertion violations, and non-termination.
Among the various kinds of testing usually performed during We have implemented DART for programs written in the C pro-
the software development cycle, unit testing applies to the indi- gramming language. Preliminary experiments to unit test several
vidual components of a software system. In principle, unit testing examples of C programs are very encouraging. For instance, DART
plays an important role in ensuring overall software quality since was able to find automatically attacks in various C implementations
its role is precisely to detect errors in the component’s logic, check of a well-known flawed security protocol (Needham-Schroeder’s).
all corner cases, and provide 100% code coverage. Yet, in practice, Also, DART found hundreds of ways to crash 65% of the about 600
externally visible functions provided in the oSIP library, an open-
source implementation of the SIP protocol. These experimental re-
sults are discussed in detail in Section 4.
Permission to make digital or hard copies of all or part of this work for personal or The idea of extracting automatically interfaces of software com-
classroom use is granted without fee provided that copies are not made or distributed ponents via static analysis has been discussed before, for model-
for profit or commercial advantage and that copies bear this notice and the full citation checking purposes (e.g., [8]), reverse engineering (e.g., [37]), and
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee. compositional verification (e.g., [1]). However, we are not aware of
PLDI’05 June 12–15, 2005, Chicago, Illinois, USA. any tool like DART which combines automatic interface extraction
Copyright 2005 ACM 1-59593-056-6/05/0006...$5.00. with random testing and dynamic test generation. DART is comple-
mentary to test-management tools that take advantage of interface

213
definitions as part of programming languages, such as JUnit [20] 2. DART Overview
for Java, but do not perform automatic test generation. DART’s integration of random testing and dynamic test generation
Random testing is a simple and well-known technique (e.g., [4]), using symbolic reasoning is best intuitively explained with an ex-
which can be remarkably effective at finding software bugs [11]. ample.
Yet, it is also well-known that random testing usually provides
low code coverage (e.g., [32]). For instance, the then branch of 2.1 An Introduction to DART
the conditional statement “if (x==10) then . . . ” has only one
chance to be exercised out of 232 if x is a 32-bit integer program Consider the function h in the file below:
input that is randomly initialized. The contributions of DART com-
int f(int x) { return 2 * x; }
pared to random testing are twofold: DART makes random testing
int h(int x, int y) {
automatic by combining it with automatic interface extraction (in
if (x != y)
contrast with prior work which is API-specific, e.g., [11]), and also
if (f(x) == x + 10)
makes it much more effective in finding errors thanks to the use of
abort(); /* error */
dynamic test generation to drive the program along alternative con-
return 0;
ditional branches. For instance, the probability of taking the then
}
branch of the statement “if (x==10) then . . . ” can be viewed
as 0.5 with DART. The novel dynamic test-generation techniques The function h is defective because it may lead to an abort state-
used in DART are presented in Section 2. ment for some value of its input vector, which consists of the input
Besides testing, the other main way to check correctness during parameters x and y. Running the program with random values of
the software development cycle is code inspection. Over the last x and y is unlikely to discover the bug. The problem is typical of
few years, there has been a renewed interest in static source-code random testing: it is difficult to generate input values that will drive
analysis for building automatic code-inspection tools that are more the program through all its different execution paths.
practical and usable by the average software developer. Examples In contrast, DART is able to dynamically gather knowledge
of such tools are Prefix/Prefast [6], MC [16], Klocwork [22], and about the execution of the program in what we call a directed
Polyspace [33]. Earlier program static checkers like lint [19] usu- search. Starting with a random input, a DART-instrumented pro-
ally generate an overly large number of warnings and false alarms, gram calculates during each execution an input vector for the next
and are therefore rarely used by programmers on a regular basis. execution. This vector contains values that are the solution of sym-
The main challenge faced by the new generation of static analyz- bolic constraints gathered from predicates in branch statements dur-
ers is thus to do a better job in dealing with false alarms (warn- ing the previous execution. The new input vector attempts to force
ings that do not actually correspond to programming errors), which the execution of the program through a new path. By repeating this
arise from the inherent imprecision of static analysis. There are es- process, a directed search attempts to force the program to sweep
sentially two main approaches to this problem: either report only through all its feasible execution paths.
high-confidence warnings (at the risk of missing some actual bugs), For the example above, the DART-instrumented h initially
or report all of them (at the risk of overwhelming the user). Despite guesses the value 269167349 for x and 889801541 for y. As a
significant recent progress on techniques to separate false alarms result, h executes the then-branch of the first if-statement, but fails
from real errors (for instance, by using more precise analysis tech- to execute the then-branch of the second if-statement; thus, no
niques to eliminate false alarms, or by using statistical classification error is encountered. Intertwined with the normal execution, the
techniques to rank warnings by their severity more accurately), an- predicates x0 = y0 and 2 · x0 = x0 + 10 are formed on-the-fly
alyzing the results of static analysis to determine whether a warning according to how the conditionals evaluate; x0 and y0 are symbolic
actually corresponds to an error still involves significant human in- variables that represent the values of the memory locations of vari-
tervention. ables x and y. Note the expression 2 · x0 , representing f(x): it is
We believe DART provides an attractive alternative approach defined through an interprocedural, dynamic tracing of symbolic
to static analyzers, because it is based on high-precision dynamic expressions.
analysis instead, while being fully automated as static analysis. The The predicate sequence x0 = y0 , 2 · x0 = x0 + 10, called
main advantage of DART over static analysis is that every execu- a path constraint, represents an equivalence class of input vectors,
tion leading to an error that is found by DART is guaranteed to be namely all the input vectors that drive the program through the path
sound. Two areas where we expect DART to compete especially that was just executed. To force the program through a different
well against static analyzers are the detection of interprocedural equivalence class, the DART-instrumented h calculates a solution
bugs and of bugs that arise through the use of library functions to the path constraint x0 = y0 , 2 · x0 = x0 + 10 obtained by
(which are usually hard to reason about statically), as will be dis- negating the last predicate of the current path constraint.
cussed later in the paper. Of course, DART is overall complemen- A solution to this path constraint is (x0 = 10, y0 = 889801541)
tary to static analysis since it has its own limitations, namely the and it is recorded to a file. When the instrumented h runs again, it
computational expense of running tests and the sometimes limited reads the values of the symbolic variables that have been solved
effectiveness of dynamic test generation to improve over random from the file. In this case, the second execution then reveals the er-
testing. In any case, DART offers a new trade-off among existing ror by driving the program into the abort() statement as expected.
static and dynamic analysis techniques.
The paper is organized as follows. Section 2 presents an 2.2 Execution Model
overview of DART. Section 3 discusses implementation issues
when dealing with programs written in the C programming lan- DART runs the program P under test both concretely, executing
guage. In Section 4, experimental results are discussed. We com- the actual program with random inputs, and symbolically, calculat-
pare DART with other related work in Section 5 and conclude with ing constraints on values at memory locations expressed in terms
Section 6. of input parameters. These side-by-side executions require the pro-
gram P to be instrumented at the level of a RAM (Random Access
Memory) machine.
The memory M is a mapping from memory addresses m to,
say, 32-bit words. The notation + for mappings denotes updating;

214
for example, M := M + [m → v] is the same map as M, ex- evaluate symbolic (e, M, S) =
cept that M (m) = v. We identify symbolic variables by their match e:
addresses. Thus in an expression, m denotes either a memory ad- case m: //the symbolic variable named m
dress or the symbolic variable identified by address m, depending if m ∈ domainS then return S(m)
on the context. A symbolic expression, or just expression, e can be else return M(m)
of the form m, c (a constant), ∗(e , e ) (a dyadic term denoting case ∗(e , e ): //multiplication
multiplication), ≤ (e , e ) (a term denoting comparison), ¬(e ) (a let f’= evaluate symbolic(e , M, S);
monadic term denoting negation), ∗e (a monadic term denoting let f”= evaluate symbolic(e , M, S);
pointer dereference), etc. Thus, the symbolic variables of an ex- if not one of f  or f  is a constant c then
pression e are the set of addresses m that occur in it. Expressions all linear = 0
have no side-effects. return evaluate concrete(e, M)
The program P manipulates the memory through statements if both f  and f  are constants then
that are specially tailored abstractions of the machine instructions return evaluate concrete(e, M)
actually executed. There is a set of numbers that denote instruction if f  is a constant c then
addresses, that is, statement labels. If is the address of a statement return ∗(f  , c)
(other than abort or halt), then + 1 is guaranteed to also be an else return ∗(c, f  )
address of a statement. The initial address is 0 . A statement can be case ∗e : //pointer dereference
a conditional statement c of the form if (e) then goto  (where let f’= evaluate symbolic(e , M, S);
e is an expression over symbolic variables and  is a statement if f  is a constant c then
label), an assignment statement a of the form m ← e (where m if ∗c ∈ domainS then return S(∗c)
is a memory address), abort, corresponding to a program error, or else return M(∗c)
halt, corresponding to normal termination. else all locs definite = 0
The concrete semantics of the RAM machine instructions of P return evaluate concrete(e, M)
is reflected in evaluate concrete(e, M), which evaluates expres- etc.
sion e in context M and returns a 32-bit value for e. Addition-
ally, the function statement at( , M) specifies the next statement
Figure 1. Symbolic evaluation
to be executed. For an assignment statement, this function calcu-
lates, possibly involving address arithmetic, the address m of the  0 to itself. Expressions are evaluated symbolically as de-
m ∈ M
left-hand side, where the result is to be stored; in particular, indirect
scribed in Figure 1. When an expression falls outside the theory, as
addressing, e.g., stemming from pointers, is resolved at runtime to
in the multiplication of two non-constant sub-expressions, DART
a corresponding absolute address.1 simply falls back on the concrete value of the expression, which is
A program P defines a sequence of input addresses M  0 , the
used as the result. In such a case, we also set a flag all linear to 0,
addresses of the input parameters of P . An input vector I,  which which we use to track completeness. Another case where DART’s
associates a value to each input parameter, defines the initial value directed search is typically incomplete is when the program deref-
of M 0 and hence M.2 erences a pointer whose value depends on some input parameter; in
Let C be the set of conditional statements and A the set of this case, the flag all locs definite is set to 0 and the evaluation falls
assignment statements in P . A program execution w is a finite3 back again to the concrete value of the expression. With this eval-
sequence in Execs := (A ∪ C)∗ (abort | halt). We prefer to view uation strategy, symbolic variables of expressions in S are always
w as being of the form α1 c1 α2 c2 . . . ck αk+1 s, where αi ∈ A∗ (for contained in M  0.
1 ≤ i ≤ k + 1), ci ∈ C (for 1 ≤ i ≤ k), and s ∈ {abort, halt}. To carry out a search through the execution tree, our instru-
The concrete semantics of P at the RAM machine level allows mented program is run repeatedly. Each run (except the first) is
us to define for each input vector I an execution sequence: the result executed with the help of a record of the conditional statements
of executing P on I (the details of this semantics is not relevant executed in the previous run. For each conditional, we record a
for our purposes). Let Execs(P ) be the set of such executions branch value, which is either 1 (the then branch is taken) or 0
generated by all possible I.  By viewing each statement as a node, (the else branch is taken), as well as a done value, which is 0
Execs(P ) forms a tree, called the execution tree. Its assignment when only one branch of the conditional has executed in prior
nodes have one successor; its conditional nodes have one or two runs (with the same history up to the branch point) and is 1 oth-
successors; and its leaves are labeled abort or halt. erwise. This information associated with each conditional state-
ment of the last execution path is stored in a list variable called
2.3 Test Driver and Instrumented Program stack, kept in a file between executions. For i, 0 ≤ i < |stack|,
The goal of DART is to explore all paths in the execution tree stack[i] = (stack[i].branch, stack[i].done) is thus the record corre-
Execs(P ). To simplify the following discussion, we assume that sponding to the i + 1th conditional executed.
we are given a theorem prover that decides, say, the theory of More precisely, our test driver run DART is shown in Figure 2.
integer linear constraints. This will allow us to explain how we This driver combines random testing (the repeat loop) with directed
handle the transition from constraints within the theory to those search (the while loop). If the instrumented program throws an
that are outside. exception, then a bug has been found. The two completeness flags,
DART maintains a symbolic memory S that maps memory ad- namely all linear and all locs definite, each holds unless a “bad”
dresses to expressions. Initially, S is a mapping that maps each situation possibly leading to incompleteness has occurred. Thus, if
the directed search terminates—that is, if directed of the inner loop
1 We do this to simplify the exposition; left-hand sides could be made
no longer holds—then the outer loop also terminates provided all of
symbolic as well. the completeness flags still hold. In this case, DART terminates and
2 To simplify the presentation, we assume that M  0 is the same for all safely reports that all feasible program paths have been explored.
executions of P . But if just one of the completeness flags have been turned off at
3 We thus assume that all program executions terminate; in practice, this can some point, then the outer loop continues forever (modulo resource
be enforced by limiting the number of execution steps. constraints not shown here).

215
run DART () = compare and update stack(branch,k,stack) =
all linear, all locs definite, forcing ok = 1, 1, 1 if k < |stack| then
repeat if stack[k].branch = branch then
stack = ; I = [] ; directed = 1 forcing ok = 0
while (directed) do raise an exception
 =
try (directed, stack, I) else if k = |stack| − 1 then
 stack[k].branch = branch
instrumented program(stack, I)
stack[k].done = 1
catch any exception →
else stack = stack ^ (branch, 0)
if (forcing ok)
return stack
print “Bug found”
exit()
else forcing ok = 1 Figure 4. Compare and update stack
until all linear ∧ all locs definite
solve path constraint(ktry ,path constraint,stack) =
Figure 2. Test driver let j be the smallest number such that
for all h with −1 ≤ j < h < ktry , stack[h].done = 1
if j = −1 then
instrumented program(stack, I)  = return (0, , ) // This directed search is over
0
// Random initialization of uninitialized input parameters in M else
 path constraint[j] = neg(path constraint[j])
for each input x with I[x] undefined do
 = random() stack[j].branch= ¬stack[j].branch
I[x]
 0 and I if (path constraint[0, . . . , j] has a solution I ) then
Initialize memory M from M
return (1, stack[0..j], I + I )
// Set up symbolic memory and prepare execution
 0 ]. else
S = [m → m | m ∈ M solve path constraint(j,path constraint,stack)
= 0 // Initial program counter in P
k = 0 // Number of conditionals executed
// Now invoke P intertwined with symbolic calculations Figure 5. Solve path constraint
s = statement at( ,M)
This value is changed to 1 if the execution proceeds according to
while (s ∈ / {abort, halt}) do
all the branches in stack as checked by compare and update stack.
match (s)
If it ever happens that a prediction of the outcome of a conditional
case (m ← e):
is not fulfilled, then the flag forcing ok is set to 0 and an exception
S= S + [m → evaluate symbolic(e, M, S)]
is raised to restart run DART with a fresh random input vector.
v = evaluate concrete(e, M)
Note that setting forcing ok to 0 can only be due to a previous
M = M + [m → v]; = + 1
incompleteness in DART’s directed search, which was then (con-
case (if (e) then goto  ):
servatively) detected and resulted in setting (at least) one of the
b = evaluate concrete(e, M)
completeness flags to 0. In other words, the following invariant
c = evaluate symbolic(e, M, S)
always holds: all linear ∧ all locs definite ⇒ forcing ok.
if b then
When the original program halts, new input values are generated
path constraint = path constraint ^ c
in solve path constraint, shown in Figure 5, to attempt to force the
stack = compare and update stack(1, k,stack)
next run to execute the last4 unexplored branch of a conditional
= 
along the stack. If such a branch exists and if the path constraint
else
path constraint = path constraint ^ neg(c) that may lead to its execution has a solution I , this solution is used
stack = compare and update stack(0, k,stack) to update the mapping I to be used for the next run; values corre-
= +1 sponding to input parameters not involved in the path constraint are
k= k + 1 preserved (this update is denoted I + I ).
s =statement at( ,M) // End of while loop The main property of DART is stated in the following theorem,
if (s==abort) then which formulates (a) soundness (of error founds) and (b) a form of
raise an exception completeness.
else // s==halt T HEOREM 1. Consider a program P as defined in Section 2.2. (a)
return solve path constraint(k,path constraint,stack) If run DART prints out “Bug found” for P , then there is some input
to P that leads to an abort. (b) If run DART terminates without
Figure 3. Instrumented program printing “Bug found,” then there is no input that leads to an abort
statement in P , and all paths in Execs(P ) have been exercised. (c)
The instrumented program itself is described in Figure 3 (where Otherwise, run DART will run forever.
^ denotes list concatenation). It executes as the original pro- Proofs of (a) and (c) are immediate. The proof of (b) rests on the
gram, but with interleaved gathering of symbolic constraints. assumption that any potential incompleteness in DART’s directed
At each conditional statement, it also checks by calling com- search is (conservatively) detected and recorded by setting at least
pare and update stack, shown in Figure 4, whether the current one of the two flags all linear and all locs definite to 0.
execution path matches the one predicted at the end of the pre-
vious execution and represented in stack passed between runs. 4 A depth-first search is used for exposition, but the next branch to be forced
Specifically, our algorithm maintains the invariant that when in- could be selected using a different strategy, e.g., randomly or in a breadth-
strumented program is called, stack[|stack| − 1].done = 0 holds. first manner.

216
Since DART performs (typically partial) symbolic executions struct foo { int i; char c; }
only as generalizations of concrete executions, a key difference be- bar (struct foo *a) {
tween DART and static-analysis-based approaches to software ver- if (a->c == 0) {
ification is that any error found by DART is guaranteed to be sound *((char *)a + sizeof(int)) = 1;
(case (a) above) even when using an incomplete or wrong theory. In if (a->c != 0)
order to maximize the chances of termination in case (b) above, set- abort();
ting off completeness flags as described in evaluate symbolic could }
be done less conservatively (i.e., more accurately) using various op- }
timization techniques, for instance by distinguishing incomplete-
ness in expressions used in assignments from those used in condi- DART here treats the pointer input parameter by randomly initial-
tional statements, by refining after each conditional statement the izing it to NULL or to a single heap-allocated cell of the appropri-
constraints stored in S that are associated with symbolic variables ate type (see Section 3.2). For this example, a static analysis will
involved in the conditional, by dealing with pointer dereferences in typically not be able to report with high certainty that abort() is
a more sophisticated way, etc. reachable. Sound static analysis tools will report “the abort might
be reachable”, and unsound ones (like BLAST [18] or SLAM [2])
2.4 Example will simply report “no bug found”, because standard alias analysis
is not able to guarantee that a->c has been overwritten. In contrast,
Consider the C program: DART finds a precise execution leading to the abort very easily
int f(int x, int y) { by simply generating an input satisfying the linear constraint a->c
int z; == 0. This kind of code is often found in implementations of net-
z = y; work protocols, where a buffer of type char * (e.g., representing a
if (x == z) message) is occasionally cast into a struct (e.g., representing the
if (y == x + 10) different fields of the protocol encoded in the message) and vice
abort(); versa.
return 0; The DART approach of intertwined concrete and symbolic ex-
} ecution has two important advantages. First, any execution lead-
ing to an error detected by DART is trivially sound. Second, it al-
The input address vector is M  0 = mx , my  (where mx = my
lows us to alleviate the limitations of the constraint solver/theorem
are some memory addresses) for f’s input parameters x, y. Let prover. In particular, whenever we generate a symbolic condition
us assume that the first value for x is 123456 and that of y is at a branching statement while executing the program under test,
654321, that is, I = 123456, 654321. Then, the initial concrete and the theorem prover cannot decide whether that symbolic con-
memory becomes M = [mx → 123456, my → 654321], and dition is true or false, we simply replace this symbolic condition
the initial symbolic memory becomes S = [mx → mx , my → by its concrete value, i.e., either true or false. This allows us to
my ]. During execution from this configuration, the else branch continue both the concrete and symbolic execution in spite of the
of the outer if statement is taken and, at the time halt is encoun- limitation of the theorem prover. Note that static analysis tools us-
tered, the path constraint is ¬(mx = my ). We have k = 1, ing predicate abstraction [2, 18] will simply consider both branches
stack = (0, 0), S = [mx → mx , my → my , mz → my ], from that branching point, which may result in unsound behaviors.
M = [mx → 123456, my → 654321, mz → 654321]. The sub- A test-generation tool using symbolic execution [36], on the other
sequent call to solve path constraint results in an attempt to solve hand, will stop its symbolic execution at that point and may miss
mx = my , which leads to a solution mx → 0, my → 0. bugs appearing down the branch. To illustrate this point, consider
The updated input vector I + I is then 0, 0, the branch bit in the following C program:

stack has been flipped, and the assignment (directed, stack, I)=(1, 1 foobar(int x, int y){
(1, 0), 0, 0) is executed in run DART. During the second call of 2 if (x*x*x > 0){
instrumented program, the compare and update stack will check 3 if (x>0 && y==10)
that the actually executed branch of the outer if statement is now 4 abort();
the then branch (which it is!). Next, the else branch of the inner 5 } else {
if statement is executed. Consequently, the path constraint that is 6 if (x>0 && y==20)
now to be solved is mx = my , my = mx + 10. The run DART 7 abort();
driver then calls solve path constraint with (ktry ,path constraint, 8 }
stack)=(2, mx = my , my = mx +10, (1, 1), (0, 0)). Since this 9 }
path constraint has no solution, and since the first conditional has
already been covered (stack[0].done = 1), solve path constraint Given a theorem prover that cannot reason about non-linear arith-
returns (0, , ). In turn, run DART terminates since all complete- metic constraints, a static analysis tool using predicate abstrac-
ness flags are still set. tion [2, 18] will report that both aborts in the above code may be
reachable, hence one false alarm since the abort in line 7 is un-
2.5 Advantages of the DART approach reachable. This would be true as well if the test (x*x*x > 0) is
Despite the limited completeness of DART when based on linear replaced by a library call or if it was dependent on a configuration
integer constraints, dynamic analysis often has an advantage over parameter read from a file. On the other hand, a test-generation tool
static analysis when reasoning about dynamic data. For example, based on symbolic execution [36] will not be able to generate an in-
to determine if two pointers point to the same memory location, put vector to detect any abort because its symbolic execution will
DART simply checks whether their values are equal and does not be stuck at the branching point in line 2. In contrast, DART can
require alias analysis. Consider the C program: generate randomly an input vector where x>0 and y!=10 with al-
most 0.5 probability; after the first execution with such an input, the
directed search of DART will generate another input with the same
positive value of x but with y==10, which will lead the program in
its second run to the abort at line 4. Note that, if DART randomly

217
generates a negative value for x in the first run, then DART will /* initially, */
generate in the next run inputs where x>0 and y==20 to satisfy the int is_room_hot=0; /* room is not hot */
other branch at line 7 (it will do so because no constraint is gen- int is_door_closed=0; /* and door is open */
erated for the branching statement in line 2 since it is non-linear); int ac=0; /* so, ac is off */
however, due to the concrete execution, DART will then not take
the else branch at line 6 in such a second run. In summary, our void ac_controller(int message) {
mixed strategy of random and directed search along with simulta- if (message == 0) is_room_hot=1;
neous concrete and symbolic execution of the program will allow if (message == 1) is_room_hot=0;
us to find the only reachable abort statement in the above example if (message == 2) {
with high probability. is_door_closed=0;
ac=0;
}
3. DART for C if (message == 3) {
We now discuss how to implement the algorithms presented in the is_door_closed=1;
previous section for testing programs written in the C programming if (is_room_hot) ac=1;
language. }
if (is_room_hot && is_door_closed && !ac)
3.1 Interface Extraction abort(); /* check correctness */
Given a program to test, DART first identifies the external inter- }
faces through which the program can obtain inputs via uninitialized
memory locations M  0 . In the context of C, we define the external Figure 6. AC-controller example (C code)
interfaces of a C program as
• its external variables and external functions (reported as “un- • Library functions are functions not defined in the program but
defined reference” at the time of compilation of the program), controlled by the program, and hence considered as part of it.
and Examples of such functions are operating-system functions and
• the arguments of a user-specified toplevel function, which is a functions defined in the standard C library. These functions are
function of the program called to start its execution. treated as unknown but deterministic “black-boxes” which we
The main advantage of this definition is that the external interfaces cannot instrument or analyze.
of a C program can be easily determined and instrumented by a
The ability of DART to handle deterministic but unknown (and
light-weight static parsing of the program’s source code. Inputs to
arbitrarily complex) library functions by simply executing these
a C program are defined as memory locations which are dynami-
makes it unique compared to standard symbolic-execution based
cally initialized at runtime through the static external interface. This
frameworks, as discussed in Section 2.4. In practice, the user can
allows us to handle inputs which are dynamic in nature, such as
adjust the boundary between library and external functions to sim-
lists and trees, in a uniform way. Considering inputs as uninitialized
ulate desired effects. For instance, errors in system calls can easily
runtime memory locations, instead of syntactic objects exclusively
be simulated by considering the corresponding system functions as
such as program variables, also allows us to avoid expensive or im-
external functions instead of library functions.
precise alias analyses, which form the basis of many static analysis
tools.
Note that the (simplified) formalization of Section 2.2 assumed 3.2 Generation of Random Test Driver
that the input addresses M  0 are the same for all executions of Once the external interfaces of the C program are identified, we
program P . However, our implementation of DART supports a generate a nondeterministic/random test driver simulating the most
more general model where multiple inputs can be mapped to a general environment visible to the program at its interfaces. This
same address m when these are obtained by successively reading test driver is itself a C program, which performs the random initial-
m during different successive calls to the toplevel function, as ization abstractly described at the beginning of the function instru-
will be discussed later, as well as the possibility of a same input mented program() in Section 2, and which is defined as follows:
being mapped to different addresses in different executions, for
instance when the input is provided through an address dynamically • The test driver consists of a function main which initializes all
allocated with malloc(). external variables and all arguments of the toplevel function
For each external interface, we determine the type of the input with random values by calling the function random init de-
that can be passed to the program via that interface. In C, a type fined below, and then calls the application’s toplevel function.
is defined recursively as either a basic type (int, float, char, enum, The user of DART specifies (using the parameter depth) the
etc.), a struct type composed of one or more fields of other types, number of times the toplevel function is to be called iteratively
an array of another type, or a pointer to another type. in a single run.
Figure 6 shows a simple example of C program simulating a • The test driver also contains code simulating each external
controller for an air-conditioning (AC) system. The toplevel func- function in such a way that, whenever an external function is
tion is ac controller, and the external interface is simply its ar- called during the program execution, a random value of the
gument message, of basic type int. function’s return type is returned by the simulated function.
It is worth emphasizing that we distinguish three kinds of C
For example, Figure 7 shows the test driver generated for the
functions in this work.
AC-controller example of Figure 6.
• Program functions are functions defined in the program. The initialization of memory locations controlled by the exter-
• External functions are functions controlled by the environment nal interface is performed using the procedure random init shown
and hence part of the external interface of the program; they can in Figure 8. This procedure takes as arguments a memory location
nondeterministically return any value of their specified return m and the type of the value to be stored at m, and initializes ran-
type. domly the location m depending on its type. If m stores a value of

218
void main() { Once the test driver has been generated, it can be combined with the
for (i=0; i < depth ; i++) { C program being tested to form a self-executable program, which
int tmp; can be compiled and executed automatically.
random_init(&tmp,int);
ac_controller(tmp); 3.3 Implementation of Directed Search
} A directed search can be implemented using a dynamic instrumen-
} tation as explained in Section 2. The main challenge when deal-
ing with C is to handle all the possible types that C allows, as
Figure 7. Test driver generated for the AC-controller example (C well as generate and manipulate symbolic constraints, especially
code) across function boundaries (i.e., tracking inputs through function
calls when a variable whose value depends on an input is passed as
argument to another program function). This is tedious (because of
random_init(m,type) { the complexity of C) but conceptually not very hard.
if (type == pointer to type2) { In our implementation of DART for C, the code instrumentation
if (fair coin toss == head) { needed to intertwine the concrete execution of the program P
*m = NULL; with the symbolic calculations performed by DART as described
} else { in function instrumented program() (see Section 2) is performed
*m = malloc(sizeof(type)); using CIL [28], an OCAML application for parsing and analyzing
random_init(*m,type2); C code. The constraint solver used by default in our implementation
} is lp solve [26], which can solve efficiently any linear constraint
} else if (type == struct) { using real and integer programming techniques.
for all fields f in struct
random_init(&(m->f),typeof(f)); 3.4 Additional Remarks
} else if (type == array[n] of type3){ For the sake of modeling “realistic” external environments, we have
for (int i=0;i<n;i++) assumed in this work that the execution of external functions do
random_init((m+i),type3); not have any side effects on (i.e., do not change the value of) any
} else if (type == basic type) { previously-defined stack or heap allocated program variable, in-
*m = random_bits(sizeof(type)); cluding those passed as arguments to the function. For instance,
} an external function returning a pointer to an int can only return
} NULL or a pointer to a newly allocated int, not a pointer to a pre-
viously allocated int. Note that this assumption does not restrict
Figure 8. Procedure for randomly initializing C variables of any generality: external functions with side effects or returning previ-
type (in pseudo-C) ously defined heap-allocated objects can be simulated by adding
interface code between the program and its environment.
Another assumption we made is that all program variables (i.e.,
basic type, its value *m5 is initialized with the auxiliary procedure
all those not controlled by the environment) are properly initialized.
random bits which returns n random bits where n is its argument.
Detecting uninitialized program variables can be done using other
If its type is a pointer, the value of location m is randomly initial-
analyzes and tools, either statically (e.g., with lint [19]) or dy-
ized with either the value NULL (with a 0.5 probability) or with
namically (e.g., with Purify [17]) or both (e.g., with CCured [29]).
the address of newly allocated memory location, whose value is in
Instead of using a static definition of interface for C programs
turn initialized according to its type following the same recursive
as done above in this section, we could have used a dynamic defi-
rules. If type is a struct or an array, every sub-element is initial-
nition, such as considering any uninitialized variable (memory lo-
ized recursively in the same way. Note that, when inputs are data
cation) read by the program as an input. In general, detecting in-
structures defined with a recursive type (such as lists), this general
puts with such a loose definition can only be done dynamically,
procedure can thus generate data structures of unbounded sizes.
using a dynamic program instrumentation similar to one for de-
For each external variable or argument to the toplevel function,
tecting uninitialized variables. Such instrumentations require a pre-
say v, DART generates a call to random init(&v,typeof(v))
cise, hence expensive, tracking of memory accesses. Discovering
in the function main of the test driver before calling the toplevel
and simulating external functions on-the-fly is also challenging. It
function. For instance, in the case of the AC-controller pro-
would be worth exploring further how to deal with dynamic inter-
gram, the variable message forming the external interface is
face definitions.
of type int, and therefore the corresponding initialization code
random init(&tmp,int)6 is generated (see Figure 7).
Similarly, if the C program being tested can call an external 4. Experimental Evaluation
function, say return type some fun(), then the test driver gen- In this section, we present the results of several experiments per-
erated by DART will include a definition for this function, which is formed with DART. We first compare the efficiency of a purely
as follows: random search with a directed search using two program examples.
We then discuss the application of DART on a larger application.
return_type some fun(){ All experiments were performed on a Pentium III 800Mhz proces-
return_type tmp; sor running Linux. Runtime is user+system time as reported by the
random init(&tmp,return type); Unix time command and is always roughly equal to elapsed time.
return tmp;
} 4.1 AC-controller Example
Our first benchmark is the AC-controller program of Figure 6. If
5 In C, *m denotes the value stored at m. we set the depth to 1, the program does not have any execution
6 In C, &v gives the memory location of the variable v. leading to an assertion violation. For this example, a directed search

219
explores all execution paths upto that depth in 6 iterations and less depth error? Random search Directed search
than a second. In contrast, a random search would thus runs forever 1 no - 69 runs (<1 second)
without detecting any errors. If we set the depth to 2, there is an 2 yes - 664 runs (2 seconds)
assertion violation if the first input value is 3 and the second input
value is 0. This scenario is found by the directed search in DART
Figure 9. Results for Needham-Schroeder protocol with a possi-
in 7 iterations and less than a second. In contrast, a random search
bilistic intruder model
does not find the assertion violation after hours of search. Indeed,
if message is a 32-bit integer, the probability for a random search
to find the specific combination of inputs leading to this assertion depth error? Iterations (runtime)
violation is one out of 232 × 232 = 264 , i.e., virtually zero in 1 no 5 runs (<1 second)
practice! 2 no 85 runs (<1 seconds)
This explains why a directed search usually provides much 3 no 6,260 runs (22 seconds)
better code coverage than a simple random search. Indeed, most 4 yes 328,459 runs (18 minutes)
applications contain input-filtering code that performs basic sanity
checks on the inputs and discards the bad or irrelevant ones. Only
inputs that satisfy these filtering tests are then passed to the core Figure 10. Results for Needham-Schroeder protocol with a Dolev-
application and can influence its behavior. For instance, in the Yao intruder model
AC-controller program, only values 0 to 3 are meaningful inputs
detailed than the protocol description analyzed in [24]. The C pro-
while all others are ignored; the directed mode is crucial to identify
gram simulates the behavior of both the initiator A and responder
(iteratively) those meaningful input values.
B according to the protocol rules. It can be executed as a single
It is worth observing how a directed search can learn through
Unix process simulating the interleaved behavior of both protocol
trial and error how to generate inputs that satisfy such filtering
entities. It also contains an assertion that is violated whenever an
tests. Each way to pass these tests corresponds to an execution path
attack to the protocol occurs.8 . In the C program, agent identifiers,
through the input-filtering code that leads to the core application
keys, addresses and nounces are all represented by integers. The
code. Every such path will eventually be discovered by the directed
program takes as inputs tuples of integer values representing in-
search provided it can reason about all the constraints along the
coming messages.
path. When this happens, the directed search will reach and start
Results of experiments are presented in Figure 9. When at most
exercizing (in the same smart way) the core application code. In
one (depth is 1) message is sent to the initiator or responder,
contrast, a purely random search will typically be stuck forever in
there is no program execution leading to an assertion violation.
the input-filtering code and will never exercize the code of the core
The table indicates how many iterations (runs) of the program are
application.
needed by DART’s directed search to reach this conclusion. This
4.2 Needham-Schroeder Protocol number thus represents all possible execution paths of this protocol
implementation when executed once. When two input messages
Our second benchmark example is a C implementation of the are allowed, DART finds an assertion violation in 664 iterations
Needham-Schroeder public key authentication protocol [30]. This or about two seconds of search. In contrast, a random search is not
protocol aims at providing mutual authentication, so that two par- able to find any assertion violations after many hours of search.
ties can verify each other’s identity before engaging in a transac- An examination of the program execution leading to this asser-
tion. The protocol involves a sequence of message exchanges be- tion violation reveals that DART only finds part of Lowe’s attack: it
tween an initiator, a responder, and a mutually-trusted key server. finds the projection of the attack from B’s point of view, i.e., steps
The exact details of the protocol are not necessary for the dis- 2 and 6 above. In other words, DART finds that, when placed in
cussion that follows and are omitted here. An attack against the its most general environment, there exists a sequence of two input
original protocol involving six message exchanges was reported by messages that drives this code to an assertion violation. However,
Lowe in [24]: an intruder I is able to impersonate an initiator A the most general environment, which can generate any valid input
to set up a false session with responder B, while B thinks he is at any time, is too powerful to model a realistic intruder I: for in-
talking to A. The steps of Lowe’s attack are as follows: stance, given a conditional statement of the form if (input ==
1. A → I : {Na , A}Ki (A starts a normal session with I by my secret) then . . . , DART can set the value of this input to
sending it a nonce Na and its name A, both encrypted with I’s my secret to direct its testing, which is as powerful as being able
public key Ki ) to guess encryption keys hard-coded in the program. (Such a most
2. I(A) → B : {Na , A}Kb (the intruder I impersonates A to powerful intruder model is sometimes called a possibilistic attacker
try to establish a false session with B) model in the literature.)
3. B → I(A) : {Na , Nb }Ka (B responds by selecting a new To find the complete Lowe’s attack, it is necessary to use a more
nonce Nb and trying to return it with Na to A) constrained model of the environment that models more precisely
4. I → A : {Na , Nb }Ka (I simply forwards B’s last message the capabilities of the intruder I, namely I’s ability to only decrypt
to A; note that I does not know how to decrypt B’s message to messages encrypted with its own key Ki , to compose messages
A since it is encrypted with A’s key Ka ) with only nonces it already knows, and to forward only messages it
5. A → I : {Nb }Ki (A decrypts the last message to obtain Nb has previously seen. (Such a model is called a Dolev-Yao attacker
and returns it to I) model in the security literature.) We then augmented the original
6. I(A) → B : {Nb }Kb (I can then decrypt this message to code with such a model of I. We quickly discovered that there are
obtain Nb and returns it to B; after receiving this message, B many ways to model I and that each variant can have a significant
believes that A has correctly established a session with it) impact on the size of the resulting search space.
Figure 10 presents the results obtained with one of these models,
The C implementation of the Needham-Schroeder protocol we which is at least as unconstrained as the original intruder model
considered7 is described by about 400 lines of C code and is more of [24] yet results in the smallest state space we could get. In this

7 We thank John Havlicek for providing us this implementation. 8A C assertion violation (as defined in <assert.h>) triggers an abort().

220
version, the intruder model acts as an input filter for entities A and 4.3 A Larger Application: oSIP
B. As shown in the Figure, the shortest sequence of inputs leading In order to evaluate further the effectiveness and scalability of
to an assertion violation is of length 4 and DART takes about 18 DART, we applied it to test a large application of industrial rele-
minutes of search to find it. This time, the corresponding execution vance: oSIP, an open-source implementation of the Session Initia-
trace corresponds to the full Lowe’s attack: tion Protocol. SIP is a telephony protocol for call-establishment of
multi-media sessions over IP networks (including Voice-over-IP).
• (After no specific input) A sends its first message as in Step 1 oSIP is a C library available at
(depth 1).
• B receives an input and sends an output as in Step 3 (depth 2). http://www.gnu.org/software/osip/osip.html.
• A receives an input and sends an output as in Step 5 (depth 3). The oSIP library (version 2.0.9) consists of about 30,000 lines of
• B receives an input as in Step 6, which then triggers an assertion C code describing about 600 externally visible functions which can
violation (depth 4). be used by higher-level applications. Two typical such applications
are SIP clients (such as softphones to make calls over the internet
Note that, since the initiator I is modeled as an input filter, Steps 2 from a PC) and servers (to route internet calls).
and 4 are not represented explicitly by additional messages. Our experimental setup was as follows. Since there is very little
The original code we started with contains a flag which, if documentation on the API provided by the oSIP library other than
turned on, implements Lowe’s fix to the Needham-Schroeder pro- the code itself, we considered one-by-one each of the about 600
tocol [25]. By curiosity, we also tested this version with DART and, externally visible functions as the toplevel function that DART
to our surprise, DART found again an assertion violation after about calls. These function names were automatically extracted from
22 minutes of search! After examining the error trace produced by the library using scripts. For each toplevel function, the inputs
DART, we discovered that the implementation of Lowe’s fix was controlled by DART were the arguments of the function, and the
incomplete. We contacted the author of the original code and he search was limited to a maximum of 1,000 iterations (runs). In other
confirmed this was a bug he was not aware of. After fixing the code, words, if DART did not find any errors after 1,000 runs, the script
DART was no longer able to find any assertion violation. would then move on to the next toplevel function, and so on. Since
It is interesting to compare these results with the ones re- the oSIP code does not contain assertions, the search was limited
ported in [13] where the same C implementation of the Needham- to finding segmentation faults (crashes) and non-termination.9
Schroeder protocol was analyzed using state-space exploration The results obtained with DART were surprising to us: DART
techniques. Specifically, [13] studied the exploration of the (very found hundreds of ways to crash externally visible oSIP functions.
large) state space formed by the product of this C implementation In fact, DART found a way to crash 65% of the oSIP functions
in conjunction with a nondeterministic C model of the intruder. within 1,000 attempts for each function. A closer analysis of the
The tool VeriSoft [12] was used to explore the product of these results revealed that most of these crashes share the same basic
two interacting Unix processes. Several search techniques were pattern: an oSIP function takes as argument a pointer to a data
experimented with. To summarize the results of [13], neither a sys- structure and then de-references later that pointer without checking
tematic search nor a random search through that state space were first whether the pointer is non-NULL. It is worth noticing that
able to detect the attack (within 8 hours of search). But a random some oSIP functions do contain code to test for NULL pointers, but
search guided using application-independent heuristics (essentially most do not perform such tests consistently (i.e., for all execution
maximizing the number of messages exchanged between the two paths), and the the documentation does not distinguish the former
processes) was able to find the attack after 50 minutes of search on category of functions from the latter. Also note that a simple visual
average, on a comparable machine. So far, we have not explored code inspection would have revealed most of these problems.
the use of heuristics in the context of DART. Because DART reported so many errors and because of the
Because the intruder model in the implementations of the lack of a specification for the oSIP API, it is hard to evaluate how
Needham-Schroeder protocol considered here and in [13] are dif- severe these problems really are. Perhaps the implicit assumption
ferent, a direct comparison between our results and the results for higher-level applications is that they must always pass non-
of [13] is not possible. Yet, DART was able to find Lowe’s attack NULL pointers to the oSIP library, but then it is troubling to see
using a systematic search (and to discover a previously-unknown that some of the oSIP functions do check their arguments for NULL
bug in the implementation of Lowe’s fix), while the experimental pointers. All we can conclude with good certainty is that, from
setup of [13] was not. This performance difference can perhaps be the point of view of a higher-level application developer, there
explained intuitively as follows. A standard model checking ap- are many ways to misuse the API, and that programming errors
proach as taken in [13] (at a protocol implementation level) and in higher-level code (such as mistakenly passing a NULL pointer
also in [24] (at a more abstract protocol specification level) repre- to some unguarded oSIP function) could result in dramatic failures
sents the program’s environment (here the intruder) by a nondeter- (crashes).
ministic process that blindly guesses possible sequences of inputs Overwhelmed by the large number of potential problems re-
(attacks), and then checks the effect of these on the program by ported by DART, we decided to focus on the oSIP functions called
performing state-space exploration. In contrast, a directed search in a test driver provided with the oSIP library, and to analyze in
as implemented in DART does not treat the program under test detail the results obtained for these functions. In the process, we
as a black-box. Instead, the directed search attempts to partition discovered what appears to be a significant security vulnerability
iteratively the program’s input space into equivalence classes, and in oSIP: we found an externally controllable way to crash the oSIP
generate new inputs in order to exhibit new program responses – in- parser. Specifically, the attack is as follows:
puts that trigger a previously considered program behavior are not • Build an (ASCII) SIP message containing no NULL (zero) or
generated and re-tested over and over again. In this sense, a directed “|” characters, and of more than 2.5 Megabytes (for a cygwin
search can be viewed as a more white-box approach than traditional environment – the size may vary on other platforms).
model checking since the observation of how the program reacts
to specific inputs is used to generate the next test inputs. Since a 9 Non-termination is reported by DART after a timer expiration triggered
directed search exploits more information about the program being when the program under test does not call any DART instrumentation within
tested, it is not surprising that it can (and should) be more effective. a specific time delay.

221
• Pass it to oSIP parser using the oSIP function path. In contrast, DART attempts to cover all executable program
“osip message parse”. paths, in a style similar to systematic testing and model checking
• One of the first thing this function does is to copy this packet in (e.g., [12]). It therefore does not use branch/predicate classification
stack space using the system call alloca(size). This system techniques as in [23, 15]. Also, prior work on dynamic test gener-
call returns a pointer to size bytes of uninitialized local stack ation does not deal with functions calls, unknown code segments
space, or NULL if the allocation failed. Since 2.5 Megabytes (such as library functions), how to check at run-time whether pre-
is larger than the standard stack space available for cygwin dictions about new test inputs are matched in the next run, and does
processes, an error is reported and NULL is returned. not discuss completeness. Finally, to the best of our knowledge, dy-
• The oSIP code does not check success/failure of the call to namic test generation has never been implemented previously for a
alloca, and pass the pointer blindly to another oSIP function, full-fledged programming language like C nor applied to large ex-
which does not check this input argument and then crashes amples like the Needham-Schroeder protocol and the oSIP library.
because of the NULL pointer value. DART is also more loosely related to the following work.
QuickCheck [7] is a tool for random testing of Haskell programs
By modifying the test driver that comes with oSIP and generat- which supports a test specification language where the user can
ing an input SIP message that satisfies these constraints, we were assign probabilities to inputs. Korat [5] is a tool that can analyze
able to confirm this attack. This is a potentially very serious flaw a Java method’s precondition on its input and automatically gener-
in oSIP: it could be possible to kill remotely any SIP client or ate all possible non-isomorphic inputs up to a given (small) size.
server relying on the oSIP library for parsing SIP messages by sim- Continuous testing [34] uses free cycles on a developer’s machine
ply sending it a message satisfying the simple properties described to continuously run regression tests in the background, providing
above! However, we do not know whether existing SIP clients or feedback about test failures as source code is edited. Random in-
servers (i.e., higher-level applications) built using oSIP are vulner- terpretation [14] is an approximate form of abstract interpretation
able to this attack. Note that, as of version 2.2.0 of the oSIP library where code fragments are interpreted over a probabilistic abstract
(December 2004), this code has been fixed (see comments in the domain and their abstract execution sampled via random testing.
ChangeLog file).
6. Conclusions
5. Other Related Work With DART, we have turned the conventional stance on the role of
Automatically closing an open reactive program with its most gen- symbolic evaluation upside-down: symbolic reasoning is an adjunct
eral environment to make it self-executable and to systematically to real execution. Randomization helps us where automated rea-
explore all its possible behaviors was already proposed in [8]. How- soning is impossible or difficult. For example, when we encounter
ever, the approach taken there is to use static analysis and code ’malloc’s we use randomization to guess the result of the alloca-
transformation in order to eliminate the external interface of the tion. Thus symbolic execution degrades gracefully in the sense that
open program and to replace with nondeterministic statements all randomization takes over, by suggesting concrete values, when au-
conditional statements whose outcome may depend on an input tomated reasoning fails to suggest how to proceed.
value. The resulting closed program is a simplified version (abstrac- DART’s ability to execute and test any program that compiles
tion) of the original open program that is guaranteed to simulate all without writing any test driver/harness code is a new powerful
its possible behaviors. In comparison, DART is more precise both paradigm, we believe. Running a program for the first time usu-
because it does not abstract the program under test and because ally brings interesting feedback (detects bugs), and DART makes
it does not suffer from the inherent imprecision of static analysis. this step almost effortless. We wrote “almost” because in prac-
The article [35] explores how to achieve the same goal as [8] by tice, the user is still responsible for defining what a suitable self-
partitioning the program’s input domain using static analysis. Be- contained unit is: it makes little sense to test in isolation functions
cause [35] does not rely on abstraction (program simplifications), that are tightly coupled; instead, DART should be applied to pro-
it can be more precise than [8], but it still suffers from the cost and gram/application interfaces where pretty much any input can be ex-
imprecision of static analysis compared to DART. pected and should be dealt with. The user can also restrict the most
There is a rich literature on test-vector generation using sym- general environment or test for functional correctness by adding
bolic execution (e.g., see [21, 27, 10, 3, 36, 38, 9]). Symbolic ex- interface code to the program in order to filter inputs (i.e., enforce
ecution is limited in practice by the imprecision of static analysis pre-conditions) and analyze outputs (i.e., test post-conditions). We
and of theorem provers. As illustrated by the examples in Section 2, plan to explore how to effectively present to the user the interface
DART is able to alleviate some of the limitations of symbolic exe- identified by DART and let him/her specify constraints on inputs or
cution by exploiting dynamic information obtained from a concrete outputs in a modular way.
execution matching the symbolic constraints, by using dynamic test
generation, and by instrumenting the program to check whether the Acknowledgments
input values generated next have the expected effect on the pro-
gram. The ability of DART to handle complex or unknown code We thank Dennis Dams, Cormac Flanagan, Alan Jeffrey, Rupak
segments (including library functions) as black-boxes by simply Majumdar, Darko Marinov, Kedar Namjoshi and Vic Zandy for
executing these makes it unique compared to standard symbolic- helpful comments on this work. We are grateful to the anonymous
execution based frameworks, which require some knowledge (min- reviewers for their comments on a preliminary version of this paper.
imally regarding termination) about all program parts or are incon- This work was funded in part by NSF CCR-0341658. The work
clusive otherwise. Since most C code usually contains a system or of Koushik Sen was done mostly while visiting Bell Laboratories,
library call every 10 lines or so on average, this distinguishing fea- and we thank his advisor, Gul Agha, for the additional time and
ture of DART is a significant practical advantage. resources needed to complete this work.
The directed search algorithm introduced in Section 2 is closely
related to prior work on dynamic test generation (e.g., [23, 15]).
The algorithms discussed in these papers generate test inputs to
exercise a specific program path or branch (to determine if its exe-
cution is feasible), starting with some (possibly random) execution

222
References [18] T. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy Abstraction.
[1] R. Alur, P. Cerny, G. Gupta, P. Madhusudan, W. Nam, and A. Sri- In Proceedings of the 29th ACM Symposium on Principles of
vastava. Synthesis of Interface Specifications for Java Classes. In Programming Languages, pages 58–70, Portland, January 2002.
Proceedings of POPL’05 (32nd ACM Symposium on Principles of [19] S. Johnson. Lint, a C program checker, 1978. Unix Programmer’s
Programming Languages), Long Beach, January 2005. Manual, AT&T Bell Laboratories.
[2] T. Ball and S. Rajamani. The SLAM Toolkit. In Proceedings [20] Junit. web page: http://www.junit.org/.
of CAV’2001 (13th Conference on Computer Aided Verification), [21] J. C. King. Symbolic Execution and Program Testing. Communica-
volume 2102 of Lecture Notes in Computer Science, pages 260–264, tions of the ACM, 19(7):385–394, 1976.
Paris, July 2001. Springer-Verlag. [22] Klocwork. web page: http://klocwork.com/index.asp.
[3] D. Beyer, A. J. Chlipala, T. A. Henzinger, R. Jhala, and R. Majumdar. [23] B. Korel. A dynamic Approach of Test Data Generation. In IEEE
Generating Test from Counterexamples. In Proceedings of ICSE’2004 Conference on Software Maintenance, pages 311–317, San Diego,
(26th International Conference on Software Engineering). ACM, May November 1990.
2004.
[24] G. Lowe. An Attack on the Needham-Schroeder Public-Key
[4] D. Bird and C. Munoz. Automatic Generation of Random Self- Authentication Protocol. Information Processing Letters, 1995.
Checking Test Cases. IBM Systems Journal, 22(3):229–245, 1983. [25] G. Lowe. Breaking and Fixing the Needham-Schroeder Public-
[5] C. Boyapati, S. Khurshid, and D. Marinov. Korat: Automated Key Protocol using FDR. In Proceedings of TACAS’1996 ((Second
testing based on Java predicates. In Proceedings of ISSTA’2002 International Workshop on Tools and Algorithms for the Construction
(International Symposium on Software Testing and Analysis), pages and Analysis of Systems), volume 1055 of Lecture Notes in Computer
123–133, 2002. Science, pages 147–166. Springer-Verlag, 1996.
[6] W. Bush, J. Pincus, and D. Sielaff. A static analyzer for finding [26] lp solve. web page: http://groups.yahoo.com/group/lp solve/.
dynamic programming errors. Software Practice and Experience, [27] G. J. Myers. The Art of Software Testing. Wiley, 1979.
30(7):775–802, 2000.
[28] G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer. CIL:
[7] K. Claessen and J. Hughes. QuickCheck: A Lightweight Tool for Intermediate Language and Tools for Analysis and transformation of
Random Testing of Haskell Programs. In Proceedings of ICFP’2000, C Programs. In Proceedings of Conference on compiler Construction,
2000. pages 213–228, 2002.
[8] C. Colby, P. Godefroid, and L. J. Jagadeesan. Automatically [29] G. C. Necula, S. McPeak, and W. Weimer. CCured: Type-Safe
Closing Open Reactive Programs. In Proceedings of PLDI’98 (1998 Retrofitting of Legacy Code. In Proceedings of POPL’02 (29th
ACM SIGPLAN Conference on Programming Language Design and
ACM Symposium on Principles of Programming Languages), pages
Implementation), pages 345–357, Montreal, June 1998. ACM Press. 128–139, Portland, January 2002.
[9] C. Csallner and Y. Smaragdakis. Check’n Crash: Combining
[30] R. Needham and M. Schroeder. Using Encryption for Authentication
Static Checking and Testing. In Proceedings of ICSE’2005 (27th in Large Networks of Computers. Communications of the ACM,
International Conference on Software Engineering). ACM, May 21(12):993–999, 1978.
2005.
[31] The economic impacts of inadequate infrastructure for software
[10] J. Edvardsson. A Survey on Automatic Test Data Generation. testing. National Institute of Standards and technology, Planning
In Proceedings of the 2nd Conference on Computer Science and Report 02-3, May 2002.
Engineering, pages 21–28, Linkoping, October 1999.
[32] J. Offut and J. Hayes. A Semantic Model of Program Faults. In
[11] J. E. Forrester and B. P. Miller. An Empirical Study of the Robustness Proceedings of ISSTA’96 (International Symposium on Software
of Windows NT Applications Using Random Testing. In Proceedings
Testing and Analysis), pages 195–200, San Diego, January 1996.
of the 4th USENIX Windows System Symposium, Seattle, August
2000. [33] Polyspace. web page: http://www.polyspace.com.
[12] P. Godefroid. Model Checking for Programming Languages using [34] D. Saff and M. D. Ernst. Continuous testing in Eclipse. In
VeriSoft. In Proceedings of POPL’97 (24th ACM Symposium on Proceedings of 2nd Eclipse Technology Exchange Workshop (eTX),
Principles of Programming Languages), pages 174–186, Paris, Barcelona, March 2004.
January 1997. [35] S. D. Stoller. Domain Partitioning for Open Reactive Programs. In
[13] P. Godefroid and S. Khurshid. Exploring Very Large State Spaces Proceedings of ACM SIGSOFT ISSTA’02 (International Symposium
Using Genetic Algorithms. In Proceedings of TACAS’2002 (8th on Software Testing and Analysis), 200.
Conference on Tools and Algorithms for the Construction and [36] W. Visser, C. Pasareanu, and S. Khurshid. Test Input Generation
Analysis of Systems), Grenoble, April 2002. with Java PathFinder. In Proceedings of ACM SIGSOFT ISSTA’04
[14] S. Gulwani and G. C. Necula. Precise Interprocedural Analysis using (International Symposium on Software Testing and Analysis), Boston,
Random Interpretation. In To appear in Proceedings of POPL’05 July 2004.
(32nd ACM Symposium on Principles of Programming Languages), [37] J. Whaley, M. C. Martin, and M. S. Lam. Automatic Extraction
Long Beach, January 2005. of Object-Oriented Component Interfaces. In Proceedings of ACM
[15] N. Gupta, A. P. Mathur, and M. L. Soffa. Generating test data for SIGSOFT ISSTA’02 (International Symposium on Software Testing
branch coverage. In Proceedings of the 15th IEEE International and Analysis), 2002.
Conference on Automated Software Engineering, pages 219–227, [38] T. Xie, D. Marinov, W. Schulte, and D. Notkin. Symstra: A framework
September 2000. for generating object-oriented unit tests using symbolic execution. In
[16] S. Hallem, B. Chelf, Y. Xie, and D. Engler. A System and Language Proceedings of TACAS’05 (11th Conference on Tools and Algorithms
for Building System-Specific Static Analyses. In Proceedings for the Construction and Analysis of Systems), volume 3440 of LNCS,
of PLDI’02 (2002 ACM SIGPLAN Conference on Programming pages 365–381. Springer, 2005.
Language Design and Implementation), pages 69–82, 2002.
[17] R. Hastings and B. Joyce. Purify: Fast Detection of Memory Leaks
and Access Errors. In Proceedings of the Usenix Winter 1992
technical Conference, pages 125–138, Berkeley, January 1992.

223

You might also like