Practical Guide To Combinatorial Testing
Practical Guide To Combinatorial Testing
INFORMATION SECURITY
October, 2010
The Information Technology Laboratory (ITL) at the National Institute of Standards and
Technology (NIST) promotes the U.S. economy and public welfare by providing technical
leadership for the Nation’s measurement and standards infrastructure. ITL develops tests,
test methods, reference data, proof of concept implementations, and technical analyses to
advance the development and productive use of information technology. ITL’s
responsibilities include the development of technical, physical, administrative, and
management standards and guidelines for the cost-effective security and privacy of
sensitive unclassified information in Federal computer systems. This Special Publication
800-series reports on ITL’s research, guidance, and outreach efforts in computer security,
and its collaborative activities with industry, government, and academic organizations.
ii
Practical Combinatorial Testing
_______________________________________________________
Note to Readers
This document is a publication of the National Institute of Standards and Technology
(NIST) and is not subject to U.S. copyright. Certain commercial entities, equipment, or
materials may be identified in this document in order to describe an experimental procedure
or concept adequately. Such identification is not intended to imply recommendation or
endorsement by the National Institute of Standards and Technology, nor is it intended to
imply that the entities, materials, or equipment are necessarily the best available for the
purpose.
For questions or comments on this document, contact Rick Kuhn, [email protected] or 301-
975-3337.
Acknowledgements
Special thanks are due to Tim Grance, Jim Higdon, Eduardo Miranda, and Tom Wissink
for early support and evangelism of this work, and especially Jim Lawrence who has been
an integral part of the team since the beginning. We have benefitted tremendously from
interactions with researchers and practitioners including Renee Bryce, Myra Cohen,
Charles Colbourn, Mike Ellims, Vincent Hu, Justin Hunter, Aditya Mathur, Josh
Maximoff, Carmelo Montanez-Rivera, Jenise Reyes Rodriguez, Rick Rivello, Sreedevi
Sampath, Mike Trela, and Tao Xie. We also gratefully acknowledge NIST SURF students
Michael Forbes, William Goh, Evan Hartig, Menal Modha, Kimberley O’Brien-Applegate,
Michael Reilly, Malcolm Taylor and Bryan Wilkinson who contributed to the software and
methods described in this document.
iii
Practical Combinatorial Testing
________________________________________________________________
iv
Practical Combinatorial Testing
_______________________________________________________
Table of Contents
1 INTRODUCTION ........................................................................................ 2
1.1 Authority ..............................................................................................................2
1.2 Document Scope and Purpose .............................................................................2
1.3 Audience and Assumptions..................................................................................3
1.4 Organization: How to use this Document...........................................................3
2 COMBINATORIAL METHODS IN TESTING................................................ 4
2.1 Two Forms of Combinatorial Testing..................................................................6
2.2 The Test Oracle Problem .....................................................................................9
2.3 Chapter Summary ..............................................................................................10
3 CONFIGURATION TESTING .................................................................... 11
3.1 Simple Application Platform Example ..............................................................11
3.2 Smart Phone Application Example....................................................................13
3.3 Cost and Practical Considerations .....................................................................15
3.4 Chapter Summary ..............................................................................................16
4 INPUT PARAMETER TESTING ................................................................ 17
4.1 Example Access Control Module ......................................................................17
4.2 Real-world Systems ...........................................................................................19
4.3 Cost and Practical Considerations .....................................................................20
4.4 Chapter Summary ..............................................................................................21
5 SEQUENCE-COVERING ARRAYS ............................................................ 22
5.1 Constructing Sequence Covering Arrays...........................................................23
5.2 Using Sequence Covering Arrays ......................................................................23
5.3 Cost and Practical Considerations .....................................................................24
5.4 Chapter Summary ..............................................................................................25
6 MEASURING COMBINATORIAL COVERAGE .......................................... 27
6.1 Software Test Coverage.....................................................................................27
6.2 Combinatorial Coverage ....................................................................................28
6.3 Cost and Practical Considerations .....................................................................32
6.4 Chapter Summary ..............................................................................................32
7 COMBINATORIAL AND RANDOM TESTING ............................................ 33
7.1 Coverage of Random Tests................................................................................33
7.2 Comparing Random and Combinatorial Coverage............................................36
7.3 Cost and Practical Considerations .....................................................................40
7.4 Chapter Summary ..............................................................................................40
v
Practical Combinatorial Testing
________________________________________________________________
vi
Practical Combinatorial Testing
_______________________________________________________
Executive Summary
Software implementation errors are one of the most significant contributors to
information system security vulnerabilities, making software testing an essential part of
system assurance. In 2003 NIST published a widely cited report which estimated that
inadequate software testing costs the US economy $59.5 billion per year, even though 50%
to 80% of development budgets go toward testing. Exhaustive testing – testing all possible
combinations of inputs and execution paths – is impossible for real-world software, so high
assurance software is tested using methods that require extensive staff time and thus have
enormous cost. For less critical software, budget constraints often limit the amount of
testing that can be accomplished, increasing the risk of residual errors that lead to system
failures and security weaknesses.
Combinatorial testing is a method that can reduce cost and increase the effectiveness of
software testing for many applications. The key insight underlying this form of testing is
that not every parameter contributes to every failure and most failures are caused by
interactions between relatively few parameters. Empirical data gathered by NIST and
others suggest that software failures are triggered by only a few variables interacting (6 or
fewer). This finding has important implications for testing because it suggests that testing
combinations of parameters can provide highly effective fault detection. Pairwise (2-way
combinations) testing is sometimes used to obtain reasonably good results at low cost, but
pairwise testing may miss 10% to 40% or more of system bugs, and is thus not sufficient
for mission-critical software. Combinatorial testing beyond 2-way has been limited,
primarily due to a lack of good algorithms for higher interaction levels such as 4-way to 6-
way testing. New algorithms, however, have made combinatorial testing beyond pairwise
practical for industrial use.
1
Practical Combinatorial Testing
________________________________________________________________
1 INTRODUCTION
1.1 Authority
The National Institute of Standards and Technology (NIST) developed this document
in furtherance of its statutory responsibilities under the Federal Information Security
Management Act (FISMA) of 2002, Public Law 107-347.
This guideline has been prepared for use by Federal agencies. It may be used by
nongovernmental organizations on a voluntary basis and is not subject to copyright, though
attribution is desired.
Nothing in this document should be taken to contradict standards and guidelines made
mandatory and binding on Federal agencies by the Secretary of Commerce under statutory
authority, nor should these guidelines be interpreted as altering or superseding the existing
authorities of the Secretary of Commerce, Director of the OMB, or any other Federal
official.
This publication introduces combinatorial testing and explains how to use it effectively
for system and software assurance.
2
Practical Combinatorial Testing
_______________________________________________________
1.3 Audience and Assumptions
This document assumes that the readers have experience with software development
and testing, some familiarity with scripting languages, and basic knowledge of
programming, logic, and discrete mathematics equivalent to what would be acquired in an
undergraduate computer science or engineering program. Most of the material should be
readily understood by an undergraduate student with some programming experience.
Because of the constantly changing nature of the information technology industry, readers
are strongly encouraged to take advantage of other resources (including those listed in this
document) for more current and detailed information.
Readers new to combinatorial testing may want to review the basics of combinatorics
in Appendix A and read chapters 2, 3, and 4. Other sections of the publication can be
reserved for later use as needed.
3
Practical Combinatorial Testing
________________________________________________________________
Combinatorial testing can help detect problems like this early in the testing life cycle.
The key insight underlying t-way combinatorial testing is that not every parameter
contributes to every failure and most failures are triggered by a single parameter value or
interactions between a relatively small number of parameters (for more on the number of
parameters interacting in failures, see Appendix B). To detect interaction failures, software
developers often use “pairwise testing”, in which all possible pairs of parameter values are
covered by at least one test. Its effectiveness is based on the observation that software
failures often involve interactions between parameters. For example, a router may be
observed to fail only for a particular protocol when packet volume exceeds a certain rate, a
2-way interaction between protocol type and packet rate. Figure 1 illustrates how such a 2-
way interaction may happen in code. Note that the failure will only be triggered when both
pressure < 10 and volume > 300 are true.
if (pressure < 10) {
// do something
if (volume > 300) {
faulty code! BOOM!
}
else {
good code, no problem
}
}
else {
// do something else
}
Figure 1. 2-way interaction failure triggered only when two conditions are
true.
Pairwise testing can be highly effective and good tools are available to generate arrays
with all pairs of parameter value combinations. But until recently only a handful of tools
could generate combinations beyond 2-way, and most that did could require impractically
long times to generate 3-way, 4-way, or 5-way arrays because the generation process is
mathematically complex. Pairwise testing, i.e. 2-way combinations, has come to be
4
Practical Combinatorial Testing
_______________________________________________________
accepted as the common approach to combinatorial testing because it is computationally
tractable and reasonably effective.
100
90
80
Med. Devices
70
Cu m u la t ive %
60 Browser
50
Server
40
30 NASA Distributed
20 DB
10
0
1 2 3 4 5 6
I n t e r a ct ion s
5
Practical Combinatorial Testing
________________________________________________________________
232 or more). These values must be discretized to a few distinct values. Most glaring of all
is the problem of determining the correct result that should be expected from the system
under test for each set of test inputs. Generating 1,000 test data inputs is of little help if we
cannot determine what the system under test (SUT) should produce as output for each of
the 1,000 tests.
6
Practical Combinatorial Testing
_______________________________________________________
selecting input values to exercise the application in each scenario, possibly supplementing
these tests with unusual or suspected problem cases. In the combinatorial approach to input
data selection, a test data generation tool is used to cover all combinations of input values
up to some specified limit. One such tool is ACTS (described in Appendix C), which is
available freely from NIST.
Many, if not most, software systems have a large number of configuration parameters.
Many of the earliest applications of combinatorial testing were in testing all pairs of system
configurations. For example, telecommunications software may be configured to work
with different types of call (local, long distance, international), billing (caller, phone card,
800), access (ISDN, VOIP, PBX), and server for billing (Windows Server, Linux/MySQL,
Oracle). The software must work correctly with all combinations of these, so a single test
suite could be applied to all pairwise combinations of these four major configuration items.
Any system with a variety of configuration options is a suitable candidate for this type of
testing.
7
Practical Combinatorial Testing
________________________________________________________________
2.1.2 Input Parameter Testing
Thorough testing requires that the font-processing function work correctly for all
valid combinations of these input settings. But with 10 binary inputs, there are 210 = 1,024
possible combinations. But the empirical analysis reported above shows that failures
appear to involve a small number of parameters, and that testing all 3-way combinations
may detect 90% or more of bugs. For a word processing application, testing that detects
better than 90% of bugs may be a cost-effective choice, but we need to ensure that all 3-
way combinations of values are tested. To do this, we create a test suite to cover all 3-way
combinations (known as a covering array) [12, 14, 23, 26, 30, 43, 63].
Tests
8
Practical Combinatorial Testing
_______________________________________________________
Similar arrays can be generated to cover up to all 6-way combinations. In general, the
number of t-way combinatorial tests that will be required is proportional to vt log n, for n
parameters with v possible values each.
Figure 4 contrasts these two approaches. With the first approach, we may run the
same test set against all 3-way combinations of configuration options, while for the second
approach, we would construct a test suite that covers all 3-way combinations of input
transaction fields. Of course these approaches could be combined, with the combinatorial
tests (approach 2) run against all the configuration combinations (approach 1).
Inputs:
Product
Amount System
Quantity Under Test
Pmt method
Shipping method
Figure 4. Two ways of using combinatorial testing
Even with efficient algorithms to produce covering arrays, the oracle problem
remains – testing requires both test data and results that should be expected for each data
input. High interaction strength combinatorial testing may require a large number of tests
in some cases, although not always. Approaches to solving the oracle problem for
combinatorial testing include:
Crash testing: the easiest and least expensive approach is to simply run tests
against the system under test (SUT) to check whether any unusual combination of input
values causes a crash or other easily detectable failure. This is essentially the same
procedure used in “fuzz testing”, which sends random values against the SUT. This form
of combinatorial testing could be regarded as a disciplined form of fuzz testing [59]. It
should be noted that although pure random testing will generally cover a high percentage of
t-way combinations, 100% coverage of combinations requires a random test set much
larger than a covering array. For example, all 3-way combinations of 10 parameters with 4
values each can be covered with 151 tests. Purely random generation requires over 900
tests to provide full 3-way coverage.
9
Practical Combinatorial Testing
________________________________________________________________
Embedded assertions: An increasingly popular “light-weight formal methods”
technique is to embed assertions within code to ensure proper relationships between data,
for example as preconditions, postconditions, or input value checks. Tools such as the Java
Modeling language (JML) can be used to introduce very complex assertions, effectively
embedding a formal specification within the code. The embedded assertions serve as an
executable form of the specification, thus providing an oracle for the testing phase. With
embedded assertions, exercising the application with all t-way combinations can provide
reasonable assurance that the code works correctly across a very wide range of inputs.
This approach has been used successfully for testing smart cards, with embedded JML
assertions acting as an oracle for combinatorial tests [25]. Results showed that 80% - 90%
of errors could be found in this way.
1. Empirical data suggest that software failures are caused by the interaction of relatively
few parameter values, and that the proportion of failures attributable to t-way interactions
declines very rapidly with increase in t. That is, usually single parameter values or a pair of
values are the cause of a failure, but increasingly smaller proportions are caused by 3-way,
4-way, and higher order interactions.
2. Because a small number of parameters are involved in failures, we can attain a high
degree of assurance by testing all t-way interactions, for an appropriate interaction strength
t (2 to 6 usually). The number of t-way tests that will be required is proportional to vt log n,
for n parameters with v values each.
3. Combinatorial methods can be applied to configurations or to input parameters, or in
some cases both.
4. As with all other types of testing, the oracle problem must be solved – i.e., for every
test input, the expected output must be determined in order to check if the application is
producing the correct result for each set of inputs. A variety of methods are available to
solve the oracle problem.
10
Practical Combinatorial Testing
_______________________________________________________
3 CONFIGURATION TESTING
Parameter Values
Operating system XP, OS X, RHL
Browser IE, Firefox
Protocol IPv4, IPv6
CPU Intel, AMD
DBMS MySQL, Sybase, Oracle
Table 2. Simple example configuration options.
We can now generate test configurations using the ACTS tool. For simplicity of
presentation we illustrate usage of the command line version of ACTS, but an intuitive GUI
version is available that may be more convenient. This tool is summarized in Appendix C
and a comprehensive user manual is included with the ACTS download.
The first step in creating test configurations is to specify the parameters and
possible values in a file for input to ACTS, as shown in Figure 5:
[System]
[Parameter]
OS (enum): XP,OS_X,RHL
Browser (enum): IE, Firefox
Protocol(enum): IPv4,IPv6
CPU (enum): Intel,AMD
DBMS (enum): MySQL,Sybase,Oracle
[Relation]
[Constraint]
[Misc]
11
Practical Combinatorial Testing
________________________________________________________________
Note that most of the bracketed tags in the input file are optional, and not filled in
for this example. The essential part of the file is the [Parameter] specification, in the
format <parameter name> (<type>): <values>, where one or more values are listed
separated by commas. The tool can then be run at the command line:
A variety of options can be specified, but for this example we only use the “degree of
interaction” option to specify 2-way, 3-way, etc. coverage. Output can be created in a
convenient form shown below, or as a matrix of numbers, comma separated value, or Excel
spreadsheet form. If the output will be used by human testers rather than as input for
further machine processing, the format in Figure 6 is useful:
Degree of interaction coverage: 2
Number of parameters: 5
Maximum number of values per parameter: 3
Number of configurations: 10
-------------------------------------
Configuration #1:
1 = OS=XP
2 = Browser=IE
3 = Protocol=IPv4
4 = CPU=Intel
5 = DBMS=MySQL
-------------------------------------
Configuration #2:
1 = OS=XP
2 = Browser=Firefox
3 = Protocol=IPv6
4 = CPU=AMD
5 = DBMS=Sybase
-------------------------------------
Configuration #3:
1 = OS=XP
2 = Browser=IE
3 = Protocol=IPv6
4 = CPU=Intel
5 = DBMS=Oracle
-------------------------------------
Configuration #4:
1 = OS=OS_X
2 = Browser=Firefox
3 = Protocol=IPv4
4 = CPU=AMD
5 = DBMS=MySQL
. . .
12
Practical Combinatorial Testing
_______________________________________________________
The complete test set for 2-way combinations is shown in Table 1 in Section 2.1.1. Only
10 tests are needed. Moving to 3-way or higher interaction strengths requires more tests, as
shown in Table 3.
t # Tests % of Exhaustive
2 10 14
3 18 25
4 36 50
5 72 100
Table 3. Number of combinatorial tests for a simple example.
13
Practical Combinatorial Testing
________________________________________________________________
int HARDKEYBOARDHIDDEN_NO;
int HARDKEYBOARDHIDDEN_UNDEFINED;
int HARDKEYBOARDHIDDEN_YES;
int KEYBOARDHIDDEN_NO;
int KEYBOARDHIDDEN_UNDEFINED;
int KEYBOARDHIDDEN_YES;
int KEYBOARD_12KEY;
int KEYBOARD_NOKEYS;
int KEYBOARD_QWERTY;
int KEYBOARD_UNDEFINED;
int NAVIGATIONHIDDEN_NO;
int NAVIGATIONHIDDEN_UNDEFINED;
int NAVIGATIONHIDDEN_YES;
int NAVIGATION_DPAD;
int NAVIGATION_NONAV;
int NAVIGATION_TRACKBALL;
int NAVIGATION_UNDEFINED;
int NAVIGATION_WHEEL;
int ORIENTATION_LANDSCAPE;
int ORIENTATION_PORTRAIT;
int ORIENTATION_SQUARE;
int ORIENTATION_UNDEFINED;
int SCREENLAYOUT_LONG_MASK;
int SCREENLAYOUT_LONG_NO;
int SCREENLAYOUT_LONG_UNDEFINED;
int SCREENLAYOUT_LONG_YES;
int SCREENLAYOUT_SIZE_LARGE;
int SCREENLAYOUT_SIZE_MASK;
int SCREENLAYOUT_SIZE_NORMAL;
int SCREENLAYOUT_SIZE_SMALL;
int SCREENLAYOUT_SIZE_UNDEFINED;
int TOUCHSCREEN_FINGER;
int TOUCHSCREEN_NOTOUCH;
int TOUCHSCREEN_STYLUS;
int TOUCHSCREEN_UNDEFINED;
Figure 7. Android resource configuration file.
14
Practical Combinatorial Testing
_______________________________________________________
Using Table 4, we can now calculate the total number of configurations:
3 ⋅ 3 ⋅ 4 ⋅ 3 ⋅ 5 ⋅ 4 ⋅ 4 ⋅ 5 ⋅ 4 = 172,800 configurations (i.e., a 33 4 45 2 system). Like many
applications, thorough testing will require some human intervention to run tests and verify results,
and a test suite will typically include many tests. If each test suite can be run in 15 minutes, it will
take roughly 24 staff-years to complete testing for an app. With salary and benefit costs for each
tester of $150,000, the cost of testing an app will be more than $3 million, making it virtually
impossible to return a profit for most apps. How can we provide effective testing for apps at a
reasonable cost?
Using the covering array generator, we can produce tests that cover t-way
combinations of values. Table 5 shows the number of tests required at several levels of t.
For many applications, 2-way or 3-way testing may be appropriate, and either of these will
require less than 1% of the time required to cover all possible test configurations.
t # Tests % of Exhaustive
2 29 0.02
3 137 0.08
4 625 0.4
5 2532 1.5
6 9168 5.3
Table 5. Number of combinatorial tests for Android example.
The system described in Section 3.1 illustrates a common situation in all types of
testing: some combinations cannot be tested because they don’t exist for the systems under
test. In this case, if the operating system is either OS X or Linux, Internet Explorer is not
available as a browser. Note that we cannot simply delete tests with these untestable
combinations, because that would result in losing other combinations that are essential to
test but are not covered by other tests. For example, deleting tests 5 and 7 in Section 2.1.1
would mean that we would also lose the test for Linux with the IPv6 protocol.
One way around this problem is to delete tests and Some combinations
supplement the test suite with manually constructed test
configurations to cover the deleted combinations, but covering
never occur in
array tools offer a better solution. With ACTS we can specify practice.
constraints, which tell the tool not to include specified combinations in the generated test
configurations. ACTS supports a set of commonly used logic and arithmetic operators to
specify constraints. In this case, the following constraint can be used to ensure that invalid
combinations are not generated:
(OS != “XP” => Browser = “Firefox”)
The covering array tool will then generate a set of test configurations that does not include
the invalid combinations, but does cover all those that are essential. The revised test
configuration array is shown in Figure 8 below. Parameter values that have changed from
15
Practical Combinatorial Testing
________________________________________________________________
the original configurations are underlined. Note that adding the constraint also resulted in
reducing the number of test configurations by one. This will not always be the case,
depending on the constraints used, but it illustrates how constraints can help reduce the
problem. Even if particular combinations are testable, the test team may consider some
combinations unnecessary, and constraints could be used to prevent these combinations,
possibly reducing the number of test configurations.
Using combinatorial methods to design test configurations is probably the most widely
used combinatorial approach because it is quick and easy to do and typically delivers
significant improvements to testing. Combinatorial testing for input parameters can
provide better test coverage at lower cost than conventional tests, and can be extended to
high strength coverage to provide much better assurance.
2. Because many systems have certain configurations that may not be of interest (such as
Internet Explorer browser on a Linux system), constraints are an important consideration in
any type of testing. With combinatorial methods, it is important that the covering array
generator allows for the inclusion of constraints so that all relevant interactions are tested,
and important information is not lost because a test contains an impossible combination.
16
Practical Combinatorial Testing
_______________________________________________________
The system under test is an access control module that implements the following
policy:
Our task is to develop a covering array of tests for these inputs. The first step will
be to develop a table of parameters and possible values, similar to that in Section 3.1 in the
previous chapter. The only difference is that in this case we are dealing with input
parameters rather than configuration options. For the most part, the task is simple: we just
take the values directly from the specifications or code, as shown in Figure 10. Several
17
Practical Combinatorial Testing
________________________________________________________________
parameters are boolean, and we will use 0 and 1 for false and true values respectively. For
day of the week, there are only seven values, so these can all be used. However, hour of
the day presents a problem. Recall that the number of tests generated for n parameters is
proportional to vt, where v is the number of values and t is the interaction level (2-way to 6-
way). For all boolean values and 4-way testing, therefore, the number of tests will be some
multiple of 24. But consider what happens with a large number of possible values, such as
24 hours. The number of tests will be proportional to 244 = 331,736. For this example,
time is given in minutes, which would obviously be completely intractable. Therefore, we
must select representative values for the hour parameter. This problem occurs in all types
of testing, not just with combinatorial methods, and good methods have been developed to
deal with it. Most testers are already familiar with two of these: equivalence partitioning
and boundary value analysis. Additional background on these methods can be found in
software testing texts such as Ammann and Offutt [2], Beizer [4], Copeland [21], Mathur
[45], and Myers [52].
Parameter Values
emp 0,1
time ??
day m,tu,w,th,f,sa,su
auth 0, 1
aud 0, 1
Figure 10. Parameters and values for access control example.
Both of these intuitively obvious methods will produce a smaller set of values that
should be adequate for testing purposes, by dividing the possible values into partitions that
are meaningful for the program being tested. One value is selected for each partition. The
objective is to partition the input space such that any value selected from the partition will
affect the program under test in the same way as any other value in the partition. Thus,
ideally if a test case contains a parameter x which has value y, replacing y with any other
value from the partition will not affect the test case result. This ideal may not always be
achieved in practice.
How should the partitions be determined? One obvious, but not necessarily good,
approach is to simply select values from various points on the range of a variable. For
example, if capacity can range from 0 to 20,000, it might seem sensible to select 0, 10,000,
and 20,000 as possible values. But this approach is likely to miss important cases that
depend on the specific requirements of the system under test. Some judgment is involved,
but partitions are usually best determined from the specification. In this example, 9 am and
5 pm are significant, so 0540 (9 hours past midnight) and 1020 (17 hours past midnight)
determine the appropriate partitions:
18
Practical Combinatorial Testing
_______________________________________________________
Ideally, the program should behave the same for any
of the times within the partitions; it should not matter Use a maximum
whether the time is 4:00 am or 7:03 am, for example, of 8 to 10 values
because the specification treats both of these times the same. per parameter to
Similarly, it should not matter which time between the hours keep testing
of 9 am and 5 pm is chose; the program should behave the tractable.
same for 10:20 am and 2:33 pm. One common strategy,
boundary value analysis, is to select test values at each boundary and at the smallest
possible unit on either side of the boundary, for three values per boundary. The intuition,
backed by empirical research, is that errors are more likely at boundary conditions because
errors in programming may be made at these points. For example, if the requirements for
automated teller machine software say that a withdrawal should not be allowed to exceed
$300, a programming error such as the following could occur:
It is generally also desirable to test the extremes of ranges. One possible selection
of values for the time parameter would then be: 0000, 0539, 0540, 0541, 1019, 1020, 1021,
and 1440. More values would be better, but the tester may believe that this is the most
effective set for the available time budget. With this selection, the total number of
combinations is 2 ⋅ 8 ⋅ 7 ⋅ 2 ⋅ 2 = 448.
t # Tests
2 56
3 112
4 224
Figure 11. Number of tests for access control example.
19
Practical Combinatorial Testing
________________________________________________________________
= 1.7 x 1010 possible settings. We clearly cannot test 17 billion possible settings, but all 3-
way interactions can be tested with only 33 tests, and all 4-way interactions with only 85.
This may seem surprising at first, but it results from the fact that every test of 34
parameters contains = 5,984 3-way and = 46,376 4-way combinations.
34 34
3 4
Combinatorial methods can be highly effective and reduce the cost of testing
substantially. For example, Justin Hunter has applied these methods to a wide variety of
test problems and consistently found both lower cost and more rapid error detection [30].
But as with most aspects of engineering, tradeoffs must be considered. Among the most
important is the question of when to stop testing, balancing the cost of testing against the
risk of failing to discover additional failures. An extensive body of research has been
devoted to this topic, and sophisticated models are available for determining when the cost
of further testing will exceed the expected benefits [10, 45]. Existing models for when to
stop testing can be applied to the combinatorial test approach also, but there is an additional
consideration: What is the appropriate interaction strength to use in this type of testing?
20
Practical Combinatorial Testing
_______________________________________________________
12000
10000
8000
Tests 6000
4000
2000
0
2-way 3-way 4-way 5-way 6-way
Because the number of tests increases only logarithmically with the number of
parameters, test set size for a large problem may be only somewhat larger than for a much
smaller problem. For example, if a project uses combinatorial testing for a system that has
20 parameters and generates several hundred tests, a much larger system with 40 to 50
parameters may only require a few dozen more tests. Combinatorial methods may generate
the best cost benefit ratio for large systems.
1. The key advantage of combinatorial testing derives from the fact that all, or nearly all,
software failures appear to involve interactions of only a few parameters. Generating a
covering array of input parameter values allows us to test all of these interactions, up to
a level of 5-way or 6-way combinations, depending on resources.
2. Practical testing often requires abstracting the possible values of a variable into a small
set of equivalence classes. For example, if a variable is a 32-bit integer, it is clearly not
possible to test the full range of values in +/- 231. This problem is not unique to
combinatorial testing, but occurs in most test methodologies. Simple heuristics and
engineering judgment are required to determine the appropriate portioning of values
into equivalence classes, but once this is accomplished it is possible to generate
covering arrays of a few hundred to a few thousand tests for many applications. The
thoroughness of coverage will depend on resources and criticality of the application.
21
Practical Combinatorial Testing
________________________________________________________________
5 SEQUENCE-COVERING ARRAYS
In testing event-driven software, the critical condition for triggering failures often is
whether or not a particular event has occurred prior to a second one, not necessarily if they
are back to back. This situation reflects the fact that in many cases, a particular state must
be reached before a particular failure can be triggered. For example, a failure might occur
when connecting device A only if device B is already connected. The methods described
in this chapter were developed to solve a real problem in interoperability test and
evaluation, using combinatorial methods to provide efficient testing. Sequence covering
arrays, as defined here, ensure that any t events will be tested in every possible t-way order.
There are 6! = 720 possible sequences for these six events, and the system should
respond correctly and safely no matter the order in which they occur. Operators may be
instructed to use a particular order, but mistakes are inevitable, and should not result in
injury to users or compromise the enterprise. Because setup, connections and operation of
this component are manual, each test can take a considerable amount of time. It is not
uncommon for system-level tests such as this to take hours to execute, monitor, and
complete. We want to test this system as thoroughly as possible, but time and budget
constraints do not allow for testing all possible sequences, so we will test all 3-event
sequences.
With six events, a, b, c, d, e, and f, one subset of three is {b, d, e}, which can be
arranged in six permutations: [b d e], [b e d], [d b e], [d e b], [e b d], [e d b]. A test that
covers the permutation [d b e] is: [a d c f b e]; another is [a d c b e f]. A larger example
system may have 10 devices to connect, in which case the number of permutations is 10!,
or 3,628,800 tests for exhaustive testing. In that case, a 3-way sequence covering array
with 14 tests covering all 10 ⋅ 9 ⋅ 8 = 720 3-way sequences is a dramatic improvement, as is 72
tests for all 4-way sequences (see Table 8).
Event Description
a connect air flow meter
b connect pressure gauge
c connect satellite link
d connect pressure readout
e engage drive motor
f engage steering control
Table 6. System events
22
Practical Combinatorial Testing
_______________________________________________________
Definition. We define a sequence covering array, SCA(N, S, t) as an N x S matrix where
entries are from a finite set S of s symbols, such that every t-way permutation of symbols
from S occurs in at least one row; the t symbols in the permutation are not required to be
adjacent. That is, for every t-way arrangement of symbols x1, x2, ..., xt, the regular
expression .*x1.*x2.*xt.* matches at least one row in the array. Sequence covering arrays,
as the name implies, are analogous to standard covering arrays, which include at least one
of every t-way combination of any n variables, where t<n. A variety of algorithms are
available for constructing covering arrays, but these are not usable for generating t-way
sequences because they are designed to cover combinations in any order.
Example 1. Consider the problem of testing four events, a, b, c, and d. For convenience, a
t-way permutation of symbols is referred to as a t-way sequence. There are 4! = 24 possible
permutations of these four events, but we can test all 3-way sequences of these events with
only six tests (see Table 7).
Test
1 a d b c
2 b a c d
3 b d c a
4 c a b d
5 c d b a
6 d a c b
A 2-way sequence covering array can be constructed by listing the events in some order for
one test and in reverse order for the second test:
1 a b c d
2 d c b a
To see that the procedure in Example 2 generates tests that cover all 2-way sequences, note
that for 2-way sequence coverage, every pair of variables x and y, x..y and y..x must both be
in some test (where a..b means that a is eventually followed by b). All variables are
included in each test, therefore any sequence x..y must be in either test 1 or test 2 and its
reverse y..x in the other test.
For t-way sequence test generation, where t > 2, we use a greedy algorithm that
generates a large number of tests, scores each by the number of previously uncovered
sequences it covers, then chooses the highest scoring test. This simple approach produces
surprisingly good results,
Sequence covering arrays have been incorporated into operational testing for a
mission-critical system that uses multiple devices with inputs and outputs to a laptop
23
Practical Combinatorial Testing
________________________________________________________________
computer. The test procedure has 8 steps: boot system, open application, run scan, connect
peripherals P-1 through P-5. It is expected that for some sequences, the system will not
function properly, thus the order of connecting peripherals is a critical aspect of testing. In
addition, there are constraints on the sequence of events: can't scan until the app is open;
can't open app until system is booted. There are 40,320 permutations of 8 steps, but some
are redundant (e.g., changing the order of peripherals connected before boot), and some are
invalid (violates a constraint). Around 7,000 are valid, and non-redundant, but this is far
too many to test for a system that requires manual, physical connections of devices.
The system was tested using a seven-step sequence covering array, incorporating
the assumption that there is no need to examine strength-3 sequences that involve boot-up.
The initial test configuration (Figure 14) was drawn from the library of pre-computed
sequence tests. Some changes were made to the pre-computed sequences based on unique
requirements of the system test. If 6='Open App' and 5='Run Scan', then cases 1, 4, 6, 8,
10, and 12 are invalid, because the scan cannot be run before the application is started.
This was handled by 'swapping 0 and 1' when they are adjacent (1 and 4), out of order. For
the other cases, several cases were generated from each that were valid mutations of the
invalid case. A test was also embedded to see whether it mattered where each of three
USB connections were placed. The last test case ensures at least strength 2 (sequence of
length 2) for all peripheral connections and 'Boot', i.e., that each peripheral connection
occurs prior to boot. The final test array is shown in Table 9.
Test 1 0 1 2 3 4 5 6
Test 2 6 5 4 3 2 1 0
Test 3 2 1 0 6 5 4 3
Test 4 3 4 5 6 0 1 2
Test 5 4 1 6 0 3 2 5
Test 6 5 2 3 0 6 1 4
Test 7 0 6 4 5 2 1 3
Test 8 3 1 2 5 4 6 0
Test 9 6 2 5 0 3 4 1
Test 10 1 4 3 0 5 2 6
Test 11 2 0 3 4 6 1 5
Test 12 5 1 6 4 3 0 2
As with other forms of combinatorial testing, some combinations may be either impossible or
not exist on the system under test. For example, ‘receive message’ must occur before ‘process
message’. The algorithm we have developed makes it possible to specify pairs x,y, where the
sequence x..y is to be excluded from the generated covering array. Typically this will lead to extra
tests, but does not increase the test array significantly.
24
Practical Combinatorial Testing
_______________________________________________________
5.4 Chapter Summary
2. All 2-way sequences can be tested simply by listing the events to be tested in any order,
then reversing the order to create a second test. Algorithms have been developed to
create sequence covering arrays for higher strength interaction levels.
25
Practical Combinatorial Testing
________________________________________________________________
Original
Case Case Step1 Step2 Step3 Step4 Step5 Step6 Step7 Step8
1 1 Boot P-1 (USB-RIGHT) P-2 (USB-BACK) P-3 (USB-LEFT) P-4 P-5 Application Scan
2 2 Boot Application Scan P-5 P-4 P-3 (USB-RIGHT) P-2 (USB-BACK) P-1 (USB-LEFT)
3 3 Boot P-3 (USB-RIGHT) P-2 (USB-LEFT) P-1 (USB-BACK) Application Scan P-5 P-4
4 4 Boot P-4 P-5 Application Scan P-1 (USB-RIGHT) P-2 (USB-LEFT) P-3 (USB-BACK)
5 5 Boot P-5 P-2 (USB-RIGHT) Application P-1 (USB-BACK) P-4 P-3 (USB-LEFT) Scan
6A 6 Boot Application P-3 (USB-BACK) P-4 P-1 (USB-LEFT) Scan P-2 (USB-RIGHT) P-5
6B 7 Boot Application Scan P-3 (USB-LEFT) P-4 P-1 (USB-RIGHT) P-2 (USB-BACK) P-5
6C 8 Boot P-3 (USB-RIGHT) P-4 P-1 (USB-LEFT) Application Scan P-2 (USB-BACK) P-5
6D 9 Boot P-3 (USB-RIGHT) Application P-4 Scan P-1 (USB-BACK) P-2 (USB-LEFT) P-5
7 10 Boot P-1 (USB-RIGHT) Application P-5 Scan P-3 (USB-BACK) P-2 (USB-LEFT) P-4
8A 11 Boot P-4 P-2 (USB-RIGHT) P-3 (USB-LEFT) Application Scan P-5 P-1 (USB-BACK)
8B 12 Boot P-4 P-2 (USB-RIGHT) P-3 (USB-BACK) P-5 Application Scan P-1 (USB-LEFT)
9 13 Boot Application P-3 (USB-LEFT) Scan P-1 (USB-RIGHT) P-4 P-5 P-2 (USB-BACK)
10A 14 Boot P-2 (USB-BACK) P-5 P-4 P-1 (USB-LEFT) P-3 (USB-RIGHT) Application Scan
10B 15 Boot P-2 (USB-LEFT) P-5 P-4 P-1 (USB-BACK) Application Scan P-3 (USB-RIGHT)
11 16 Boot P-3 (USB-BACK) P-1 (USB-RIGHT) P-4 P-5 Application P-2 (USB-LEFT) Scan
12A 17 Boot Application Scan P-2 (USB-RIGHT) P-5 P-4 P-1 (USB-BACK) P-3 (USB-LEFT)
12B 18 Boot P-2 (USB-RIGHT) Application Scan P-5 P-4 P-1 (USB-LEFT) P-3 (USB-BACK)
NA 19 P-5 P-4 P-3 (USB-LEFT) P-2 (USB-RIGHT) P-1 (USB-BACK) Boot Application Scan
26
Practical Combinatorial Testing
_______________________________________________________
Test coverage is one of the most important topics in software assurance. Users would
like some quantitative measure to judge the risk in using a product. For a given test set,
what can we say about the combinatorial coverage it provides? With physical products,
such as light bulbs or motors, reliability engineers can provide a probability of failure
within a particular time frame. This is possible because the failures in physical products
are typically the result of natural processes, such as metal fatigue.
With software the situation is more complex, and many Commonly used
different approaches have been devised for determining software
test coverage. With millions of lines of code, or only with a few
coverage
thousand, the number of paths through a program is so large that measures do not
it is impossible to test all paths. For each if statement, there are apply well to
two possible branches, so a sequence of n if statements will combinatorial
result in 2n possible paths. Thus even a small program with only testing.
270 if statements in an execution trace may have more possible
paths than there are atoms in the universe, which is on the order of 1080. With loops (while
statements) the number of possible paths is literally infinite. Thus a variety of measures
have been developed to gauge the degree of test coverage. The following are some of the
better-known coverage metrics:
27
Practical Combinatorial Testing
________________________________________________________________
• Condition coverage: The percentage of conditions within decision expressions
that have been evaluated to both true and false. Note that 100% condition coverage does
not guarantee 100% decision coverage. For example, “if (A || B) {do something}
else {do something else}” is tested with [0 1], [1 0], then A and B will both have
been evaluated to 0 and 1, but the else branch will not be taken because neither test leaves
both A and B false.
Note that the coverage measures above depend on access to program source code.
Combinatorial testing, in contrast, is a black box technique. Inputs are specified and
expected results determined from some form of specification. The program is then treated
as simply a processor that accepts inputs and produces outputs, with no knowledge
expected of its inner workings.
Even in the absence of knowledge about a program’s inner structure, we can apply
combinatorial methods to produce precise and useful measures. In this case, we measure
the state space of inputs. Suppose we have a program that accepts two inputs, x and y, with
10 values each. Then the input state space consists of the 102 = 100 pairs of x and y values,
which can be pictured as a checkerboard square of 10 rows by 10 columns. With three
inputs, x, y, and z, we would have a cube with 103 = 1,000 points in its input state space.
Extending the example to n inputs we would have a (hard to visualize) hypercube of n
dimensions with 10n points. Exhaustive testing would require inputs of all 10n
combinations, but combinatorial testing could be used to reduce the size of the test set.
How should state space coverage be measured? Looking closely at the nature of
combinatorial testing leads to several measures that are useful. We begin by introducing
what will be called a variable-value configuration.
Example. Given four binary variables, a, b, c, and d, a=0, c=1, d=0 is a variable-value
configuration, and a=1, c=1, d=0 is a different variable-value configuration for the same
three variables a, c, and d.
28
Practical Combinatorial Testing
_______________________________________________________
is 100%, by definition, but many test sets not based on covering arrays may still provide
significant t-way coverage. If the test set is large, but not designed as a covering array, it is
very possible that it provides 2-way coverage or better. For example, random input
generation may have been used to produce the tests, and good branch or condition coverage
may have been achieved. In addition to the structural coverage figure, for software
assurance it would be helpful to know what percentage of 2-way, 3-way, etc. coverage has
been obtained.
Definition: For a given test set for n variables, simple t-way combination coverage is the
proportion of t-way combinations of n variables for which all variable-values
configurations are fully covered.
Example. Figure 15 shows an example with four binary variables, a, b, c, and d, where
each row represents a test. Of the six 2-way combinations, ab, ac, ad, bc, bd, cd, only bd
and cd have all four binary values covered, so simple 2-way coverage for the four tests in
Figure 15 is 1/3 = 33.3%. There are four 3-way combinations, abc, abd, acd, bcd, each
with eight possible configurations: 000, 001, 010, 011, 100, 101, 110, 111. Of the four
combinations, none has all eight configurations covered, so simple 3-way coverage for this
test set is 0%.
a b c d
0 0 0 0
0 1 1 0
1 0 0 1
0 1 1 1
Definition. For a given test set for n variables, (t+k)-way combination coverage is the
proportion of (t+k)-way combinations of n variables for which all variable-values
configurations are fully covered. (Note that this measure would normally be applied only
to a t-way covering array, as a measure of coverage beyond t).
Example. If the test set in Figure 15 is extended as shown in Figure 16, we can extend 3-
way coverage. For this test set, bcd is covered, out of the four 3-way combinations, so 2-
way coverage is 100%, and (2+1)-way = 3-way coverage is 25%.
29
Practical Combinatorial Testing
________________________________________________________________
a b c d
0 0 0 0
0 1 1 0
1 0 0 1
0 1 1 1
0 1 0 1
1 0 1 1
1 0 1 0
0 1 0 0
Definition. For a given set of n variables, (p, t)-completeness is the proportion of the C(n,
t) combinations that have configuration coverage of at least p [50].
Example. For Figure 16 above, there are C(4, 2) = 6 possible variable combinations and
C(4,2)22 = 24 possible variable-value configurations. Of these, 19 variable-value
configurations are covered and the only ones missing are ab=11, ac=11, ad=10, bc=01,
bc=10. But only two, bd and cd, are covered with all 4 value pairs. So for the basic
definition of simple t-way coverage, we have only 33% (2/6) coverage, but 79% (19/24) for
the configuration coverage metric. For a better understanding of this test set, we can
compute the configuration coverage for each of the six variable combinations, as shown in
Figure 17. So for this test set, one of the combinations (bc) is covered at the 50% level,
three (ab, ac, ad) are covered at the 75% level, and two (bd, cd) are covered at the 100%
level. And, as noted above, for the whole set of tests, 79% of variable-value
configurations are covered. All 2-way combinations have at least 50% configuration
coverage, so (.50, 2)-completeness for this set of tests is 100%.
Although the example in Figure 17 uses variables with the same number of values,
this is not essential for the measurement. Coverage measurement tools that we have
developed compute coverage for test sets in which parameters have differing numbers of
values, as shown in Figure 18 and Figure 19.
30
Practical Combinatorial Testing
_______________________________________________________
1
2-way
0.9
0.8
0.7
Level of coverage
3-way
0.6
0.5
0.4
0.3
4-way
0.2
0.1
0
25
35
45
85
95
3
9
0
1
05
15
55
65
75
1
8
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
31
Practical Combinatorial Testing
________________________________________________________________
1
0.9
0.8
2-way
Level of coverage 0.7
0.6
0.5
0.4
3-way
0.3
0.2 4-way
0.1
0
0.45
0.5
0.6
0.65
0.7
0.8
0.85
0.9
0.95
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.55
0.75
1
P ercent of variable-value configurations
1. Many coverage measures have been devised for code coverage, including
statement, branch or decision, condition, and modified condition decision coverage. These
measures are based on aspects of source code and are not suitable for combinatorial
coverage measurement.
32
Practical Combinatorial Testing
_______________________________________________________
7 COMBINATORIAL AND RANDOM TESTING
Now consider the size of a random test set required to provide 100% combination
coverage. With the most efficient covering array algorithms, the difficulty of finding tests
with high coverage increases as tests are generated. Thus even if a randomly generated test
set provides better than 99% of the coverage of an equal sized covering array, it should not
be concluded that only a few more tests are needed for the random set to provide 100%
coverage. Table 11 gives the sizes of randomly generated test sets required for 100%
combinatorial coverage at various configurations, and the ratio of these sizes to covering
arrays computed with ACTS. Although there is considerable variation among
configurations, note that the ratio of random to combinatorial test set size for 100%
coverage exceeds 3 in most cases, with average ratios of 3.9, 3.8, and 3.2 at t = 2, 3, and 4
respectively. Thus, combinatorial testing retains a significant advantage over random
testing if the goal is 100% combination coverage for a given value of t.
33
Practical Combinatorial Testing
________________________________________________________________
34
Practical Combinatorial Testing
_______________________________________________________
Table 11. Size of random test set required for 100% t-way combination
coverage.
Values
per Ratio, Ratio, Ratio,
variable 2-way 3-way 4-way
2 2.14 2.54 2.57
4 3.84 4.04 3.04
6 4.16 3.59 3.12
8 4.70 4.33 3.44
10 4.68 4.59 3.86
Table 12. Average ratio of random/ACTS for covering arrays
by values per variable, variables = 10, 15, 20, 25
35
Practical Combinatorial Testing
________________________________________________________________
7.2 Comparing Random and Combinatorial Coverage
• For binary variables (v=2), random tests compare reasonably well with covering
arrays (96% to 99% coverage) for all three values (2, 3, and 4) of t for 15 or more
variables. Thus random testing for a SUT with all or mostly binary variables may compare
favorably with combinatorial testing.
However there is a less obvious but important tradeoff regarding covering array size. An
algorithm that produces a very compact array, i.e., with few tests, for t-way combinations
may include fewer (t+1)-way combinations because there are fewer tests. Table 13 and
Table 14 illustrate this phenomenon for an example. Table 9 shows the percentage of t+1
up to t+3 combination coverage provided by the ACTS tests and in Table 10 the equivalent
36
Practical Combinatorial Testing
_______________________________________________________
number of random tests. Although ACTS pairwise tests
provide better 3-way coverage than the random tests, at other A less optimal (by
interaction strengths and values of t, the random tests are size) array may
roughly the same or slightly better in combination coverage provide better
than ACTS. Recall from Section 7.1 that pairwise failure detection
combinatorial tests detected slightly fewer events than the because it
equivalent number of random tests. One possible explanation
includes more
may be that the superior 4-way and 5-way coverage of the
random tests allowed detection of more events. Almost interactions at
paradoxically, an algorithm that produces a larger, sub-optimal t+1, t+2, etc.
covering array may provide better failure detection because the
larger array is statistically more likely to include t+1, t+2, and higher degree interaction
tests as a byproduct of the test generation. Again, however, the less optimal covering array
is likely to more closely resemble the random test suite in failure detection.
Note also that the number of failures in the SUT can affect the degree to which random
testing approaches combinatorial testing effectiveness. For example, suppose the random
test set covers 99% of combinations for 4-way interactions, and the SUT contains only one
4-way interaction failure. Then there is a 99% probability that the random tests will
contain the 4-way interaction that triggers this failure. However, if the SUT contains m
independent failures, then the probability that combinations for all m failures are included
in the random test set is .99m. Hence with multiple failures, random testing may be
significantly less effective, as its probability of detecting all failures will be cm, for c =
percent coverage and m = number of failures.
37
Practical Combinatorial Testing
________________________________________________________________
38
Practical Combinatorial Testing
_______________________________________________________
4.50 -5.00
4.00 -4.50
5.00
3.50 -4.00
4 .5 0
3.00 -3.50
2.50 -3.00 4.00
2 wa y nval=2
3 way
4way
Interactions
Figure 25. Average ratio of random/ACTS for covering arrays by values per
variable
39
Practical Combinatorial Testing
________________________________________________________________
7.3 Cost and Practical Considerations
The relationship between covering arrays and randomly generated tests presents some
interesting issues. Generating covering arrays for combinatorial tests is complex; it has
been shown to be an NP-hard problem. But generating tests randomly is trivial. Thus for
large problems, we can compare the cost and time of generating a covering array versus
producing tests randomly, measuring their coverage (Chapter 6), then adding tests as
needed to provide full combinatorial coverage. Notice the last column of Table 10. For 4-
way tests, once the number of parameters exceeds roughly 20, random generation will
cover 99% or more of 4-way combinations. If a problem requires tests for 100 parameters,
for example, covering array generators may require hours or days, or may simply be unable
to handle that many parameters, but random tests could be generated quickly and easily.
This is an option that may be cost effective even for smaller problems, and should be kept
in mind for test planning.
1. Existing research has shown either no difference (for some problems) or higher
failure detection effectiveness (for most problems) for combinatorial testing. Analyzing
random test sets suggests a number of reasons for this result. In particular, a highly
optimized t-way covering array may include fewer t+1, t+2, and higher degree interaction
tests than an equivalent sized random test set. Similarly, a covering array algorithm that
produces a larger, sub-optimal array may provide better failure detection because the larger
array is statistically more likely to include t+1, t+2, and higher degree interaction tests as a
byproduct of the test generation.
2. While the analysis reported here does not indicate that combinatorial testing is
uniformly better than random, it does support a preference for combinatorial methods if the
cost of applying the two test approaches is the same. This preference may be particularly
relevant if the SUT is likely to contain multiple failures (as is usually the case). Single
failures that depend on the interaction of two or more variables have a high likelihood of
being detected by random tests, because the random test set may cover a high percentage of
all t-way combinations. But the probability of detecting multiple failures declines rapidly
as cm, for c = percent coverage and m = number of independent failures.
40
Practical Combinatorial Testing
_______________________________________________________
Many programming languages include an assert feature that allows the programmer
to specify properties that are assumed true at a particular point in the program. For
example, a function that includes a division in which a particular parameter x will be used
as a divisor may require that this parameter may never be zero. This function may include
the C statement assert(x != 0); as the first statement executed. Note that the
assertion is not the same as an input validity check that issues an error message if input is
not acceptable. The assertion gives conditions that must hold for the function to operate
properly, in this case a non-zero divisor. It is the responsibility of the programmer to ensure
that a zero divisor is never passed to the function. The distinction between assertions and
input validation code is that assertions are intended to catch programming mistakes, while
input validation detects errors in user or file/database input.
41
Practical Combinatorial Testing
________________________________________________________________
is needed to ensure that the screen information is properly displayed. However, we can do
very thorough testing of the most critical aspects of the withdrawal module.
The module has these inputs from the user after the user is authorized by another module:
How should these requirements be translated into assertions and used in testing? Consider
requirement 1: if minflag is set, then the balance before and after the withdrawal must be
no less than the minimum balance amount. This could be translated directly into logic for
assertions: minflag => balance >= min. If the assertion facility does not include
logical implication, then the equivalent expression can be used, for example, in C syntax:
!minflag || balance >= min.
42
Practical Combinatorial Testing
_______________________________________________________
However, we must also consider overdraft protection and withdrawal limits, so the
assertion above is not adequate. Collecting conditions, we can develop assertions for each
of the eight possible settings of minflag, odflag, and limflag. If there is a minimum
balance requirement, no overdraft protection, and a withdrawal limit below the default,
what is the relationship between balance and the other parameters?
This relation must hold after the withdrawal, so to develop an assertion that must hold
immediately before the withdrawal, substitute (balance – amt) for balance in the expression
above:
Assertions such as this would be placed immediately before the balance is modified,
not at the beginning of the code for the withdrawal function. Code prior to the subtraction
from balance should have ensured that properties encoded by assertions hold immediately
before the subtraction, thus any violation of the assertions indicates an error in the code (or
possibly in the assertions!) that must be investigated. This is illustrated in Figure 26, where
“wdl_init.c” and “wdl_final.c” are files containing assertions such as developed above.
Including the card number, there are 11 parameters for this module. We need to
partition the inputs to determine what values to use in generating a covering array.
Partitions should cover valid and invalid values, minimum and maximum for ranges, and
values at and on either side of boundaries. The bank uses a check digit scheme for card
numbers to detect errors such as digit transposition when numbers are entered manually. A
simple partition could be as follows:
Using the equivalence classes above, this is thus a 2447 system, or 262,144 possible inputs.
If values on either side of boundaries are used, the number of possible input combinations
will be much larger, but using combinatorial methods we can cover 3-way or 4-way
combinations with only a few hundred tests.
43
Practical Combinatorial Testing
________________________________________________________________
While the method described in the previous section can be very effective in testing,
notice that it will be inadequate for many problems, because basic assertion functions such
as in C language library do not support important logic operators such as ∀ (for all) and ∃
(for some). Thus expressing simple properties such as S is sorted in ascending order =
∀i : 0 ≤ i < n − 1 : S [i ] ≤ S [i + 1] cannot be done without a good deal of additional coding.
While it would be possible to add code to handle these problems in assertions, a better
solution is to use an assertion language that is designed for the purpose and contains all the
necessary features.
Tools such as Anna [44] for Ada, the Java Modeling language (JML) [42] and
iContract [28] for Java, and APP [57] or Nana [46] for C, can be used to introduce complex
assertions, effectively embedding a formal specification within the code. The embedded
assertions serve as an executable form of the specification, thus providing an oracle for the
testing phase. With embedded assertions, exercising the application with all t-way
combinations can provide reasonable assurance that the code works correctly across a very
wide range of inputs. This approach has been used successfully for testing smart cards,
with embedded JML assertions acting as an oracle for combinatorial tests [25]. Results
showed that 80% - 90% of errors could be found in this way.
44
Practical Combinatorial Testing
_______________________________________________________
8.3 Cost and Practical Considerations
More complex assertions can provide stronger assurance, but there are limits to their
effectiveness. For example, invariants (properties that are expected to hold throughout a
computation) cannot be assured without placing an assertion for every line of code. Since
assertions must be executed to show the presence or absence of a property at some point,
errors that prevent the assertion from being reached may not be detected. As an example,
consider the code in Figure 26. If a coding error in the first few lines of the function
prevents execution the code at of lines 15 and 17, the assertions will not be executed and it
may be assumed that the test was passed. In this case, an ERROR return for the particular
test case might trigger an investigation that would identify the faulty code, but this may not
happen with other applications.
Assertions are one of the easiest to use and most effective approaches to dealing with
the oracle problem. Properties ranging from simple parameter checks to effectively
embedded proofs can be encoded in assertions, but special language support is needed for
the stronger forms of assurance. This support may be provided as language preprocessors,
as in the case of Anna [44] and others. Placement within code is particularly important to
assertion effectiveness [60, 61], but if sufficiently strong assertions are embedded, the code
becomes self-checking for important properties. With self-checking code, thousands of
tests can be run at low cost in most cases, greatly improving the chances that faults will be
detected.
45
Practical Combinatorial Testing
________________________________________________________________
One of the most effective ways to produce test oracles is to use a model of the
system under test, and generate complete tests, including both input data and expected
results, directly from the model. The model in this case is exactly what the name implies:
it incorporates the most important aspects of the system, but not every detail such as the
location of an amount on a screen (if it did include all details, it would be equivalent to the
system itself). This chapter provides a step-by-step introduction to model-based automated
generation of tests that provide combinatorial coverage. Procedures introduced in this
tutorial will produce a set of complete tests, i.e., input values with the expected output for
each set of inputs.
In addition to the ACTS covering array generator, (see Appendix C), we use
NuSMV [18], a variant of the original SMV model checker. NuSMV is freely available
and was developed by Carnegie Mellon University, Instituto per la Ricerca Scientifica e
Tecnolgica (IRST), U. of Genova, and U. of Trento. NuSMV can be installed on either
UNIX/Linux or Windows systems running Cygwin. Links and instructions for
downloading NuSMV are included in the appendix.
9.1 Overview
1. Using ACTS, construct a set of tests that will cover all t-way combinations of
parameter values. The covering array specifies test data, where each row of the array can
be regarded as a set of parameter values for an individual test (see Chapter 4).
2. Determine what output should be produced by the SUT for each set of input parameter
values. The test data output from ACTS will be incorporated into SMV specifications that
can be processed by the NuSMV model checker for this step. In many cases, the
conversion to SMV will be straightforward. The example in Section 9.2.1 illustrates a
simple conversion of rules in the form “if condition then action” into the syntax used by the
model checker. The model checker will instantiate the specification with parameter values
from the covering array once for each test in the covering array. The resulting specification
is evaluated against a claim that negates each specified result Rj using a model checker, so
that the model checker evaluates claims in the following form: Ci => ~Rj, where Ci is a set
of parameter values in one row of the covering array in the form p1 = vi1 & p2 = vi2 & ... &
46
Practical Combinatorial Testing
_______________________________________________________
pn = vin, and Rj is one of the possible results. The output of this step is a set of
counterexamples that show how the SUT can reach the claimed result Rj from a given set
of inputs.
The example in the following sections illustrates how these counterexamples are converted
into tests. Other approaches to determining the correct output for each test can also be
used. For example, in some cases we can run a model checker in simulation mode,
producing expected results directly rather than through a counterexample.
The completed tests can be used to validate correct operation of the system for
interaction strengths up to some pre-determined level t. Depending on the system type and
level of effort, we may want to use pairwise (t=2) or higher strength, up to t=6 way
interactions. We do not claim this guarantees correctness of the system, as there may be
failures triggered only by interaction strengths greater than t. In addition, some of the
parameters are likely to have a large number of possible values, requiring that they be
abstracted into equivalence classes. If the abstraction does not faithfully represent the
range of values for a parameter, some flaws may not be detected by equivalence classes
used.
Here we present a small example of a very simple access control system. The rules
of the system are a simplified multi-level security system, given below, followed by a step-
by-step construction of tests using a fully automated process.
Each subject (user) has a clearance level u_l, and each file has a classification level,
f_l. Levels are given as 0, 1, or 2, which could represent levels such as Confidential,
Secret, and Top Secret. A user u can read a file f if u_l ≥ f_l (the “no read up” rule), or
write to a file if f_l ≥ u_l (the “no write down” rule).
Tests produced will check that these rules are correctly implemented in a system.
This system is easily modeled in SMV as a simple two-state finite state machine. The
START state merely initializes the system (line 8, Figure 27), with the rule above used to
evaluate access as either GRANT or DENY (lines 9-13). For example, line 9 represents
the first line of the pseudo-code above: in the current state (always START for this simple
model), if u_l ≥ f_l then the next state is GRANT. Each line of the case statement is
examined sequentially, as in a conventional programming language. Line 12 implements
the “else DENY” rule, since the predicate “1” is always true. SPEC clauses given at the
47
Practical Combinatorial Testing
________________________________________________________________
end of the model are simple “reflections” that duplicate the access control rules as temporal
logic statements. They are thus trivially provable, but we are interested in using them to
generate tests rather than to prove properties of the system.
1. MODULE main
2. VAR
--Input parameters
3. u_l: 0..2; -- user level
4. f_l: 0..2; -- file level
5. act: {rd,wr}; -- action
--output parameter
6. access: {START_, GRANT,DENY};
7. ASSIGN
8. init(access) := START_;
--if access is allowed under rules, then next state is GRANT
--else next state is DENY
9. next(access) := case
10. u_l >= f_l & act = rd : GRANT;
11. f_l >= u_l & act = wr : GRANT;
12. 1 : DENY;
13. esac;
14. next(u_l) := u_l;
15. next(f_l) := f_l;
16. next(act) := act;
Separate documentation on SMV should be consulted to fully understand the syntax used,
but specifications of the form “AG ((predicate 1) -> AX (predicate 2))” indicate
essentially that for all paths (the “A” in “AG”) for all states globally (the “G”), if predicate
1 holds then ( “->”) for all paths, in the next state (the “X” in “AX”) predicate 2 will hold.
In the next section we will see how this specification can be used to produce complete
tests, with test data input and the expected output for each set of input data.
48
Practical Combinatorial Testing
_______________________________________________________
true for the given model, the model checker reports this fact. What makes a model checker
particularly valuable for many applications, though, is that if the statement is false, the
model checker not only reports this, but also provides a “counterexample” showing how
the claim in the SPEC statement can be shown false. The counterexample will include
input data values and a trace of system states that lead to a result contrary to the SPEC
claim (Figure 28). In the process described in this section, the input data values will be the
covering array generated by ACTS.
Checking the properties in the SPEC statements shows that they match the access
control rules as implemented in the FSM, as expected. In other words, the claims we made
about the state machine in the SPEC clauses can be proven. This step is used to check that
the SPEC claims are valid for the model defined previously. If NuSMV is unable to prove
one of the SPECs, then either the spec or the model is incorrect. This problem must be
resolved before continuing with the test generation process. Once the model is correct and
SPEC claims have been shown valid for the model, counterexamples can be produced that
will be turned into test cases, by which we mean a set of test inputs with the expected result
for these inputs. In other words, ACTS is used to generate tests, then the model checker
determines expected results for each test.
-- specification AG((u_l >= f_l & act = rd) -> AX access = GRANT)
is true
-- specification AG((f_l >= u_l & act = wr) -> AX access = GRANT)
is true
-- specification AG(!((u_l >= f_l & act = rd)|(f_l >= u_l & act = wr))
-> AX access = DENY) is true
Figure 28. NuSMV output
49
Practical Combinatorial Testing
________________________________________________________________
We will compute covering arrays that give all t-way combinations, with degree of
interaction coverage = 2 for this example. This section describes the use of ACTS as a
standalone command line tool, using a text file input (see Section 3.1). The first step is to
define the parameters and their values in a system definition file that will be used as input
to ACTS. Call this file “in.txt”, with the following format:
[System]
[Parameter]
u_l: 0,1,2
f_l: 0,1,2
act: rd,wr
[Relation]
[Constraint]
[Misc]
For this application, the [Parameter] section of the file is all that is needed. Other tags refer
to advanced functions that will be explained in other documents. After the system
definition file is saved, run ACTS as shown below:
java -Ddoi=2 –jar acts_cmd.jar ActsConsoleManager in.txt out.txt
The “-Ddoi=2” argument sets the degree of interaction for the covering array that we want
ACTS to compute. In this case we are using simple 2-way, or pairwise, interactions. (For
a system with more parameters we would use a higher strength interaction, but with only
three parameters, 3-way interaction would be equivalent to exhaustive testing.) ACTS
produces the output shown in Figure 29.
Each test configuration defines a set of values for the input parameters u_l, f_l, and
act. The complete test set ensures that all 2-way combinations of parameter values have
been covered. If we had a larger number of parameters, we could produce test
configurations that cover all 3-way, 4-way, etc. combinations. ACTS may output “don’t
care” for some parameter values. This means that any legitimate value for that parameter
can be used and the full set of configurations will still cover all t-way combinations. Since
“don’t care” is not normally an acceptable input for programs being tested, a random value
for that parameter is substituted before using the covering array to produce tests.
50
Practical Combinatorial Testing
_______________________________________________________
Number of parameters: 3
Maximum number of values per parameter: 3
Number of configurations: 9
-------------------------------------
Configuration #1:
1 = u_l=0
2 = f_l=0
3 = act=rd
-------------------------------------
Configuration #2:
1 = u_l=0
2 = f_l=1
3 = act=wr
-------------------------------------
Configuration #3:
1 = u_l=0
2 = f_l=2
3 = act=rd
-------------------------------------
Configuration #4:
1 = u_l=1
2 = f_l=0
3 = act=wr
-------------------------------------
Configuration #5:
1 = u_l=1
2 = f_l=1
3 = act=rd
-------------------------------------
Configuration #6:
1 = u_l=1
2 = f_l=2
3 = act=wr
-------------------------------------
Configuration #7:
1 = u_l=2
2 = f_l=0
3 = act=rd
-------------------------------------
Configuration #8:
1 = u_l=2
2 = f_l=1
3 = act=wr
-------------------------------------
Configuration #9:
1 = u_l=2
2 = f_l=2
3 = (don't care)
Figure 29. ACTS output
The next step is to assign values from the covering array to parameters used in the
model. For each test, we claim that the expected result will not occur. The model checker
51
Practical Combinatorial Testing
________________________________________________________________
determines combinations that would disprove these claims, outputting these as
counterexamples. Each counterexample can then be converted to a test with known
expected result. Every test from the ACTS tool is used, with the model checker supplying
expected results for each test. (Note that the trivially provable positive claims have been
commented out. Here we are concerned with producing counterexamples.)
Recall the structure introduced in Section 9.1: Ci => ~Rj. Here Ci is the set of
parameter values from the covering array. For example, for configuration #1 in Section:
As can be seen below, for each of the 9 configurations in the covering array
we create a SPEC claim of the form:
This process is repeated for each possible result, in this case either “GRANT” or
“DENY”, so we have 9 claims for each of the two results. The model checker is able to
determine, using the model defined in Section 9.2.1, which result is the correct one for each
set of input values, producing a total of 9 tests.
Excerpt:
...
-- reflection of the assign for access
--SPEC AG ((u_l >= f_l & act = rd ) -> AX (access = GRANT));
--SPEC AG ((f_l >= u_l & act = wr ) -> AX (access = GRANT));
--SPEC AG (!((u_l >= f_l & act = rd ) | (f_l >= u_l & act = wr ))
-> AX (access = DENY));
SPEC AG((u_l = 0 & f_l = 0 & act = rd) -> AX !(access = GRANT));
SPEC AG((u_l = 0 & f_l = 1 & act = wr) -> AX !(access = GRANT));
SPEC AG((u_l = 0 & f_l = 2 & act = rd) -> AX !(access = GRANT));
SPEC AG((u_l = 1 & f_l = 0 & act = wr) -> AX !(access = GRANT));
SPEC AG((u_l = 1 & f_l = 1 & act = rd) -> AX !(access = GRANT));
SPEC AG((u_l = 1 & f_l = 2 & act = wr) -> AX !(access = GRANT));
SPEC AG((u_l = 2 & f_l = 0 & act = rd) -> AX !(access = GRANT));
SPEC AG((u_l = 2 & f_l = 1 & act = wr) -> AX !(access = GRANT));
SPEC AG((u_l = 2 & f_l = 2 & act = rd) -> AX !(access = GRANT));
SPEC AG((u_l = 0 & f_l = 0 & act = rd) -> AX !(access = DENY));
SPEC AG((u_l = 0 & f_l = 1 & act = wr) -> AX !(access = DENY));
SPEC AG((u_l = 0 & f_l = 2 & act = rd) -> AX !(access = DENY));
SPEC AG((u_l = 1 & f_l = 0 & act = wr) -> AX !(access = DENY));
SPEC AG((u_l = 1 & f_l = 1 & act = rd) -> AX !(access = DENY));
SPEC AG((u_l = 1 & f_l = 2 & act = wr) -> AX !(access = DENY));
SPEC AG((u_l = 2 & f_l = 0 & act = rd) -> AX !(access = DENY));
SPEC AG((u_l = 2 & f_l = 1 & act = wr) -> AX !(access = DENY));
SPEC AG((u_l = 2 & f_l = 2 & act = rd) -> AX !(access = DENY));
52
Practical Combinatorial Testing
_______________________________________________________
9.2.3 Generating Tests from Counterexamples
NuSMV produces counterexamples where the input values would disprove the
claims specified in the previous section. Each of these counterexamples is thus a set of test
data that would have the expected result of GRANT or DENY.
For each SPEC claim, if this set of values cannot in fact lead to the particular result
Rj, the model checker indicates that this is true. For example, for the configuration below,
the claim that access will not be granted is true, because the user’s clearance level (u_l =
0) is below the file’s level (f_l = 2):
-- specification AG (((u_l = 0 & f_l = 2) & act = rd) -> AX
!(access = GRANT)) is true
If the claim is false, the model checker indicates this and provides a trace of
parameter input values and states that will prove it is false. In effect this is a complete test
case, i.e., a set of parameter values and expected result. It is then simple to map these
values into complete test cases in the syntax needed for the system under test.
The model checker finds that 6 of the input parameter configurations produce a result of
GRANT and 3 produce a DENY result, so at the completion of this step we have
successfully matched up each input parameter configuration with the result that should be
produced by the SUT.
We now strip out the parameter names and values, giving tests that can be applied
to the system under test. This can be accomplished using a variety of methods; a simple
script used in this example is given in the appendix. The test inputs and expected results
produced are shown below:
53
Practical Combinatorial Testing
________________________________________________________________
u_l = 2 & f_l = 2 & act = rd -> access = GRANT
u_l = 0 & f_l = 2 & act = rd -> access = DENY
u_l = 1 & f_l = 0 & act = wr -> access = DENY
u_l = 2 & f_l = 1 & act = wr -> access = DENY
These test definitions can now be post-processed using simple scripts written in PERL,
Python, or similar tool to produce a test harness that will execute the SUT with each input
and check the results. While tests for this trivial example could easily have been
constructed manually, the procedures introduced in this tutorial can, and have, been used to
produce tens of thousands of complete test cases in a few minutes, once the SMV model
has been defined for the SUT.
Model based test generation trades up-front analysis and specification time against the
cost of greater human interaction for analyzing test results. The model or formal
specification may be costly to produce, but once it is available, large numbers of tests can
be generated, executed, and analyzed without human intervention. This can be an
enormous cost savings, since testing usually requires 50% or more of the software
development budget. For example, suppose a $100,000 development project expects to
spend $50,000 on testing, because of the staff time required to code and run tests, and
analyze results. If a formal model can be created for $20,000, complete tests generated and
analyzed automatically, with another $10,000 for a smaller number of human-involved
tests and analysis, then the project will save 20%. One tradeoff for this savings is the
requirement for staff with skills in formal methods, but in some cases this approach may be
practical and highly cost-effective.
1. The oracle problem must be solved for any test methodology, and it is particularly
important for thorough testing that produces a large number of test cases. One
approach to determining expected results for each test input is to use a model of the
system that can be simulated or analyzed to compute output for each input.
2. Model checkers can be used to solve the oracle problem because whenever a specified
property for a model does not hold, the model checker generates a counter-example.
The counter-example can be post-processed into a complete working test harness that
executes all tests from the covering array and checks results.
3. Several approaches are possible for integrating combinatorial testing with model
checkers, but some present practical problems. The method reported in this chapter can
be used to generate full combinatorial test suites, with expected results for each test, in
a cost effective way.
54
Practical Combinatorial Testing
_______________________________________________________
10 FAULT LOCALIZATION
At first glance, fault localization may not appear to be a difficult problem, and in many
cases it will not be, but we want to automate the process as much as possible. To
understand the size of the problem, consider a module that has 20 input parameters. A set
of 3-way covering tests passes 100%, but several tests derived from a 4-way covering array
result in failure. (Therefore, at least four parameter values are involved in triggering the
failure. It is possible that a 5-way or higher combination caused the failure, since any set
of t-way tests also includes (t+1)-way and higher strength combinations as well.) A test
with 20 input parameters has C(20, 4) = 4,845 4-way combinations, yet presumably only
one (or just a few) of these triggered the failure. To determine the combination at fault, a
variety of strategies can be used.
The analysis presented here applies to a deterministic system, in which a particular set
of input values always results in the same processing and outputs. Let P = {combinations
in passing tests} and F = {combinations in failing tests} and C = {fault-triggering
combinations}. Then F \ P , combinations in failing tests that are not in any passing tests,
must contain the fault-triggering combinations C because if any of those in C were in P,
then the test would have failed. So in most cases, C ⊆ F \ P , as shown in Figure 30.
C⊆F\P
C⊆F\P
P F
55
Practical Combinatorial Testing
________________________________________________________________
Continuing with the analysis in this manner, some properties become apparent. For the
discussion below, Pt = {combinations in t-way passing tests}, with Ft and Ct defined
analogously. Let Tt = {t-way tests} and f(x) be a function that indicates whether a test x
passes or fails for the system under test. Thus P4 = {combinations in 4-way passing tests},
T5= {5-way tests}, etc.
• Interaction level lower bound: If all t-way tests pass, then a t-way or lower strength
combination did not cause the failure. The failure must have been caused by a
(t+k)-way combination, for some k > t. Note that the converse is not necessarily
true: if some t-way test fails, we cannot conclude that a t-way test caused the
failure, because any t-way test set contains some k-way combinations, for k > t.
• Interaction continuity: Now consider Ct. Because t-way tests cover all
combinations of t-way or lower strength (e.g., 4-way tests also cover all 3-way
combinations), a combination that triggered the failure in Ft must also occur in
F(t+1), F(t+2), etc. Therefore we can further reduce the potential failure-triggering
combinations by computing Ft I F (t + 1) I ... I F (t + k ) for whatever interaction
strength k we have tests available.
• Value dependence: If tests in Ft cover all values for a t-way parameter combination
c, then the failure is independent of c; i.e., c is not a t-way failure-triggering
combination(s).
56
Practical Combinatorial Testing
_______________________________________________________
1 because “a && b” evaluates to 1, but p(0,1,1,1,0) will detect the error. A complete 3-
way covering test set will detect the error because it must include at least one test with
values 0,1,1,1,. and one with 1,0,1,1,. . Figure 31 shows tests for this example for t = 2, 3,
and 4. Failing tests are underlined.
A 2-way test may detect the error, since “c && d” is the condition necessary, but
this will only occur if line 3 is reached, which requires either a=0 or b=0. In the example
test set this occurs with the second test. So in this case, a full 2-way test set has detected
the error, and the heuristics above for 2-way combinations will find that tests with c=1 and
d=1 occur in both P and F. In this case, debugging may identify c=1, d=1 as a combination
that triggers the failure, but automated analysis using the heuristics will find two 3-way
combinations that occur in failing tests but not passing tests: a=0, c=1, d=1 and b=0, c=1,
d=1. As Figure 32 illustrates, in most cases we will find more than one combination
identified as possible causes of failure.
The heuristics above can be applied to combinations in the failed tests to identify possible
failure-triggering combinations, shown in Figure 32.
• The 1-way tests do not detect any failures, but the 2-way tests do, so t=2 is a lower
bound for the interaction level needed to detect a failure.
• The value dependence rule applies to combination “be” – since all four possible
values for this combination occur in failing tests, failure must be independent of
combination be. In other words, we do not consider the pair be to be a cause of
57
Practical Combinatorial Testing
________________________________________________________________
failure because it does not matter what value this pair has. Every test must have
some value for these parameters.
t=2 ab ac ad ae bc bd be cd ce de
01 01 01 01 11 11 11 11 11 11
00 11 11 00 01 01 01 10 10
10 11 00
10
t=3 abc abd abe acd ace ade bcd bce bde cde
011 011 011 011 011 011 111 111 111 111
001 001 001 111 010 010 011 011 011 110
101 101 000 111 111 010 010
101 110 110
010
t=4 abcd abce abde bcde
0111 0111 0111 1111
0011 0011 0011 0111
1011 0010 0010 0110
1011 1011 1110
0110 0110
Figure 32. Combinations in failing tests.
• The elimination rule can be applied to determine that there are no 1-way or 2-way
combinations that do not appear in both passing and failing tests. Results for 3-way
and 4-way combinations are shown in Figure 33. These results were produced by
an analysis tool which outputs in the format <test number>:<t level> <parameter
numbers> = <parameter values>. Two different 3-way combinations are identified:
a=0, c=1, d=1 and b=0, c=1, d=1. A large number of 4-way combinations are also
identified, but we can use the interaction continuity rule to show that one of the two
3-way combinations occurs in all of the failing 4-way failing tests. Therefore we
can conclude that covering all 3-way parameter interactions would detect the error.
58
Practical Combinatorial Testing
_______________________________________________________
The situation is more complex with continuous variables. If, for example, a failure-
related branch is taken any time x > 100, y = 3, z < 1000, there may be many combinations
implicated in the failure. Analysis will show that [x = 200, y = 3, z = 120], [x = 201, y = 3,
z = 119], [x = 999, y = 3, z = 999], [x = 101, y = 3, z = 0], [x = 200, y = 3, z = 0] are all
combinations that trigger the failure. With more than three input parameters, there may be
dozens or hundreds of failure-triggering combinations, even though there is most likely a
single point in the code that is in error.
When source code is available, the best way to identify the cause of a failure is with
conventional debugging techniques, since the error must be fixed in code anyway. With
pure black-box testing and no access to source code, the heuristics discussed in this chapter
may help to narrow down possible causes. Usually there will be many combinations
identified as possible causes, so substantial additional testing may be needed to determine
the exact cause.
59
Practical Combinatorial Testing
________________________________________________________________
This appendix reviews a few basic facts of combinatorics, regular expressions, and
mathematical logic that are necessary to understand the concepts in this publication.
Combinatorics
Permutations and Combinations
n
For n variables, there are n! permutations and =
n!
t t! (n − t )!
(“n choose t”) combinations
of t variables, also written for convenience as C(n ,t). To exercise all of the t-way
combinations of inputs to a program, we need to cover all t-way combinations of variable
values, and each combination of t values can have vt configurations, where v is the number
of values per variable. Thus the total number of combinations instantiated with values that
must be covered is
vt n (1)
t
Fortunately, each test covers C(n, t) combination configurations. This fact is the source of
combinatorial testing’s power. For example, with 34 binary variables, we would need 234 =
1.7 * 1010 tests to cover all possible configurations, but with only 33 tests we can cover all
3-way combinations of these 34 variables. This happens because each test covers C(34, 3)
combinations.
Example. If we have five binary variables, a, b, c, d, and e, then expression (1) says we
will need to cover 23 * C(5, 3) = 8*10 = 80 configurations. For 3-way combinatorial
testing, we will need to take all 3-variable combinations, of which there are 10:
abc, abd, abe, acd, ace, ade, bcd, bce, bde, cde
Each of these will need to be instantiated with all 8 possible configurations of three binary
variables:
abc abd abe acd ace ade bcd bce bde cde
010 000 011 001 001 001 100 101 101 001
Orthogonal Arrays
Many software testing problems can be solved with an orthogonal array, a structure
that has been used for combinatorial testing in fields other than software for decades. An
60
Practical Combinatorial Testing
_______________________________________________________
orthogonal array, OAλ ( N ; t , k , v) is an N x k array. In every N x t subarray, each t-tuple
occurs exactly λ times. We refer to t as the strength of the coverage of interactions, k as the
number of parameters or components (degree), and v as the number of possible values for
each parameter or component (order).
Covering Arrays
61
Practical Combinatorial Testing
________________________________________________________________
The challenge in computing covering arrays is to find the smallest possible array that
covers all configurations of t variables. If every new test generated covered all previously
uncovered combinations, then the number of tests needed would be
vt n = vt
t
n
t
Since this is not generally possible, the covering array will be significantly larger
than vt, but still a reasonable number for testing. It can be shown that the number of tests in
a t-way covering array will be proportional to
vt log n (2)
62
Practical Combinatorial Testing
_______________________________________________________
v↓ t→ 2 3 4 5 6
2 4 8 16 32 64
4 16 64 256 1024 4096
6 36 216 1296 7776 46656
Table 1. Growth of vt
Despite the possibly discouraging numbers in the table above, there is some good news.
Note that formula (2) grows only logarithmically with the number of variables, n. This is
fortunate for software testing. Early applications of combinatorial methods were typically
involved with small numbers of variables, such as a few different types of crops or
fertilizers, but for software testing, we must deal with tens, or in some cases hundreds of
variables.
Regular Expressions
Regular expressions are formal descriptions of strings of symbols, which may represent
text, events, characters, or other objects. They are developed within automata theory and
formal languages, where it is shown that there are direct mappings between expressions
and automata to process them, and are encountered in many areas within computer science.
In combinatorial testing they may be encountered in sequence covering or in processing
test input or output. Implementations vary, but standard syntax is explained below.
Expression Operators
63
Practical Combinatorial Testing
________________________________________________________________
Combining Operators
The operators above can be combined with symbols to create arbitrarily complex
expressions. Examples include:
Many regular expression utilities such as egrep support a broader range of operators and
features. Readers should consult documentation for grep, egrep, or other regular
expression processors for detailed coverage of the options available on particular tools.
64
Practical Combinatorial Testing
_______________________________________________________
One of the most important questions in software testing is "how much is enough"?
For combinatorial testing, this question includes determining the appropriate level of
interaction that should be tested. That is, if some failure is triggered only by an unusual
combination of more than two values, how many testing combinations are enough to detect
all errors? What degree of interaction occurs in real system failures? This section
summarizes what is known about these questions based on research by NIST and others [4,
7, 34, 35, 36, 65].
65
Practical Combinatorial Testing
________________________________________________________________
NVD
Vars cumulative
%
1 93%
2 99%
3 100%
4 100%
5 100%
6 100%
Table 3. Cumulative percentage of denial-of-service
vulnerabilities triggered by t-way interactions.
Why do the failure detection curves look this way? That is, why does the error rate
tail off so rapidly with more variables interacting? One possibility is that there are simply
few complex interactions in branching points in software. If few branches involve 4-way,
5-way, or 6-way interactions among variables, then this degree of interaction could be rare
for failures as well. The table below (Table 4 and Fig. 2) gives the number and percentage
of branches in avionics code triggered by one to 19 variables. This distribution was
66
Practical Combinatorial Testing
_______________________________________________________
developed by analyzing data in a report on the use of MCDC testing in avionics software
[16], which contains 20,256 logic expressions in five different airborne systems in two
different airplane models. The table below includes all 7,685 expressions from if and while
statements; expressions from assignment (:=) statements were excluded.
As shown in Fig. 2, most branching statement expressions are simple, with over 70%
containing only a single variable. Superimposing the curve from Fig. 2 on Fig. 1, we see
(Fig. 3) that most failures are triggered by more complex interactions among variables. It
is interesting that the NASA distributed database failures, from development-phase
software bug reports, have a distribution similar to expressions in branching statements.
This distribution may be because this was development-phase rather than fielded software
like all other types reported in Fig. 1. As failures are removed, the remaining failures may
be harder to find because they require the interaction of more variables. Thus testing and
use may push the curve down and to the right.
67
Practical Combinatorial Testing
________________________________________________________________
68
Practical Combinatorial Testing
_______________________________________________________
• ACTS covering array generator – produces compact arrays that will cover 2-way
through 6-way combinations. It also supports constraints that can make some
values dependent on others, and mixed level covering arrays which offer different
strength coverage for subsets of the parameters (e.g., 2-way coverage for one
subset but 4-way for another subset of parameters). Output can be exported in a
variety of formats, including human-readable, numeric, and spreadsheet. Either
“don’t care” or randomized output can be specified for tests that include
combinations already fully covered by previous tests.
69
Practical Combinatorial Testing
________________________________________________________________
Appendix D - REFERENCES
5. B. Beizer, Software Testing Techniques, Van Nostrand Reinhold, New York, 2nd
edition, 1990.
11. R. Bryce, C.J. Colbourn. The Density Algorithm for Pairwise Interaction Testing,
Journal of Software Testing, Verification and Reliability, August 2007
13. R. Bryce, Y. Lei, D.R. Kuhn, R. Kacker, "Combinatorial Testing", Chap. 14,
Handbook of Research on Software Engineering and Productivity Technologies:
Implications of Globalization, Ramachandran, ed. , IGI Global, 2009.
70
Practical Combinatorial Testing
_______________________________________________________
14. K. Burr and W. Young Combinatorial Test Techniques: Table-Based Automation,
Test Generation, and Test Coverage, International Conference on Software Testing,
Analysis, and Review (STAR), San Diego, CA, October, 1998.
23. Dalal, S.R., C.L. Mallows, Factor-covering Designs for Testing Software,
Technometrics, v. 40, 1998, pp. 234-243.
71
Practical Combinatorial Testing
________________________________________________________________
24. S. Dunietz, W. K. Ehrlich, B. D. Szablak, C. L. Mallows, A. Iannino. Applying
design of experiments to software testing, Proceedings of the Intl. Conf. on
Software Engineering, (ICSE ’97), 1997, pp. 205-215, New York
25. L. du Bousquet, Y. Ledru, O. Maury, C. Oriat, J.-L. Lanet, A case study in JML-
based software validation. Proceedings of 19th Int. IEEE Conf. on Automated
Sofware Engineering, pp. 294-297, Linz, Sep. 2004
27. C.A.R. Hoare, “Assertions, a Personal Perspective”, IEEE Annals of the History of
Computing, vol. 25, no. 2, pp. 14-25, 2003.
29. V. Hu, D.R. Kuhn, T. Xie, "Property Verification for Generic Access Control
Models", IEEE/IFIP International Symposium on Trust, Security, and Privacy for
Pervasive Applications, Shanghai, China, Dec. 17-20, 2008.
31. D.R. Kuhn, "Fault Classes and Error Detection Capability of Specification Based
Testing," ACM Transactions on Software Engineering and Methodology, Vol. 8,
No. 4 (October,1999).
33. D.R. Kuhn, R. Kacker, Y. Lei, "Automated Combinatorial Test Methods: Beyond
Pairwise Testing", Crosstalk, Journal of Defense Software Engineering, vol. 21, no.
6, June 2008
34. D.R. Kuhn and V. Okun, “Pseudo-exhaustive Testing for Software,” Proceedings of
30th NASA/IEEE Software Engineering Workshop, pp. 153-158, 2006
36. D.R. Kuhn, D.R. Wallace, and A. Gallo, “Software Fault Interactions and
Implications for Software Testing,” IEEE Transactions on Software Engineering,
30(6): 418-421, 2004
72
Practical Combinatorial Testing
_______________________________________________________
37. D.R. Kuhn, R. Kacker, Y.Lei, "Random vs. Combinatorial Methods for Discrete
Event Simulation of a Grid Computer Network", Proceedings, Mod Sim World
2009, Oct. 14-17 2009, Virginia Beach, pp. 83-88, NASA CP-2010-216205,
National Aeronautics and Space Administration.
38. D.R. Kuhn, R. Kacker, Y. Lei, "Combinatorial and Random Testing Effectiveness
for a Grid Computer Simulator" NIST Tech. Rpt. 24 Oct 2008.
40. D.R. Kuhn, J.M. Higdon, J.F. Lawrence, R. Kacker, Y. Lei, "Combinatorial
Methods for Event Sequence Testing", (to appear).
42. G.T. Leavens, A.L. Baker, and C. Ruby. JML: A notation for detailed design. In H.
Kilov, B. Rumpe, and I. Simmonds, editors, Behavioral Specifications of
Businesses and Systems. Kluwer, 1999
44. D.C. Luckham, F.W. von Henke. “Overview of Anna, a Specification Language
for Ada”, IEEE Software, vol. 2, no. 2, pp. 9-22, March 1985.
46. P.J. Maker, GNU Nana – User’s Guide (version 2.4). Technical report, School of
Information Technology – Northern Territory Univ., July 1998.
47. B.A. Malloy, J.M. Voas, “Programming with Assertions – a Prospectus”, IEEE IT
Professional, vol. 6, no. 5, pp. 53-59, Sept./Oct. 2004.
48. B. Marick, The Craft of Software Testing, Simon & Schuster, 1995.
49. A.P. Mathur, Foundations of Software Testing, Addison-Wesley, New York, 2008.
50. J.R. Maximoff, M.D. Trela, D.R. Kuhn, R. Kacker, “A Method for Analyzing
System State-space Coverage within a t-Wise Testing Framework”, IEEE
International Systems Conference 2010, Apr. 4-11, 2010, San Diego.
73
Practical Combinatorial Testing
________________________________________________________________
51. M. Memon and Q. Xie. Studying the fault-detection effectiveness of GUI test cases
for rapidly evolving software. IEEE Trans. Softw. Eng., 31(10):884–896, 2005.
53. G. Myers, The Art of Software Testing, John Wiley and Sons, New York, 1979.
55. V. Okun, "Specification Mutation for Test Generation and Analysis", PhD
Dissertation, U of Maryland Baltimore Co., 2004
56. Alexander Pretschner, Tejeddine Mouelhi, Yves Le Traon. Model Based Tests for
Access Control Policies, 2008 International Conference on Software Testing,
Verification, and Validation pp. 338-347
58. Patrick J. Schroeder, Pankaj Bolaki, and Vijayram Gopu. Comparing the fault
detection effectiveness of n-way and random test suites. In Proceedings of the IEEE
International Symposium on Empirical Software Engineering, pages 49–59, 2004.
60. J.M. Voas, K.W. Miller, “Putting Assertions in their Place”, Proceedings of
International Symposium on Software Reliability Engineering, IEEE, pp. 152-157,
1994.
61. J. Voas, Schatz, M., Schmid, M., "A Testability-based Assertion Placement Tool
for Object-Oriented Software," National Institute for Standards and Technology
NIST GCR 98-735, 1998.
63. X. Yuan, M.B. Cohen, A. Memon, “Covering Array Sampling of Input Event
Sequences for Automated GUI Testing”, November 2007 ASE '07: Proceedings of
the 22nd IEEE/ACM Intl. Conf. Automated Software Engineering, pp. 405-408.
74
Practical Combinatorial Testing
_______________________________________________________
64. X. Yuan and A. M. Memon. Using GUI run-time state as feedback to generate test
cases. In ICSE’07, Proceedings of the 29th International Conference on Software
Engineering, pages 396–405, Minneapolis, MN, USA, May 23–25, 2007.
65. D.R. Wallace, D.R. Kuhn, Failure Modes in Medical Device Software: an Analysis
of 15 Years of Recall Data, International Journal of Reliability, Quality, and Safety
Engineering, Vol. 8, No. 4, 2001.
66. A.W. Williams, R.L. Probert. A practical strategy for testing pair-wise coverage of
network interfaces The Seventh International Symposium on Software Reliability
Engineering (ISSRE '96) p. 246
75