0% found this document useful (0 votes)
762 views

EXP-9 Estimation of Test Coverage Metrics and Structural Complexity

The document discusses estimating test coverage metrics and structural complexity of programs. It defines key terms like control flow graph (CFG), basic blocks, paths, linearly independent paths, and McCabe's cyclomatic complexity metric. CFGs use nodes and edges to visually represent a program's control structure. Basic blocks group sequential statements to simplify large CFGs. Linearly independent paths provide a means to test all possible program executions. Cyclomatic complexity provides an upper bound on the number of independent paths and estimates test effort required. An example program is used to demonstrate these concepts.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
762 views

EXP-9 Estimation of Test Coverage Metrics and Structural Complexity

The document discusses estimating test coverage metrics and structural complexity of programs. It defines key terms like control flow graph (CFG), basic blocks, paths, linearly independent paths, and McCabe's cyclomatic complexity metric. CFGs use nodes and edges to visually represent a program's control structure. Basic blocks group sequential statements to simplify large CFGs. Linearly independent paths provide a means to test all possible program executions. Cyclomatic complexity provides an upper bound on the number of independent paths and estimates test effort required. An example program is used to demonstrate these concepts.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Estimation of Test Coverage Metrics and Structural Complexity

Introduction

A visual representation of flow of control within a program may help the


developer to perform static analysis of his code. One could break down his
program into multiple basic blocks, and connect them with directed edges to
draw a Control Flow Graph (CFG). A CFG of a program helps in identifying how
complex a program is. It also helps to estimate the maximum number of test
cases one might require to test the code.

In this experiment, we will learn about basic blocks and how to draw a CFG using
them. We would look into paths and linearly independent paths in context of a
CFG. Finally, we would learn about McCabe's cyclomatic complexity, and classify a
given program based on that.

Objectives

After completing this experiment you will be able to:

 Identify basic blocks in a program module, and draw it's control flow graph
(CFG)

 Identify the linearly independent paths from a CFG

 Determine Cyclomatic complexity of a module in a program

Time Required

Around 3.00 hours

Control Flow Graph

A control flow graph (CFG) is a directed graph where the nodes represent
different instructions of a program, and the edges define the sequence of
execution of such instructions. Figure 1 shows a small snippet of code (compute
the square of an integer) along with it's CFG. For simplicity, each node in the CFG
has been labeled with the line numbers of the program containing the
instructions. A directed edge from node #1 to node #2 in figure 1 implies that
after execution of the first statement, the control of execution is transferred to
the second instruction.

int x = 10, x_2 = 0;

x_2 = x * x;

return x_2;

Figure 1: A simple program and it's CFG

A program, however, doesn't always consist of only sequential statements. There


could be branching and looping involved in it as well. Figure 2 shows how a CFG
would look like if there are sequential, selection and iteration kind of statements
in order. Figure 2: CFG for
different types of statements

A real life application seldom could be written in a few lines. In fact, it might
consist of thousand of lines. A CFG for such a program is likely to become very
large, and it would contain mostly straight-line connections. To simplify such a
graph different sequential statements could be grouped together to form a basic
block. A basic block is a [ii, iii] maximal sequence of program instructions I1, I2, ...,
In such that for any two adjacent instructions Ik and Ik+1, the following holds true:

 Ik is executed immediately before Ik+1

 Ik+1 is executed immediately after Ik

The size of a CFG could be reduced by representing each basic block with a node.
To illustrate this, let's consider the following example.

sum = 0;

i = 1;

while (i ≤ n) {

sum += i;

++i;

}
printf("%d", sum);

if (sum > 0) {

printf("Positive");

The CFG with basic blocks is shown for the above code in figure 3.

Figure 3: Basic blocks in a CFGThe


first statement of a basic block is termed as leader. Any node x in a CFG is said to
dominate another node y (written as x dom y) if all possible execution paths that
goes through node y must pass through node x. The node x is said to be
a dominator [ii]. In the above example, line #s 1, 3, 4, 6, 7, 9, 10 are leaders. The
node containing lines 7, 8 dominate the node containing line # 10. The block
containing line #s 1, 2 is said to be the entry block; the block containing line # 10
is said to be the exit block.

If any block (or sub-graph) in a CFG is not connected with the sub-graph
containing the entry block, that signifies the concerned block contains code,
which is unreachable while the program is executed. Such unreachable code can
be safely removed from the program. To illustrate this, let's consider a modified
version of our previous code:

sum = 0;

i = 1;

while (i ≤ n) {

sum += i;

++i;

return sum;

if (sum < 0) {

return 0;

Figure 4 shows the corresponding CFG. The sub-graph containing line #s 8, 9, 10 is


disconnected from the graph containing the entry block. The code in the
disconnected sub-graph would never get executed, and, therefore, could be
discarded. Figure 4:
CFG with unreachable blocks

Terminologies

Path
A path in a CFG is a sequence of nodes and edges that starts from the initial node
(or entry block) and ends at the terminal node. The CFG of a program could have
more than one terminal nodes.

Linearly Independent Path


A linearly independent path is any path in the CFG of a program such that it
includes at least one new edge not present in any other linearly independent
path. A set of linearly independent paths give a clear picture of all possible paths
that a program can take during it's execution. Therefore, path-coverage testing of
a program would suffice by considering only the linearly independent paths.
In figure 3 we can find four linearly independent paths:

1 - 3 - 6 - (7, 8) - 10

1 - 3 - 6 - (7, 8) - 9 - 10

1 - 3 - (4, 5) - 6 - (7, 8) - 10
1 - 3 - (4, 5) - 6 - (7, 8) - 9 - 10

Note that 1 - 3 - (4, 5) - 3 - (4, 5) - 6 - (7, 8) - 10, for instance, won't qualify as a
linearly independent path because there is no new edge not already present in
any of the above four linearly independent paths.

McCabe's Cyclomatic Complexity

McCabe had applied graph-theoretic analysis to determine the complexity of a


program module [vi]. Cyclomatic complexity metric, as proposed by McCabe,
provides an upper bound for the number of linearly independent paths that could
exist through a given program module. Complexity of a module increases as the
number of such paths in the module increase. Thus, if Cyclomatic complexity of
any program module is 7, there could be up to seven linearly independent paths
in the module. For a complete testing, each of those possible paths should be
tested.

Computing Cyclomatic Complexity

Let G be a a given CFG. Let E denote the number of edges, and N denote the


number of nodes. Let V(G) denote the Cyclomatic complexity for the
CFG. V(G) can be obtained in either of the following three ways:

 Method #1:V(G) = E - N + 2

 Method #2: V(G) could be directly computed by a visual inspection of the


CFG:V(G) = Total number of bounded areas + 1It may be noted here that
structured programming would always lead to a planar CFG.

 Method #3: If LN be the total number of loops and decision statements in a


program, thenV(G) = LN + 1

In case of object-oriented programming, the above equations apply to methods of


a class [viii]. Also, the value of V(G) so obtained is incremented by 1 considering
the entry point of the method. A quick summary of how different types of
statements affect V(G) could be found in [ix]. Once the complexities of individual
modules of a program are known, complexity of the program (or class) could be
determined by [4], [ix]:V(G) = SUM( V(Gi) ) - COUNT( V(Gi) ) +
1where COUNT( V(Gi) ) gives the total number of procedures (methods) in the
program (class).

Optimum Value of Cyclomatic Complexity

A set of threshold values for Cyclomatic complexity has been presented in [vii],
which we reproduce below.

V(G) Module Category Risk

1-10 Simple Low

11-20 More complex Moderate

21-50 Complex High

> 50 Unstable Very high

It has been suggested that the Cyclomatic complexity of any module should not
exceed 10 [vi], [4]. Doing so would make a module difficult to understand for
humans. If any module is found to have Cyclomatic complexity greater than 10,
the module should be considered for redesign. Note that, a high value of V(G) is
possible for a given module if it contains multiple cases in C like switch-
case statements. McCabe had exempted such modules from the limit of V(G) as
10 [vi].

Merits

McCabe's Cyclomatic complexity has certain advantages:

 Independent of programming language

 Helps in risk analysis during development or maintenance phase

 Gives an idea about the maximum number of test cases to be executed


(hence, the required effort) for a given module
Demerits

Cyclomatic complexity doesn't reflect on cohesion and coupling of modules.

McCabe's Cyclomatic complexity was originally proposed for procedural


languages. One may look in [xi] to get an idea of how the complexity calculation
could be modified for object-oriented languages. In fact, one may also wish to
make use of Chidamber-Kemerer metrics [x] (or any other similar metric), which
has been designed for object-oriented programming.

Code for GCD computation by Euclid's method

while (x != y) {

if (x > y)

x = x - y;

else

y = y - x;

return x;

Determining McCabe's Cyclomatic Complexity

Method #1
N = No. of nodes = 7
E = No. of edges = 8
V(G) = E - N + 2 = 8 - 7 + 2 = 3

Method #2
V(G) = Total no. of non overlapping areas + 1 = 2 + 1 = 3

Method #3 V(G) = Total no. of decision statements and loops + 1 = 1 + 1 + 1 = 3

Let us determine the Cyclomatic complexity for the "ReissueBook" method as


shown below:
public ID ReissueBook(ID userID, ID bookID) {

Member user = Member.GetMember(userID);

ID transactionID = null;

if ( user.canIssueNow() && Book.IsAvailable(bookID) ) {

Integer count = user.getReissueCountFor(bookID); // # of times this books


has been reissued after it's recent issue by the user

if ( count < REISSUE_LIMIT ) {

user.incrementReissueCount(bookID);

BookTransaction transaction = new BookTransaction(userID, bookID);

transaction.save();

transactionID = transaction.getID();

return transactionID;

The Control Flow Graph for the above module is shown in figure 1. The CFG has
six nodes and seven edges. So, the Cyclomatic complexity is V(G) = 7 - 6 + 2 = 3. It
can be verified with the other two formulae as well: # of regions + 1 = 2 + 1 = 3.
Also, # of decision points = 2. So, V(G) = 2 + 1 = 3. However, as mentioned in the
theory section, for methods of classes we add an extra 1 to the V(G). So, the
Cyclomatic complexity of this method becomes 4, which is good.
Figure 1.
CFG for "ReissueBook" method

Note that in line # 3 two decisions have been short-circuited. Taking this into
account, V(G) for the module would become 5, which is OK. This implies that the
method could have upto five linearly independent paths. By looking at figure 1 we
can easily identify three such paths. However, as mentioned that line # 3 consists
of two decision points, that results in another "implicit" path. Based on these, we
can design four test cases that would result in Boolean values for this sequence
{ user.canIssueNow, Book.IsAvailable, count < REISSUE_LIMIT }. The four such
cases are shown below:

 { true, true, true } : Output should be a valid ID

 { false, true, true } : Output would be null

 { true, false, true } : Output would be null

 { true, true, false } : Output would be null


Now let us focus on the "IssueManager" class. For simplicity, let's assume it has
only two methods: IssueBook and ReissueBook, as shown below.

public Class IssueManager {

public ID IssueBook(ID userID, ID bookID) {

Member user = Member.GetMember(userID);

ID transactionID = null;

if ( user.canIssueNow() && Book.IsAvailable(bookID) ) {

Book.SetStatusIssued(bookID);

user.incrementIssueCount(bookID);

BookTransaction transaction = new BookTransaction(userID, bookID);

transaction.save();

transactionID = transaction.getID();

return transactionID;

public ID ReissueBook(ID userID, ID bookID) {

Member user = Member.GetMember(userID);

ID transactionID = null;

if ( user.canIssueNow() && Book.IsAvailable(bookID) ) {

Integer count = user.getReissueCountFor(bookID); // # of times this books


has been reissued after it's recent issue by the user
if ( count < REISSUE_LIMIT ) {

user.incrementReissueCount(bookID);

BookTransaction transaction = new BookTransaction(userID, bookID);

transaction.save();

transactionID = transaction.getID();

return transactionID;

"IssueBook" has two decision points (if and &&). So, V(GIssueBook) = (2 + 1) + 1 = 4.


We have already determined V(GReissueBook) to be 5. So, the total Cyclomatic
complexity of this class (having two methods) becomesV(G) = (4 + 5) - 2 + 1 = 8

You might also like