0% found this document useful (0 votes)
69 views

Software Analogies

The authors describe an alternative approach to estimation based upon the use of analogies. Similarity is defined as Euclidean distance in n-dimensional space where n is the number of project features. The method is validated on nine different industrial datasets (a total of 275 projects)

Uploaded by

clausse
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Software Analogies

The authors describe an alternative approach to estimation based upon the use of analogies. Similarity is defined as Euclidean distance in n-dimensional space where n is the number of project features. The method is validated on nine different industrial datasets (a total of 275 projects)

Uploaded by

clausse
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

736 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 23, NO.

12, NOVEMBER 1997

Estimating Software Project Effort


Using Analogies
Martin Shepperd and Chris Schofield

Abstract—Accurate project effort prediction is an important goal for the software engineering community. To date most work has
focused upon building algorithmic models of effort, for example COCOMO. These can be calibrated to local environments. We
describe an alternative approach to estimation based upon the use of analogies. The underlying principle is to characterize projects
in terms of features (for example, the number of interfaces, the development method or the size of the functional requirements
document). Completed projects are stored and then the problem becomes one of finding the most similar projects to the one for
which a prediction is required. Similarity is defined as Euclidean distance in n-dimensional space where n is the number of project
features. Each dimension is standardized so all dimensions have equal weight. The known effort values of the nearest neighbors to
the new project are then used as the basis for the prediction. The process is automated using a PC-based tool known as ANGEL.
The method is validated on nine different industrial datasets (a total of 275 projects) and in all cases analogy outperforms
algorithmic models based upon stepwise regression. From this work we argue that estimation by analogy is a viable technique that,
at the very least, can be used by project managers to complement current estimation techniques.

Index Terms—Effort prediction, estimation process, empirical investigation, analogy, case-based reasoning.

—————————— ✦ ——————————

1 INTRODUCTION

A N important aspect of any software development proj-


ect is to know how much it will cost. In most cases the
major cost factor is labor. For this reason estimating devel-
tion in the dependent variable that can be “explained” in
terms of the independent variables. Unfortunately, this is
not always an adequate indicator of prediction quality
opment effort is central to the management and control of a where there are outlier or extreme values. Yet another ap-
software project. proach is to use Pred(25) which is the percentage of predic-
A fundamental question that needs to be asked of any tions that fall within 25 percent of the actual value. Clearly
estimation method is how accurate are the predictions. Ac- the choice of accuracy measure to a large extent depends
curacy is usually defined in terms of mean magnitude of upon the objectives of those using the prediction system.
relative error (MMRE) [6] which is the mean of absolute For example, MMRE is fairly conservative with a bias
percentage errors: against overestimates while Pred(25) will identify those
i n EE

  prediction systems that are generally accurate but occasion-
Ç  
100 ally wildly inaccurate. In this paper we have decided to
(1)
E
i 1  n
 i
adopt MMRE and Pred(25) as prediction performance indi-
cators since these are widely used, thereby rendering our
where there are n projects, E is the actual effort and E is the results more comparable with those of other workers.
predicted effort. There has been some criticism of this The remainder of this paper reviews work to date in the
measure, in particular that it is unbalanced and penalizes field of effort prediction (both algorithmic and non-
overestimates more than underestimates. For this reason algorithmic) before going on to describe an alternative ap-
Miyazaki et al. [19] propose a balanced mean magnitude of proach to effort prediction based upon the use of analogy.
relative error measure as follows: Results from this approach are compared with traditional

EE

 statistical methods using nine datasets. The paper then dis-

Ç  3 8 

i n
100 cusses the results of a sensitivity analysis of the analogy
 (2)

min E, E
i 1  n
i
method. An estimation process is then presented. The pa-
per concludes by discussing the strengths and limitations of
This approach has been criticized by Hughes [8], among analogy as a means of predicting software project effort.
others, as effectively being two distinct measures that
should not be combined.
2 A BRIEF HISTORY OF EFFORT PREDICTION
Other workers have used the adjusted R squared or coef-
ficient of determination to indicate the percentage of varia- Over the past two decades there has been considerable ac-
tivity in the area of effort prediction with most approaches
¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥
being typified as being algorithmic in nature. Well known
• M. Shepperd and C. Schofield are with the Department of Computing, 1
Bournemouth University, Talbot Campus, Poole, BH12 5BB United King-
examples include COCOMO [4] and function points [2].
dom. E-mail: {mshepper, cschofie}@bournemouth.ac.uk. Whatever the exact niceties of the model, the general form
Manuscript received 10 Feb. 1997. tends to be:
Recommended for acceptance by D.R. Jeffery.
For information on obtaining reprints of this article, please send e-mail to: 1. We include function points as an algorithmic method since they are di-
[email protected], and reference IEEECS Log Number 104091. mensionless and therefore need to be calibrated in order to estimate effort.
0098-5589/97/$10.00 © 1997 IEEE
SHEPPERD AND SCHOFIELD: ESTIMATING SOFTWARE PROJECT EFFORT USING ANALOGIES 737

E aS b (3) compares linear regression with a neural net approach us-


ing the COCOMO dataset. Both approaches seem to per-
where E is effort, S is size typically measured as lines of
form badly with MMREs of 520.7 and 428.1 percent, re-
code (LOC) or function points, a is a productivity parame-
spectively.
ter and b is an economies or diseconomies of scale parame-
Srinivasan and Fisher [27] also report on the use of a
ter. COCOMO represents an approach that could be re-
neural net with a back propagation learning algorithm.
garded as “off the shelf.” Here the estimator hopes that the
They found that the neural net outperformed other tech-
equations contained in the cost model adequately represent
niques and gave results of MMRE = 70 percent. However, it
their development environment and that any variations can
is unclear exactly how the dataset was divided up for
be satisfactorily accounted for in terms of cost drivers or
training and validation purposes. Unfortunately, they also
parameters built into the model. For instance COCOMO
found that the results were sensitive to the number of hid-
has 15 such drivers. Unfortunately, there is considerable
den units and layers. Results to date suggest that accuracy
evidence that this “off the shelf” approach is not always
is sensitive to decisions regarding the topology of the net,
very successful. Kemerer [12] reports average errors (in
the number of learning epochs and the initial random
terms of the difference between predicted and actual proj-
weights of the neurons within the net. In addition, there is
ect effort) of over 600 percent in his independent study of
little explanation value in a neural net, that is such models
COCOMO. Other independent studies [14], [18] have also
do not help us understand software project development
reported high error rates.
effort.
Another algorithmic approach is to calibrate a model by
There have been a number of attempts to use regression
estimating values for the parameters (a and b in the case of
and decision trees to predict aspects of software engineer-
(3)). However, the most straightforward method is to as-
ing. Srinivasan and Fisher [27] describe the use of a regres-
sume a linear model, that is set b to unity, and then use re-
sion tree to predict effort using the Kemerer dataset [12].
gression analysis to estimate the slope (parameter a) and
They found that although it outperformed COCOMO and
possibly introduce an intercept so the model becomes:
SLIM, the results were less good than using either a statisti-
E a1  a 2 S (4) cal model derived from function points or a neural net. Bri-
so that a1 represents fixed development costs (for example and et al. [5] obtained rather better results (MMRE = 94
regression testing will consume a fixed amount of effort percent) from their tree induction analysis. In this case they
irrespective of the size of the software) and a2 represents used a combination of the Kemerer and COCOMO datasets.
productivity. Kok et al.[15] describes how this approach has Porter and Selby [21], [22] describe the use of decision or
been successfully utilized on the Esprit MERMAID Project. classification trees in predicting aspects of the software de-
Function points [2] are also often calibrated to local envi- velopment process. Results from this approach seem to be
ronments in order to convert size in function points to pre- quite mixed and, as with the neural net approach, results
dicted effort. Again, as with COCOMO, quite mixed results are quite sensitive to aspects such as the choice of algorithm
have been reported [9], [10], [12], [17]. Kitchenham and to derive the tree and tree depth.
Kansala [13] also note that better results can be obtained Finally, Mukhopadhyay et al. [20] describe some early
through disaggregating the components of function points work using a hybrid case based reasoning (CBR) and rule
and using stepwise regression to reestimate weights and based system. They report encouraging results based,
determine the significant components. again, upon the dataset collected by Kemerer, however,
Although, most research into project effort estimation their approach requires access to an expert in order to de-
has adopted an algorithmic approach there has been lim- rive estimation rules and create a case base. Our work dif-
ited exploration of machine learning or nonalgorithmic fers in that no expert is used and a pure CBR strategy is
methods. For example, Karunanithi et al. [11] report the use adopted.
of neural nets for predicting software reliability, and con- Although the results from nonalgorithmic approaches
clude that both feed forward and Jordan networks with a seem quite mixed they are sufficiently encouraging to war-
cascade correlation learning algorithm, out perform tradi- rant further investigation. However, we wish to stress that
tional statistical models. More recently Wittig and Finnie we do not propose that algorithmic approaches be rejected,
[28] describe their use of back propagation learning algo- merely that we search for additional and complementary
rithms on a multilayer perceptron in order to predict devel- methods of software project effort prediction. The reason
opment effort. An overall error rate (MMRE) of 29 percent for this is that in situations where pronounced linear or
was obtained which compares favorably with other meth- curvilinear relationships are to be found the ability to
ods. However, it must be stressed that the datasets were model this in terms of algorithms is important. In addition,
large (81 and 136 projects, respectively) and that only a very the use of multiple techniques can be used as a “sanity
small number of projects were withdrawn for validation check” upon any prediction generated.
purposes. Some outliers also appear to have been removed.
This tends to confirm the findings from Serluca [25] that 3 ANALOGY
neural nets seem to require large training sets in order to
Estimation by analogy is a form of CBR. Cases are defined
give good predictions.
as abstractions of events that are limited in time and space.
Another study by Samson et al. [24] uses an Albus multi-
It is argued that estimation by analogy offers some distinct
layer perceptron in order to predict software effort. In this
advantages.
instance they use Boehm’s COCOMO dataset. The work
738 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 23, NO. 12, NOVEMBER 1997

• It avoids the problems associated both with knowl- %K(C


1j  C2 j )
2

) &0
edge elicitation and extracting and codifying the
K'1
knowledge. Feature _ dissimilarity(C1 j , C2 j
• Analogy-based systems only need deal with those
problems that actually occur in practice, while gen-
where 1) the features are numeric, 2) if the features
erative (i.e., algorithmic) systems must handle all pos-
sible problems. are categorical and C1j = C2j, or 3) where the features
• Analogy-based systems can also handle failed cases are categorical and, C1 › C2j, respectively.
(i.e., those cases for which an accurate prediction was • Manually guided induction. Here an expert manually
not made). This is useful as it enables users to identify identifies key features, although this reduces some
potentially high-risk situations. of the adavantages of using a CBR system in that
• Analogy is able to deal with poorly understood do- an expert is required.
mains (such as software projects) since solutions are • Template retrieval. This functions in a similar fash-
based upon what has actually happened as opposed ion to query by example database interfaces, that is
to chains of rules in the case of rule based systems. the user supplies values for ranges, and all cases
• Users may be more willing to accept solutions from that match are retrieved.
analogy based systems since they are derived from a • Goal directed preference. Select cases that have the
form of reasoning more akin to human problem same goal as the current case.
solving, as opposed to the somewhat arcane chains of • Specificity preference. Select cases that match fea-
rules or neural nets. This final advantage is particu- tures exactly over those that match generally.
larly important if systems are to be not only deployed • frequency preference—select cases that are most
but also have reliance placed upon them. frequently retrieved.
The key activities for estimating by analogy are the • Recency preference. Choose recently matched cases
identification of a problem as a new case, the retrieval of over those that have not been matched for a period
similar cases from a repository, the reuse of knowledge de- of time.
rived from previous cases and the suggestion of a solution • Fuzzy similarity. Where concepts such as at-least-as-
for the new case. This solution may be revised in the light similar and just-noticeable-difference are employed.
of actual events and the outcome retained to augment the The similarity measures suffer from a number of disad-
repository of completed cases. This approach to prediction vantages. First, they tend to be computationally intensive,
poses two problems. First, how do we characterize cases? although Aha [1] has proposed a number of more efficient
Second, how do we retrieve similar cases, indeed how do algorithms that are only marginally less accurate. However,
we measure similarity? efficiency is not an issue for project effort estimation as
Characterization of cases is largely a pragmatic issue of typically one is dealing with less than 100 cases. Second, the
what information is available. Variables can be continuous algorithms are intolerant of noise and of irrelevant features.
(i.e., interval, ratio or absolute scale measures) or categori- One strategy to overcome this problem is to build in learn-
cal (i.e., nominal or ordinal measures). When designing a ing so that the algorithm learns the importance of the vari-
new CBR system, experts should be consulted to try to es- ous features. Essentially, weights are increased for match-
tablish those features of a case that are believed to be sig- ing features for successful predictions and diminished for
nificant in determining similarity, or otherwise, of cases. unsuccessful predictions. Third, symbolic or categorical
Rich and Knight [23] describe the problem of choosing in- features are problematic. Although there are several algo-
sufficiently general features. Again the solution appears to rithms that have been proposed to accommodate such fea-
be to use an expert. tures they are all fairly crude in that they adopt a Boolean
Assessing similarity is the other problem. There are a va- approach: features match or fail to match with no middle
riety of approaches including a number of preference heu- ground. A fourth criticism of these similarity measures is
ristics proposed by Kolodner [16]: that they fail to take into account information which can be
• Nearest Neighbor Algorithms. These are the most derived from the structure of the data, thus, they are weak
popular and are either based upon straightforward for higher order feature relationships such as one might
distance measures or the sum of squares of the differ- expect to see exhibited in legal systems.
ences for each variable. In either case each variable Our approach has been guided by the twin aims of ex-
must be first standardized (so that it has an equal in- pediency and simplicity. In essence we take a new project,
fluence) and then weighted according to the degree of one for which we wish to predict effort, and attempt to find
importance attached to the feature. A common algo- other similar completed projects. Since these projects are
rithm is given by Aha [1]. completed, development effort will be known and can be
1 used as a basis for estimating effort for the new project.
SIM(C1 , C2 , P ) Similarity is defined in terms of project features, such as
Ç1°P Feature _ dissimilarity(C1j , C2 j ) number of interfaces, development method, application
where P is the set of n features, C1 and C2 are cases domain and so forth. Clearly the features used will depend
and upon what data is available to characterize projects. The
number of features is also flexible. We have analyzed data-
SHEPPERD AND SCHOFIELD: ESTIMATING SOFTWARE PROJECT EFFORT USING ANALOGIES 739

sets with as few as one feature and as many as 29 features. gression models were generated using the entire dataset.
Features may be either categorical or continuous. This means the results are likely to be biased in favor of the
Similarity, defined as proximity in n-dimensional space regression models. Note that we use two slightly different
(where each dimension corresponds to a different feature), regression analysis techniques. Both regression 1 and 2 use
is most intuitively appealing, hence we use unweighted stepwise regression, however, regression 1 restricts the pro-
Euclidean distance. The most similar projects will be closest cedure to the three variables most highly correlated with
to each other. Note that each dimension is standardized the dependent variable (i.e., effort). Not surprisingly the
(between 0 and 1) so that it has the same degree of influ- results are in general similar, however, occasional differ-
ence and the method is immune to the choice of units. ences are due to the fact that the regression procedure at-
Moreover, the notion of distance gives an indication of the tempts to minimize the sum of the squares of the residuals,
degree of similarity. Once the analogous projects have been whereas MMRE is based upon the mean of the sum of the
found, the known effort can be used in a variety of ways. unsquared residuals.
We use the weighted or unweighted average of up to three Each dataset is treated separately since each one has dif-
analogies. No one approach is consistently more accurate so ferent project features available and therefore we are not
the decision requires a certain amount of experimentation able to merge all the data into a single all encompassing
on the part of the estimators. Because of the small datasets, dataset. This is appropriate since it is unlikely that an or-
we cope with noise (that is, unhelpful features that do not ganization would have access to such large volumes of data
aid in the process of finding good analogies) by means of and there seems some merit in estimating using smaller,
an exhaustive search of all possible subsets of the project more homogenous datasets, a point we will return to.
features so as to obtain the optimum predictions for proj- From Table 2 we see that for all datasets the MMRE per-
ects with known effort. The whole method, from storing formance of estimating by analogy is better than that of the
analogies through eliminating redundant features to finding regression methods. This suggests that analogy is capable of
analogies is automated by a PC-based software tool known yielding more accurate predictions, at least for these datasets.
2
as ANGEL (ANaloGy Estimation tool ). A fuller description An interesting problem occurs for Real-time 1 dataset. Here it
is to be found in Shepperd et al. [26]. was not possible to develop an algorithmic model or to use
regression analysis since the dataset comprises only categori-
cal data, with the exception of actual project effort. Indeed
4 COMPARING ESTIMATION BY ANALOGY WITH
the dataset was very sparse and was made up of only three
REGRESSION MODELS distinguishing project features. Yet even in these highly un-
Next, we compared the accuracy of software project effort propitious circumstances the analogy method was able to
prediction using analogy with an algorithmic approach yield a predictive accuracy of 74 percent. This is indicative of
based upon equations derived through stepwise regres- the possibility of being able to use analogy based estimation
sion analysis. at an extremely early stage of a project when other estimation
Table 1 summarizes the datasets that were used for our techniques may not be possible for the reason that analogy
comparison of analogy based estimation with stepwise re- does not require quantitative data. Similarly, an accuracy of
gression. As can be seen from the table the datasets are 39 percent was obtained for the dataset Telecom 1 despite the
quite diverse and are drawn from many different applica- fact that only a single distinguishing feature was available.
tion domains ranging from telecommunications to com- Again, stepwise regression only achieves a result of MMRE =
mercial information systems. All the data was taken from 86 percent by method 1 or 2.
industrial projects, that is, no academic or student projects The Pred(25) results from Table 2 are slightly more
are included. The projects range in size from a few person mixed. Recall that unlike MMRE, a higher score implies
months to over 1,000 person months. It is also important to better predictive accuracy. Two datasets (Atkinson and
stress that none of the data was collected with estimation Desharnais) yield a higher Pred(25) score for the regression
by analogy in mind, instead we were able to exploit data model. In general, the results are closer than for the MMRE
that was already available. The final point is that we only analysis. One explanation lies in the fact that the ANGEL
utilized information that would be available at the time the tool explicitly tries to optimize the MMRE result so that it is
prediction would be made, so we avoided project features not surprising that it performs best in terms of this indica-
such as LOC. This is important if we wish to avoid creating tor. A second explanation lies in the fact that MMRE and
a false impression as to the efficacy of different prediction Pred(25) are assessing slightly different characteristics of a
methods. prediction system. MMRE is conservative and looks at the
Table 2 shows the accuracy of the respective methods mean absolute percentage error whereas Pred(25) is opti-
using the MMRE and Pred (25) values. A jack-knifing pro- mistic and focuses upon the best predictions (i.e., those
3
cedure was adopted for the analogy-based predictions, within 25 percent of actual) and ignores all other predic-
since this could be automated in the ANGEL tool, the re- tions. The choice of indicator to some extent depends upon
the objectives of the user. Nevertheless, the overall picture
2. The authors are happy to provide a simple version of ANGEL at no suggests that estimation by analogy tends to be the more
cost. The zip files may be downloaded from http://xanadu.bournemouth.ac.uk/ accurate prediction method.
ComputingResearch/ChrisSchofield/Angel/AngelPage.html
.
3. Jack knifing is a validation technique whereby each case is removed
from the dataset and the remainder of the cases are used to predict the
removed case. The case is then returned to the dataset and the next case
removed. This procedure is repeated until all cases have been covered.
740 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 23, NO. 12, NOVEMBER 1997

TABLE 1
DATASETS USED TO COMPARE EFFORT PREDICTION METHODS
Name Source n Features Description
Albrecht [2] 24 5 IBM DP Services projects
Atkinson [3] 21 12 Builds to a large telecommunications product at
U.K. company X
Desharnais [7] 77 9 Canadian software house—commercial projects
Finnish Finnish Dataset: dataset made 38 29 Data collected by the TIEKE organization from IS
available to the ESPRIT Mermaid projects from nine different Finnish companies.
Project by the TIEKE organization
Kemerer [12] 15 2 Large business applications
Mermaid MM2 Dataset: Dataset made 28 17 New and enhancement projects
available to the ESPRIT Mermaid
Project anonymously
Real-time 1 not in public domain 21 3 Real-time projects at U.K. company Z
Telecom 1 Appendix A 18 1 Enhancements to a U.K. telecommunication product
Telecom 2 not in public domain 33 13 Telecommunication product at U.K. company Y

TABLE 2
RELATIVE ACCURACY LEVELS OF EFFORT ESTIMATION FOR ANALOGY AND REGRESSION
Analogy Regression 1 Regression 2 Analogy Regression 1 Regression 2
(MMRE) (MMRE) (MMRE) (Pred 25) (Pred 25) (Pred 25)
Dataset (%) (%) (%) (%) (%) (%)
Albrecht 62 90 90 33 33 33
Atkinson 39 45 40 38 43 38
Desharnais 64 66 66 36 42 42
4
Finnish 41 101 128 39 21 29
Kemerer 62 107 107 40 13 13
Mermaid 78 252 226 21 14 14
Real-time 1 74 N/A N/A 23 N/A N/A
Telecom 1 39 86 86 44 44 44
Telecom 2 37 142 72 51 27 42

TABLE 3
RELATIVE ACCURACY LEVELS OF HOMOGENIZED DATASETS
Analogy Regression 1 Regression 2 Analogy Regression 1 Regression 2
(MMRE) (MMRE) (MMRE) (Pred 25) (Pred 25) (Pred 25)
Dataset (%) (%) (%) (%) (%) (%)
Desharnais 1 37 41 41 47 45 45
Desharnais 2 29 29 29 47 48 48
Desharnais 3 26 36 49 70 30 50
Mermaid E 53 62 62 39 27 27
Mermaid N 60 – – 25 – –

In general, the best results seem to be achieved where regression based prediction when using the Pred (25) indi-
the data is drawn from many builds or enhancements to an cator. The Mermaid N dataset is particularly interesting as
existing system, for example the Atkinson, Telecom 1, and it shows a dataset for which no statistically significant rela-
Telecom 2 datasets. The poorest results occur when the data tionships could be found between any of the independent
is drawn from a wide range of projects from more than one variables and effort hence no statistically significant regres-
organisation, such as the Mermaid dataset. This tendency sion equation can be derived. By contrast, the analogy
appears to be true for both analogy and regression analysis. method is able to produce an overall estimation accuracy of
Table 3 shows the results of dividing the Desharnais and MMRE = 60 percent.
Mermaid datasets into more homogenous subsets. The Finally, we note that the procedure to search for opti-
Desharnais dataset is divided on the basis of differing de- mum subsets of features for predicting effort reduced the
velopment environments. The Mermaid data is divided into set of features for every dataset studied excepting, of
enhancement (E) and new (N) projects. We observe that this course, Telecom 1 which only had a single feature in the
division leads to enhanced accuracy for all estimation first place. This procedure has a significant impact upon the
methods. Overall analogy has equal or superior perform- levels of accuracy that we were able to obtain.
ance to regression based prediction for seven out of eight
comparisons, the only exception being the Desharnais 2
5 SENSITIVITY ANALYSIS
dataset which reveals fractionally superior performance for
An important question to ask about any prediction method is
how sensitive is it to any peculiar characteristics of the data
4. In a previous paper [26] we reported an accuracy level of MMRE = 62 and how will it behave over time. All the datasets we studied
percent. The improvement is due to the use of additional project features
with which to find analogies that were not utilized during our earlier work.
were historical in the sense that they described completed
SHEPPERD AND SCHOFIELD: ESTIMATING SOFTWARE PROJECT EFFORT USING ANALOGIES 741

projects and we conducted the analysis after the event. This risk technique at below this number of projects. The Tele-
section explores the dynamic behavior of effort prediction by com 2 dataset shows little improvement beyond 15 projects.
simulating the growth of a dataset over time. This enables us On this theme, it is interesting to note that, overall, it is not
to answer questions such as how many data points do we the largest datasets such as the Desharnais dataset that
need for estimation by analogy to be viable and how stable have the lowest MMREs and clearly other factors, over and
are the results (in other words, are the accuracy levels vul- above size, such as homogenity also have an impact.
nerable to the addition of a single rogue project)? An interesting feature of Fig. 1 is the sharp rise in the
Figs. 1 and 2 show the trends in estimation accuracy as MMRE values that occurs after 10 projects have been added
the datasets grow. The Albrecht dataset (Fig. 1) was se- for random sequence A1 and 16 added for random se-
lected as an example of a dataset for which a comparatively quence A2. Further investigation reveals that both of these
low level of accuracy was achieved and in contrast the anomalies are linked to the introduction of the same proj-
Telecom 2 dataset (Fig. 2) showed the highest level of accu- ect. The project is third in sequence A3, when predictions
racy. The procedure was to randomly number the projects are still very poor. This suggests that the results from esti-
from 1 to n (where n is the number of projects in the data- mating by analogy, like regression, can be influenced by
set). Projects are added to the dataset, one at a time, in the outlying projects. However, A2 demonstrates that the affect
random number order. Thus, the dataset grows until all of a rogue project is ameliorated as the size of the dataset
projects are added. The optimum subset of features was increases. Superficially there appears to be a similar effect in
recalculated as each new project was added. This involved Fig. 2 for sequences T1 and T3 and projects 4 and 7, respec-
for each partial dataset (starting from two projects), jack tively. In this case, however, the peaks are caused by differ-
knifing the dataset by holding out each project, one at a ent projects and the most likely explanation is the vulnerabil-
time, and using the remaining projects to predict effort. The ity of finding analogous cases from very small datasets.
average absolute prediction error for all projects contained
in the partial dataset gives the MMRE of that partial data-
6 AN ESTIMATION PROCESS
set. This procedure was repeated three times for each data-
set (hence, A1, A2, and A3 and T1, T2, and T3). This section considers how estimation by analogy can be
introduced into software development organizations. The
following are the main stages in setting up an estimation by
analogy program:
• identify the data or features to collect
• agree data definitions and collection mechanisms
• populate the case base
• tune the estimation method
• estimate the effort for a new project
The first stage, that of identifying what data to collect,
will be very dependent upon the nature of the projects for
which estimates are required. Because of these variations,
our software tool ANGEL is designed to be very flexible in
the data that is used to characterize analogies and the user
Fig. 1. Estimation accuracy over time (Albrecht dataset). is able to define a template describing the data that will be
supplied. Factors to be taken into account include beliefs as
to what features significantly impact development effort
(and are measurable at the time the estimate is required)
and what features can easily be collected. There is little
sense in identifying huge numbers of variables that cannot
be easily or reliably collected in practice. Estimation by
analogy can cope with both continuous and categorical
data, although categorical data has to be held as binary val-
ues. For instance, programming language would be repre-
sented as a series of truth valued variables e.g., COBOL,
4GL, C++, etc. The reason for this is that the similarity
measure treats categorical features as either being the same
or different: there are no degrees of difference.
The second stage is to agree definitions as to what is being
Fig. 2. Estimation accuracy over time (Telecom 2 dataset). collected. Even within an organizations there may be no
shared understanding of what is meant by effort. Any esti-
Overall, Figs. 1 and 2 show that the MMRE decreases as mation program will be flawed, possibly fatally, if different
the size of the dataset grows. There is a tendency for the projects are measuring the same features in different ways. It
MMRE to start to stabilize at approximately 10 projects is also important to identify who is responsible for the data
which suggests that estimation by analogy may be a high collection and when they should collect the data. Sometimes it
742 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 23, NO. 12, NOVEMBER 1997

can be beneficial to have the same person collecting the data this type of situation may be quite common particularly at a
across projects in order to increase the level of consistency. very early stage in a project, for example in response to an
Next, the case base must be populated. Like all estimation invitation to tender. This makes analogy an attractive
methods, other than inspired guess work, analogy requires method for producing very early estimates.
some data collection. Our experience suggests that a mini- Estimation by analogy also offers an advantage in that it
mum of 10-12 projects are required in order to provide a sta- is a very intuitive method. There is some evidence to sug-
ble basis for estimation. In general, more data is preferable gest that practitioners use analogies when making estimates
although, in most cases, data collection will be an on-going by means of informal methods [8]. Our approach allows
process as projects are completed and their effort data be- users to assess the reasoning process behind a prediction by
comes available. However, there appear to exist some trade- identifying the most analogous projects thereby increasing,
offs between the size of the dataset and homogeneity. Again, or reducing, their confidence in the prediction.
our experience suggests there is merit in the strategy of di- Many experts have suggested that it is appropriate to
viding highly distinct projects into separate datasets. Often use more than one method when predicting software de-
this separation is quite straightforward using such distin- velopment effort. We believe that estimation by analogy is a
guishing features as application type or development site. viable technique and can usefully contribute to this process.
The penultimate stage is to tune the estimation method. This is not to suggest that it is without weakness but on the
The user also will need to experiment with the optimum empirical evidence presented in this paper it is certainly
number of analogies searched for, and whether to use a worthy of further consideration.
subset of variables, since some features may not usefully
contribute to the process of finding effective analogies.
APPENDIX A
Tuning can make quite a difference to the quality of pre-
dictions—typically tuning can yield a twofold improve- ACT ACT_DEV ACT_TST CHNGS FILES
ment in performance—and for this reason the ANGEL tool 305.22 250.49 54.73 218 105
provides automated support for this process. 330.29 225.4 104.89 357 237
The last stage is to estimate for a new project. It must be 333.96 177.35 156.61 136 98
possible to characterise the project in terms of the variables 150.4 114.7 35.7 25 24
544.61 357.49 187.12 263 197
that have been identified at the first stage of the estimation
117.87 71.5 46.37 39 39
process. From these variables, ANGEL can be used to find 1115.54 833.05 267.09 377 284
similar projects and the user can make a subjective judg- 158.56 130.4 28.16 48 37
ment as to the value of the analogies. Where they are be- 573.71 372.15 201.56 118 53
lieved to be trustworthy the prediction can be relied on to 276.95 232.7 44.25 178 116
greater extent than where they are thought to be doubtful. 97.45 68.55 28.9 59 38
374.34 275.64 98.7 200 180
Here we wish to sound a note of caution. The value of esti- 167.12 100.83 66.29 53 43
mation by analogy as an independent source of prediction 358.37 281.18 77.19 143 84
will be somewhat reduced if the users discount values that 123.1 87.7 35.4 257 257
are not consistent with their prior beliefs and for this reason 23.54 16.42 7.12 6 6
there was no expert intervention or manipulation in any of 34.25 27.5 6.75 5 5
31.8 24.2 7.6 3 3
the foregoing analysis. Another indicator of likely predic-
tion quality is the average MMRE figure obtained through
The above data is drawn from the dataset Telecom 1. ACT
jack knifing the dataset. Again a low figure will indicate
is actual effort, ACT_DEV and ACT_TEST are actual devel-
more confidence than a high figure.
opment and testing effort, respectively. CHNGS is the num-
ber of changes made as recorded by the configuration man-
7 CONCLUSIONS agement system and files is the number of files changed by
Accurate estimation of software project effort at an early the particular enhancement project. Only FILES can be used
stage in the development process is a significant challenge for predictive purposes since none of the other information
for the software engineering community. This paper has would be available at the time of making the prediction.
described a technique based upon the use of analogy some-
times referred to as case based reasoning. We have com- ACKNOWLEDGMENTS
pared the use of analogy with prediction models based
The authors are grateful to the Finish TIEKE organization for
upon stepwise regression analysis for nine datasets, a total
granting the authors’ leave to use the Finnish dataset; to Bar-
of 275 projects. A striking pattern emerges in that estima-
bara Kitchenham for supplying the Mermaid dataset; to Bob
tion by analogy produces a superior predictive perform-
Hughes for supplying the dataset Telecom 2; and to anony-
ance in all cases when measured by MMRE and in seven
mous staff for the provision of datasets Telecom 1 and Real-
out of nine cases for the Pred(25) indicator. Moreover, esti-
time 1. Many improvements have been suggested by Dan
mation by analogy is able to operate in circumstances
Diaper, Pat Dugard, Bob Hughes, Barbara Kitchenham, Steve
where it is not possible to generate an algorithmic model,
MacDonell, Austen Rainer, and Bill Samson. This work has
such as the dataset Real-time 1 where all the data was cate-
been supported by British Telecom, the U.K. Engineering and
gorical in nature or the Mermaid N dataset where no statis-
Physical Sciences Research Council under Grant GR/L37298,
tically significant relationships could be found. We believe
and the Defence Research Agency.
SHEPPERD AND SCHOFIELD: ESTIMATING SOFTWARE PROJECT EFFORT USING ANALOGIES 743

REFERENCES [20] T. Mukhopadhyay, S.S. Vicinanza, and M.J. Prietula, “Examining


the Feasibility of a Case-Based Reasoning Model for Software Ef-
[1] D.W. Aha, “Case-Based Learning Algorithms,” Proc. 1991 DARPA fort Estimation,” MIS Quarterly, vol. 16, pp. 155-171, June, 1992.
Case-Based Reasoning Workshop. Morgan Kaufmann, 1991. [21] A. Porter and R. Selby, “Empirically Guided Software Develop-
[2] A.J. Albrecht and J.R. Gaffney, “Software Function, Source Lines of ment Using Metric-Based Classification Trees,” IEEE Software, no.
Code, and Development Effort Prediction: A Software Science Vali- 7, pp. 46-54, 1990.
dation,” IEEE Trans. Software Eng., vol. 9, no. 6, pp. 639-648, 1983. [22] A. Porter and R. Selby, “Evaluating Techniques for Generating
[3] K. Atkinson and M.J. Shepperd, “The Use of Function Points to Metric-Based Classification Trees,” J. Systems Software, vol. 12, pp.
Find Cost Analogies,” Proc. European Software Cost Modelling 209-218, 1990.
Meeting, Ivrea, Italy, 1994. [23] E. Rich and K. Knight, Artificial Intelligence, second edition.
[4] B.W. Boehm, “Software Engineering Economics,” IEEE Trans. McGraw-Hill, 1995.
Software Eng., vol. 10, no. 1, pp. 4-21, 1984. [24] B. Samson, D. Ellison, and P. Dugard, “Software Cost Estimation
[5] L.C. Briand, V.R. Basili, and W.M. Thomas, “A Pattern Recogni- Using an Albus Perceptron (CMAC),” Information and Software
tion Approach for Software Engineering Data Analysis,” IEEE Technology, vol. 39, nos. 1/2, 1997.
Trans. Software Eng., vol. 18, no. 11, pp. 931-942, 1992. [25] C. Serluca, “An Investigation into Software Effort Estimation
[6] S. Conte, H. Dunsmore, and V.Y. Shen, Software Engineering Met- Using a Back Propagation Neural Network,” MSc dissertation,
rics and Models. Menlo Park, Calif.: Benjamin Cummings, 1986. Bournemouth Univ., 1995.
[7] J.M. Desharnais, “Analyse statistique de la productivitie des pro- [26] M.J. Shepperd, C. Schofield, and B.A. Kitchenham, “Effort Esti-
jets informatique a partie de la technique des point des fonction,” mation Using Analogy,” Proc. 18th Int’l Conf. Software Eng., Berlin:
masters thesis, Univ. of Montreal, 1989. IEEE CS Press, 1996.
[8] R.T. Hughes, “Expert Judgement as an Estimating Method,” In- [27] K. Srinivasan and D. Fisher, “Machine Learning Approaches to
formation and Software Technology, vol. 38, no. 2, pp. 67-75, 1996. Estimating Development Effort,” IEEE Trans. Software Eng., vol.
[9] D.R. Jeffery, G.C. Low, and M. Barnes, “A Comparison of Func- 21, no. 2, pp. 126-137, 1995.
tion Point Counting Techniques,” IEEE Trans. Software Eng., vol. [28] G.E. Wittig and G.R. Finnie, “Using Artificial Neural Networks
19, no. 5, pp. 529-532, 1993. and Function Points to Estimate 4GL Software Development ef-
[10] R. Jeffery and J. Stathis, “Specification Based Software Sizing: An fort,” Australian J. Information Systems, vol. 1, no. 2, pp. 87-94,
Empirical Investigation of Function Metrics,” Proc. NASA Goddard 1994.
Software Eng. Workshop. Greenbelt, Md., 1993.
[11] N. Karunanithi, D. Whitley, and Y.K. Malaiya, “Using Neural Martin Shepperd received a BSc degree
Networks in Reliability Prediction,” IEEE Software, vol. 9, no. 4, (honors) in economics from Exeter University, an
pp. 53-59, 1992. MSc degree from Aston University, and the PhD
[12] C.F. Kemerer, “An Empirical Validation of Software Cost Estima- degree from the Open University, the latter two
tion Models,” Comm. ACM, vol. 30, no. 5, pp. 416-429, 1987. in computer science. He has a chair in software
[13] B.A. Kitchenham and K. Kansala, “Inter-Item Correlations among engineering at Bournemouth University. Profes-
Function Points,” Proc. First Int’l Symp. Software Metrics, Balti- sor Shepperd has written three books and pub-
more, Md.: IEEE CS Press, 1993. lished more than 50 papers in the areas of soft-
[14] B.A. Kitchenham and N.R. Taylor, “Software Cost Models,” ICL ware metrics and process modeling.
Technology J., vol. 4, no. 3, pp. 73-102, 1984.
[15] P. Kok, B.A. Kitchenham, and J. Kirakowski, “The MERMAID
Approach to Software Cost Estimation,” Proc. ESPRIT Technical
Week, 1990. Chris Schofield received a BSc degree (honors)
[16] J.L. Kolodner, Case-Based Reasoning. Morgan Kaufmann, 1993. in software engineering management from
[17] J.E. Matson, B.E. Barrett, and J.M. Mellichamp, “Software Devel- Bournemouth University, where he is presently
opment Cost Estimation Using Function Points,” IEEE Trans. studying for his PhD. His research interests in-
Software Eng., vol. 20, no. 4, pp. 275-287, 1994. clude software metrics and cost estimation.
[18] Y. Miyazaki and K. Mori, “COCOMO Evaluation and Tailoring,”
Proc. Eighth Int’l Software. Eng. Conf. London: IEEE CS Press, 1985.
[19] Y. Miyazaki et al., “Method to Estimate Parameter Values in
Software Prediction Models,” Information and Software Technology,
vol. 33, no. 3, pp. 239-243, 1991.

You might also like