0% found this document useful (0 votes)
8 views

Software Reusability

Uploaded by

suryakirana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Software Reusability

Uploaded by

suryakirana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Aug.

31
International Journal of advanced studies in Computer Science and Engineering
IJASCSE, Volume 3, Issue 8, 2014

Assessment of classification techniques on


predicting success or failure of software
reusability
Nahid Hajizadeh, Manijeh Keshtgari , Marzieh Ahmadzadeh
Department of computer engineering and IT
Shiraz University of technology
Shiraz, Iran

Abstract— Software reusability means use of reusable modules Software reusability not only stands for using part of one
when development of the software takes place so that it not system in other system, but also using whole part of other
needs to generate code from scratch. This capability not only systems is welcomed [2], [3]. Ramamortty et al [4] noted that
reduces software development time and cost of development in addition to reusing source code, you will find that other
but also improves software productivity. To predict whether a software projects assets such as requirements, design, test
software product under development will be used as a reusable cases and documents could be reused. Software project
module or not in the future, will assist software project managers are extremely interested that during developing
managers to ensure that the product is reusable, otherwise their software products and before its offering to the market
modify the weak points. In this paper, to predict success or
to ensure that it is reusable. Since the main application of
failure of software reusability, 42 classification algorithms on a
specific Software reuse data set (known as PROMISE Software
classification algorithms are to predict, we also used
Engineering Repository data set [1]) are applied. By classification algorithms for prediction of success or failure
comparing 8 conventional evaluation metrics obtain by each of of software reusability prior to marketing. Classification
42 implementation of the classification algorithms, results algorithms are a main branch in data mining science. Data
showed that “SGD or SMO” and “Multi Class Classifier mining is the main step in Knowledge Discovery in
Updateable” algorithms has a better performance in Databases (KDD) process. Though KDD is used
comparison with other algorithms. synonymously to represent data mining, both these are
actually different. Some preprocessing steps before data
Keywords- Data mining, classification, Software reuse, mining are to be done to transform the raw data as useful
Assesment. knowledge. Steps of knowledge discovery are selection of
data set, data preprocessing, data transformation, data mining
I. INTRODUCTION and knowledge extraction, respectively (Figure 1 [5]).
Software reusability, besides reducing the time and cost
of software development processes increases productivity.

Figure 1. DataMining process

www.ijascse.org Page 5
Aug. 31
International Journal of advanced studies in Computer Science and Engineering
IJASCSE, Volume 3, Issue 8, 2014

an interesting SE problem: how to make sense of software


II. RELATED WORK development when very little data (only 24 instances) is
Until now, many researches on Software Reusability available. This Data set contains 27 attributes and a class
based on data mining have been done by various authors. variable.
Morisio et.al [6] has introduced a number of the key factors
in predicting software Reusability. Clustering methods for IV. DATA PREPROCESSING PHASE
predicting reusability of object oriented software systems has According to missing values in data set, doing
been done by Shri et.al [7]. An artificial neural network preprocessing is not inevitable.
method has been suggested by G.Boetticher et.al [8] to rank
the reusability of software products. In [9] Basili et.al In this paper for handling of missing attribute values, ”
indicated that most of the metrics proposed by Chidamber Mode” technique has been used in such a manner that values
and Kemerer in [10] are useful for predicting the fault- of missing attributes are set due to majority values of same
proneness of classes during the design phase of Object attribute.
Oriented systems [11]. Sonia et.al [12] has proposed a For example the value of 'Development Approach' for
framework for evaluating reusability of procedure oriented instance 24 is missed (Table 1)
system using metrics based approach.
The most probable value for attribute 'Development
III. METHODOLOGY Approach' is “OO”, so value “OO” is selected for this
The data set that employed in this paper is provided by missing value (Table 2)
Morisio et al [6] and is one of the PROMISE Software
Engineering Repository data set [1]. This data set represents

TABLE 1. 'DEVELOPMENT APPROACH' FOR INSTANCE 24 (BEFORE PREPROCESS)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
OO OO OO OO OO OO OO OO OO proc proc proc proc OO proc proc proc OO OO OO OO OO proc not_available

TABLE 2. 'DEVELOPMENT APPROACH' FOR INSTANCE 24 (AFTER PREPROCESS)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
OO OO OO OO OO OO OO OO OO proc proc proc proc OO proc proc proc OO OO OO OO OO proc OO

classification algorithms from 7 main groups. This 42


V. APPLYING DATA MINING classification algorithms is listed in Table 3.
WEKA toolkit [13], version 3.7.11, with 10-fold cross
validation is employed for making models by 42
TABLE 3. CLASSIFICATION ALGORITHMS
Group Name Functions
Bayes BayesNet, NaiveBayes, NaiveBayesMultinomialText, NaiveBayesUpdateable
Functions Logistic, MultilayerPerceptron, SGD, SGDText, SimpleLogistic, SMO, VotedPerceptron
Trees DecisionStump, HoeffdingTree, J48, LMT, RandomForest, RandomTree, REPTree
Lazy IB1, KStar, LWL
Rules DecisionTable, JRip, OneR, PART, ZeroR
AdaBoostM1, AttributeSelectedClassifier, Bagging, ClassificationViaRegression, CVParameterSelection,
Meta FilteredClassifier, LogitBoost, MultiClassClassifier, MultiClassClassifierUpdateable, MultiScheme,
RandomCommittee, RandomizableFilteredClassifier, RandomSubSpace, Stacking, Vote
Misc InputMappedClassifier

is necessary. In this work, “normalization by decimal


VI. EMPIRICAL DATA ANALYSIS scaling” [15] is used that is shown in equation (1).
To assessment models made by those 42 different
classifiers, Kiviat diagram [14] is applied, so normalization V
of metric values to scale data to fall within a specified range  V   
10 j

www.ijascse.org Page 6
Aug. 31
International Journal of advanced studies in Computer Science and Engineering
IJASCSE, Volume 3, Issue 8, 2014

TP
Where j is the smallest integer such that Max(V )  1  Recall   
TP  FN
Which Metrics applied to assess the performance of these
models is as follow: To calculate the Cost, confusion matrix [17] element
Incorrectly Classified Instances (ICI) is the number of must be multiplied to corresponding entries in the cost
instances that are incorrectly classified. matrix and results be added together.
Gain ratio is used to determine the goodness of a split [18].
Accuracy is the measure to determine the accuracy of a
Mean absolute error (MAE) is the average of the
classifier. This measure indicates that what percentage of the
difference between predicted and actual value in all test
total test set records correctly classified. Equation (2) shows
cases; it is the average prediction error. The formula for
the calculation of accuracy.
calculating MAE is given in equation (5) [11].

TN  TP
 Accuracy    ( p1  a1 ) 2  ( p2  a2 ) 2  ...  ( pn  an ) 2
TN  FN  TP  FP  MAE  
n
Precision for a class is the number of true positives (TP)
(i.e. the number of items correctly labeled as belonging to the The final metric is Relative Absolute Error (RAE) that is
positive class) divided by the total number of elements used to show Error rate in relative mode. Actual target values
labeled as belonging to the positive class [16]. The precision and Predicted target values are two parameters to calculate
equation is indicated in equation (3) [11]. RAE as mentioned in equation (6).

TP Actual target : ai , Peredicted target : pi


 Precision   
TP  FP  p1  a1  ...  pn  an  
RAE 
Recall in this perspective is defined as the number of true a  a1  ...  a  an
positives divided by the total number of elements that
actually belong to the positive class (i.e. the sum of true
positives and false negatives (FN), which are items which VII. RESULTS
were not labeled as belonging to the positive class but should 42 classifier algorithms of 7 groups were analyzed and
have been). The recall can be calculated as equation (4) [11]. the best classifier of each group is shown in Table 4.
TABLE 4. BEST CLASSIFIER FROM EACH 7 CATEGORIES
Accuracy Cost Recall ICI Gain MAE Precision RAE

Bayes (NaiveBayes) 95.8% 1 0.958 1 10 0.0608 0.961 12.795%


Functions (SGD or SMO) 95.8% 1 0.958 1 10 0.0417 0.961 8.773%
Trees (HoeffdingTree) 95.8% 1 0.958 1 10 0.0608 0.961 12.795%
Lazy (LWL) 95.8% 1 0.958 1 10 0.0503 0.961 10.592%
Rules (JRip,PART) 95.8% 1 0.958 1 10 0.0794 0.961 16.71%
Meta (MultiClassClassifierUpdateable) 95.8% 1 0.958 1 10 0.0417 0.961 8.773%
Misc (InputMappedClassifier) 62.5% 9 0.625 9 0 0.474 0.391 100%

Kiviat diagram of these 7 best classifiers is illustrated in Figure2.

www.ijascse.org Page 7
Aug. 31
International Journal of advanced studies in Computer Science and Engineering
IJASCSE, Volume 3, Issue 8, 2014

Figure 2. Kiviat diagram of 7 best classifiers: (a) NaiveBayes, (b) SGD, (c) HoeffdingTree, (d) LWL, (e) JRip, (f) MultiClassClassifierUpdateable, (g)
InputMappedClassifier.

www.ijascse.org Page 8
Aug. 31
International Journal of advanced studies in Computer Science and Engineering
IJASCSE, Volume 3, Issue 8, 2014

Exception of Input Mapped Classifier that in comparison In order to show more clearly the differences and choose
with other types of classifiers has extremely weak the best classifier, Figure 3 which contains a comparison of
performance, , other 6 classifiers have similar values in most that 7 classifiers based on two metrics MAE and ARE is
metrics and have difference only in Mean Absolute plotted.
Error(MAE) and Relative absolute error(RAE).

Figure 3. Comparison between 7 best classifiers based on MAE and ARE metrics.

“SGD or SMO” and “Multi Class Classifier Updateable” [3] Gill.N.S, "Reusability Issues in Component-Based Development,"
have the sample and best performance among others Sigsoft Softw. Eng. Notes, vol. 28, pp. 4-4, 2003.
(according to lower value in errors). [4] e. a. Ramamoorthy C. V., "Support for Reusability in Genesis,"
Software Engineering, IEEE Transactions, vol. 14, pp. 1145-1154,
1988.
VIII. CONCLUSION
[5] G. P.-S. a. P. S. U. Fayyad, "data mining to knowledge discovery in
In comparison was made between the different models of databases," AI Magazine, , vol. 17(3), pp. 37-54, 1996.
classifiers, in general, “Functions” and “Meta” categories [6] Morisio, M. Ezran, and C. Tully, "Success and Failure Factors in
showed better performance than other groups and among Software Reuse," IEEE Transactions on Software Engineering, vol.
these categories, “SGD or SMO” and “Multi Class Classifier 28, no. 4, p. 340–357, 2002.
Updateable” classifiers are the most appropriate classifiers [7] Anju Shri, Parvinder S. Sandhu, Vikas Gupta, Sanyam Anand,
for predicting success of failure of software reusability. "Prediction of Reusability of Object Oriented Software Systems,"
World Academy of Science, Engineering and Technology, pp. 853-
858, 2010.
IX. FUTURE WORK
[8] Boetticher.G and D. Eichmann, "A Neural Network Paradigm for
Data set used in this paper does not consider recent Characterising Reusable Software," Proceedings of The 1 st, pp. 18-
software development technologies such as Component base 19, November 1993.
development and Agent base development, while these [9] Basili V. R., Briand. L. & Melo, W, "How reuse influences
techniques play an important role in increasing the Software productivity in object-oriented systems," Communications of the, vol.
30, no. 10, pp. 104-114, 1996.
reusability. As future work, to generalize our results, we
need to analyze other datasets and representative case [10] Chidamber S.R.,. Kemerer. C.F, "A Metrics Suite for Object Oriented
Design," IEEE Transactions on Software Engineering, vol. 20, no. 6,
studies. pp. 476-493, 1994.
[11] Ajay Prakash , Ashoka , Manjunath Aradhya, "Application of Data
X. REFERENCE Mining Techniques for Software Reuse," Procedia Technology, no. 4,
[1] Sayyad Shirabad, J. and Menzies, "T.J. (2005) The PROMISE p. 384 – 389, 2012.
Repository of Software Engineering Databases. School of [12] Henry W. Li, S., "Object-Oriented Metrics That Predict
Information Technology and Engineering, University of Ottawa, Maintainability," Journal of Systems and Software, vol. 23, no. 2, pp.
Canada. Available: http://promise.site.uottawa.ca/SERepository". 111-122, 1993.
[2] J. e. a. Mccall, "Factors in Software Quality," Griffiths Air Force [13] Witten I. and Frank E., "Data Mining: Practical Machine Learning
Base, no. N.Y. Rome Air Development Center Air Force, 1977. Tools And Techniques," Morgan Kaufmann, San Francisco, 2nd
Edition, 2005.

www.ijascse.org Page 9
Aug. 31
International Journal of advanced studies in Computer Science and Engineering
IJASCSE, Volume 3, Issue 8, 2014
[14] J. W. C. B. K. a. P. T. Chambers, "Star plot," Graphical Methods for
Data Analysis, no. Wadsworth, 1983.
[15] L. Al Shalabi, "Normalization as a Preprocessing Engine for Data
Mining and the Approach of Preference Matrix," IEEE,Dependability
of Computer Systems, pp. 207 - 214, 2006.
[16] http://www.wikipedia.org/wiki/Precision_and_recall.
[17] Ron Kohavi,Foster Provost, "Special Issue on Applications of
Machine Learning and the Knowledge Discovery Process," [Online].
Available: http://robotics.stanford.edu/~ronnyk/glossary.html.
[18] Kamber, Jiawei Han and Micheline, Data Mining: Concepts and
Techniques, Morgan Kaufmann Publishers, 2006.

www.ijascse.org Page 10

You might also like