PID5108657
PID5108657
net/publication/323949660
CITATIONS READS
152 5,781
2 authors, including:
SEE PROFILE
All content following this page was uploaded by Hemant Kumar Gianey on 17 September 2018.
iii. The attribute having the largest SDR as the decision node is
to be selected.
iv. Dataset has to be divided based on the values of selected
attributes. We further split a branch set if the standard
deviation is greater than 0.
v. The process keeps on running in recursion until all the data is
processed.
iv. Random Forest Regression:
It is an extra cover of randomness to bagging. Unlike a Figure 2: The steepness of the curve.
normal tree, random forest splits each node using the best among
Source: Logistic regression [18]
a subset of predictors randomly chosen at that node. It is an easy
to operate technique because it has very few parameters – the ii. K Nearest Neighbors:
This method is known for its simplicity because of the
factors such as the ease of interpreting and the low calculation
time. It basically stores the cases that are available and
categorizes new cases based on the homogeneity basis such as
distance function. The object is categorized by a majority vote of
its neighbors and the result is usually class integration. After
this, object is allotted to a class which has the greatest similarity
amongst the K nearest neighbors [19]. Some functions for
distance are,
Euclidean - ∑ −
Figure 3: Entropy of a decision tree
Manhattan - ∑ | |
/ Source: Entropy of a DT algorithm [22]
Minkowski - ∑ | − |
Algorithm for Information Gain:
In the case of categorical variables Hamming distance must
be used. i. Calculate Entropy of the target.
iii. Naïve Bayesian: ii. We split the dataset based on different attributes and then
calculate the entropy for each branch. After this we add it
It is based on the probabilistic model of Bayes theorem, and proportionally which gives us total entropy after the split.
easy to set up as complex iterative parameter estimation is The total entropy after the split is subtracted from the total
almost none, making it viable to use for large sets of data [20]. entropy before the split and the result is Information Gain.
Given the class variables, the value of a certain characteristic is
assumed to be independent of the value of any other iii. Attribute with largest Information Gain is the decision node.
characteristic by the naïve Bayesian classifiers. We can calculate iv. A branch with entropy 0 is a leaf node.
the Posterior probability P (A|B) v. A branch with entropy greater than 0 needs further splitting.
|
| v. One-R:
vi. Zero-R:
III. Methodologies carried out for Experiment i) Accuracy, confusion matrix and Recall:
A confusion matrix is a sort of table which defines the number
of data instances which are wrongly classified and which are
German credit data set has been used. This data set is selected truly classified. This is an nXn matrix where ‘n’ is the number of
keeping in mind the diversity of the data and data type to be classes defined for the data set.
considered during the implication of the methodologies carried
out during the test, including categorical data, integer data,
nominal data, and in some the mixture of these data types.
WEKA tool is used for implementation. It is open source tool Figure 4: Confusion Matrix
available freely over the internet. Weka provides a package for
data mining system which includes all the features required for This confusion matrix serves as the basis for deriving values for
the data mining process to be carried. This includes a very good almost every other parameter used. The columns represent the
application of all classifier algorithms so far concluded. This it values which are predicted by the classifier and rows represent
has a very wide application and functionality. It is a java based the actual values or class labels to which the data object actually
tool which is allows the service to the user to be worked upon by belongs.
either graphical User Interface or simple command prompt. The values in the cells are as:-
Credit data-set: This dataset is for classification of credit risk True negative: which is the proportion of the negative cases
in Germany. The dataset contains 1,000 observations on 20 which were classified correctly?
attributes (7 numerical and 13 categorical). The target variable Calculated as: TN/TN+FP
consists of 2 classes: 1 for creditworthy and 0 for not
creditworthy. The dataset is from Daimler-Benz, Forchung Ulm, False Positive: which is the proportion of negative cases which
Germany were classified incorrectly as positive?
Calculated as: FP/TN+FP
1. Title: German Credit data
2. Number of Instances: 1000
False Negative: which is the proportion of positive cases which
3. Number of Attributes german: 20 (7 numerical, 13
were classified incorrectly as negative?
categorical)
Calculated as: FN/FN+TP
Number of Attributes german.numer: 24 (24 numerical)
IV. Parameters used for evaluating tool performance: True Positive or Recall: proportion of positive cases that were
Only considering the value of accuracy obtained by any correctly classified.
classifier as the measure of evaluation of performance of the Calculated as: TP/FN+TP
classifying algorithm is not the correct way. The accuracy of the
classifier is just the value for the instances classified to be Accuracy: it denotes the number of correct predictions made by
belonging to their actual class. It does not introduce to the other the classifier. And it is calculated as:
specificities of the classifier like; the relation between the data TN+TP/TN+FP+FN+TP
attributes, the measure of correct distribution of data instances to
each and every class possible, the number of the positive
Precision (Confidence):
Precision is the rate of positives which were predicted positive
and in actually too were positive. This value is given as:
TP/FP+TP
.
Figure 7: Lift curves for k nearest neighbor for Credit
dataset
Figure 10: Lift curves for Support Vector Machine over Figure 13: ROC curves for Decision tree for credit data set.
Credit dataset
Figure 14: ROC curves for credit data set for SVM
VI. Conclusion:
This paper discusses the commonly used supervised
Figure 11: ROC curves for credit data set for NB algorithms. The primary goal was to prepare a comprehensive
review of the key ideas and present different techniques for
every supervised learning method. The paper makes it clear that
every algorithm differs according to area of application and no
algorithm is more powerful than the other in different scenarios.
The choice of an algorithm should be made depending on the
type of problem given to us and the data available. The accuracy
can be increased by using two or more algorithm together in
suitable conditions.
Table 1: Parameters used for supervised algorithms
Parameters Logistic Support vector Random forest Naive One-R Zero-R
used/Algorithm used Regression Machine Bayesian
Accuracy 75.2 75.1 76.4 75.4 66.1 70
Precision 0.741 0.738 0.751 0.743 0.608 0.490
Recall 0.752 0.751 0.764 0.754 0.661 0.700
F-measure 0.744 0.741 0.744 0.746 0.620 0.576
TP-Rate 0.752 0.751 0.764 0.754 0.661 0.700
FP-Rate 0.398 0.410 0.440 0.393 0.614 0.700
MCC 0.379 0.371 0.386 0.385 0.061 0.000
ROC area 0.785 0.671 0.791 0.787 0.524 0.500
PRC area 0.798 0.681 0.810 0.797 0.591 0.580
[17] Brownlee, J. “Logistic Regression for Machine Learning - Machine
Learning Mastery”, [online] Machine Learning Mastery. Available at:
VII. References: http://machinelearningmastery.com/logistic-regression-for-machine-learning/
[Accessed 12 Aug. 2017].
[1] Domingos, P. “A few useful things to know about machine learning”,
[18] The steepness of the curve of logistic regression [Online], http://chem-
Communications of the ACM, 55(10),2012 pp.1.
eng.utoronto.ca/~datamining/dmc/logistic_regression.htm, last access
[2] Mohri, M., Rostamizadeh, A. and Talwalker, A. “Foundations of machine 22.08.2017.
learning”, Cambridge, MA: MIT Press,2012.
[19] Bicego, M. and Loog, M, “Weighted K-Nearest Neighbor revisited”, 23rd
[3] Nguyen, T. and Shirai, K. “Text Classification of Technical Papers Based on International Conference on Pattern Recognition (ICPR), 2016, pp. 1642-1647.
Text Segmentation”, Natural Language Processing and Information Systems,
[20] Ting, K. and Zheng, Z. “Improving the Performance of Boosting for Naive
2013,pp.278-284.
Bayesian Classification. Methodologies for Knowledge Discovery and Data
[4] Deng, L. and Li, X. “Machine Learning Paradigms for Speech Recognition: An Mining”, 1999, pp.296-298.
Overview”, IEEE Transactions on Audio, Speech, and Language Processing,
[21] Peng Ye, “The decision tree classification and its application research in
21(5), 2013, pp.1060-1089.
personnel management”, Proceedings of 2011 International Conference on
[5] Siswanto, A., Nugroho, A. and Galinium, M. “Implementation of face recognition Electronics and Optoelectronics, 2011, pp. 1-4.
algorithm for biometrics based time attendance system”, 2014 International
[22] Entropy of a decision tree classification algorithm [Online], http://chem-
Conference on ICT For Smart Society (ICISS).
eng.utoronto.ca/~datamining/dmc/decision_tree.htm, last access 22.08.2017.
[6] Chen, Z. and Huang, X. “End-to-end learning for lane keeping of self-driving
[23] Muda, Z., Yassin, W., Sulaiman, M. and Udzir, N. “Intrusion detection
cars”, 2017 IEEE Intelligent Vehicles Symposium (IV).
based on k-means clustering and OneR classification”, 2011 7th International
[7] Yong, S., Hagenbuchner, M. and Tsoi, A. “Ranking Web Pages Using Machine Conference on Information Assurance and Security (IAS).
Learning Approaches”, 2008 IEEE/WIC/ACM International Conference on Web
Intelligence and Intelligent Agent Technology. [24] Kerdegari, H., Samsudin, K., Ramli, A. and Mokaram, S. “Evaluation of
[8] Wei, Z., Qu, L., Jia, D., Zhou, W. and Kang, M. “Research on the collaborative fall detection classification approaches”, 2012 4th International Conference on
filtering recommendation algorithm in ubiquitous computing”, 2010 8th World Intelligent and Advanced Systems (ICIAS2012).
Congress on Intelligent Control and Automation.
[9] Kononenko, I. “Machine learning for medical diagnosis: history, state of the art
and perspective”, Artificial Intelligence in Medicine, 23(1), 2011, pp.89-109.
[10] Jordan, M. “Statistical Machine Learning and Computational Biology”,IEEE
International Conference on Bioinformatics and Biomedicine (BIBM 2007).
[11] Thangavel, S., Bkaratki, P. and Sankar, A. “Student placement analyzer: A
recommendation system using machine learning”, 4th International Conference
on Advanced Computing and Communication Systems (ICACCS-2017).
[12] Byun, H. and Lee, S., “Applications of Support Vector Machines for Pattern
Recognition: A Survey”. Pattern Recognition with Support Vector Machines,
2002, pp.214-215.
[13] Support vector machine regression algorithm [Online], http://chem-
eng.utoronto.ca/~datamining/dmc/support_vector_machine_reg.htm, last access
22.08.2017.
[14] Kotsiantis, S. “Decision trees: a recent overview. Artificial Intelligence Review”,
39(4), 2011, pp.262-267.
[15] Andy Liaw and Matthew Wiener “Classification and Regression by
randomForest”, R News, ISSN 1609-363, vol. 2/3, December 2002, pp. 18-22.
[16] Tibshirani, R.”Regression shrinkage and selection via the lasso: a retrospective”,
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3),
2011, pp.273-282.