Comparative Analysis of Classification Models On Income Prediction
Comparative Analysis of Classification Models On Income Prediction
1
Bhavin Patel, 2V. Kakulapati, 3VVSSS Balaram
1.2,3
Sreenidhi Institute of Science & Technology,
Yamnampet, Ghatkesar, Hyderabad, India
1
[email protected], [email protected], [email protected]
Abstract: Predictive Analytics is the underlying technology that can simply be described as an approach to scientifically utilize the past to
predict the future to help coveted results. It is the branch of cutting edge analytics which is utilized to make predictions about unfamiliar events.
Predictive analytics utilizes different procedures from information mining, insights, modeling, machine learning and artificial Intelligence. It
includes extraction of data from information and is utilized to predict patterns and behavior patterns. It can be connected to an unfamiliar event
or interest whether past, present or future. It helps being used of statistical algorithms information and machine learning strategies to distinguish
the probability of future results in light of chronicled information. Income Determination is an important application of predictive analytics
where customer segmentation takes place based on different demographical data. In this paper, we attempt to identify this purpose with a novel
approach using different classification techniques to minimize the risk and cost involved to predict certain income levels. Here we have
demonstrated the performance of each algorithm particularly on identification of customers using classification techniques. In addition, we
provide an investigation analysis on true positives, false negatives, scored labels and scored probabilities.
__________________________________________________*****_________________________________________________
453
IJRITCC | April 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 4 451 455
______________________________________________________________________________________________
execution of the MART GBS. To solve regression problems, Counterfeit- -ve: 722
gradient boosting is a machine learning technique. This Counterfeit- +ve: 498
constructs every failure tree in a stage insightful form, Exact- -ve: 6918
utilizing a predefined misfortune capacity to quantify the Positive-Label: >50K
mistake in every progression and accurate for it in the Negative-Label: <=50K
subsequent. In this manner the prediction model is really a Accuracy: 0.90
grouping of weaker forecast models. In regression issues,
Confusion Matrix: The following matrix is known as
boosting constructs a progression of trees in a step-wise
Confusion Matrix. It predicts the Scored Labels against
form, and afterward chooses the optimal tree utilizing a
the actual class. For instance, this matrix indicates the
subjective differentiable loss work
accuracy of the Multi-Classification Decision Forest
. In this paper, since we are concerned with predicting
Algorithm against the actual classes i.e. High, Medium and
values and estimating the relationship between variable we
Low.
will use Regression and in order to predict categories and
identify what categories the new information belongs to we In short, it predicts the Scored Probabilities of the predicted
use Classification. class.
Here we need to select the column to be expected stand on
further columns. We start training the model and determine The Average-Accuracy of the Multi-Classification
its suitability for the solution. We later visualize the newly Decision Forest using the sample dataset turned out to be
trained data. 0.7.
454
IJRITCC | April 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 4 451 455
______________________________________________________________________________________________
Computational Intelligence, 978-1-4673-9795-7/15 $31.00
2015 IEEE,DOI 10.1109/CSCI.2015.162, pp: 180-185.
[3] A Lazar. Income Prediction via Support Vector Machine,
IEEE conference on Machine Learning and
applications,16-18 Dec. 2004 DOI:
10.1109/ICMLA.2004.1383506.
[4] Vrushali Comparative Analysis of Classification
Techniques on Soil Data to Predict Fertility Rate for
Aurangabad District IJETTCS, Volume 3, Issue 2, March
April 2014, ISSN 2278-6856, PP:200-203.
[5] Y. Bengio et al., "Introduction to the special issue on
neural networks for data mining and knowledge
discovery," IEEE Trans. Neural Networks, vol. 11, pp.
545-549, 2000.
[6] S.Archana et al., Survey of Classification Techniques in
Data Mining, International Journal of Computer Science
and Mobile Applications, Vol.2 Issue. 2, February- 2014.
[7] Kumari et al , Comparative Study of Data Mining
Classification Methods in Cardiovascular Disease
Prediction, International Journal of Computer Science and
Technology Vol. 2, Issue 2, pp. 304-308, 2011.
[8] Ture, M et al, Comparing classification techniques for
predicting essential hypertension, Expert Systems with
Applications 29, pp. 583588, 2011.
[9] Burges, C., A Tutorial on Support Vector Machines for
Pattern Recognition Data Mining and Knowledge
Discovery, Vol. 2, pp. 121-167, 1998
455
IJRITCC | April 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________