0% found this document useful (0 votes)
12 views5 pages

Apoorva CSITSS

Uploaded by

a.apoorva89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Apoorva CSITSS

Uploaded by

a.apoorva89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Mining Approach For Predicting Student and

Institution's Placement Percentage


Ashok M V Apoorva A
Professor, Dept. of Computer Science Assistant Professor, Dept. of MCA
Teacher’s Academy GIMS
Bangalore, India Bangalore, India
[email protected] [email protected]

Abstract-Placement of students is one of the very important


activities in educational institutions. Admission and reputation of II. PROBLEM STATEMENT
institutions mainly depends on placements. Hence all institutions Every student dreams to be successful in life. The onus is
strive to strengthen placement department. In this study, the on the institution to help them by providing good placement
objective is to analyze previous year's student's historical data opportunity. Every student cannot be placed hence the
and predict placement chance of the current students and the intention of this study is to predict placement chance of the
percentage placement chance of the institution. A model is clustered students who have better chance of placement and
proposed along with an algorithm to predict the placement thus find the percentage placement of the institution for the
chance of students. Data pertaining to the study were collected current academic year. This would help the institution to
form the same institution for which the placement chance
analyze the status of the institution in comparison with other
prediction and percentage placement need to be found from 2006
to 2015. Data collected is divided into historic data form 2016to
institutions and take appropriate measures to improve it. A
1014 and test data i.e, 2014; 2016 data is considered as current prediction model is proposed. Various mining algorithms are
data. Suitable data pre-processing methods are applied. Students applied on the processed data, tested, and compared with the
having better chance of placement are characterized as good if proposed model based on certain criteria like accuracy,
not bad. This proposed model is compared with other precision and recall etc.
classification algorithms such as Naïve bayes, Decision tree, and
Neural network with respect to accuracy, precision and recall. III. RELATED WORKS
From the results obtained it is found that the proposed algorithm Many scientists have been working to explore the best mining
predicts better in comparison with other algorithms. techniques for solving placement chance prediction problems.
Various works have been done in this regard. Few of the
related works are listed below:
Keywords-Data mining; prediction; placement; classification; Jae H. Min et al., 2001[1] Applies support vector machines
Naïve bayes; Decision tree
(SVMs) and used a grid-search technique using 5-fold cross-
I. INTRODUCTION validation to find out the optimal parameter values of kernel
function of SVM, they applied SVM to bankruptcy prediction
It's a well known fact all round the world that admission of
problem, and showed its attractive prediction power compared
students in an educational institution depends on the
placements. Placement is one of the factors considered for to the existing methods;J.A.K. Suykens et al., 1998
determining the quality of the institution. Hence every [2] discussed a least squares version of support vector machine
institution strives hard to provide better placements to their classifiers and illustrated that a least squares SVM with RBF
students. An educational institution contains a large number of kernel is readily found with excellent generalization
student records. This data is a prosperity of information, but is performance and low computational cost.;Tung-Kuang Wu et
too large for any one person to understand in its entirety. al., 2008[3]apply two well-known artificial intelligence
Finding characteristics in this data is an essential task in techniques, artificial neural network (ANN) and support
education research. It does not make sense to find the vector machine (SVM), to the LD diagnosis problem;Guha, S
placement possibility of all the students in the institution as all
et al., 1999[4]proposed a new concept of links to measure the
the students will have not have good KSA(knowledge, skill
and attitude) score. Hence there is a need for identifying those similarity/proximity between a pair of data points with
students among the whole set of students who have good KSA categorical attributes and developed a robust hierarchical
score and finding placement chance for them would help us clustering algorithm;KakotiMahanta et al., 2005[5]prove that
achieve the objective and thus save lot of time. Hence input under certain conditions, the final clusters obtained by the
for the study is the best cluster of students having better KSA algorithm are nothing but the connected components of a
score who will have good chance of placement which is certain graph with the input data-points as
obtained by applying clustering algorithm and other necessary vertices;AgnieszkaPrusiewicz et al.,[6] 2010 proposal for
data preprocessing techniques. services recommendation in online educational systems based
on service oriented architecture are introduced;Christian The obtained result is compared in terms of precision,
Borgelt2005[7]proposed a new data structure for frequent item accuracy, variance with other algorithms such as Decision
set mining algorithms.BalazsRacz, D 2004[8] described an tree, Naïve Bayes, Neural network.
implementation of a pattern growth based frequent item set
mining algorithm. The data structure presented here can
accommodate the top-down recursion approach, thereby V. DATA DESCRIPTION
further reducing memory need and computation Time;Ke
Wang, Liu Tang et al., 2002[9]propose an efficient algorithm,
The objective is to predict the placement chance of students
called TD-FP-Growth (the shorthand for Top-Down FP-
Growth), to mine frequent patterns;SudheepElayidom et al., identified as proficient students in the college identifies as
XX. Basic requirement of any prediction problem is the
2011[10] attempt to help the prospective students to make
existence of previous or past data based on which future is
wise career decisions using technologies like data mining
using decision trees, Naïve Bayes and artificial neural predicted. Data is collected from a college XX identified
above, that offers various courses.
networks;Ajay Kumar Palet al,.2013[11]suggested that Naïve
Data collection is divided into three types.
Bayes classifier has the potential to significantly improve the
Historic Data: Collected for the duration of 10 years starting
conventional classification methods for use in placement
from 2006 to 2014
among all the machine learning algorithm tested;K. Pal et al.,
2013[12]describe the use of data mining techniques to Test data: Collected for the year 2015.
Current data: Students identified as proficient students for the
improve the efficiency of academic performance in the
year 2016.
educational institutions;B.K. Bharadwaj et al., 2011[13]the
classification task is used on student database to predict the
students division on the basis of previous database;S. K.
Yadav et al., 2012[14]focusing upon methodologies for Table I: Data Description
extracting useful knowledge from data and there are several
useful KDD tools to extracting the knowledge. Variables Description Possible Values
Year Year for which the data is {int}
entered ,
IV. PROPOSED MODEL. Reg-no Register number of the {int}
student’s
Branch Branch {1, 2, 3, 4, 5...}
Data Experimental Data Obtained From
Educational Institution (MCA,MBA,BSC….etc) of
Data
Collection the student
Percent Over all Percentage of the {65,71,82,….,100}
students
Skills Knowledge, Skills and {1, 2, 3, 4, 5...10}
Predict Placement Ability
Probability Proposed Algorithm
Effective- Effective- { 0,50,99,154,..200 }
score score=percent+skills*10, it
shows the overall
Decision Tree, Naïve Bayes, Neural
Evaluate Network, Proposed algorithm performance of the student.
Results Placed Student Placed based on her {Text}
performance
Fig 1: Proposed Model

The algorithm of the proposed model, along with its Year : Year that student completed education. Data
computational processes for predicting placement chance is collected were from 2006-2015.
outlined below: Reg-no : Register number of the student. It takes any
Step 1: Data collection. integer values.
The goal is to find the proficient students in the college under Branch : represents the name of the Branch. It can take only
consideration viz., XX for the year 2016. In this college there text values ranging from A-Z
were 1,434 students. These students hailed from various Percent : various marks scored by student in subjects. It can
courses that were operative in the college. The courses are take only the numeric values from 0 to 100.
MBA, MCA, BCA, B.Com, and BBA. Skills : it shows the overall Skills of the student.. It can take
Step 2: Predict Placement chance. only the numeric values from 0 to 10.
This step predicts placement chance of the student and also Effective-score: it shows the overall performance of the
percentage placement of institution using Proposed student Formula to calculate Effective-score is as follows
Classification algorithm. Effective score = percent + skills * 10 It can take only the
Step 3: Evaluate the result numeric values from 0 to 200.
Placed : Placed based on student performance. Value is
taken in the form of Yes\No,
IF Yes Table II represents output of the clustering algorithm which is
Student placed, used as the input to the proposed algorithm with the attributes
ELSE as shown above.
Student has not been placed.
For each student in the selected clusters the
VI. PROPOSED ALGORITHM.
following operations are performed. Store the
An algorithm is proposed to achieve the objective of study. student in a variable ‘S'.
The algorithm is as follows.
Input : current Student, oldStudentList 1 if x = y
Output : Placement chance f(x,y) =
0 otherwise
1. Read Student
2. Read oldStudentList
3. Select count all students in oldStudentList having n
score = score of current Student. C = ∑ f(x,y)
Store it as count Selected x=1
4. Select count all placed students in oldStudentList Where x is effective score of each historic student data and y is
having score = score of current Student. effective score of current student and C is count of Selected
Store it as count Placed And Selected Historic Data
5. If countSelected == 0 then Count all the effective scores of historic data same as effective
6. probability =0.5 score ‘S'.
7. else Illustration:
8. Calculate chance = count Placed And Selected/count If the effective score of current student is 104 then this value
Selected will be searched in the historic data and returns the count of
[End If at Step 5] such data.
9. If chance >=0.4 ( i.e Excellent/Good/Average) ex: c =
10. set placement chance Good f(72,104)+f(121,104)+f(146,104)+f(106,104)+f(104,104)+f(10
else 0,104)+f(165,104)+f(83,104)+f(129,104)+f(110,104)
11. set placement chance Bad =0+0+0+0+1+0+0+0+0+0
12. [End If at Step 9] =1
13. Write placement chance Step 4 of the algorithm.

1 if x = y &x(placed) = true
f(x,y) =
VII. EXPERIMENTAL EVALUATION 0 otherwise

TABLE II :Input data( Output of clustering algorithm)

reg_no branch effective_score centroids of cluster


n
1 MCA 53 55.33
d = ∑ f(x,y)
2 MCA 72 55.33
x=1
3 MCA 110 105.0
4 MCA 41 55.33
5 MCA 129 146.66 Where x is effective score of each historic student data and y
6 MCA 146 146.66 is effective score of current studen and d is count of selected
7 MCA 106 105.0 and placed historic data.
8 MCA 100 105.0
Count all the effective scores of historic data same as effective
score ‘S' and also having flag as placed.
9 MCA 104 105.0
Illustration:
10 MCA 165 146.66
If the effective score of current student is 105 then this value
An algorithm was proposed to estimate the number of clusters will be searched in the historic data only which has flag of
and finding the elements of the cluster using centroid based on placed and returns the count of such data.
Euclidean distance.
ex: d = centroid value is 105.0. hence it is concluded that
f(72,104)+f(121,104)+f(146,104)+f(106,104)+f(104,104)+f(1 the placement chance is bad as it falls in the row
00,104)+f(165,104)+f(83,104)+f(129,104)+f(110,104)
=0+0+0+0+1+0+0+0+0+0
poor with p<0.4 in the above table IV. Same
=1 explanation can be given for student with reg_no5
where the p value falls in the excellent row with
Step 8: p =d / c p>0.9.
placement percentage is calculated using
Where p = probability of placement, d = count of selected
historic students who have been placed, c= count of selected Placement percentage =number of good*100 / total
historic students, number of students
if c=0; consider p=0.5 according to step 6 if in case c becomes As per the above calculations percentage of
0 then p tends to infinity. Practically it is a unique case where placement chance is 50%
in there is no instance of occurrence in historical data. To
avoid this situation the value of p is taken as 0.5.

Step : 9, 10 and 11

TABLE III: Probability Ranges

Range of Probability Remark Value


p >= 0.9 Excellent

0.9 > p >= 0.6 Good Good


0.6 > p >=0.4 Average

p < 0.4 Poor Bad

According to step 8 various values of p are obtained. These


values are classified as above. If p> 0.9 it is considered as
excellent, if it ranges between 0.9 and 0.6 it is good, if it is
between 0.6 and 0.4 it is average and if it is less than 0.4 it is
poor.

TABLE IV : Placement Chance for input data Fig 2: screen shot of placement chance prediction in
reg_ branch effective_s centroids of Placement Chance
percentage.
no core cluster
1 MCA 53 55.33 Bad (Not Selected) The above screenshot represents percentage placements of the
institution considered.
2 MCA 72 55.33 Bad (Not Selected)
3 MCA 110 105.0 Bad
4 MCA 41 55.33 Bad (Not Selected)
5 MCA 129 146.66 Good VIII. RESULTS
6 MCA 146 146.66 Good
7 MCA 106 105.0 Bad CONFUSION MATRIX
8 MCA 100 105.0 Good
9 MCA 104 105.0 Good Data mining algorithms like decision tree, Naïve
10 MCA 165 146.66 Good bayes, neural network and proposed algorithm were
applied on the same dataset and the tests were
let us explain the first instance of the table IV.Since conducted separately. Results obtained after the
the centroid of the reg_no1 falls in the eliminated tests for each algorithm were modeled as confusion
cluster in the module1, this student will not be matrix.
placed. similarly the student with reg_no 2 won't be
selected. consider student with reg_no 3.The
TABLE V : Comparison of Proposed algorithm with with unique characteristics that require different
other algorithms approaches for solving the problem. In this study, A
model was proposed along with a algorithm. This
Algorithm Accuracy TPR Precision was compared with three other classification
algorithms such as decision tree, naïve bayes, and
Decision Tree 0.84 0.95 0.74
neural network in terms of accuracy, precision, true
Naïve Bayes 0.87 0.93 0.78
positive rate(recall).The proposed model, proved
to be the best predicting model for solving
Neural Networks 0.83 0.88 0.69 placement chance prediction problems compared to
all other algorithms. Hence, having the information
Proposed 0.92 0.96 0.87 generated through our study, institution would be
Algorithm
able to design strategies to overcome lacunae and
improve placements with best chances of getting
The above table V gives accuracy, true positive rate, placed. Thus admission can be increased.
false positive rate, true negative rate, false negative
rate and precision of different algorithms compared REFERENCES
with the proposed algorithm. Precision and
accuracy of the proposed algorithm is high [1] Jae H. Mina, Young-Chan Leeb, “Bankruptcy prediction using support
vector machine with optimal choice of kernel function parameters”,
compared with other classifying algorithms. The Volume 28, Issue 4, May 2005, Pages 603–614.
false negative rate of the proposed algorithm is low [2] J.A.K. Suykens and J. Vandewalle, “Least Squares Support Vector
against all other algorithms. Machine Classifiers:” Volume 308, Issue 2, 27 April 2001, Pages 397–
407.
[3] Tung-Kuang Wu, Shian-Chang Huang:”Evaluation of ANN and SVM
1 classifiers as predictors to the diagnosis of students with learning
Decision tree disabilities”, Volume 34, Issue 3, April 2008, Pages 1846–1856.
0.8
[4] Guha, S.; Rastogi, R.; Kyuseok Shim “ROCK: a robust clustering
0.6 algorithm for categorical attributes”, Pages 512 – 521.
0.4 Naïve Bayes [5] KakotiMahanta, Arun K. Pujari,” QROCK: A quick version of the
ROCK algorithm for clustering of categorical data”,Volume 26, Issue
0.2 15, November 2005, Pages 2364–2373.
0 Neural [6] Agnieszka Prusiewicz, “MaciejZiębaServices Recommendation in
Network Systems Based on Service Oriented Architecture by Applying Modified
ROCK Algorithm” Volume 88, 2010, pp 226-238.
Proposed [7] Christian Borgelt, “An implementation of the FP-growth algorithm”,
Algorithm Pages 1 – 5, 2000.
[8] BalazsRacz, D: An FP-Growth Variation without Rebuilding the FP-
Tree”.
Fig 3:Comaparison of algorithms with Proposed algorithm [9] Ke Wang, Liu Tang, Jiawei Han, “Junqiang Liu “Top down FP-Growth
for Association Rule Mining”, Volume 2336, 2002, pp 334-340.
[10] SudheepElayidom, Suman Mary Idikkula& Joseph Alexander “A
The above graph represents accuracy, recall, Generalized Data mining Framework for Placement Chance Prediction
precision of various classification algorithms. Problems” International Journal of Computer Application (0975-8887)
Volume 31- No.3, October 2011.
Proposed algorithm has the highest precision, [11] Ajay Kumar Pal, Saurabh Pal “Classification Model of Prediction for
accuracy and recall. Decision tree, Naïve bayes, Placement of students” I.J.Modren Education and Computer Science,
2013, 11, 49-56.
Neural network and proposed algorithm is [12] K. Pal, and S. Pal, “Analysis and Mining of Educational Data for
represented by blue, red, green and purple Predicting the Performance of Students”, (IJECCE) International Journal
of Electronics Communication and Computer Engineering, Vol. 4, Issue
respectively. 5, pp. 1560-1565, ISSN: 2278-4209, 2013.
[13] B.K. Bharadwaj and S. Pal. “Mining Educational Data to Analyze
IX. CONCLUSION Students' Performance”, International Journal of Advance Computer
Science and Applications (IJACSA), Vol. 2, No. 6, pp. 63-69, 2011.
Data mining techniques applied on educational data [14] S. K. Yadav, B.K. Bharadwaj and S. Pal, “Data Mining Applications: A
comparative study for Predicting Student's Performance”, International
in concerned with developing methods for exploring Journal of Innovative Technology and Creative Engineering (IJITCE),
the unique types of data; in educational domain Vol. 1, No. 12, pp. 13-19, 2011.
each educational problem has specific objectives

You might also like