0% found this document useful (0 votes)
50 views

Predicting Students Employability Using ML

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Predicting Students Employability Using ML

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2019 6th IEEE International Conference on Engineering Technologies and Applied Sciences (ICETAS)

Predicting Students’ Employability using Machine


Learning Approach
Cherry D. Casuat
Graduate School
Technological Institute of the Philippines
Quezon City , Philippines
[email protected]

Enrique D. Festijo
Graduate School
Technological Institute of the Philippines
Manila, Philippines
[email protected]

Abstract—This study aims to apply an approach using machine from any institution and become a key factor in the
learning for predicting students’ employability. The reputation of every institution in the Philippines. Foreseeing
researchers conducted a case study that involved 27,000 the employability of each student before graduation can aid
information (3000 observations and 9 features) of students’ students on the areas that they need to work extensively to
Mock Job Interview Evaluation Results, On-the Job Training the ares need for improvements.
(OJT) Student Performance Rating and General Point Average
(GPA) of students enrolled in OJT course of School Year 2015 Most of the researches and studies that have been
to School Year 2018. Three learning algorithms were used such published used the techniques on data mining to predict
as Decision Trees (DT), Random Forest (RF), and Support employability. Some of the techniques were ,Decision Tree,
vector machine (SVM) in order to understand how students get NaïveBayes and Support Vector Machine [4]. The Logistic
employed. The three algorithms were evaluated through the Regression, K-Nearest Neighbor, Random forest, SVM
performance matrix as accuracy measures, precision and recall (Linearsvc), Quadratic Discriminant Analysis (QDA)¸ and
measures, f1-score and support measures. During the Multi-class Ada Boosted were also used in data mining
experiments Support Vector machine (SVM) obtained 91.22% techniques in predicting employability [5].
in accuracy measures which was significantly better than all of
the learning algorithms, DT 85%, RF 84%. The learning curve This paper aims to developed a machine learning
produced during the experiment displays the training error approach in predicting student’s employability and analyze
results which were above the one for validation error while the its skillset signals. To achieved the said objective , this
validation curve displays the testing output where gamma was paper will present the initial stage consist pre-processing
best at 10 to 100 in gamma 5. This concludes that the model technique before applying learning algorithms. This paper is
produced with SVM was not underfit and over-fit. This study
in developing stage of a machine learning based model to
is very promising that lead to the researchers to be motivated
to enhanced the process and to validate the produced predict student’s employability. The researchers were
predictive model for further study. motivated to conduct the study in the context of emergent
areas such as institutional Intelligence or academic analytics
Keywords—Classification, Machine Learning, SVM, for the enhancement and promotion of those identified skill
Random Forest, Decision Trees, Employment prediction sets that will contribute to the better employment of
engineering students of the Technological Institute of the
I. INTRODUCTION Philippines.
The sections were presented in the following sections:
The Higher Education (HE) imparts an important role in the
development of every nation’s economy [1]. This Section 2 were the Background, Section 3 presented the
strengthens the country by contributing to a reliable and Proposed Method, Section 4discussed the Case Study of
qualified workforce to society. Higher Education is the Students’ Employability ,Section 5 were the Results &
foundation for several benefits such as: encouraging and Analysis, Section 6 were the Conclusion and Section 7
nurturing the talent; it is also increases the quality of were Acknowledgement.
national human capital; and the core means of a nation’s
competitive status to upgrade and this lead to educational II. BACKGROUND
institutions to find better ways in improving the This study was developed for the benefit of the career
employability of their students [2]. In most institutions, the center of Technological Institute of the Philippines (TIP).
major issues of every student is the employability, TIP was one of the leading engineering school in the
foreseeing students’ employability prior to job application Philippines. The career center of TIP was the model career
can increase institutional placement proportion. To identify center in the Philippines that accommodates students’
the weaknesses prior to interview to any company can employability development and career linkages. Career
support students to improve the areas where they were center conducts Students Development Program (SDP) and
identified as need for improvements [3]. The education is Mock Job Interview Evaluation Results for those
becoming more and more employment-oriented. This engineering students who intends to enroll in the On-the
become the trends of employment of graduating students

978-1-7281-4082-7 /19/$31.00 ©2019 IEEE

Authorized licensed use limited to: University of Exeter. Downloaded on June 24,2020 at 16:29:43 UTC from IEEE Xplore. Restrictions apply.
Job Training course (OJT) for engineering programs. Mock In the study conducted by Gao [10], he used the
Job Interview is one of the requirements of engineering employment information as attributes. The classic decision
students prior to deployment in the company for their tree classifiers were algorithms used for evaluation by
internship. It is evident that internship datasets increases doing the analysis and comparisons based on different
year after year and resulted to the growing population of criteria .Weka were used also by to build a data mining
educational datasets [6]. techniques to analyze his works. The study concludes that in
As Higher Education Institutions (HEIs) are for each gender the origin of students' place and every work
increasingly held accountable for students’ career outcomes is different, it shows that 50% of the graduates do not
and as competition for jobs in the labor market increases, select the educational institution [10].
institutions need to determine students who are most likely
to be employed and who needs improvement. To determine III. PROPOSED METHOD
the areas student need to improve before the student In this paper, the researchers followed a new approach
undergo the interview of any company [5]. in predicting employability of engineering students. In this
Different studies conducted in predicting employability method, the predictive models opened a new way to manage
of students. Most of the studies were done using data mining huge amounts of datasets.
techniques. The most widely used techniques in data mining Generally, the proposed method to predict students’
are classification and prediction techniques. The study of employability (based on machine learning) uses the
Sapaat [7] designed predictive model using data mining structure of data, orderly data approaches principles that
techniques on Graduate Employability where datasets ere were also used in data science [11]. Since, it is on its
gathered from the tracer study of the Ministry of Higher preliminary stage the proponent presented this study in the
Education of Malaysia consisting of all the graduates of most comprehendible approach to explain its
public and private institutions, polytechnics where the appropriateness [12].
algorithms such as Decision Tree, Naive Bayes, Neural The researcher proposed a new way of predicting
Network, Logistic Regression, Random Forest were students’ employability using machine learning approach.
commonly used in employment classification and Figure 1 explains the workflow of this proposal:
prediction. WEKA were also used as a data mining tool to
build predictive models. The methods such as CRISPDM
(Cross Industry Standard Process for Data Mining) and
KDD (Knowledge Discovery in Databases) were the
prediction analysis techniques that have been done. In this
study the decision tree were concluded as most appropriate
algorithm to predict the employability of students. The
highest accuracy was obtained by the classification J48
(variant of decision trees)[8]. Tajul[12] used five data
mining algorithms; Naive Bayes, Logistic Regression,
Multilayer Perceptron, KNearest Neighbor and J48 Decision
Tree to determine whether the graduates can be employed,
unemployed or in some instance cannot be determined six
months after graduation, the proponent used the bayesian
and decision Tree algorithm [12]. The Novel Neural
Network (NN) were also used to predict the unemployment
rate based on the information gathered from the web.
Fig. 1. Workflow of Experimental Set-Up
Another study of Sapaat uses the datasets collected from the
Alumni unit (AU) Examinations, Unit(EU), and curriculum
unit (CU) resulted to logistic regression as the best classifier Figure 1 presented the workflow of this study, the first
to predict whether the graduates will be employed in a was collection of dataset of On the Job Training (OJT)
private or public sector, unemployed or continue their students enrolled during 1st Semester of SY2015-2016 up to
education. Most of the studies conducted used Data mining Summer SY2018-2019 OJT students information where
techniques to predict employability[9] there were 3,000 observations and nine (9) features and a
In Mishra’s[1] study, used the built predictive model
total of 27,000 of information collected where 18,000
for employment status of graduates of Khon Kaen
information of students or data collected from Mock Job
University. In the said study the proponents used Averaged
Interview Results conducted by the career center and 3000
One Dependence Estimators (AODE) Averaged One
information of student performance ratings collected from
Dependence Estimators with subsumption resolution
the student performance rating or Student assessment tool
(AODEsr), Bayesian methods were also used such as Naïve
rated by the supervisor of the students that serve as
Bayesian Simple, Naïve Bayesian, and Bayesian networks,
and Naïve Bayesian Updateable. The highest accuracy assessment tool used during their on-the job training (OJT)
results were evident in Averaged One-Dependence in the company and another 3000 observation from the
Estimators with subsumption resolution (AODEsr) General Point Grade of the students collected from
algorithm which obtained 98.3% followed by the AODE Registrar’s office, then the Data cleaning were performed
which gathered 96.1% accuracy [1]. and removed all observations who have only partial
information. After that, the researchers normalized the data

Authorized licensed use limited to: University of Exeter. Downloaded on June 24,2020 at 16:29:43 UTC from IEEE Xplore. Restrictions apply.
before applying the learning algorithms. Then, merged the B. Data Pre-Processing
datasets collected from different offices. The merged The following are the techniques used in pre- processing
datasets were used in the case study and those datasets of datasets:
collected from the conducted survey to the students who
already graduated in SY 2015-2018 that were used to create
1. Merging of Datasets- The datasets collected from career
the class, the learning algorithms used were related to
center , Registrar’s office and OJT Faculty In-charge were
supervised learning. Decision Trees (DT), Random Forest
merged to have one dataset that will be used in the
(RF) and Support Vector Machine (SVM) were the
experiments [3].
algorithms used by the researchers to predict students
2. Data Normalization : Each attribute or columns were
employability. The performance matrix used by the
filled with the median values when there is a missing
researchers to assess thee the performance of the models’
values on attributes or columns. For row or number of
accuracy, precision, recall, f1-score measures. The
observations were filled with the mean of that number of
researchers compared the results of each learning algorithm
observation or row when there is a missing values for
and chose the model that has the highest accuracy among
the three, the best predictive model that was built and can row[3].
predict if the student is employable or less employable was
Support Vector Machine (SVM). C. Learning Algorithms
After performing the Feature Selection, the proponent
IV. CASE STUDY: STUDENTS’ EMPLOYABILITY split the data into training data into seventy percent (70%)
The data used in this study were gathered from the and testing data into thirty percent (30%) respectively[1].
available information of students who completed a degree The researchers started training the datasets applying the
in Engineering programs from SY 2015- 2016 to SY2018- learning algorithms are as follows:
2019. The researchers conducted survey to those students 1. Decision Trees – An algorithm that includes chance event
graduated from the said School year. Then the class were outcomes, utility, and resource costs. The tree-like graph
created using the result of the survey .This is consisting of model can be noticeable and resulted into decisions and
27,000 information of students with 3000 observations and possible occurences. Conditional control statements were
9 features of each student. The researchers gathered the used in this type of algorithm [3].
information by using two input methods such as data sets 2.Random Forest – An algorithm that consists of numerous
given by the career center of technological Institute of the decision trees at a given period of training where each
Philippines and the raw data from the survey conducted by decision tree resulted into an individual output. Resultant
the researchers to the graduates of engineering programs and output were produced in this type of alogrithm. [3].
currently landed a job. The researchers used Google forms 3. Support Vector Machine- This algorithm used the concept
with privacy consent statement in survey deployment. The of decision planes that described decision boundaries[3].
following are the process done by the researcher:
D. Performance Evaluation

A. Data Collection
Table 2: Predictive Model Decision Trees Result
The datasets were collected from the Career Center of
Technological Institute of the Philippines- Manila consists
of Mock job Interview Results consist of three thousands
(3000) observations and nine (9) features, Student Table 3: Predictive Model Random Forest Result
Performance Rating of the OJT students collected by the
On-The Job Training (OJT) Faculty In-charge and General
Point Average from the Registrar’s Office. The datasets that
were collected were compliant of Data Privacy Act of the
Philippines. Table 4: Predictive Model Support Vector Machine Result

TABLE 1
Students’ Employability Dataset

The following performance measures were used for


performance evaluation of the three learning algorithms:

1.Precision – The precision is certainly the classifier’s


ability of not classifying a positive when the sample is
negative. The score gathered was at its best value at 1 and
worst score at 0. The ratio of true positives from the false
positives can be shown as [12].
P = tp / (tp + fp) [12]

Authorized licensed use limited to: University of Exeter. Downloaded on June 24,2020 at 16:29:43 UTC from IEEE Xplore. Restrictions apply.
Where: and precision and F1-score. SVM obtained 91.22% in
tp = true positives accuracy, 91.15% in Recall, 91% for precision and 91% for
fp=false positives f1 score. Since the SVM model got the highest accuracy the
proponents plotted its learning curve and validation curve.
1. Recall- It is the ability of the classifier to locate all the
positive samples. This score was at its best value at 1and 1. Learning Curve graphs
its worst score at 0. The ratio of the true positives from
the false negatives can be shown mathematically as [12].

R= tp / (tp + fn) [12]


Where:
tp = true positive
tn = true negative

2. F1-score- It is where 1 is the F1 score best value and 0 is


its worst score . The relative contribution of precision and
recall to the F1 score is equal. The weighted average of
the precision and recall cab be illustrates mathematically
as [12].

F1-score = (P+R)/2 [12] Fig. 2. Learning curve with SVM in gamma 5

4.The support can be shown as in each class in each


predicted label , the number of occurrences were made
[12].

Cross-evaluation performance measures were used


consist of the following :

1) Accuracy: The correct predictions must be higher than


incorrect prediction, this can be stated mathematically as:
Accuracy = (TP + T N) = (TP + F N + T N + FP) [3] (1)
2) Sensitivity or Recall: The proportion from instances of
those are actually positive from correct positive
classifications, this can be stated mathematically as:
Sensitivity = TP= (TP + F N) [3] (2)
3) Specificity: The proportion in instances that are actually Fig. 3. Learning curve with SVM in gamma4
negative from correct negative classifications.This can be
mathematically stated as: The figure 2 illustrates the learning curve graph in gamma 5
Specificity = T N= (T N + FP) [3] (3) where the maximum training score mean was 0.952 and the
4) Positive Predictive Value (PPV): The Proportion in maximum cross validation mean was 0.901, maximum
instances that are predicted positive from correct positive training score was 0.994 and the maximum cross-validation
classifications. This can be stated mathematically as: score was 0.811. The learning curve for the training error
PP V = TP= (TP + FP) [3] (4) results was above the one for validation error. The accuracy
5) Negative Predictive Values (NPV): The Proportion in measure described how good the model is and the MSE on
instances that are predicted negative from correct negative the other side described how bad the model is. The
classifications. This can be stated mathematically, irreducible error gives an upper bound.
as: 2. Learning Validation graphs
NP V = T N= (T N + F N) [1] (5)

Table 5: Comparison of Learning Models Performance


Model DT RF SVM
Accuracy 84.5% 84% 91.22%
Recall 85% 84% 91.15%
Precision 85% 84% 91.00%
F1 score 85% 84% 91.00%

Table 5 display the comparison of performance matrix


where accuracy, Recall and Precision & F1 score measures
used for evaluation of models. Among the learning
Fig. 4. Validation curve with SVM in gamma 5
algorithms used SVM obtained the highest accuracy, recall

Authorized licensed use limited to: University of Exeter. Downloaded on June 24,2020 at 16:29:43 UTC from IEEE Xplore. Restrictions apply.
engineering students and determine what skillset needs to be
improved .

VI. ACKNOWLEDGMENT
The proponents would like to thank the Career Center of
Technological Institute of the Philippines especially to the
SDP Officer and Career Adviser for their unwavering
support to the proponents to make this study possible.

REFERENCES
[1] T. Mishra, “Students’ Performance and Employability Prediction
Fig. 5. Validation curve with SVM in gamma 4 through Data Mining: A Survey”, 2017
[2] Chen, J.K.,”A pro-performance appraisal system for the University”,
Figure 4 shows the validation curve with SVM in Expert Systems with Applications, 20100315.
gamma 5 where the maximum training R-squared score was [3] Pushpendra Singh Rajawat, Deepak Kumar Gupta, Santosh Singh
0.918 and the maximum cross-validation score was 0.857 Rathore, Avtar Singh. "Predictive Analysis of Medical Data using a
while in figure 5 shows the validation curve with SVM in Hybrid Machine Learning Technique", 2018 First International
Conference on Secure Cyber Computing and Communication
gamma4 where the maximum training R-squared score was (ICSCCC),
0.912 and the maximum cross-validation score was 0.909. [4] Wilton W.T. Fok, Y.S. He, H.H. Au Yeung, K.Y Law, KH Cheung,
It shows that the validation curve with SVM in gamma 5 , YY. Ai, P. Ho. "Prediction model for students' future development by
the gamma is best at 10 to 100. deep learning and tensorflow artificial intelligence engine", 2018 4th
International Conference on Information Management (ICIM), 2018
[5] Yogesh Bharambe, Nikita Mored, Manisha Mulchandani, Radha
All models were evaluated, the results of the analysis Shankarmani, Sameer Ganesh Shinde. "Assessing employability of
have been shown in tables 2-5 . In training the datasets students using data mining techniques", 2017
using different classifiers the proponent had drawn the [6] L. Verecio Rommel.”Predicting Employability Skills among
following figures. When it comes to the accuracy measure, information Technology Graduates of Philippine State University in
their On-the Job Training using J48 Algorithm”, Indian Journal of
the Support Vector Machine (SVM) obtained 91.22%, it is
Science and Technology,2018.
significantly better than all of the learning algorithms, DT
[7] Rahman,N.Tan,KLet. al , “Predictive analysis and data mining among
85%, RF 84%. As shown in Tables 2-5.With respect to the employment of fresh graduate students in HEI”, 2017,
Recall measure SVM obtained the Recall of 91.15%, http://dx.doi.org/10.1063/1.5005340.
evidently better than all other base classifiers, DT 85%, RF [8] Xu, W., Li, Z., Cheng, C., & Zheng, T. (2012). “Data mining for
78% As shown in Tables 2-5. When it comes to F1score unemployment rate prediction using search engine query data. Service
measure, SVM obtained 91% evidently better than all of the Oriented Computing and Application”, 7(1), 33–42.
https://doi.org/10.1007/s11761-012-0122-2Y. Yorozu, M. Hirano, K.
base classifiers such as DT 85% and RF 84%. Oka, and Y. Tagawa, “Electron spectroscopy studies on magneto-
optical media and plastic substrate interface,” IEEE Transl. J. Magn.
Japan, vol. 2, pp. 740–741, August 1987 [Digests 9th Annual Conf.
Magnetics Japan, p. 301, 1982].
V. CONCLUSION AND FUTURE WORK [9] Sapaat, M. A., Mustapha, A., Ahmad, J., & Chamili, K. (2011). A
Data Mining Approach to Construct Graduates Employability Model
in Malaysia, 1(4), 1086–1098.
The Higher Education Institutions (HEIs) becoming more [10] Gao, L. (2015). Analysis of Employment Data Mining for
accountable for students career outcomes and as jobs in the University,”Student based on Weka Platform, 2(4), 130–133 A
labor market increases its competition, institution needs to Comparison”,International Journal of Innovative Research in
Computer and Communication Engineering, (An ISO Certified
identify students’ employability.This study was a Organization), 3297(6), 4584–4588. Retrieved from www.ijircce.com
preparation for machine learning approach of predicting [11] Francisco J. García-Peñalvo1,3,4*,Juan Cruz-Benito1,3,4, Martín
students’ employability. The preparation stage created a Martín-González5,AndreaVázquez-Ingelmo1,3,4, José Carlos
great effect in the results of learning algorithms. Therefore Sánchez-Prieto1,4, Roberto Theró “Proposing Machine Learning
researchers concluded that Support Vector Machine (SVM) Approach to Analyze and Predict Employment and its factors”, 2017
produces a predictive model that obtained the highest [12] M. K. Joyo et al., “Optimized Proportional-Integral-Derivative
Controller for Upper Limb Rehabilitation Robot,” Electronics, vol. 8,
accuracy of 91.22% and recall measures of .991 or 91.15% no. 8, p. 826, 2019.
and 91% for precision and recall respectively. The
researchers realized that gamma is best at 10 to 100 in
gamma 5 as shown in figure 3. The researchers concluded
that the learning curve and validation curve that it shown
was not over fit or under fit. For future work the researchers
will analyzed what skillsets will give higher importance
score in predicting students’ employability. The researcher
will use the results of the experiments in real scenario to
validate its accuracy. The model will be used in creating a
system that will predict students’ employability of

Authorized licensed use limited to: University of Exeter. Downloaded on June 24,2020 at 16:29:43 UTC from IEEE Xplore. Restrictions apply.

You might also like