A Novel Optimized Approach For Machine Learning Techniques For Predicting Employee Attrition
A Novel Optimized Approach For Machine Learning Techniques For Predicting Employee Attrition
Abstract—As there are numerous opportunities for domain have not discussed about optimizing the results of
competent people throughout the world, workers frequently best performing algorithm. This paper attempts to optimize
switch employers to take advantage of these opportunities, the results using ROC curve method.
which causes a high attrition rate inside the organization. All
firms increasingly view employee attrition as a major problem The cost of hiring, training, and supporting employees
because of its negative impact on workplace productivity and can outweigh the value of human capital investment. This
the timely achievement of corporate goals and vision. paper provides a systematic survey of machine learning
Businesses are using machine learning technology to estimate techniques that are used in employee attrition prediction
worker turnover rates in an effort to solve this problem. In models. These techniques aim to predict whether an
order to most precisely anticipate employee attrition, various individual will leave his/her current place of employment
machine learning algorithms are investigated and their results soon or stay longer. LR algorithm in Machine Learning is
are compared in this study. The current study also optimizes used to analyze categorical target data. LR can be classified
the results of the most efficient machine learning algorithm for either as binomial, multinomial or ordinal. Based on the
the given data using the ROC method. Notably, optimization of provided dataset of independent variables, LR evaluates the
machine learning algorithms has not been studied in earlier likelihood of an event occurring, such as whether an
research works related to employee attrition. In the current employee will quit the organization or not. As the sigmoid
study, an attempt is made to optimize the performance of the
function is used to model the data in logistic regression, only
selected algorithm and a model is proposed.
when a decision threshold is introduced into the equation,
Keywords—AUC, confusion matrix, F1-score, employee logistic regression becomes a classification approach [1].
attrition, machine learning, Precision, Recall, ROC curve The threshold value is an important feature of Logistic
regression and is determined by the classification issue itself.
I. INTRODUCTION The precision and recall levels have a large influence on the
threshold value determination. DT has a tree-like structure,
Machine learning has been a central part of business for
with the core having multiple dataset attributes, branches
decades. It is used for innumerable tasks in different
providing a rule base, and each child node representing the
industries and every day across the world. In recent years,
result. The judgments or tests are constructed using the
machine-learning techniques have become more prevalent in
attributes of the provided dataset. It only poses a Yes/No
tactical decision making and prediction.
question and splits into subtrees in accordance with the
Employee attrition is a key downside risk to response. It is a visual representation of each potential
organizations and is difficult to predict. In recent years, the solution to a dilemma or choice under consideration [2]. On
use of machine learning algorithms in the prediction of the other side, Random Forest averages many decision trees
employee turnover has been explored to arrive at promising applied to various subsets of a given dataset to enhance the
results. This paper aims to develop a machine learning model projected efficiency of the prediction on that dataset. The
to predict employee attrition based on the data that has been random forest gathers projections from each decision tree
gathered by human resources personnel and compares the structure and forecasts the ultimate result based on the
performance of the six machine learning techniques majority vote of predictions, as opposed to relying just on
commonly used to predict employee attrition. The six one decision tree [3]. The accuracy increases and the
techniques assessed in this study are Logistic Regression possibility of overfitting decrease as the size of the forest
(LR), Decision Tree Classifiers (DT), Support Vector increases. Both continuous data, as in regression, and
Machines (SVM), Naïve Bayes Classifiers (NB), K-Nearest discrete data, as in classification, may be handled by the
Neighbors Classifiers (KNN), and Random Forests (RF). Random Forest Algorithm. In categorization tasks, it
The previous studies conducted in this performs better than other algorithms [4]. The Naïve Bayes
strategy is a supervised approach to learning that addresses
______________ classification challenges by applying the Bayes theorem. As
*Corresponding author a probabilistic classifier, it makes predictions based on the
likelihood of an item [5]. The Bayes theorem can be used to
2
Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on July 23,2024 at 05:38:42 UTC from IEEE Xplore. Restrictions apply.
was predicted using tree-based models. These models and efficiencies. The accuracy was compared before and
contain random forests and light gradient enhanced trees, after feature selection and parameter tuning, However no
which performed the best. They utilized their own dataset significant change was observed and moreover the execution
with 5550 samples which was generated from anonymously time recorded for these algorithms was also considerably
submitted resumes on Glassdoor. The comparison of the high, hence, fine tuning in the algorithms should be done to
performances is based only on the ROC curves and hence it improve the performance. In [22], the authors compared 4
may not be comprehensive, also the ROC performance Machine Learning algorithms on an employee dataset and
recorded for light gradient enhanced trees of 76% which can observed that KNN and Random Forest returned the most
further be improved and enhanced. accurate results, and random forest algorithm having the
highest precision of 87%. Although the data has been
In [16], the authors performed the EDA of the IBM
visualized and preprocessed, there is still bias in the data
dataset and balanced the dataset, after which they compared
with the number of records for both classes which could have
the results from various classification algorithms namely
resulted in poor results.
random forest, Decision tree, K-Nearest neighbor (KNN),
Logistic regression, and Stochastic Gradient Descent over [23] discovered the disparity in the retrieved data in a
evaluation metrics like Accuracy, Precision, Recall, and F1 study utilizing the IBM HR Employee Attrition &
score with random forest having the most accurate results. Performance dataset. During the data exploration stage, the
The overall comparison has been made on all the algorithms correlation plot and histogram visualization were used to
that were taken into consideration by the authors and it was show the correlation between the continuous variables in the
very comprehensive by considering all the parameters for model. To balance the Attrition class, the SMOTE (Synthetic
accuracy. The authors in [17] selected supervised learning Minority Oversampling Technique) was used. The authors in
techniques to build the ML model to predict the employee [23] concluded that among the five methods, Logistic
attrition and evaluate the performance based on the Regression showed the highest precision accuracy of 87%,
confusion matrix and pseudo R square estimate of error rate however the accuracy could be improved by accurate feature
and concluded that random forest is the ideal model to selection and appropriate scaling.
predict attrition. They have compared various supervised
models for prediction, however they have not compared all Another study was conducted by [16]which trained and
the metrics before arriving at a conclusion. compared various machine learning models including
Decision trees, KNN, SVC and Light GBM and recorded the
In another study [18], the authors proposed a model highest accuracy with Light GBM of 99.13%. The attributes
comparing the performance analysis of 6 ML techniques which impacted the most to the dependent variable were
namely ANN, SVM, Bagging, GBT, RF and DT after carefully selected which led to the overall accuracy of all the
preprocessing and using chi-square feature selection. It was algorithms. They have not only compared the algorithms
observed that GBT performed consistently better than the based on the accuracy that was recorded but also highlighted
other techniques at predicting the employee attrition. The the benefits and tradeoffs of each algorithm. The accuracy
ML techniques were measured for their performance across for all the algorithms discussed was very high although most
various feature selection methods to achieve a clear of them were prone to overfitting.
understanding of the features that are actually contributing to
the accurate prediction. The authors in [19]investigated the A study was conducted in [24] on the data collected from
usage of the Extreme Gradient Boosting (XGBoost) the personnel records of employees in one of the Higher
approach, which is more resilient due to its regularization Institutions in South-West Nigeria which upon cleaning and
formulation. Using data from a multinational retailer's HRIS preprocessing was utilized to predict the employee attrition,
and BLS, XGBoost is compared to six previously used various decision tree models were used and compared
supervised machine learning classifiers, demonstrating much including C4.5 (J48), REPTree, and CART. WEKA
greater accuracy in predicting staff turnover. The findings of Classifier was used to compare the performance based on the
this [19] study show that the XGBoost classifier is a superior TP and FP Rate, precision, recall, F-measure and ROC Area
method for predicting turnover in terms of much greater and concluded that C4.5 performed better than the others
accuracy, comparatively short runtimes, and efficient taken into consideration. Although the data is preprocessed
memory use. In comparison to other classifiers, the before training the models the accuracy recorded is still
formulation of its regularization makes it a robust approach pretty low at 67% for C4.5 and even lower for the other,
capable of tolerating noise in HRIS data, meeting the which could be improved by making minor adjustments
primary issue in this area. thereby improving the prediction accuracy.
In a similar study [20], the authors applied data mining Employee attrition was compared and predicted on
techniques and compared the performance of Decision trees varying sizes of the data using all machine learning
(C4.5, Random Forest) and Neural Networks (MLP, Radial algorithms in [25]. The authors considered two different
Basic Function Network ) and recommended using C4.5 dataset, one collected primarily from a bank and the second
Decision trees with the highest accuracy of 95.14%. The data one being the IBM Watson dataset, and upon preprocessing
was modelled and tested using limited classifiers, and could and feature scaling the models were validated. The effects of
be implemented on other classifiers to obtain the best changes the size of the dataset in accurate prediction and
accuracy and feature selection or attribute reduction could be correlation was studied. The approach towards the problem
performed to ignore the irrelevant attributes. In [21], of employee attrition is very unique and reliable. The
conducted a comparative study to develop machine learning problem is addressed taking all the possible scenarios into
models, i.e., J84 Decision Trees, SVM, and Artificial Neural consideration. Multiple data was considered to provide a
Networks, for predicting probable employee attrition and better understanding of the accuracy of the models.
compare between the algorithms in terms of their accuracy
3
Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on July 23,2024 at 05:38:42 UTC from IEEE Xplore. Restrictions apply.
In [26], the authors aimed at optimizing the K-Nearest strong correspondence with ‘YearsAtCompany’. Likewise,
Neighbor classifier to predict employee attrition more ‘YearsAtCompany’ and ‘YearsWithCurrentManager’ are
accurately and precisely. The correlation among the features also strongly associated as well. Another observation from
of the dataset was explored using the Pearson correlation Fig.4 is that the ‘EmployeeCount’ and ‘StandardHours' have
coefficient method. The strongest correlation was recorded no correlation with the other attributes in the dataset as they
with distance from home, the authors proposed the improved contain a constant value, hence these attributes are dropped
KNN algorithm and upon validating the model it yielded an and finally label encoding is applied to the remaining
accuracy of 86.7%. The authors have tried to improve the categorical attributes before feature selection and training the
performance of the existing KNN classifier to predict the model.
employee attrition by proposing the optimized algorithm,
various metrics were also considered to accurately measure
the performance of the proposed algorithm like the k-value,
ROC Curve, reliability and cumulative curve.
III. METHODOLOGY
A. Dataset
The population that was considered under this study was
employees who had been working in organizations, and their
personal as well as professional details were recorded to
predict the attrition, the data was compiled and made
available by IBM HR Analytics, the dataset contained unique
records of 1470 employees and had no missing values[27].
Each employee had 35 attributes including details about their
current job, their personal background and educational
details as well as their personal details including marital
status and relationship satisfaction based on which the target
variable (i.e. Attrition) which is categorical and binary in
nature is generated. Out of the total of 1470 observations, Fig. 1. Employee Attrition Distribution
1233 is No whereas 167 is Yes for attrition attribute. There
are 588 females and 883 male employee in data. The
attributes like geography, domicile, size of industry were not
available in the dataset which can impact the employee
attrition too.
B. Data Preprocessing and Representation
As discussed the dataset contains no missing values, the
categorical attributes containing a constant value (over18)
are identified and dropped from the data. Since
EmployeeNumber contains discrete values it is dropped as
well because it is supposed not to contribute in the
prediction.
Since this problem falls under Binary Classification
problem, i.e. whether the employee would leave the
company or not the distribution is visualized in Fig. 1. As Fig. 2. Job Satisfaction
observed the number of people who actually left the
company are very less compared to those who didn’t, which
further led to understanding how many employees were
actually satisfied working in their current role. As observed
in Fig. 2 almost 80% of employees that forms the majority of
the population considered in the study are actually satisfied
which could result in lower attrition rate in the data.
As per Fig. 3 the employees in the age group of 25-40
have shown the highest attrition rates. To get a better
understanding of the features, a heat map is also generated to
depict the correlation among the attributes. As observed from
Fig. 4, the ‘MonthlyIncome’ and ‘JobLevel’ have a strong
positive correlation as the job level increases the monthly
income increases as well. ‘SalaryHike’ and
‘PerformanceRating' are also correlated, higher the rating Fig. 3. Age-wise Attrition
higher the hike in salary, ‘TotalWorkingYears' has a weak
correlation with the ‘NumCompaniesWorked’ but has a
4
Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on July 23,2024 at 05:38:42 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Correlation Heat-map
5
Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on July 23,2024 at 05:38:42 UTC from IEEE Xplore. Restrictions apply.
Fig. 6. Experiment Design
D. Experiment Design
The proposed methodology in Fig. 6 involves splitting
the dataset 80:20 into training and testing/validation data
keeping a constant random state as 0 after preprocessing,
exploratory data analysis, and feature selection. The training
subset of the data is used to train the various algorithms and
generate a model. Upon successful training, the model is
checked for accuracy against the testing data. The other
evaluation parameters used in the study are precision, recall,
F1-score, and AUC. Upon testing the models and recording
the predictions for each model, the predicted value and the
actual value is compared and the results are analyzed using Fig. 7. Optimum k- Value
confusion matrix and ROC curve. Since a variety of machine
highest precision of 84.21%. The algorithm that performed
learning algorithms and techniques can be applied to solve
the lowest was Decision Tree classifier, it not only had the
the employee attrition prediction problem, the classification
lowest accuracy but also showed very poor precision in
techniques considered for comparison in this study are
terms of prediction and underperformed in the other metrics
Logistic-Regression, Decision-Trees, Random-Forest-
as well. The confusion-matrix was created for all the
Classifier, Naïve-Bayes, K-Nearest-Neighbor and Support-
classifiers taken into consideration in this study in order to
Vector-Machines. The main goal is to identify the optimum
account for the amount of incorrect positive and
classifier with the most accurate results. The value of k in K-
incorrect negative results, which is crucial to assessing the
Nearest-Neighbor will be determined by the input data, since
performance of the model in its purest meaning [30][28]. The
data with more outliers or noise would most likely perform
confusion-matrix for each of the methods employed in this
better with larger values of k. Fig. 7 depicts the change in
study is shown from Fig. 8 to Fig. 13.
error with respect to k-values for training and testing data, As
observed, for k=5, the test error is the minimum, hence k is
considered as 5 in this study.
IV.RESULTS AND DISCUSSIONS
After exploration and preprocessing of the data, the most
impactful features were selected, the data was trained and
tested and the accuracy was generated of each model,
Logistic Regression had the highest accuracy among all the
classifiers studied, It was fine tuned to arrive at an accuracy
of 87.76%, Other algorithms also gave promising results, K-
Nearest Neighbor recorded an accuracy of 87.75% even
Fig. 9. Decision Tree Confusion
though Logistic Regression it is prone to overfitting, the Fig. 8. Logistic Regression
Matrix
results recorded are realistic with a precision of 78.26% and Confusion Matrix
KNN was the most precise in terms of prediction and
recorded the
6
Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on July 23,2024 at 05:38:42 UTC from IEEE Xplore. Restrictions apply.
Similarly, Decision Tree classifier has recorded the
lowest performance in all the metrics considered in this study
and is therefore, also rejected. Based on the metrics
considered, SVM with the linear kernel also shows
promising results when tested on the data, however, Logistic
Regression is considered as the ideal classifier for the data
under study as it outperforms the other classifiers on all the
validation metrics in terms of accuracy, precision, recall, F1
score and AUC and can accurately predict the employee
Fig. 10. Naïve Bayes Confusion attrition in an organization.
Matrix Fig. 11. Random Forest Confusion
Matrix
TABLE I. RESULT METRICS
Accu- F-1
Precision Recall
Classifier racy Score AUC
(%) (%)
(%) (%)
Logistic
87.76 78.26 36.73 50.00 0.8440
Regression
Decision
77.55 33.33 34.69 34.00 0.6041
Trees
Random
85.03 69.23 18.36 29.03 0.7830
Forest
Fig 12. K-Nearest Neighbour Fig 13. Scalar Vector Machine Naïve
80.27 40.82 40.82 40.82 0.7551
Confusion Matrix Confusion Matrix Bayes
K-Nearest
Fig. 14 shows the combined ROC curve of all the 87.75 84.21 32.65 47.06 0.6788
Neighbor
algorithms with their respective AUC scores. AUC is a
metric that aggregates the performance across all Support
categorization criteria. AUC may be interpreted as the Vector 84.35 55.17 32.65 41.03 0.7637
likelihood that the model rates a random positive case higher Machines
than a random negative example. It is observed that similar
to the above metrics, Logistic Regression has the highest A. Optimising the Model
AUC score of 0.8440. However, SVM and Random Forest As Logistic Regression outperformed the other
shows improved AUC score of 0.7637 and 0.7863 which algorithms considered in this study, using the ROC curve in
implies that they have potential in predicting employee Fig. 14, an attempt is made to optimize and overall improve
attrition correctly. the performance and efficiency of logistic-regression model.
As observed in Table 1, confusion matrices and Fig.14, Since the default threshold value considered while
LR shows the best results, with the highest AUC score of training the logistic regression algorithm is 0.5, based on the
0.844, followed by Random Forest Classifier with the AUC ROC curve and AUC value as observed in Fig. 14, the
score of 0.6788. KNN model trained with k=5 can be threshold value was changed to 0.4 in order to optimize the
considered as an equally efficient model just like Logistic results and performance of the running model. As observed
Regression, since it has similar accuracy to Logistic in Table 2 and Fig.15.
Regression and better precision of 84.21%, Recall and F-1
scores are also comparable, however, the AUC score of The accuracy has reduced upon changing the threshold
KNN is one of the lowest after Decision Tree classifier value, from 87.76% to 86.73%, however, the count of false-
among all the algorithms considered. negative values which are most critical in the study where
the true label is 1, but the predicted label is 0, i.e. the
employee would leave the organization, but the model
predicted that the employee would not leave, has reduced
from 31 to 26, which can be considered as an improvement
of the model. Hence, even though the accuracy has very
slightly reduced, the wrong or incorrect classifications has
improved by optimizing the threshold value.
7
Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on July 23,2024 at 05:38:42 UTC from IEEE Xplore. Restrictions apply.
opportunities and hence this is also an important factor while
predicting attrition.
B. Discussion
A company should focus on both its organizational
culture and the work environment if they want to avoid
turnover. It is important to create a motivating work
environment that gives meaning and purpose so that
employees know what their contributions mean. When Fig. 16. Major factors that contribute towards attrition
employees get a feel like their input matters, they feel more Fig. 16 depicts the major factors that contribute towards
invested in staying with the organization for the long-term. attrition of an employee which should be the primarily
At any given time, a business can be losing more than 50% considered during hiring to prevent attrition. Employees in
of its employees. This can lead to inefficiencies, an inability the age group of 25-40 years having the monthly income
to meet goals and deadlines, and lackluster productivity. The between $2000-$3000 with 0-5 years of total working
causes of employee attrition are as varied as the individuals experience and 0-2 years at the company and 6-8 years in the
themselves. current role have a higher tendency to leave the organization.
In this paper, the factors that contributed the most
towards attrition was MonthlyIncome, other features that V. CONCLUSION
played a significant role in predicting attrition of an The current study helps to generate new insights
employee were DailyRate, Age, TotalWorkingYears, regarding employee attrition which cannot be generated by
YearsAtCompany, YearsInCurrentRole, DistanceFromHome merely conducting exit interviews with employee. Income,
and Overtime. Income acts as the biggest motivation for years in current role, age, total working years, and number of
employees working in an organization to continue working years in the same company were the most import factors for
there, if they are satisfied with their income the attrition employee attrition. Employee Attrition Prediction is a
would be reduced and they would be retained, Age also business strategy concerned with the processes of predicting
impacts attrition, as observed in Fig. 3, employees who are how many employees will leave an enterprise over the next
currently in the ages 25 to 40 are more prone to leave the year, and which groups of employees are more likely to be
organization as in the initial years of their career they are on their way out. This information can help companies
filled with the zeal to take on new challenges and risks evaluate their workforce, monitor the effectiveness of
thereby switch companies more often, also the retention programs, and plan for workforce reductions.
responsibilities are comparatively less which makes it easier
for them to leave more easily. This paper identified the factors that strongly affected the
attrition of an employee and the selected features were used
It is also observed from this study that, the more work to train the algorithms and test them against various inputs
experience an individual has in terms of number of years, which were alien to the model and the performance was
and the years they have spent in their current organization recorded to arrive at the optimum method to tackle the
and role would affect their chances of leaving the problem of employee attrition and aid in early prediction of
organization as well which is why it is also an important the same.
feature to consider during hiring and later for retention as
well. Various machine learning algorithms were implemented
out of which Logistic regression gave the most accurate
More often, an employee is unsatisfied if the organization results and outperformed the other methods on all the metrics
they are working for is geographically located far from their considered and the selected model was optimized to reduce
place of residence, as the time spent in commuting increases the misclassifications and can therefore be used for
which also affects the overall productivity of the employee predicting employee attrition most accurately.
and can also sometimes be considered as an added expense.
An employee does not voluntarily choose to work overtime The first step to reduce employee turnover is to find out
unless there is a personal motive in terms of gaining extra where attrition starts. When answering this question it's
remuneration, however if the employee is working overtime important for companies to make sure they're looking at all
but not out of choice it could lead to a poor productivity and possible sources that could lead to attrition- both voluntary
with time the employee would consider looking for other and involuntary- because otherwise they won't know where
they should focus their efforts. A more accurate and
8
Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on July 23,2024 at 05:38:42 UTC from IEEE Xplore. Restrictions apply.
insightful prediction of employee attrition relies on a variety Eng. Technol., vol. 11, no. 12, pp. 3329–3341, 2020, [Online].
of factors. It is possible to predict the probability that an Available: https://doi.org/10.34218/IJARET.11.12.2020.313
[19] R. Punnoose and P. Ajit, “Prediction of Employee Turnover in
employee will leave their current job based on their Organizations using Machine Learning Algorithms”, Int. J. Adv.
personality, background, and professional values. It also Res. Artif. Intell., vol. 5, no. 9, pp. 22–26, 2016, doi:
relies on workplace strife, employer prestige, and other non- 10.14569/ijarai.2016.050904.
personal factors. [20] J. Hamidah, H. AbdulRazak, and A. O. Zulaiha, “Towards applying
data mining techniques for talent managements”, 2009 Int. Conf.
With the available dataset, this study optimized the Comput. Eng. Appl. IPCSIT, vol. 2, no. March 2015, pp. 476–481,
results of best performing model out of the six algorithms 2011.
used in the current study. In future with larger datasets, [21] N. Mansor, N. S. Sani, and M. Aliff, “Machine Learning for
Predicting Employee Attrition”, Int. J. Adv. Comput. Sci. Appl., vol.
employee segmentation can be studied to develop ‘at risk’ 12, no. 11, pp. 435–445, 2021, doi:
categories of employees and deep learning algorithm can 10.14569/IJACSA.2021.0121149.
also be employed to study employee attrition. [22] A. Patel, N. Pardeshi, S. Patil, S. Sutar, R. Sadafule, and S. Bhat,
“Employee Attrition Predictive Model Using Machine Learning”,
REFERENCES Int. Res. J. Eng. Technol., no. May, pp. 3855–3859, 2020, [Online].
Available: www.irjet.net
[1] IBM, “What is logistic regression?”, 2022.
[23] K. K. Mohbey, “Employee’s attrition prediction using machine
https://www.ibm.com/topics/logistic-regression (accessed Sep. 20,
learning approaches”, Mach. Learn. Deep Learn. Real-Time Appl.,
2022).
no. January, pp. 121–128, 2020, doi: 10.4018/978-1-7998-3095-
[2] H. H. Patel and P. Prajapati, “Study and Analysis of Decision Tree
5.ch005.
Based Classification Algorithms”, Int. J. Comput. Sci. Eng., vol. 6,
[24] A.A.D. Alao, “Analyzing Employee Attrition using Decision Tree
no. 10, pp. 74–78, 2018, doi: 10.26438/ijcse/v6i10.7478.
Algorithms”, Inf. Syst. Dev. Informatics, vol. 4, no. 1, pp. 17–28,
[3] L. Breiman, “Random Forests”, Mach. Learn., vol. 45, no. 1, pp. 5–
2013, [Online]. Available:
32, 2001, doi: 10.1023/A:1010933404324.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1012.294
[4] A. Sarica, A. Cerasa, and A. Quattrone, “Random Forest Algorithm
7&rep=rep1&type=pdf
for the Classification of Neuroimaging Data in Alzheimer’s Disease:
[25] Y. Zhao, M. K. Hryniewicki, F. Cheng, B. Fu, and X. Zhu,
A Systematic Review”, Front. Aging Neurosci., vol. 9, 2017, doi:
"Employee Turnover Prediction with Machine Learning: A Reliable
10.3389/fnagi.2017.00329.
Approach", vol. 869. Springer International Publishing, 2018. doi:
[5] I. Rish, “An Empirical Study of the Naïve Bayes Classifier An
10.1007/978-3-030-01057-7_56.
empirical study of the naive Bayes classifier”, Cc.Gatech.Edu, no.
[26] T.A. Assegie, “A Predictive Model for Improving Employee
January 2001, pp. 41–46, 2014, [Online]. Available:
Attrition Rate With K-Nearest Neighbor Classifier”, Int. J. of
https://www.cc.gatech.edu/~isbell/reading/papers/Rish.pdf
Research and Reviews in App. Sci., vol. 46, no. 1, pp. 78–84, 2021,
[6] S. Raschka, “Naive Bayes and Text Classification I - Introduction
[Online]. Available:
and Theory.” arXiv, 2014. doi: 10.48550/ARXIV.1410.5329.
www.arpapress.com/Volumes/Vol46Issue1/IJRRAS_46_1_09.pdf
[7] Z. Zhang, “Introduction to machine learning: K-nearest neighbors”
[27] IBM, IBM HR Analytics Employee Attrition & Performance. 2019.
Ann. Transl. Med., vol. 4, no. 11, pp. 1–7, 2016, doi:
[Online]. Available: https://github.com/IBM/employee-attrition-
10.21037/atm.2016.03.37.
aif360
[8] D. SBoswell, “An Introduction to Support Vector Machines”,
[28] M. Hasnain, M. F. Pasha, I. Ghani, M. Imran, M. Y. Alzahrani, and
Recent Adv. Trends Nonparametric Stat., pp. 3–17, 2002, doi:
R. Budiarto, “Evaluating Trust Prediction and Confusion Matrix
10.1016/B978-044451378-6/50001-6.
Measures for Web Services Ranking”, IEEE Access, vol. 8, pp.
[9] D. Singh, “A Literature Review on Employee Retention with Focus
90847–90861, 2020, doi: 10.1109/ACCESS.2020.2994222.
on Recent Trends”, Int. J. Sci. Res. Sci. Eng. Technol., no. May, pp.
425–431, 2019, doi: 10.32628/ijsrst195463.
[10] M. Pratt, M. Boudhane, and S. Cakula, “Employee Attrition
Estimation Using Random Forest Algorithm”, Balt. J. Mod.
Comput., vol. 9, no. 1, pp. 49–66, 2021, doi:
10.22364/BJMC.2021.9.1.04.
[11] F. Fallucchi, M. Coladangelo, R. Giuliano, and E. W. De Luca,
“Predicting Employee Attrition Using Machine Learning
Techniques”, Computers, vol. 9, no. 4, pp. 1–17, 2020, doi:
10.3390/computers9040086.
[12] R. Yedida, R. Reddy, R. Vahi, R. Jana, A. GV, and D. Kulkarni,
“Employee Attrition Prediction”, arXiv, 2018. doi:
10.48550/ARXIV.1806.10480.
[13] S. Al-Darraji, D. G. Honi, F. Fallucchi, A. I. Abdulsada, R.
Giuliano, and H. A. Abdulmalik, “Employee Attrition Prediction
Using Deep Neural Networks Salah”, Computers, vol. 10, no. 11,
pp. 1–11, 2021, doi: 10.3390/computers10110141.
[14] S. Gupta, “Employee Attrition Rate Prediction Using Machine
Learning”, Code Algorithms Pvt. Ltd, 2022.
https://www.enjoyalgorithms.com/blog/attrition-rate-prediction-
using-ml
[15] F. K. Alsheref, I. E. Fattoh, and W. M Ead, “Automated Prediction
of Employee Attrition Using Ensemble Model Based on Machine
Learning Algorithms”, Comput. Intell. Neurosci., vol. 2022, p.
7728668, 2022, doi: 10.1155/2022/7728668.
[16] S. Aggarwal, M. Singh, S. Chauhan, M. Sharma, and D. Jain,
“Employee Attrition Prediction Using Machine Learning
Comparative Study”, Smart Innov. Syst. Technol., vol. 265, no. 9,
pp. 453–466, 2022, doi: 10.1007/978-981-16-6482-3_45.
[17] D. R. S. Kamath, D. S. S. Jamsandekar, and D. P. G. Naik,
“Machine Learning Approach for Employee Attrition Analysis”, Int.
J. Trend Sci. Res. Dev., vol. Special Is, no. Special Issue-
FIIIIPM2019, pp. 62–67, 2019, doi: 10.31142/ijtsrd23065.
[18] M. Subhashini and R. Gopinath, “Employee Attrition Prediction in
Industry Using Machine Learning Techniques”, Int. J. Adv. Res.
9
Authorized licensed use limited to: Dr. D. Y. Patil Educational Complex Akurdi. Downloaded on July 23,2024 at 05:38:42 UTC from IEEE Xplore. Restrictions apply.