0% found this document useful (0 votes)

15 views10 pages

Cdu 1121 09

Uploaded by

rakshithac10369

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views10 pages

Cdu 1121 09

Uploaded by

rakshithac10369

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

VOLUME 26 Journal of Chengdu University of Technology

Different Approaches for Prediction of Employee Attrition

Rakshith. A.C*, Siddesha. S1, S K Niranjan2

*
Student, Dept. of Computer Applications, JSS Science and Technology
University, Mysuru
1
Assistant Professor, Dept. of Computer Applications, JSS Science and
Technology University, Mysuru
2
Professor, Dept. of Computer Applications, JSS Science and Technology
University, Mysuru

*
[email protected], [email protected], [email protected]

Abstract—In this work, we propose different approaches for prediction of employee

attrition. We used different machine learning models like Decision tree, K-NN,
Random forest, eXtreme gradient boosting, Naïve Baye’s and Support Vector
Classifier. The experimentation is conducted on two different data sets, generic and
data scientist employee. Among these, random forest model gave good results of
98% and 86% for generic employee and data scientist dataset respectively.

Keywords: prediction, employee attrition, class imbalance, data cleaning, feature selection,
classification

1. Introduction
Employees make up a crucial part of any organization. They are the key asset who are
responsible for the growth of an organization. Without employees, projects cannot be
scaled up nor can be completed within the deadlines. Therefore, these days companies are
investing more in their employees by providing them with a good working atmosphere
and benefits like insurance, paid leaves, transport, etc.

In the present world, many organizations are facing a serious problem referred to as
employee attrition: attrition is described as a worker retiring/resigning from a firm. There
are various factors influencing employees to quit the company, such as work pressure,
low growth, and job satisfaction. According to a survey, 31% out of 1000 employees
reported quitting the job in their very first six months in the company [1].

Reports indicate over 50% of the companies globally are facing issues with employee
retention. Corporates spend a lot of resources and time recruiting and training new
employees. Replacing the experienced workforce with freshers is an expensive and long
process. According to the data available on the internet, it requires around 1-2 years of
time for a newly hired employee to match the speed, productivity, and knowledge level of
an existing employee. And if this problem continues to grow, it will result in knowledge
loss and also a financial loss for the companies to a great extent over time [2]. 1

*
Rakshith.A.C
Student, Dept. of Computer Applications
JSS Science and Technology University, Mysuru, India
[email protected]

ISSN-NO- 1671-9727 ISSUE 11 2021

VOLUME 26 Journal of Chengdu University of Technology

In order to overcome the employee attrition problem, employers in recent times have
started estimating the employees who are likely to resign, which thereby helps to reduce
the non- scheduled staff replacement costs. By predicting employee attrition at an early
stage, corporates can start hiring new candidates in advance that will help them by coping
with project deadlines [3].

For this work, multiple Machine Learning models: Naive Bayes, Random Forest, Support
Vector Classifier (SVC), k-Nearest Neighbours (kNN), Decision Trees, and eXtreme
Gradient Boosting (XGB) algorithms are used to predict the employee attrition by
classifying the employee data into two categories: Employee will stay and Employee will
exit.

2. Literature Survey

A study was carried out using monthly reports of 3,638 software developers over time as
the dataset, from two IT companies. Three datasets were considered to carry out the
experiment, the first being the dataset of the corporate C1, the C2 being the dataset of
second company, and the third is the combination of both the company reports into a
single dataset. 67 features along six dimensions from the employee’s initial six-month
report data were used. The six dimensions include total hours worked per month,
complete statistics of hours worked and projects done, task report statistics, readability of
task report and project statistics of each month. According to this, the main reason the
developers were leaving the job is because of the heavy workload and tight deadlines.
Random Forest classifier gave good result [3].

Another work reported using dataset provided by IBM analytics, which consists of around
1500 samples and 35 features. The dataset was imbalanced with 84% of the workers
staying back and 16% leaving the job. In his research, he found "Monthly Income" as the
major factor that led to workers quitting the job. Gaussian Naive Bayes, Naive Bayes
classifier for multivariate Bernoulli models, Logistic Regression classifier, kNN, Decision
tree classifier, RFC, SVM classification and LSVM were used. The Gaussian Naive
Bayes algorithm gave the best result [2].

A new ERP (Employee Resignation Predictor) approach was reported, where the data of
publicly available professional profiles on LinkedIn have been mined and later the
features are extracted. By the use of data mining, the crawler has mined 1,20,000 expert
profiles. 11 features were selected. Three classification algorithms: Decision Trees, Back
Propagation, and Self Organizing Maps were used for evaluating the ERP approach. The
work demonstrated Decision Trees algorithm outperformed compared to Back
Propagation and Self Organizing Maps algorithms with a good accuracy [4].

Various data pre-processing techniques were discussed, which are focused on outliers
detection, handling missing values ( Ignoring instances, mean substitution, hot-deck
imputation, and more), discretization, data normalization ( min-max and z-score
normalization), feature selection (based on: distance, information, dependence,
consistency; based on selection movement: Sequential backward floating selection
(SBFS) and Sequential forward floating selection (SFFS)), and feature construction that is
performed by the GALA algorithm [5].

A study has described several techniques for handling the imbalance datasets. The work
discusses the previously proposed solutions for class-imbalance both at the algorithmic
and data levels. At the data level, random under-sampling and random oversampling are
non- heuristic methods, where random oversampling balances the class through random

ISSN-NO- 1671-9727 ISSUE 11 2021

VOLUME 26 Journal of Chengdu University of Technology

duplication of minority class, and random undersampling does this by eliminating random
examples of the majority class. At the algorithmic level, the solutions include threshold
method, one-class learning, and cost- sensitive learning that requires defining fixed and
unequal misclassification costs between the classes [6].

Different variants of Decision trees algorithm were used for the work. The dataset used
consisted of 309 worker records of one of the reputed companies in Nigeria. 9 features
were extracted from the records out of which only 6 features (Sex, State of Origin, Length
of Service, Rank, Salary, and Reason for leaving) were used for modelling. Two tools
were used for experimenting, the first being the WEKA tool (Waikato Environment for
Knowledge Analysis) which was developed in New Zealand by the University of
Waikato. WEKA tool provides a proper suite for performing machine learning that
includes visualization tools and ML models for predictive modelling and data analysis
along with a GUI. The second tool was the See5 which was used for the discovery of
patterns in the data. The classifiers C4.5 (J48), REPTree, and CART were used. Among
these, the See5 Decision Tree gave favourable result compared to other algorithms, and
the attributes that contributed more to employees decision to leave the organization were
the Salary and Length [7].

3. Dataset
In this work, we used two datasets, Generic Employee dataset, and the HR Analytics: Job
change of Data Scientists dataset. Both datasets contain historical data of employees that
are in 2 different categories: Employee will stay, and Employee will leave.

3.1. Generic Employee

This is collected from GitHub. The Generic Employee dataset comprises 14,999 samples
along with 11 (9+2) features [8].

Table 1. Generic employee dataset, Set 1

empl number average time_ work_ left Promot depar Salary

oyee _project _monthl spend_ accident Ion last tment
_id y_hours company _5years

1003 1003 157 3 0 1 0 sales low

1005 5 262 6 0 1 0 sales medium
1486 7 272 4 0 1 0 sales Medium
1038 5 223 5 0 1 0 sales Low
1057 2 159 3 0 1 0 sales Low

Table 2. Generic employee dataset, Set 2

EMPLOYEE# satisfaction_level last_evaluation

1003 0.38 0.53
1005 0.8 0.86
1486 0.11 0.88
1038 0.72 0.87
1057 0.37 0.52

ISSN-NO- 1671-9727 ISSUE 11 2021

VOLUME 26 Journal of Chengdu University of Technology

3.2. Data Scientist

The HR Analytics: Job change of Data Scientists dataset is available on Kaggle. It has
19,159 entries and 14 parameters [9].

Table 3. Data Scientist dataset

enrole city City development gender Relevant Enrolled education

e_id _index experience university _level
8949 city_103 0.92 Has relevent no_enrollme
Male experience nt Graduate
29725 city_40 0.776 No relevent no_enrollme
Male experience nt Graduate
11561 city_21 0.624 No relevent Full time
experience course Graduate
33241 city_115 0.789 No relevent
experience Graduate
666 city_162 0.767 Has relevent no_enrollme
Male experience nt Masters

major_discipline experi company company last_new training_ target

ence _size _type _job hours

STEM >20 1 36 1
STEM 15 50-99 Pvt Ltd >4 47 0
STEM 5 never 83 0
Business Degree <1 Pvt Ltd never 52 1
Funded
STEM >20 50-99 Startup 4 8 0

4. Proposed System

In this work, we propose one of the new ways to handle employee attrition problem, and
it deals with the limitations of the existing system and is trained and evaluated on larger
data. Two different datasets are used to perform experiments. One experiment is done on
the generic employee departments such as sales, accounting, support, etc, and the other is
performed only on the data scientists. Employee satisfaction is considered since it is one
of the key factors that determine whether an employee will depart the
company/organization or not.
Multiple classifiers: kNN, Decision Trees, SVC, Random Forest, Naïve Bayes, and
XGBoost are used.

Exploratory Data
Data Preprocessing Feature Selection
Analysis (EDA)

Prediction of Handling Class Preparing the

Employee Attrition Imbalance dataset

ISSN-NO- 1671-9727 ISSUE 11 2021

VOLUME 26 Journal of Chengdu University of Technology

Figure 1. Proposed System workflow

4.1. Data Preprocessing

The Generic Employee dataset is available in two different sets with one having 9
attributes and the other having 2. In this phase, the separate sets are combined using
set_index() and join() functions, and the features satisfaction_level and last_evaluation
that had null values are filled with the mean of that feature. Two Attempts are made on
the Data Scientist dataset, in Attempt 1 the missing values are filled with the mean of the
respective features, and all the rows with even a single missing value are deleted in
Attempt 2.

4.2. Exploratory Data Analysis (EDA)

In this step, data is explored in-depth, both statistically and visually. The datasets are
checked for any missing, class imbalance, categorical features. Visualization libraries
Matplotlib and Seaborn are used to visualize the data in many ways. Plots such as
correlation plot are used for feature selection and the pairplot technique is a grid of
scatterplots that plots pairwise bivariate distributions which show the relationship for (n,
2) combination of variables as a matrix of plots and the diagonal graphs are the univariate
plots.

4.3. Feature Selection

Features that are not relevant or partly relevant can impact negatively on the model's
performance, hence feature selection is done to select only those features which contribute
the most to the output variable. Three feature selection methods Correlation Matrix with
Heatmap, Feature Importance, and Univariate feature selection techniques are employed.
However, only the SelectKBest which is a Univariate feature selection method is
considered due to its accuracy. This method scores the features by combining statistical
tests and selecting the k-number of features concerning the results obtained between X
and y. Additionally, this method has a built-in Chi-Square (chi2) test. The employee_id
feature of the Generic employee dataset, and 3 features gender, company_size and
enrolled_id of the Data Scientist dataset are deleted since they are of no use in prediction
[10].

4.4. Preparing the dataset

At this stage, the dataset is prepared to feed to the classification algorithms. Firstly, the
categorical variables are converted to numerical features. The Generic employee dataset
had 2 features in the categorical form which are converted to numerical using the
LabelEncoder() function, while the Data Scientist dataset had 10, and are transformed
manually by passing them through user-defined methods. Later, the datasets are split into
training and testing sets using the train_test_split() function. For a few models such as
kNN and SVC, data normalization is necessary since they work on the basis of distance

ISSN-NO- 1671-9727 ISSUE 11 2021

VOLUME 26 Journal of Chengdu University of Technology

between the data points, and if the distance is large, it may lead to inaccuracies. The
StandardScalar() function is used to scale down the distance between the data points by
bringing them within a smaller range.

4.5. Handling Class Imbalance

Class imbalance is a problem where the distribution of samples throughout the classes is
unequal. It is described in terms of a ratio, where a slight class imbalance of 4:6 is
negligible, whereas a severe imbalance in class with a ratio of 1:3, 1:5, or 1:100 is not
negligible as it leads to a bias in models towards the class which has more samples
(majority class), hence it is supposed to be treated using some resampling techniques.
Two methods of imblearn library, Random over-sampling and NearMiss undersampling
are employed to handle class imbalance. The RandomOverSampler() selects random
samples of the majority class and adds them to the minority class to balance the dataset,
while the NearMiss() undersampling method selects the entries to keep. This is an
efficient method since it keeps the essential samples and deletes the less important records
[11].

4.6. Prediction of Employee Attrition

The six classification algorithms Decision trees, k-Nearest Neighbors(kNN), Random

forest, eXtreme gradient boosting (XGB), Naïve Baye's and Support Vector Classifier
(SVC) are trained on both the datasets and are tested using separate evaluation metrics for
balanced and imbalanced sets.

5. Experimentation and Discussion

Two different datasets are experimented with six classification algorithms in three ways
when the dataset is imbalanced, the dataset is balanced with oversampling and
undersampling. The ML models are then trained and tested for their performance.
Hyperparameter tuning is done for three models used on the Data Scientist dataset.

Two metrics are used for evaluation, the F1 score and Accuracy. The F1 score is used to
measure the performance of the models fed with the imbalanced dataset [12]. It is also
called the harmonic mean of recall and precision since it internally uses precision and
recall metrics. Accuracy score is used to measure the models performance which is fed
with the dataset having equal class distribution. Accuracy is the measure of all accurately
classified cases [13].

Table 4. Results of Generic Employee

SL Algorithms Imbalanced Oversampled Undersampled

NO Dataset Dataset Dataset
(F1 Score) (Accuracy) (Accuracy)
1 Decision Tress 95 98 92
2 Naïve Bayes 61 62 74

ISSN-NO- 1671-9727 ISSUE 11 2021

VOLUME 26 Journal of Chengdu University of Technology

3 Random Forest 98 99 98
4 SVC 92 96 93
5 XGBoost 95 97 96
6 kNN 94 97 93

Figure 2. Results of Generic Employee

Table 5. Results of Data Scientist on Attempt 1

SL Algorithms Imbalanced Oversampled Undersampled

NO Dataset Dataset Dataset
(F1 Score) (Accuracy) (Accuracy)
1 Decision Tress 44 71 50
2 Naïve Bayes 50 69 52
3 Random Forest 53 78 53
4 SVC 45 74 50
5 XGBoost 53 77 53
6 kNN 41 72 59

ISSN-NO- 1671-9727 ISSUE 11 2021

VOLUME 26 Journal of Chengdu University of Technology

Figure 3. Results of Data Scientist on Attempt 1

Table 6. Results of Data Scientist on Attempt 2

SL Algorithms Imbalanced Oversampled Undersampled

NO Dataset Dataset Dataset
(F1 Score) (Accuracy) (Accuracy)
1 Decision Tress 33 77 36
2 Naïve Bayes 53 73 44
3 Random Forest 42 86 36
4 SVC 47 85 43
5 XGBoost 52 83 36
6 kNN 44 79 53

ISSN-NO- 1671-9727 ISSUE 11 2021

VOLUME 26 Journal of Chengdu University of Technology

Figure 4. Results of Data Scientist on Attempt 2

In the experiments performed, except Naive Bayes, all the models performed well with
both the balanced and imbalanced datasets. Random Forest gave the highest accuracy in
every case.

In Generic Employee dataset, all the models excluding Naive Bayes have an average
accuracy/F1 score of 94 in all three cases of imbalanced dataset, oversampling, and
undersampling. Therefore, there was no need for hyperparameter tuning. Even the dataset
was clean with minimal missing values, due to which the models are able to be highly
accurate. The best performing classifier on this dataset is Random Forest.

In contrast, hyperparameter tuning was required to increase the accuracy of the models
used on Data Scientist dataset. Two attempts, Attempt 1 and Attempt 2 are made to
increase the accuracy of the models. In both the attempts, the models are able to perform
better only on the oversampled dataset. In Attempt 1, the models having the highest
accuracies are Random Forest with an accuracy of 78% and XGBoost with an accuracy of
77%. Hyperparameter tuning is done to SVC, Random Forest, and XGBoost. However,
after the hyperparameter tuning, XGBoost model overfitted, and SVC's accuracy
increased 2% from 74% to 76%. The Random Forest classifier overfitted and had an
accuracy of 78% before the hyperparameter tuning is done, and after the hyperparameter
tuning the model did not overfit and gave the best result. The hyperparameters used for
the Random Forest model are criterion set to 'gini', n_estimators of 500, and max_depth of
7.

The Data Scientist dataset had too many missing values, due to which the models were
not able to perform well despite hyperparameter tuning. Hence, another attempt (Attempt
2) is made on the Data Scientist dataset to increase the accuracy of the models by deleting
all the rows with even a single missing value, and the results came out positive. This time,
the hyperparameter tuned Random Forest and SVC models gave out the accuracies 86%
and 85% respectively, followed by the vanilla XGBoost classifier with 83% accuracy.
Therefore, the models performance is dependent on the dataset, and having a good dataset
can increase the performance of the algorithms.

6. Conclusion

The application of machine learning is unfolding rapidly in various domains, thanks to the
availability of large data and growth in data science that has helped in making accurate
data-driven decisions that are objective.

In this application of machine learning for predicting employee attrition rate, the Random
Forest classifier performed the best on both the balanced and imbalanced datasets due to
its ability to create multiple trees and select the best one, while the least result was given
by the Naive Bayes model. The algorithms gave out the best results on oversampled
datasets. Therefore, apart from having good classification algorithms and hyperparameter
tuning, it is equally important to have a clean dataset that would otherwise lead to poor
model performance.

ISSN-NO- 1671-9727 ISSUE 11 2021

VOLUME 26 Journal of Chengdu University of Technology

7. References
[1] 20 Surprising Employee Retention Statistics You Need to Know, https://blog.bonus.ly/surprising-
employee-retention-statistics.
[2] Fallucchi, Francesca, Marco Coladangelo, Romeo Giuliano, and Ernesto William De Luca. "Predicting
Employee Attrition Using Machine Learning Techniques." Computers 9, no. 4 (2020): 86.
[3] Bao,Lingfeng,ZhenchangXing,XinXia,DavidLo,andShanpingLi."Whowill leave the company?: a large-
scale industry study of developer turnover by mining monthly work report." In 2017 IEEE/ACM 14th
International Conference on Mining Software Repositories (MSR), pp. 170-181. IEEE, 2017.
[4] de Jesus, Ana Carolina C., Márcio Enio GD Júnior, and Wladmir C. Brandao. "Exploiting linkedin to
predict employee resignation likelihood." In Proceedings of the 33rd Annual ACM Symposium on
Applied Computing, pp. 1764-1771. 2018.
[5] Kotsiantis,SotirisB.,DimitrisKanellopoulos,andPanagiotisE.Pintelas."Data preprocessing for supervised
leaning." International Journal of ComputerScience 1, no. 2 (2006): 111-117.
[6] Kotsiantis, Sotiris, Dimitris Kanellopoulos, and Panayiotis Pintelas. "Handling imbalanced datasets: A
review." GESTS International Transactions on Computer Science and Engineering 30, no. 1 (2006): 25-
36.
[7] Alao, D. A. B. A., and A. B. Adeyemo. "Analyzing employee attrition using decision tree algorithms."
Computing, Information Systems, Development Informatics and Allied Research Journal 4, no. 1 (2013):
17-28.
[8] Dataset 1: GenericEmployee,
https://github.com/pydeveloperashish/Predicting-which-of-your-Employee-will-Quit-your-Company-
Data-Science-Project
https://www.kaggle.com/arvindbhatt/hrcsv.
[9] Dataset 2: DataScientist, https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists.
[10] Feature Selection For Machine Learning in Python,
https://machinelearningmastery.com/feature-selection-machine-learning- python.
[11] 10 Techniques to deal with Imbalanced Classes in Machine Learning,
https://www.analyticsvidhya.com/blog/2020/07/10-techniques-to-deal-with-class-imbalance-in-machine-
learning.
[12] F1 Score – Classification Error Metric, https://www.journaldev.com/45165/f1-score-in-python.
[13] Accuracy vs F1 – Score, https://medium.com/analytics-vidhya/accuracy-vs-f1-score-6258237beca2.

ISSN-NO- 1671-9727 ISSUE 11 2021

Weekly Quiz 2 Boosting Ensemble Techniques and Model Tuning Great Learning PDF
100% (2)
Weekly Quiz 2 Boosting Ensemble Techniques and Model Tuning Great Learning PDF
8 pages
Final Capstone Project Report
100% (1)
Final Capstone Project Report
35 pages
What Are The Types of Machine Learning?
100% (1)
What Are The Types of Machine Learning?
24 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Xgboost PDF
100% (1)
Xgboost PDF
128 pages
Prediction of Employee Attrition PDF
0% (1)
Prediction of Employee Attrition PDF
7 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
Employee Attrition Prediction
100% (1)
Employee Attrition Prediction
21 pages
Towards Understanding Employee Attrition Using Decision Tree
100% (1)
Towards Understanding Employee Attrition Using Decision Tree
4 pages
Employee Turnover Prediction
100% (1)
Employee Turnover Prediction
16 pages
Early Prediction of Employee Attrition Using Data Mining Techniques
No ratings yet
Early Prediction of Employee Attrition Using Data Mining Techniques
6 pages
Machine Learning Algorithms, Real World Applications and Research
No ratings yet
Machine Learning Algorithms, Real World Applications and Research
21 pages
Human Retention Using Data Science
No ratings yet
Human Retention Using Data Science
16 pages
Summer Internship Report
No ratings yet
Summer Internship Report
24 pages
18 Intellisys Employee
No ratings yet
18 Intellisys Employee
22 pages
Evaluating Employee Attrition - Design and Implementation
No ratings yet
Evaluating Employee Attrition - Design and Implementation
10 pages
Predicting Employee Attrition Along With Identifying High Risk Employees Using Big Data and Machine Learning
No ratings yet
Predicting Employee Attrition Along With Identifying High Risk Employees Using Big Data and Machine Learning
8 pages
Reportprediction of Employee Atrition Uisng Machine Learning
No ratings yet
Reportprediction of Employee Atrition Uisng Machine Learning
6 pages
IBM Analysis
No ratings yet
IBM Analysis
17 pages
Predicting Employee Attrition Using XGBoost Machine Learning
No ratings yet
Predicting Employee Attrition Using XGBoost Machine Learning
8 pages
Problem Statement:: Field Characteristics Data Type
No ratings yet
Problem Statement:: Field Characteristics Data Type
4 pages
Data Mining
No ratings yet
Data Mining
17 pages
Leakage Identification in Water Distribution Networks Based On Xgboost Algorithm
No ratings yet
Leakage Identification in Water Distribution Networks Based On Xgboost Algorithm
13 pages
Applsci 12 06424
No ratings yet
Applsci 12 06424
17 pages
Evaluation of Machine Learning Models For Employee Churn
No ratings yet
Evaluation of Machine Learning Models For Employee Churn
5 pages
Credit Risk Modeling Using Python
No ratings yet
Credit Risk Modeling Using Python
133 pages
Project Report
No ratings yet
Project Report
22 pages
Employee Future Prediction
No ratings yet
Employee Future Prediction
3 pages
Ibm Attrition Practices
No ratings yet
Ibm Attrition Practices
7 pages
House Price - Prediction
No ratings yet
House Price - Prediction
4 pages
Employee Attrition Classification
No ratings yet
Employee Attrition Classification
16 pages
Employee Turnover Prediction
No ratings yet
Employee Turnover Prediction
12 pages
1 s2.0 S0010482523009630 Main
No ratings yet
1 s2.0 S0010482523009630 Main
10 pages
Karpagam Sep Oct 2019 Article 6
No ratings yet
Karpagam Sep Oct 2019 Article 6
6 pages
Viva EDA
No ratings yet
Viva EDA
8 pages
Spam Identification On Facebook, Twitter and Email Using Machine Learning
No ratings yet
Spam Identification On Facebook, Twitter and Email Using Machine Learning
9 pages
Predict Employee Retention Using Data Sciene
No ratings yet
Predict Employee Retention Using Data Sciene
7 pages
Machine Learning Approaches For Fantasy League Team Prediction
No ratings yet
Machine Learning Approaches For Fantasy League Team Prediction
5 pages
A Novel Optimized Approach For Machine Learning Techniques For Predicting Employee Attrition
No ratings yet
A Novel Optimized Approach For Machine Learning Techniques For Predicting Employee Attrition
9 pages
Employee Attrition in HR Using ML Techniques
No ratings yet
Employee Attrition in HR Using ML Techniques
14 pages
Emloyee Attrition and Retention
No ratings yet
Emloyee Attrition and Retention
17 pages
Heart Disease Prediction With Machine Learning
No ratings yet
Heart Disease Prediction With Machine Learning
11 pages
IBM HR Analytics For Employee Attrition and Performance Prediction
No ratings yet
IBM HR Analytics For Employee Attrition and Performance Prediction
44 pages
ANLY 502 Final Report
No ratings yet
ANLY 502 Final Report
7 pages
Ai Cep Report
No ratings yet
Ai Cep Report
21 pages
DATA4800 Report
No ratings yet
DATA4800 Report
6 pages
Report
No ratings yet
Report
45 pages
Capstone Project Report v1 - Abhishek Bihani
No ratings yet
Capstone Project Report v1 - Abhishek Bihani
16 pages
Finalllllllllllll Report
No ratings yet
Finalllllllllllll Report
38 pages
Ataiml 02.04 04
No ratings yet
Ataiml 02.04 04
14 pages
Employee Attrition Analysis
No ratings yet
Employee Attrition Analysis
2 pages
Employee Attrition Prediction Using Machine Learning Models: A Review Paper
No ratings yet
Employee Attrition Prediction Using Machine Learning Models: A Review Paper
27 pages
HR Review1
No ratings yet
HR Review1
11 pages
Nordin 2023
No ratings yet
Nordin 2023
12 pages
5 Ieee
No ratings yet
5 Ieee
6 pages
Research Paper
No ratings yet
Research Paper
5 pages
Tentative Research Topic
No ratings yet
Tentative Research Topic
4 pages
Loan Default Prediction Using Machine Learning
No ratings yet
Loan Default Prediction Using Machine Learning
5 pages
Employee Turnover Prediction Project
No ratings yet
Employee Turnover Prediction Project
10 pages
Iot Based Building Energy Management System
No ratings yet
Iot Based Building Energy Management System
6 pages
Retention Is All You Need
No ratings yet
Retention Is All You Need
7 pages
Mathematics 11 04677
No ratings yet
Mathematics 11 04677
25 pages
Predictive Maintenance - Final Presentation
No ratings yet
Predictive Maintenance - Final Presentation
23 pages
10 1109@iadcc 2018 8692137
No ratings yet
10 1109@iadcc 2018 8692137
6 pages
Grokking Machine Learning Final Release 1st Edition Luis G. Serrano 2024 Scribd Download
No ratings yet
Grokking Machine Learning Final Release 1st Edition Luis G. Serrano 2024 Scribd Download
40 pages
Db15 Conference
No ratings yet
Db15 Conference
6 pages
ICML20 GRL Workshop
No ratings yet
ICML20 GRL Workshop
5 pages
Instant Access To Forecasting With Artificial Intelligence: Theory and Applications Mohsen Hamoudia Ebook Full Chapters
100% (3)
Instant Access To Forecasting With Artificial Intelligence: Theory and Applications Mohsen Hamoudia Ebook Full Chapters
66 pages
1 s2.0 S2772662224000651 Main
No ratings yet
1 s2.0 S2772662224000651 Main
17 pages
AIP - Aip 202501 0006
No ratings yet
AIP - Aip 202501 0006
16 pages
Employee Turnover1
No ratings yet
Employee Turnover1
4 pages
Batch 16
No ratings yet
Batch 16
8 pages
DL Unit-1
No ratings yet
DL Unit-1
10 pages
Attrition Project Mangal
No ratings yet
Attrition Project Mangal
75 pages
s8 - Detection of Malicious Social Bots - Project Report
No ratings yet
s8 - Detection of Malicious Social Bots - Project Report
58 pages
Employee Attrition Prediction Using Machine Learning
No ratings yet
Employee Attrition Prediction Using Machine Learning
9 pages
Borse Et Al 2024 Detecting Early Warning Signs of Employee Attrition Using Machine Learning Algorithms
No ratings yet
Borse Et Al 2024 Detecting Early Warning Signs of Employee Attrition Using Machine Learning Algorithms
9 pages
DOCUMENTATION12
No ratings yet
DOCUMENTATION12
42 pages
Demand Forecasting With Machine Learning
No ratings yet
Demand Forecasting With Machine Learning
26 pages
Gradient Boosting
No ratings yet
Gradient Boosting
39 pages
HR Analytics - Employee Attrition Analysis Using Random Forest
No ratings yet
HR Analytics - Employee Attrition Analysis Using Random Forest
7 pages
Applsci 13 00267
No ratings yet
Applsci 13 00267
8 pages
BTAIML10 Major Project Report
No ratings yet
BTAIML10 Major Project Report
25 pages
11783-Article Text-8048-2-10-20221004
No ratings yet
11783-Article Text-8048-2-10-20221004
9 pages
941-Article Text-9536-1-10-20240830
No ratings yet
941-Article Text-9536-1-10-20240830
12 pages
Enhancing Software Bug Prediction Using Catboost: Under The Esteemed Guidance of Mrs.Y.Swathi Assistant Professor
No ratings yet
Enhancing Software Bug Prediction Using Catboost: Under The Esteemed Guidance of Mrs.Y.Swathi Assistant Professor
20 pages
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
From Everand
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
Dr. GEETHA N DATA SCIENTIST, BENGALURU
No ratings yet
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
From Everand
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
Manish Soni
No ratings yet

Cdu 1121 09

Uploaded by

Cdu 1121 09

Uploaded by

VOLUME 26 Journal of Chengdu University of Technology

Different Approaches for Prediction of Employee Attrition

Rakshith. A.C*, Siddesha. S1, S K Niranjan2

Abstract—In this work, we propose different approaches for prediction of employee

ISSN-NO- 1671-9727 ISSUE 11 2021

ISSN-NO- 1671-9727 ISSUE 11 2021

3.1. Generic Employee

Table 1. Generic employee dataset, Set 1

empl number average time_ work_ left Promot depar Salary

1003 1003 157 3 0 1 0 sales low

Table 2. Generic employee dataset, Set 2

EMPLOYEE# satisfaction_level last_evaluation

ISSN-NO- 1671-9727 ISSUE 11 2021

3.2. Data Scientist

Table 3. Data Scientist dataset

enrole city City development gender Relevant Enrolled education

major_discipline experi company company last_new training_ target

Prediction of Handling Class Preparing the

ISSN-NO- 1671-9727 ISSUE 11 2021

Figure 1. Proposed System workflow

4.1. Data Preprocessing

4.2. Exploratory Data Analysis (EDA)

4.3. Feature Selection

4.4. Preparing the dataset

ISSN-NO- 1671-9727 ISSUE 11 2021

4.5. Handling Class Imbalance

4.6. Prediction of Employee Attrition

The six classification algorithms Decision trees, k-Nearest Neighbors(kNN), Random

5. Experimentation and Discussion

Table 4. Results of Generic Employee

SL Algorithms Imbalanced Oversampled Undersampled

ISSN-NO- 1671-9727 ISSUE 11 2021

Figure 2. Results of Generic Employee

Table 5. Results of Data Scientist on Attempt 1

SL Algorithms Imbalanced Oversampled Undersampled

ISSN-NO- 1671-9727 ISSUE 11 2021

Figure 3. Results of Data Scientist on Attempt 1

Table 6. Results of Data Scientist on Attempt 2

SL Algorithms Imbalanced Oversampled Undersampled

ISSN-NO- 1671-9727 ISSUE 11 2021

Figure 4. Results of Data Scientist on Attempt 2

ISSN-NO- 1671-9727 ISSUE 11 2021

ISSN-NO- 1671-9727 ISSUE 11 2021

You might also like