0% found this document useful (0 votes)
17 views

DATA4800 Report

Uploaded by

Liza Sengupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

DATA4800 Report

Uploaded by

Liza Sengupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Machine Learning/AI for a Business Problem

Student Name : Anil Kumar


Kaplan Business School
https://docs.google.com/document/d/1i2WFImiJEOsHlrmflBVA1Js5KjP0QwuKti3L8z0_xL0/edit?usp=sharing

that are unnoticed by individuals scrutinizing the data. It


helps to identify possible risks of attrition thus enabling the
Abstract HR departments to intervene before damage is done (Wang
This report aims to study and reduce the employee and Zhi, 2021). AI maximizes efficiency with minimum
churn rate using people analytics. The goal is to forecast efforts and makes it possible to employ appropriate
the probability of an active member’s attrition and strategies to increase retention. Furthermore, AI can
determine attributes driving the turnover. This is a improve its prediction capabilities by learning from new
supervised classification problem that tends to identify data. (Chakraborty et al., 2021)
patterns and build predictive models with the help of
machine learning techniques. The data provides useful
2. Problem Statement
recommendations to increase staff retention. In this report,
we are going to present raw data changes, preprocessing, High employee turnover is one of the main issues observed
data exploration, model selection.(Lazzari, Alvarez and in organizations. This process refers to the level of turnover
Ruggieri, 2022) in an organization, in so far as employees quit their jobs
and are replaced by new personnel. Typically, high turnover
is disadvantageous to an organization as it increases costs
1. Introduction of hiring and training, staff attrition, and interruption of
productivity. The knowledge of reasons why employees
1.1. About Organisation leave organizations is crucial in the formulation and
This organization is a division of IBM that deals with implementation of the retention strategies. Some of the
software solutions. It is involved with a number of possible reasons for turnover are job satisfaction, pay
activities such as Cloud, Artificial Intelligence and structure promotion, working hours flexibility. (Anh et al.,
analytics. It has employees from all around the world and it 2020)
is important to manage human resources to guarantee
success. 3. Methodology
Predictive analytics can be a valuable tool to identify
1.2. Current Operations employees who may be thinking about quitting and analyze
the root of their discontent. Through analysis of
As previously mentioned, the activities of IBM include demographics of employees, roles, performance indices,
cloud computing, artificial intelligence, and analytics. The and other related factors, organizational leaders can
establishment has numerous employees from around the develop ways of improving engagement and retention of
world, making it necessary to have proper human resource employees. This paper tends to apply machine learning
management. However, the company still employs techniques using ODM to extract insights of employee
orthodox practices in the HR department, which can be turnover to determine the probability of active employees
inefficient and inaccurate. The absence of the ability to leaving the company. The ultimate aim is to build strategies
make employee turnover forecasts contributed to escalating to maximize staff retention.(El-Rayes et al., 2020)
costs and unhappy and less productive employees. (Bahadır
et al., 2021)
3.1. Data Source
1.3. Application of AI The dataset used in this case study is publicly available on
Kaggle. This dataset contains information about IBM HR
The application of artificial intelligence in this organization Analytics Employee Attrition & Performance. There are
would transform how employee turnover is handled. At the numerous variables in the dataset relative to employees’
moment, the company seems to use traditional manual age, seniority, gender, occupation, salary, and productivity.
approach that can be slow, and mostly work on the basis of The target variable is ‘Attrition’, a binary variable where
responding to problems as they occur. AI can analyze a ‘Yes’ means employee has left and ‘No’ means employee is
large amount of data within a short time and derive factors still engaged with the company.

1
3.2. Orange Configuration

Fig 3: Department
Fig 1 : Orange WorkFlow
The department with the most employees is the Research &
Development, though it has a moderate level of attrition.
3.3. Exploratory Data Analysis (EDA) Attrition rate is higher in the Sales department implying
that there may be problems in retaining employees within
EDA involves summarizing and visualizing the main this department. The Human Resources department has the
characteristics of the dataset. However, before performing highest retention rate, and this means that employees in this
EDA, you need to perform Data Preprocessing. These steps department are likely to remain in the company.
are essential to ensure the quality and usability of the
dataset. It involves handling missing values, encoding
categorical variables, and scaling numeric features.

Our dataset is quite clean, thus it doesn’t contain any


missing values. However, it contains some categorical
features that need to be handled and data needs to be
scaled. After that we perform Exploratory Data Analysis.

Fig 4: Education

The above graph depicts that employees with level 3


education i.e., mid level education are more likely to
turnover than those with lower or higher education.

Fig 2: Business Travel

Majority of the employees fall under the Travel-Rarely


category and have an 14.96% attrition rate. The other two
categories include Travel-Frequently and Non-Travel with
their attrition rates being 24.91% and 8% respectively.

2
Fig 6 : Random Forest Configuration

3.5. Model 2 : Gradient Boosting

This model has a decent accuracy on training data with all


metrics at 1.000 which indicates that the model fits the
training data very well. However, there is a significant
difference in the model’s accuracy on training data and test
data. The AUC and MCC are 0.781 and 0.478 respectively
for the testing data. (Wang and Zhi, 2021)

Fig 5: Years with Current Manager

The above graph reveals that employees with less than two
years with the current manager show a higher attrition rate
implying that the duration with the current manager
influences their decision to leave the organization. The
attrition rate decreased for employees who work with the
current manager for a middle to long time: 3-9+ years.

3.4. Model 1 : Random Forest

This model performs exceptionally well on training data


with AUC, CA, F1, Precision, Recall, and MCC close to
1.000, indicating that it can capture complex patterns
effectively.

However, its performance declines on test data with AUC Fig 7: Gradient Boosting Configuration
at 0.759, CA at 0.861, and MCC at 0.345, which indicates
that the model have overfit. Low MCC value on test data
indicates that there is a poor balance between true and false
positive rates. (Pratt, Boudhane and Cakula, 2021) 3.6. Model 3 : Neural Network

Neural Network performs well on both training and testing


data. The AUC, CA, F1 and MCC for training data are
0.938, 0.926, 0.920, and 0.704. For testing data, these are
0.817, 0.871, 0.858, and 0.447. This indicates that it has
identified the underlying patterns effectively without
getting affected by noise. Consistent metrics suggest a
balanced model with less overfitting. (Wang and Zhi,
2021)

3
Networks, and Logistic Regression. The metrics for these
on training and testing data are as follows:

Training Data

Metric Random Gradient Neural Logistic


Forest Boosting Network Regression

AUC 1.000 1.000 0.938 0.838

CA 0.981 1.000 0.926 0.874

F1 Score 0.981 1.000 0.920 0.850

Precision 0.982 1.000 0.924 0.865

Recall 0.981 1.000 0.926 0.874

MCC 0.930 1.000 0.704 0.438

Test Data
Metric Random Gradient Neural Logistic
Forest Boosting Network Regression
Fig 8 : Neural Network Configuration
AUC 0.817 0.781 0.817 0.847
3.7. Model 4 : Logistic Regression
CA 0.864 0.878 0.871 0.864
Logistic Regression model has a stable performance on
training and test data. The AUC, CA, and MCC values for F1 Score 0.830 0.865 0.858 0.839
training data are 0.838, 0.874, and 0.438 respectively. For
testing data, these are 0.847, 0.864, and 0.376. This Precision 0.852 0.865 0.856 0.846
indicates that the model has generalized the patterns
effectively without overfitting. (Chakraborty et al., 2021) Recall 0.864 0.878 0.871 0.864

MCC 0.358 0.478 0.447 0.376

Fig 9 : Logistic Regression Configuration

3.8. Performance Summary Fig 10: ROC Curve

We implemented Random Forest, Gradient Boost, Neural

4
3.8.1. Confusion Matrix

Fig 11: Random Forest Fig 14: Logistic Regression

According to the performance measures on test set,


Gradient Boosting model proves to be the best one because
of better performance in CA, F1, Recall, Precision and
MCC. It achieves a fairly good trade off between precision
and recall.

3.9. Analysis and Recommendations

Gradient Boosting model proves to be the best model for


predicting employee turnover due to following reasons:

● Balanced Performance: The analysis of Training


and Test Data Accuracy indicates that the Gradient
Boosting performance is strong and consistent.
The performance values are significant and signify
a high CA, F1, Precision, Recall and MCC while
Fig 12 : Gradient Boosting also indicating low overfitting.
● Effective Pattern Recognition: This model is
applicable in cases where there are numerous
factors contributing to employee turnover and
have a complex relationship.
● Robustness to Noise: We can see that the
Gradient Boosting model is less sensitive to noise
and hence a good tool to predict employee
attrition.
● Overall Utility: Due to the higher CA in
comparison with other classified models, the
model can separate the group of employees who
may be potential leavers from those who will not
leave. The HR department can then streamline
retention of individuals to a certain category,
thereby increasing the effectiveness of retention
strategies.
10. Development Considerations for Model
Fig 13: Neural Network
While developing an AI-driven employee churn prediction
model, you need to consider the following:

● Integration with HR Systems: Ensure that the


other Human Resource applications being used are

5
compatible with the model. 4. References
● Real-Time Predictions: Employ real time 1. Anh, N.T., Tu, N.D., Solanki, V.K., Giang, N.L.,
processing for prepared information for up to date Thu, V.H., Son, L.N., Loc, N.D. and Nam, V.T.,
predictions. 2020. Integrating employee value model with
● Scalability: Ensure that the tool operates with churn prediction. International Journal of Sensors
large data sets and can scale based on the growth Wireless Communications and Control, 10(4),
of the organization. pp.484-493.
● Data Security and Privacy: Comply with the 2. Bahadır, M.B., Bayrak, A.T., Yücetürk, G. and
proper legal actions to protect the employees by Ergun, P., 2021. A comparative study for
properly performing measures to retain their employee churn prediction. In: 2021 29th Signal
information. Processing and Communications Applications
● Monitoring and Maintenance: Occasionally, go Conference (SIU).
back to the model and nourish it with new data, 3. Pratt, M., Boudhane, M. and Cakula, S., 2021.
thus confirming the accuracy and suitability of the Employee attrition estimation using random forest
model. algorithm. Baltic Journal of Modern Computing,
3.11. Business Improvement 9(1), pp.49-66.
4. Chakraborty, R., Mridha, K., Shaw, R.N. and
Utilizing an AI-driven Employee-Churn predictive system Ghosh, A., 2021. Study and prediction analysis of
that can provide the following advantages. the employee turnover using machine learning
approaches. In: 2021 IEEE 4th International
● Reduced TurnOver Cost: A company can cut at Conference on Computing, Power and
the expenses linked with the number of recruits Communication Technologies (GUCON). IEEE
that are inclined to focus on churn and decrease it. 5. El-Rayes, N., Fang, M., Smith, M. and Taylor,
Using an AI based technology, the particular HR S.M., 2020. Predicting employee attrition using
department can get a list of those employees who tree-based models. International Journal of
can potentially be a part of ‘the turnover’ and act Organizational Analysis, 28(6), pp.1273-1291.
to prevent it before it actually happens. 6. Wang, X. and Zhi, J., 2021. A machine
● Improved Satisfaction: The HR department may learning-based analytical framework for employee
apply selective incentive measures suitable for turnover prediction. Journal of Management
capturing the workers that might leave the Analytics, 8(3), pp.351-370.
organization. This means that by outlining the 7. Lazzari, M., Alvarez, J.M. and Ruggieri, S., 2022.
essential parameters for the churn of an employee Predicting and explaining employee turnover
and offering the suitable packages to the intention. International Journal of Data Science
employee, the company can decrease employee and Analytics, 14(3), pp.279-292.
attrition.
● Enhanced Productivity: If the HR department
remains successful in retaining experienced
employees in the company, this would guarantee
constant production value as well as maintaining
important expertise in-house.

We can evaluate the expected profits or ROI. The


organization gains substantial monetary benefits with the
help of a predictive model that centers on minimizing
employee turnover. For the 500 employees, a decrease in
the turnover by 5% means that 25 employees will not quit
their jobs every year, and approximately $500,000 will be
cut from the expenses linked to the employment of new
employees and training. At $200K investment for the
development and deployment of the model, net savings are
computed to $300,000 the first year, meaning a ROI of
150%. After the first year it maintains the $500,000 annual
savings, reduces staff turnover – thus increasing the quality
of work and level of satisfaction of the key employees,
which makes it a very beneficial investment for the
company.

You might also like