analyzing-ibm-hr-data-employee-attrition-and-performance-insights
analyzing-ibm-hr-data-employee-attrition-and-performance-insights
net/publication/383426416
CITATION READS
1 3,183
2 authors:
All content following this page was uploaded by Fatbardha Maloku on 26 August 2024.
Master of Science in Business Analytics, Ageno School of Business, Golden Gate University, San Francisco, California 94105, USA
ABSTRACT
Employee turnover is often perceived as detrimental to organizational efficiency, but is this always the case? This research explores the multifaceted factors
influencing employee attrition and examines whether lower turnover invariably leads to greater efficiency. Utilizing the "IBM HR Analytics Employee Attrition
and Performance" dataset, which includes variables such as employee age, department, education level, job satisfaction, gender, job role, marital status, and
overtime hours, we conduct a comprehensive descriptive analysis to predict employee retention. By understanding these factors, HR and management can
make informed decisions to mitigate attrition. This study aims to identify novel strategies to reduce employee turnover, providing actionable insights and
predictions to help IBM retain talent and maintain productivity and success.
*Corresponding author
Fatbardha Maloku, Master of Science in Business Analytics, Ageno School of Business, Golden Gate University, San Francisco, California 94105,
USA.
Received: August 09, 2024; Accepted: August 12, 2024; Published: August 25, 2024
Project Aim
The aim of this project is to analyze the factors influencing
employee attrition using the "IBM HR Analytics Employee
Attrition and Performance" dataset. By conducting a thorough
examination of variables such as employee demographics, job
characteristics, and work-related factors, the study seeks to identify
key determinants of turnover. Ultimately, the project aims to correlated factors that predict employee attrition rates within the
provide actionable insights and predictive models that can assist company. Additionally, ANOVA tests will be conducted across
IBM in implementing effective strategies to reduce employee multiple models to determine which model yields optimal results
turnover, enhance retention rates, and sustain organizational aligned with our research objectives.
productivity and success.
Descriptive Analysis
Data Collection The distribution of employee’s age is shown below.
The dataset used in this analysis was sourced from Kaggle.com
and comprises a total of 35 columns. For this study, our focus is
on key variables believed to influence employee attrition rates.
The following essential variables are considered:
• Age: Numeric variable indicating the age of employees.
• Attrition: Categorical variable with options "Yes" and "No".
• Education: Categorical variable with options including
"Below College", "College", "Bachelor", "Master", and
"Doctor".
• Employee Satisfaction: Categorical variable with options
"Low", "Medium", "High", and "Very High".
• Job Involvement: Categorical variable describing the level
of employee engagement, with options "Low", "Medium", Figure 1: Distribution of Employees Age
"High", and "Very High".
• Work Life Balance: Categorical variable assessing work-life The average age of IBM employees in this graph goes from thirty
balance, categorized as "Bad", "Good", "Better", and "Best". to fifty years old. We have a few individuals in their twenties as
• Performance Rating: Categorical variable indicating well as a few other individuals over the age of sixty.
performance level, with options "Low", "Good", "Excellent",
and "Outstanding". The distribution of the number of employees per business travel
sector is shown below. The bulk of employees, as seen on the graph
In addition to these variables, the dataset includes several others below belong to “Travel Rarely” group. We have some individuals
that will be explored to uncover factors strongly associated with that belong to the “Travel Frequently” group. Lastly, we also have
employee attrition. This research aims to identify significant a group of individuals who do not travel as much as the previous
predictors and further investigate their relationships to inform two groups, and they belong the “Non-Traveler” group.
strategies for reducing attrition rates and enhancing organizational
retention practices.
Model Selection
During my analysis of each variable in the dataset, I focused on
addressing the fundamental question: Is there a strategy to reduce
employee attrition rates? In essence, I sought to uncover trends
that may contribute to both employee attrition and performance.
Any patterns identified are likely indicative of broader trends
affecting organizations where employees choose to leave, rather
than isolated factors specific to individual cases. It is crucial to
acknowledge that historical data, while informative, does not
guarantee future outcomes. These patterns may have evolved Figure 2: Distribution of the Number of Employees per Business
over time, but their persistence in the future is uncertain. Based Travel sector
on these considerations, the research proceeded with the following
methodologies, detailed below. Initially, a descriptive analysis The distribution of employees per department is shown below.
model was applied to Kaggle.com's historical dataset. This stage
involved a thorough examination of each variable to identify
key factors influencing employee attrition rates, providing
insights for organizations aiming to retain their top talent over
extended periods. Subsequently, a predictive analytics model was
implemented to assess how findings from the descriptive study
could forecast future employee attrition and performance rates
within a company.
Solution Process
The initial phase of our analysis will involve conducting a
descriptive analysis of the dataset. This includes checking for
missing data, assessing the current data types of each variable,
and generating summary statistics to gain a comprehensive Figure 3: Distribution of the Number of Employees per Department
understanding of the dataset. Visualizations will be utilized to
better understand the distribution of each variable. Subsequently, We examined the impact of the Gender variable on attrition. The
we will implement predictive modeling to identify highly graph below illustrates the distribution of attrition by gender and
Predictive Analysis
The predictive analysis will be carried out utilizing the two criteria
listed below:
Figure 6: Box Plot Distribution of the Attrition Rate Over Age • Identify the correlation co-efficient values between variables.
• Create a regression model to predict the attrition rate based
People of younger ages are more willing to leave a job than those on the provided variables.
of older ages, as shown in the graph above. After that, we'll make
a single-variable plot of the Age variable. The graph looks like The correlation and regression model has been used to test the
this after graphing the values: relationship between variables such as Age, Employee Number,
Environment Satisfaction, Monthly Income, Performance Rating, Work Life Balance, Years at Company, Years in Current Role, Years
Since Last Promotion, Years with Current Manager, and Stock Option Level etc. By exploring these potential explanatory variables,
we will know what variables help the attrition rate in an organization.
Heatmap
Below we have created a heatmap about the correlation between the attrition values and the other explanatory variables such as (Age,
Total Number of Employees, Environment Satisfaction, Monthly Income, Work Life Balance, Years at the Company, Years Since
Last Promotion, Stock Level options)
As demonstrated in the heatmap above, the attrition rate is strongly connected with the variables like years with current manager,
years since last promotion, years in current role and a total number of years at company variables. We also see that the attrition rate
is less so connected with the age variable and the employee satisfaction variable. We may also observe that the variables with lighter
colors in the heatmap are less highly associated with one another than those with deeper.
Regression
The Regression Model was chosen for the following reasons:
• The procedure will assist us in identifying the most strongly connected possible variables that will have an impact on IBM
employee attrition rates.
• This method will help us focus our efforts on areas that will boost the likelihood of retaining talented employees within a company.
• It will have an effective and may save a company from losing productivity if employee retention is maintained.
Residuals vs Fitted
Conclusion References
In summary, we utilized strategic tools to assess the current 1. Pavansubhash (2017) IBM HR Analytics Employee Attrition
market landscape and potential factors influencing future & Performance. Kaggle https://www.kaggle.com/datasets/
employee attrition rates using the dataset sourced from Kaggle. pavansubhasht/ibm-hr-analytics-attrition-dataset.
com. Through our analysis, several key insights into the
determinants of attrition have emerged. Employee turnover poses
a significant challenge for organizations, but proactive decisions
by HR and management can mitigate this issue. This holds true
for IBM as well. Our dataset analysis allowed us to delve into
explanatory variables and their impact on attrition rates. Notably,
YearsWithCurrManager emerged as the most influential variable
affecting attrition, based on our investigation. Additionally,
YearsInCurrentRole exhibited the second highest R-squared
value at 0.02, followed by YearsSinceLastPromotion at 0.01.
When combining YearsInCurrentRole, YearsWithCurrManager,
YearsSinceLastPromotion, and YearsAtCompany, the cumulative
R-squared of 0.03 underscores the multifaceted nature of factors
influencing an employee's decision to remain with or depart from
the company [1].