DATA4800 Report
DATA4800 Report
1
3.2. Orange Configuration
Fig 3: Department
Fig 1 : Orange WorkFlow
The department with the most employees is the Research &
Development, though it has a moderate level of attrition.
3.3. Exploratory Data Analysis (EDA) Attrition rate is higher in the Sales department implying
that there may be problems in retaining employees within
EDA involves summarizing and visualizing the main this department. The Human Resources department has the
characteristics of the dataset. However, before performing highest retention rate, and this means that employees in this
EDA, you need to perform Data Preprocessing. These steps department are likely to remain in the company.
are essential to ensure the quality and usability of the
dataset. It involves handling missing values, encoding
categorical variables, and scaling numeric features.
Fig 4: Education
2
Fig 6 : Random Forest Configuration
The above graph reveals that employees with less than two
years with the current manager show a higher attrition rate
implying that the duration with the current manager
influences their decision to leave the organization. The
attrition rate decreased for employees who work with the
current manager for a middle to long time: 3-9+ years.
However, its performance declines on test data with AUC Fig 7: Gradient Boosting Configuration
at 0.759, CA at 0.861, and MCC at 0.345, which indicates
that the model have overfit. Low MCC value on test data
indicates that there is a poor balance between true and false
positive rates. (Pratt, Boudhane and Cakula, 2021) 3.6. Model 3 : Neural Network
3
Networks, and Logistic Regression. The metrics for these
on training and testing data are as follows:
Training Data
Test Data
Metric Random Gradient Neural Logistic
Forest Boosting Network Regression
Fig 8 : Neural Network Configuration
AUC 0.817 0.781 0.817 0.847
3.7. Model 4 : Logistic Regression
CA 0.864 0.878 0.871 0.864
Logistic Regression model has a stable performance on
training and test data. The AUC, CA, and MCC values for F1 Score 0.830 0.865 0.858 0.839
training data are 0.838, 0.874, and 0.438 respectively. For
testing data, these are 0.847, 0.864, and 0.376. This Precision 0.852 0.865 0.856 0.846
indicates that the model has generalized the patterns
effectively without overfitting. (Chakraborty et al., 2021) Recall 0.864 0.878 0.871 0.864
4
3.8.1. Confusion Matrix
5
compatible with the model. 4. References
● Real-Time Predictions: Employ real time 1. Anh, N.T., Tu, N.D., Solanki, V.K., Giang, N.L.,
processing for prepared information for up to date Thu, V.H., Son, L.N., Loc, N.D. and Nam, V.T.,
predictions. 2020. Integrating employee value model with
● Scalability: Ensure that the tool operates with churn prediction. International Journal of Sensors
large data sets and can scale based on the growth Wireless Communications and Control, 10(4),
of the organization. pp.484-493.
● Data Security and Privacy: Comply with the 2. Bahadır, M.B., Bayrak, A.T., Yücetürk, G. and
proper legal actions to protect the employees by Ergun, P., 2021. A comparative study for
properly performing measures to retain their employee churn prediction. In: 2021 29th Signal
information. Processing and Communications Applications
● Monitoring and Maintenance: Occasionally, go Conference (SIU).
back to the model and nourish it with new data, 3. Pratt, M., Boudhane, M. and Cakula, S., 2021.
thus confirming the accuracy and suitability of the Employee attrition estimation using random forest
model. algorithm. Baltic Journal of Modern Computing,
3.11. Business Improvement 9(1), pp.49-66.
4. Chakraborty, R., Mridha, K., Shaw, R.N. and
Utilizing an AI-driven Employee-Churn predictive system Ghosh, A., 2021. Study and prediction analysis of
that can provide the following advantages. the employee turnover using machine learning
approaches. In: 2021 IEEE 4th International
● Reduced TurnOver Cost: A company can cut at Conference on Computing, Power and
the expenses linked with the number of recruits Communication Technologies (GUCON). IEEE
that are inclined to focus on churn and decrease it. 5. El-Rayes, N., Fang, M., Smith, M. and Taylor,
Using an AI based technology, the particular HR S.M., 2020. Predicting employee attrition using
department can get a list of those employees who tree-based models. International Journal of
can potentially be a part of ‘the turnover’ and act Organizational Analysis, 28(6), pp.1273-1291.
to prevent it before it actually happens. 6. Wang, X. and Zhi, J., 2021. A machine
● Improved Satisfaction: The HR department may learning-based analytical framework for employee
apply selective incentive measures suitable for turnover prediction. Journal of Management
capturing the workers that might leave the Analytics, 8(3), pp.351-370.
organization. This means that by outlining the 7. Lazzari, M., Alvarez, J.M. and Ruggieri, S., 2022.
essential parameters for the churn of an employee Predicting and explaining employee turnover
and offering the suitable packages to the intention. International Journal of Data Science
employee, the company can decrease employee and Analytics, 14(3), pp.279-292.
attrition.
● Enhanced Productivity: If the HR department
remains successful in retaining experienced
employees in the company, this would guarantee
constant production value as well as maintaining
important expertise in-house.