0% found this document useful (0 votes)
18 views3 pages

EMPLOYEE PERFORMANCE ANALYSIS

The document outlines a project focused on employee performance analysis, utilizing a dataset of 1200 rows and 28 features to predict performance ratings based on various factors. It details the methodologies employed, including univariate, bivariate, and multivariate analysis, as well as machine learning models like Support Vector Machine, Random Forest, and Artificial Neural Networks, with the latter achieving the highest accuracy of 95.80%. The project also emphasizes data preprocessing techniques such as encoding, outlier handling, and scaling to enhance model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views3 pages

EMPLOYEE PERFORMANCE ANALYSIS

The document outlines a project focused on employee performance analysis, utilizing a dataset of 1200 rows and 28 features to predict performance ratings based on various factors. It details the methodologies employed, including univariate, bivariate, and multivariate analysis, as well as machine learning models like Support Vector Machine, Random Forest, and Artificial Neural Networks, with the latter achieving the highest accuracy of 95.80%. The project also emphasizes data preprocessing techniques such as encoding, outlier handling, and scaling to enhance model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

EMPLOYEE PERFORMANCE ANALYSIS

As the name suggests, employee performance analysis is a process of analyzing employee


data to identify patterns and trends that can help improve employee productivity,
engagement, and retention. It can be an excellent practice area as you will deal with data
containing different data types, like numerical (attendance, turnover rates, etc.) and
categorical (job satisfaction, feedback, etc.).

In such a project, you will need to:

● Set goals and decide on performance metrics,


● Collect feedback data,
● Use this data for preprocessing and analysis,
● Infer who performs the best.

BUSINESS CASE & GOAL OF PROJECT: BASED ON GIVEN FEATURE OF DATASET


WE NEED TO PREDICT THE PERFOMANCE RATING OF EMPLOYEE
DEPARTMENT WISE PERFORMANCE:
Top 3 Important Factors effecting employee performance
● A trained model which can predict the employee performance based on factors as
inputs. This will be used to hire employees Recommendations to improve the
employee performance based on insights from analysis The given Employee dataset
consist of 1200 rows. The features present in the data are 28 columns. The shape of
the dataset is 1200x28. The 28 features are classified into quantitative and qualitative
where 19 features are quantitative (11 columns consists numeric data & 8 columns
consists ordinal data) and 8 features are qualitative. EmpNumber consist
alphanumerical data (distinct values) which doesn't play a role as a relevant feature
for performance rating.
● From Correlation we can get the important aspects of the data, Correlation between
features and Performance Rating.Correlation is a statistical measure that expresses the
extent to which two variables are linearly related.The analysis of the project has gone
through the stage of Univariate,Bivariate & Multivariate analysis, correlation analysis
and analysis by each department to satisfy the project goal.
● The dataset consists of Categorical data and Numerical data. The Target variable
consist of ordinal data, so this is a classification problem.The multiple machine
learning model used in this project is Support vector classifier, Random forest
classifier & Artifical neural network[Multilayer percepton]. from above all models
Artifical neural network[Multilayer percepton] predicts higher accuracy 95.80%.
● One of the important goal of this project is to find the important feature affecting the
performance rating. The important features were predicted using the machine learning
model feature importance technique. The main technique used in the preprocessing
data using the Mannual & Frequency encoding method to convert the string -
categorical data into numerical data, because, Most of machine learning methods are
based on numerical methods where strings are not supportive. The overall project was
performed and achieved the goals by using the machine learning model and
visualization techniques.
1. Analysis
Data were analyzed by describing the features present in the data. the features play the bigger
part in the analysis. The features tell the relation between the dependent and independent
variables. Pandas also help to describe the datasets answering following questions early in
our project. The data present in the dataset are divided into numerical and categorical data.

2.Univariate, Bivariate & Multivariate Analysis:


● Univariate Analysis: In univariate analysis we get the unique labels of categorical
features, as well as get the range & density of numbers
● Bivariate Analysis: In bivariate analysis we check the feature relationship with target
veriable.
● Multivariate Analysis: In multivariate Analysis check the relationship between two
veriable with respect to the target veriable.
3.Explotary Data Analysis
Distribution of Continuous Features:
● In general, one of the first few steps in exploring the data would be to have a rough
idea of how the features are distributed with one another. To do so, we shall invoke
the familiar distplot function from the Seaborn plotting library. The distribution has
been done by both numerical features. it will show the overall idea about the density
and majority of data present in a different level.
● The age distribution is starting from 18 to 60 where the most of the employees are
laying between 30 to 40 age count Employees are worked in the multiple companies
up to 8 companies where most of the employees worked up to 2 companies before
getting to work here. The hourly rate range is 65 to 95 for majority employees work in
this company. In General, Most of Employees work up to 5 years in this company.
Most of the employees get 11% to 15% of salary hike in this company.
Check Skewness and Kurtosis of Numerical Features: Checking whether the data is Normally
distributed or Not with Skewness and Kurtosis

4.Data Pre-Processing

a. Check Missing Value


b. Categorical Data Conversion: Handel categorical data with the help of frequency and
mannual encoding, because feature contains lot's of labels

● Mannual Encoding: Mannual encoding is a best techinque to handle categorical


feature with the help of map function, map the labels based on frequency.
● Frequency Encoding: Frequency encoding is an encoding technique to transform an
original categorical variable to a numerical variable by considering the frequency
distribution of the data getting value counts.

c. Outlier Handling: Some features contain outliers so we are impute this outlier with the
help of IQR because in all features data is not normally distributed
d. Feature Transformation: In YearsSinceLastPromotion some skewed & kurtosis is
present, so we are use Square Root Transformation techinque
● Square root transformation: Square root transformation is one of the many types of
standard transformations.This transformation is used for count data (data that follow a
Poisson distribution) or small whole numbers. Each data point is replaced by its
square root. Negative data is converted to positive by adding a constant, and then
transformed.
● Q-Q Plot: Q–Q plot is a probability plot, a graphical method for comparing two
probability distributions by plotting their quantiles against each other.

e. Scaling The Data: scaling the data with the help of Standard scalar

● Standard Scaling: Standardization is the process of scaling the feature, it assumes the
feature follow normal distribution and scale the feature between mean and standard
deviation, here mean is 0 and standard deviation is always

5.Machine learning Model Creation & Evaluation

a. Define Dependant and Independant Features:


b. Balancing the data: The data is imbalance, so we need to balance the data with the
help of SMOTE

SMOTE: SMOTE (synthetic minority oversampling technique) is one of the most commonly
used oversampling methods to solve the imbalance problem. It aims to balance class
distribution by randomly increasing minority class examples by replicating them. SMOTE
synthesises new minority instances between existing minority instances. 3.Splitting Training
And Testing Data: 80% data use for training & 20% data used for testing

1. Support Vector Machine


2. Random Forest
3. Artificial Neural Network [MLP Classifier]

You might also like