0% found this document useful (0 votes)

5 views

Coding

Uploaded by

Soniya Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Coding

Uploaded by

Soniya Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

import pandas as pd

# Load the dataset from the uploaded file

file_path = '/content/employee_attrition_data.csv'
employee_attrition_data = pd.read_csv(file_path)

# Display the first few rows of the dataset

print("First few rows of the dataset:")
print(employee_attrition_data.head())

# Display summary information about the dataset

print("\nSummary Information of the dataset:")
print(employee_attrition_data.info())

# Calculate basic statistics for numerical columns

print("\nBasic Statistics of the dataset:")
print(employee_attrition_data.describe())

First few rows of the dataset:

Employee_ID Age Gender Department Job_Title
Years_at_Company \
0 0 27 Male Marketing Manager 9

1 1 53 Female Sales Engineer 10

2 2 59 Female Marketing Analyst 8

3 3 42 Female Engineering Manager 1

4 4 44 Female Sales Engineer 10

Satisfaction_Level Average_Monthly_Hours Promotion_Last_5Years

Salary \
0 0.586251 151 0
60132
1 0.261161 221 1
79947
2 0.304382 184 0
46958
3 0.480779 242 0
40662
4 0.636244 229 1
74307

Attrition
0 0
1 0
2 1
3 0
4 0
Summary Information of the dataset:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Employee_ID 1000 non-null int64
1 Age 1000 non-null int64
2 Gender 1000 non-null object
3 Department 1000 non-null object
4 Job_Title 1000 non-null object
5 Years_at_Company 1000 non-null int64
6 Satisfaction_Level 1000 non-null float64
7 Average_Monthly_Hours 1000 non-null int64
8 Promotion_Last_5Years 1000 non-null int64
9 Salary 1000 non-null int64
10 Attrition 1000 non-null int64
dtypes: float64(1), int64(7), object(3)
memory usage: 86.1+ KB
None

Basic Statistics of the dataset:

Employee_ID Age Years_at_Company Satisfaction_Level
\
count 1000.000000 1000.000000 1000.000000 1000.000000

mean 499.500000 42.205000 5.605000 0.505995

std 288.819436 10.016452 2.822223 0.289797

min 0.000000 25.000000 1.000000 0.001376

25% 249.750000 33.000000 3.000000 0.258866

50% 499.500000 43.000000 6.000000 0.505675

75% 749.250000 51.000000 8.000000 0.761135

max 999.000000 59.000000 10.000000 0.999979

Average_Monthly_Hours Promotion_Last_5Years Salary

Attrition
count 1000.000000 1000.000000 1000.000000
1000.000000
mean 199.493000 0.486000 64624.980000
0.495000
std 29.631908 0.500054 20262.984333
0.500225
min 150.000000 0.000000 30099.000000
0.000000
25% 173.000000 0.000000 47613.500000
0.000000
50% 201.000000 0.000000 64525.000000
0.000000
75% 225.000000 1.000000 81921.000000
1.000000
max 249.000000 1.000000 99991.000000
1.000000

import pandas as pd

file_path = '/content/employee_attrition_data.csv'
employee_attrition_data = pd.read_csv(file_path)

# Check for missing values

missing_values = employee_attrition_data.isnull().sum()
print("Missing values in each column:")
print(missing_values)

# One-hot encode categorical variables

encoded_data = pd.get_dummies(employee_attrition_data,
columns=['Gender', 'Department', 'Job_Title'])

# Display the first few rows of the encoded dataset

print("First few rows of the encoded dataset:")
print(encoded_data.head())

Missing values in each column:

Employee_ID 0
Age 0
Gender 0
Department 0
Job_Title 0
Years_at_Company 0
Satisfaction_Level 0
Average_Monthly_Hours 0
Promotion_Last_5Years 0
Salary 0
Attrition 0
dtype: int64
First few rows of the encoded dataset:
Employee_ID Age Years_at_Company Satisfaction_Level \
0 0 27 9 0.586251
1 1 53 10 0.261161
2 2 59 8 0.304382
3 3 42 1 0.480779
4 4 44 10 0.636244
Average_Monthly_Hours Promotion_Last_5Years Salary Attrition \
0 151 0 60132 0
1 221 1 79947 0
2 184 0 46958 1
3 242 0 40662 0
4 229 1 74307 0

Gender_Female Gender_Male Department_Engineering

Department_Finance \
0 False True False
False
1 True False False
False
2 True False False
False
3 True False True
False
4 True False False
False

Department_HR Department_Marketing Department_Sales \

0 False True False
1 False False True
2 False True False
3 False False False
4 False False True

Job_Title_Accountant Job_Title_Analyst Job_Title_Engineer \

0 False False False
1 False False True
2 False True False
3 False False False
4 False False True

Job_Title_HR Specialist Job_Title_Manager

0 False True
1 False False
2 False False
3 False True
4 False False

import matplotlib.pyplot as plt

import seaborn as sns

# Generate summary statistics for all variables

summary_statistics = encoded_data.describe()
print("Summary Statistics:")
print(summary_statistics)

# Histograms for numerical variables

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

sns.histplot(encoded_data['Age'], kde=True, ax=axes[0])

axes[0].set_title('Age Distribution')

sns.histplot(encoded_data['Satisfaction_Level'], kde=True, ax=axes[1])

axes[1].set_title('Satisfaction Level Distribution')

sns.histplot(encoded_data['Salary'], kde=True, ax=axes[2])

axes[2].set_title('Salary Distribution')

plt.show()

# Count plots for original categorical variables

fig, axes = plt.subplots(1, 2, figsize=(18, 5))

sns.countplot(data=employee_attrition_data, x='Department',
ax=axes[0])
axes[0].set_title('Department Count')

sns.countplot(data=employee_attrition_data, x='Job_Title', ax=axes[1])

axes[1].set_title('Job Title Count')

plt.show()

# Generate a correlation matrix

correlation_matrix = encoded_data.corr()

# Plot the correlation matrix

plt.figure(figsize=(16, 10))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm',
fmt='.2f', linewidths=0.5)
plt.title('Correlation Matrix')
plt.show()

Summary Statistics:
Employee_ID Age Years_at_Company Satisfaction_Level
\
count 1000.000000 1000.000000 1000.000000 1000.000000

mean 499.500000 42.205000 5.605000 0.505995

std 288.819436 10.016452 2.822223 0.289797

min 0.000000 25.000000 1.000000 0.001376

25% 249.750000 33.000000 3.000000 0.258866

50% 499.500000 43.000000 6.000000 0.505675

75% 749.250000 51.000000 8.000000 0.761135

max 999.000000 59.000000 10.000000 0.999979

Average_Monthly_Hours Promotion_Last_5Years Salary

# Select features for clustering (excluding the target variable

'Attrition' and identifier 'Employee_ID')
features = encoded_data.drop(columns=['Employee_ID', 'Attrition'])

# Apply K-means clustering

kmeans = KMeans(n_clusters=3, random_state=42)
encoded_data['Cluster'] = kmeans.fit_predict(features)

# Visualize the clusters

plt.figure(figsize=(12, 6))
sns.scatterplot(data=encoded_data, x='Satisfaction_Level',
y='Average_Monthly_Hours', hue='Cluster', palette='viridis')
plt.title('K-means Clustering of Employees')
plt.show()

/usr/local/lib/python3.10/dist-packages/sklearn/cluster/
_kmeans.py:870: FutureWarning: The default value of `n_init` will
change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
warnings.warn(

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Select features and target

X = encoded_data.drop(columns=['Employee_ID', 'Attrition', 'Cluster'])
y = encoded_data['Attrition']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Apply logistic regression

logreg = LogisticRegression(max_iter=1000)
logreg.fit(X_train, y_train)

# Predict on the test set

y_pred = logreg.predict(X_test)

# Evaluate the model

classification_report_logreg = classification_report(y_test, y_pred)
confusion_matrix_logreg = confusion_matrix(y_test, y_pred)

print("Classification Report:")
print(classification_report_logreg)
print("\nConfusion Matrix:")
print(confusion_matrix_logreg)

Classification Report:
precision recall f1-score support

0 0.51 0.59 0.55 102

1 0.49 0.41 0.44 98

accuracy 0.50 200

macro avg 0.50 0.50 0.49 200
weighted avg 0.50 0.50 0.50 200

Confusion Matrix:
[[60 42]
[58 40]]

RRB JE COMPUTER SCIENCE & INFORMATION TECHNOLOGY Chapter Wise Solved
No ratings yet
RRB JE COMPUTER SCIENCE & INFORMATION TECHNOLOGY Chapter Wise Solved
240 pages
Final Capstone Project Report
100% (1)
Final Capstone Project Report
35 pages
Graded Assignment: Name: - Algebra 2A - Unit 1 - Lesson 4 Foundations of Functions
No ratings yet
Graded Assignment: Name: - Algebra 2A - Unit 1 - Lesson 4 Foundations of Functions
6 pages
Vertopal.com_ML Project 2
No ratings yet
Vertopal.com_ML Project 2
19 pages
Employee Turnover
No ratings yet
Employee Turnover
20 pages
11. Data Cleaning
No ratings yet
11. Data Cleaning
1 page
Satya772244@gmail Compdf
No ratings yet
Satya772244@gmail Compdf
7 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Employee_attrition_rate - Jupyter Notebook
No ratings yet
Employee_attrition_rate - Jupyter Notebook
62 pages
Ads Exam 21c3
No ratings yet
Ads Exam 21c3
22 pages
Assignment3: 1) Identify Percentage of Missing Values in Each Column and Display The Same
No ratings yet
Assignment3: 1) Identify Percentage of Missing Values in Each Column and Display The Same
30 pages
Assignment Ds Midterm
No ratings yet
Assignment Ds Midterm
2 pages
Chapter 1
No ratings yet
Chapter 1
19 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Assignment On Classification Tree Model Development: Submitted By-Gaurav Khokhani
No ratings yet
Assignment On Classification Tree Model Development: Submitted By-Gaurav Khokhani
19 pages
All chapter download Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual
100% (6)
All chapter download Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual
39 pages
Ml Projects
No ratings yet
Ml Projects
22 pages
R Working Manuals Students
No ratings yet
R Working Manuals Students
11 pages
Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual pdf download
100% (4)
Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual pdf download
16 pages
Group 3
No ratings yet
Group 3
15 pages
Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual - Free Download Available To Read All Chapters
100% (1)
Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual - Free Download Available To Read All Chapters
45 pages
Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual - Available For Instant Download And Reading
100% (4)
Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual - Available For Instant Download And Reading
45 pages
Full Download of Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual in PDF DOCX Format
100% (27)
Full Download of Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual in PDF DOCX Format
39 pages
Employees Burnout Analysis
No ratings yet
Employees Burnout Analysis
20 pages
[email protected]
No ratings yet
[email protected]
13 pages
Data Preprocessing & Visualization1
No ratings yet
Data Preprocessing & Visualization1
2 pages
Instant download Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual pdf all chapter
100% (13)
Instant download Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual pdf all chapter
30 pages
Random Forest Classifier
No ratings yet
Random Forest Classifier
18 pages
Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual - Quickly Download And Experience The Full Content
100% (4)
Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual - Quickly Download And Experience The Full Content
39 pages
ML Cops
No ratings yet
ML Cops
17 pages
Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manualinstant download
100% (5)
Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manualinstant download
29 pages
Report
No ratings yet
Report
15 pages
Download complete Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual (DOCX) and get instant access
100% (3)
Download complete Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual (DOCX) and get instant access
32 pages
AI Assignment 6 - Employee Performance Analysis - Jupyter Notebook
No ratings yet
AI Assignment 6 - Employee Performance Analysis - Jupyter Notebook
9 pages
Industry Assignment 1 - EmployeeAnalyis
No ratings yet
Industry Assignment 1 - EmployeeAnalyis
4 pages
DW 14
No ratings yet
DW 14
14 pages
Salary Prediction
No ratings yet
Salary Prediction
28 pages
Data Wrangling Report
No ratings yet
Data Wrangling Report
3 pages
howxtre
No ratings yet
howxtre
8 pages
Lab BA 3 Sem Programs
No ratings yet
Lab BA 3 Sem Programs
25 pages
Frequencies
No ratings yet
Frequencies
14 pages
Python
No ratings yet
Python
32 pages
prints
No ratings yet
prints
43 pages
Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual instant download
100% (3)
Exploring Microsoft Excel 2013 Comprehensive 1st Edition Poatsy Solutions Manual instant download
37 pages
Vertopal.com AML Project LearnerNotebook LowCode
No ratings yet
Vertopal.com AML Project LearnerNotebook LowCode
74 pages
MySql Aggregate Functions
No ratings yet
MySql Aggregate Functions
30 pages
Spss Assignment -Anoushka Sharma, Enrolment No. a0403423058
No ratings yet
Spss Assignment -Anoushka Sharma, Enrolment No. a0403423058
13 pages
R Programing 6 Feb
No ratings yet
R Programing 6 Feb
10 pages
XIIInfo Pract S E 435
0% (1)
XIIInfo Pract S E 435
5 pages
Salary Data Analysis - Phase 1
No ratings yet
Salary Data Analysis - Phase 1
5 pages
PRT_2_q's
No ratings yet
PRT_2_q's
7 pages
A Micro-Project Report On: "Analysis of Salary of Data Professions"
No ratings yet
A Micro-Project Report On: "Analysis of Salary of Data Professions"
19 pages
Sanjar Xolmirzayev - SQL Practice worksheet(Employee Database)
No ratings yet
Sanjar Xolmirzayev - SQL Practice worksheet(Employee Database)
9 pages
EMPLOYEE_SATISFACTION_SURVEY_Updated_v1
No ratings yet
EMPLOYEE_SATISFACTION_SURVEY_Updated_v1
14 pages
Logistic Regression 007
No ratings yet
Logistic Regression 007
1 page
Decision_Tree-Random_Forest - Jupyter Notebook
No ratings yet
Decision_Tree-Random_Forest - Jupyter Notebook
12 pages
Data Analysis Using Python
No ratings yet
Data Analysis Using Python
12 pages
Samsung DA
No ratings yet
Samsung DA
56 pages
PeopleSoft HRMS Interview Questions, Answers, and Explanations
From Everand
PeopleSoft HRMS Interview Questions, Answers, and Explanations
equitypress
4.5/5 (3)
Official Guide to Financial Accounting using TallyPrime: Managing your Business Just Got Simpler
From Everand
Official Guide to Financial Accounting using TallyPrime: Managing your Business Just Got Simpler
Tally Education Private Limited
No ratings yet
Effective Analytics for Marketing
From Everand
Effective Analytics for Marketing
Sucheta Kakkar
No ratings yet
Cybersecurity Jobs: Resume Marketing: Find Cybersecurity Jobs, #1
From Everand
Cybersecurity Jobs: Resume Marketing: Find Cybersecurity Jobs, #1
bruce brown
No ratings yet
Place Value Cheat Sheet Revised
No ratings yet
Place Value Cheat Sheet Revised
1 page
Quarkus 2
No ratings yet
Quarkus 2
10 pages
University of Kerala University of Kerala
No ratings yet
University of Kerala University of Kerala
2 pages
OS Record Final Print
No ratings yet
OS Record Final Print
119 pages
Presentation On Pointer
No ratings yet
Presentation On Pointer
9 pages
Result Analysis III semester CSE A 2025 March
No ratings yet
Result Analysis III semester CSE A 2025 March
31 pages
Scheme Programming Cheat Sheat
No ratings yet
Scheme Programming Cheat Sheat
2 pages
TOC Unit4
No ratings yet
TOC Unit4
38 pages
DP2 Practice Activities - Answers
No ratings yet
DP2 Practice Activities - Answers
3 pages
Evaluating RSA Key Length: Impact on Security Hardness and Computational Efficiency
No ratings yet
Evaluating RSA Key Length: Impact on Security Hardness and Computational Efficiency
6 pages
Os Notes
No ratings yet
Os Notes
67 pages
Complex Computing Problem
No ratings yet
Complex Computing Problem
18 pages
Banker's Algorithm in Operating System (OS) - javatpoint
No ratings yet
Banker's Algorithm in Operating System (OS) - javatpoint
6 pages
如何撰写心理学个人陈述
100% (1)
如何撰写心理学个人陈述
9 pages
Log
No ratings yet
Log
5 pages
DSA LAB MANUAL-1
No ratings yet
DSA LAB MANUAL-1
76 pages
HCL Technical Questions and Answers
No ratings yet
HCL Technical Questions and Answers
5 pages
ML Poster
No ratings yet
ML Poster
2 pages
Coefplot Manual
No ratings yet
Coefplot Manual
36 pages
Namma Kalvi 12th Computer Science Textbook English Medium
No ratings yet
Namma Kalvi 12th Computer Science Textbook English Medium
360 pages
Kalman-Type Filtering Using The Wavelet Transform: Olivier Renaud Jean-Luc Starck Fionn Murtagh
No ratings yet
Kalman-Type Filtering Using The Wavelet Transform: Olivier Renaud Jean-Luc Starck Fionn Murtagh
24 pages
Temperature
No ratings yet
Temperature
4 pages
تجميعات اسئلة
No ratings yet
تجميعات اسئلة
6 pages
Extensiones
No ratings yet
Extensiones
18 pages
Excel String Functions
No ratings yet
Excel String Functions
4 pages
Ch 11 Arrays.ppt
No ratings yet
Ch 11 Arrays.ppt
20 pages
12.M.C.A. (2 Years)
No ratings yet
12.M.C.A. (2 Years)
31 pages
Cryptography Lecture 4 Notes
No ratings yet
Cryptography Lecture 4 Notes
15 pages