0% found this document useful (0 votes)

8 views

DSBDA_prac2

The document outlines a data analysis process using a dataset of student performance, including loading the data, checking for missing values, and filling them with mean values. It also describes the creation of boxplots to visualize the data, the calculation of z-scores to identify outliers, and the removal of these outliers from the dataset. Additionally, it includes steps for installing necessary libraries and applying statistical methods to clean and analyze the data.

Uploaded by

Manasi Deshmukh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

DSBDA_prac2

Uploaded by

Manasi Deshmukh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

In [70]: import pandas as pd

In [71]: df = pd.read_csv("D:\\Jupyter notebook\\datasets_74977_169835_StudentsPerformance.csv")

In [72]: df.head()

Out[72]: gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score

0 female group B bachelor's degree standard none 72.0 72.0 74.0

1 female group C some college standard completed 69.0 90.0 88.0

2 female group B master's degree standard none 90.0 95.0 93.0

3 male group A associate's degree free/reduced none 47.0 57.0 44.0

4 male group C some college standard none 76.0 78.0 75.0

In [73]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 1000 non-null object
1 race/ethnicity 1000 non-null object
2 parental level of education 1000 non-null object
3 lunch 1000 non-null object
4 test preparation course 1000 non-null object
5 math score 997 non-null float64
6 reading score 997 non-null float64
7 writing score 998 non-null float64
dtypes: float64(3), object(5)
memory usage: 62.6+ KB

In [74]: df.isnull()

Out[74]: gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score

0 False False False False False False False False

1 False False False False False False False False

2 False False False False False False False False

3 False False False False False False False False

4 False False False False False False False False

... ... ... ... ... ... ... ... ...

995 False False False False False False False False

996 False False False False False False False False

997 False False False False False False False False

998 False False False False False False False False

999 False False False False False False False False

1000 rows × 8 columns

In [75]: df.isnull().sum()

Out[75]: gender 0
race/ethnicity 0
parental level of education 0
lunch 0
test preparation course 0
math score 3
reading score 3
writing score 2
dtype: int64

In [76]: df['reading score'].fillna(df['reading score'].mean(),inplace=True)

df['math score'].fillna(df['math score'].mean(),inplace=True)
df['writing score'].fillna(df['writing score'].mean(),inplace=True)

In [77]: df.isnull().sum()

Out[77]: gender 0
race/ethnicity 0
parental level of education 0
lunch 0
test preparation course 0
math score 0
reading score 0
writing score 0
dtype: int64

In [78]: df.boxplot()

Out[78]: <Axes: >

In [79]: newdf = df[df["math score"]>20]

In [80]: !pip install matplotlib

Defaulting to user installation because normal site-packages is not writeable

Requirement already satisfied: matplotlib in c:\users\manasi deshmukh\appdata\roaming\python\python312\site-packages (3.8.2)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\manasi deshmukh\appdata\roaming\python\python312\site-packages (from matplotlib) (1.2.0)
Requirement already satisfied: cycler>=0.10 in c:\users\manasi deshmukh\appdata\roaming\python\python312\site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\manasi deshmukh\appdata\roaming\python\python312\site-packages (from matplotlib) (4.47.2)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\manasi deshmukh\appdata\roaming\python\python312\site-packages (from matplotlib) (1.4.5)
Requirement already satisfied: numpy<2,>=1.21 in c:\users\manasi deshmukh\appdata\roaming\python\python312\site-packages (from matplotlib) (1.26.3)
Requirement already satisfied: packaging>=20.0 in c:\users\manasi deshmukh\appdata\roaming\python\python312\site-packages (from matplotlib) (23.2)
Requirement already satisfied: pillow>=8 in c:\users\manasi deshmukh\appdata\roaming\python\python312\site-packages (from matplotlib) (10.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\manasi deshmukh\appdata\roaming\python\python312\site-packages (from matplotlib) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\manasi deshmukh\appdata\roaming\python\python312\site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: six>=1.5 in c:\users\manasi deshmukh\appdata\roaming\python\python312\site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip

In [81]: import matplotlib.pyplot as plt

In [82]: newdf.boxplot()
plt.show()

In [83]: newdf = df[df["writing score"]>20]

In [84]: newdf.boxplot()
plt.show()

In [85]: pip install scipy

Requirement already satisfied: scipy in c:\users\manasi deshmukh\appdata\local\programs\python\python39\lib\site-packages (1.12.0)

Requirement already satisfied: numpy<1.29.0,>=1.22.4 in c:\users\manasi deshmukh\appdata\local\programs\python\python39\lib\site-packages (from scipy) (1.26.1)
Note: you may need to restart the kernel to use updated packages.
WARNING: Ignoring invalid distribution -illow (c:\users\manasi deshmukh\appdata\local\programs\python\python39\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\manasi deshmukh\appdata\local\programs\python\python39\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\manasi deshmukh\appdata\local\programs\python\python39\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\manasi deshmukh\appdata\local\programs\python\python39\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\manasi deshmukh\appdata\local\programs\python\python39\lib\site-packages)
WARNING: Ignoring invalid distribution -illow (c:\users\manasi deshmukh\appdata\local\programs\python\python39\lib\site-packages)
WARNING: You are using pip version 22.0.4; however, version 23.3.2 is available.
You should consider upgrading via the 'C:\Users\Manasi Deshmukh\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.

In [86]: from scipy.stats import zscore

In [87]: df['z_scores_math'] = zscore(df['math score'])

In [88]: df

Out[88]: gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score z_scores_math

0 female group B bachelor's degree standard none 72.0 72.0 74.0 0.390843

1 female group C some college standard completed 69.0 90.0 88.0 0.192706

2 female group B master's degree standard none 90.0 95.0 93.0 1.579670

3 male group A associate's degree free/reduced none 47.0 57.0 44.0 -1.260305

4 male group C some college standard none 76.0 78.0 75.0 0.655027

... ... ... ... ... ... ... ... ... ...

995 female group E master's degree standard completed 88.0 99.0 95.0 1.447578

996 male group C high school free/reduced none 62.0 55.0 55.0 -0.269616

997 female group C high school free/reduced completed 59.0 71.0 65.0 -0.467754

998 female group D some college standard completed 68.0 78.0 77.0 0.126660

999 female group D some college free/reduced none 77.0 86.0 86.0 0.721073

1000 rows × 9 columns

In [89]: outliers = (df["z_scores_math"]> 1) | (df["z_scores_math"] < -1)

In [90]: outliers

Out[90]: 0 False
1 False
2 True
3 True
4 False
...
995 True
996 False
997 False
998 False
999 False
Name: z_scores_math, Length: 1000, dtype: bool

In [91]: df_no_math_score_outiler=df[(df.z_scores_math >-1) & (df.z_scores_math <1)]

df_no_math_score_outiler

Out[91]: gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score z_scores_math

0 female group B bachelor's degree standard none 72.0 72.0 74.0 0.390843

1 female group C some college standard completed 69.0 90.0 88.0 0.192706

4 male group C some college standard none 76.0 78.0 75.0 0.655027

5 female group B associate's degree standard none 71.0 83.0 78.0 0.324798

8 male group D high school free/reduced completed 64.0 64.0 67.0 -0.137524

... ... ... ... ... ... ... ... ... ...

994 male group A high school standard none 63.0 63.0 62.0 -0.203570

996 male group C high school free/reduced none 62.0 55.0 55.0 -0.269616

997 female group C high school free/reduced completed 59.0 71.0 65.0 -0.467754

998 female group D some college standard completed 68.0 78.0 77.0 0.126660

999 female group D some college free/reduced none 77.0 86.0 86.0 0.721073

697 rows × 9 columns

In [92]: df_no_math_score_outiler.boxplot()
plt.show()

In [93]: def RemoveOutlier(df, var):

Q1 = df[var].quantile(0.25)
Q3 = df[var].quantile(0.75)
IQR = Q3 - Q1
high = Q3 + 1.5 * IQR
low = Q1 - 1.5 * IQR
df = df[(df[var] > low) & (df[var] <= high)]
print('Outliers removed in', var)
return df

In [94]: data = RemoveOutlier(df,'math score')

Outliers removed in math score

In [95]: col= "math score"

data.boxplot(col)
plt.show()

In [96]: data.boxplot()
plt.show()

In [99]: data = RemoveOutlier(data,'reading score')

Outliers removed in reading score

In [100… data.boxplot()
plt.show()

In [101… data = RemoveOutlier(data,'math score')

Outliers removed in math score

In [102… data.boxplot()
plt.show()
In [ ]:

Data Manipulation With Python Pandas 1700003764
No ratings yet
Data Manipulation With Python Pandas 1700003764
10 pages
student analysis
No ratings yet
student analysis
16 pages
PMA_Experiment_1
No ratings yet
PMA_Experiment_1
9 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Students Performance Analysis
No ratings yet
Students Performance Analysis
12 pages
Data Cleaning
No ratings yet
Data Cleaning
83 pages
students-exam-scores-analysis.ipynb
No ratings yet
students-exam-scores-analysis.ipynb
4 pages
DS&BDA 1-14
No ratings yet
DS&BDA 1-14
95 pages
adi_dsbda2_demo_final
No ratings yet
adi_dsbda2_demo_final
14 pages
vertopal.com_Jamboree
No ratings yet
vertopal.com_Jamboree
10 pages
Data Wrangling, 2
No ratings yet
Data Wrangling, 2
4 pages
DSBDA02
No ratings yet
DSBDA02
8 pages
List of Practical Ip065 Xii Session 2025 Ckc Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 Ckc Academy
19 pages
Experiment 2
No ratings yet
Experiment 2
5 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
SSCE-2025 PRACTICAL TEST SOLUTION
No ratings yet
SSCE-2025 PRACTICAL TEST SOLUTION
7 pages
Jamboree_Case_Study
No ratings yet
Jamboree_Case_Study
24 pages
_payal_2_practical (1)_edited
No ratings yet
_payal_2_practical (1)_edited
9 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
Assignment 02
No ratings yet
Assignment 02
4 pages
First 4
No ratings yet
First 4
11 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
DSDBAAssignment2_SUMEET (1)
No ratings yet
DSDBAAssignment2_SUMEET (1)
8 pages
LIST OF PRACTICAL IP065 XII SESSION 2025 CKC ACADEMY
No ratings yet
LIST OF PRACTICAL IP065 XII SESSION 2025 CKC ACADEMY
19 pages
IP XII U1 Ch3 DataHandling (DataFrame) Final
No ratings yet
IP XII U1 Ch3 DataHandling (DataFrame) Final
45 pages
vertopal.com_IBA Practical Set A 14th Dec
No ratings yet
vertopal.com_IBA Practical Set A 14th Dec
3 pages
Samarth Raghav
No ratings yet
Samarth Raghav
15 pages
Lab2.2 Kritika
No ratings yet
Lab2.2 Kritika
10 pages
CardioGoodFitness - Jupyter Notebook
No ratings yet
CardioGoodFitness - Jupyter Notebook
12 pages
OpenLab2
No ratings yet
OpenLab2
15 pages
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
No ratings yet
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
65 pages
Complete Case Analysis (CCA) : Advantages
No ratings yet
Complete Case Analysis (CCA) : Advantages
6 pages
practical file class xii
No ratings yet
practical file class xii
25 pages
Project paarth (1) (1)
No ratings yet
Project paarth (1) (1)
21 pages
Prog Found Final
No ratings yet
Prog Found Final
10 pages
Assignment 3
No ratings yet
Assignment 3
15 pages
Practical No-2
No ratings yet
Practical No-2
4 pages
12 IP File Programs 6 To 17
No ratings yet
12 IP File Programs 6 To 17
9 pages
00 - Lesson - Data Science Workflow - Jupyter Notebook
No ratings yet
00 - Lesson - Data Science Workflow - Jupyter Notebook
6 pages
1728086737277
No ratings yet
1728086737277
26 pages
Assessment Test
No ratings yet
Assessment Test
22 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
35 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
2. DATA WRANGLING 2
No ratings yet
2. DATA WRANGLING 2
4 pages
Python Assignment
No ratings yet
Python Assignment
2 pages
Info Practical
No ratings yet
Info Practical
56 pages
Student Dropout
No ratings yet
Student Dropout
38 pages
Ss Project With Python
No ratings yet
Ss Project With Python
9 pages
hw-1
No ratings yet
hw-1
11 pages
Logistic Regression - Jupyter Notebook
No ratings yet
Logistic Regression - Jupyter Notebook
31 pages
DSBDA2 - Jupyter Notebook
No ratings yet
DSBDA2 - Jupyter Notebook
7 pages
TUTORIAL 2 QB & QP
No ratings yet
TUTORIAL 2 QB & QP
4 pages
2 data wranglin 2 Acadamic p
No ratings yet
2 data wranglin 2 Acadamic p
12 pages
Pandas Tutorial1 - Informatics
No ratings yet
Pandas Tutorial1 - Informatics
43 pages
DAV_practicle_File
No ratings yet
DAV_practicle_File
28 pages
Python Case Study
No ratings yet
Python Case Study
7 pages
Lab 2 - Basic Statistical Analysis
No ratings yet
Lab 2 - Basic Statistical Analysis
7 pages
MajorProject.ipynb - Colaboratory
No ratings yet
MajorProject.ipynb - Colaboratory
11 pages
DSBDA 3A
No ratings yet
DSBDA 3A
11 pages
DSBDA_PRAC1
No ratings yet
DSBDA_PRAC1
1 page
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
DSBDA 3B
No ratings yet
DSBDA 3B
5 pages
dsbda_5
No ratings yet
dsbda_5
4 pages
DSBDA 3B
No ratings yet
DSBDA 3B
5 pages
First Order Logic Syntax Semantics
No ratings yet
First Order Logic Syntax Semantics
8 pages
IT Final Assignment
No ratings yet
IT Final Assignment
12 pages
8051 PRJ
No ratings yet
8051 PRJ
25 pages
Underground Mine Design
100% (3)
Underground Mine Design
47 pages
Library Management System: Team Members
No ratings yet
Library Management System: Team Members
31 pages
System Calls
No ratings yet
System Calls
5 pages
Design Website Using ASP - Net (C#) - Encrypt Using RSA Algorithm
No ratings yet
Design Website Using ASP - Net (C#) - Encrypt Using RSA Algorithm
18 pages
Edx Presentation
No ratings yet
Edx Presentation
25 pages
Coderbyte Report - Purvi Majoka
No ratings yet
Coderbyte Report - Purvi Majoka
5 pages
SAP_C02_AWS_Certified_Solutions_Architect___Professional_Updated_Questions.pdf
No ratings yet
SAP_C02_AWS_Certified_Solutions_Architect___Professional_Updated_Questions.pdf
33 pages
TM Master Technical Guide
No ratings yet
TM Master Technical Guide
33 pages
IT409-IT476 Assignment2
No ratings yet
IT409-IT476 Assignment2
4 pages
Sartorius Manual Dcu-Host Interface 1 - 6
No ratings yet
Sartorius Manual Dcu-Host Interface 1 - 6
40 pages
Vindicator V5 IDS Data Sheet 2 PDF
No ratings yet
Vindicator V5 IDS Data Sheet 2 PDF
2 pages
L1 Intro To OOP
100% (1)
L1 Intro To OOP
15 pages
SC2x W1L1 IntroNetworkDesign
No ratings yet
SC2x W1L1 IntroNetworkDesign
5 pages
Kiosk Companies
No ratings yet
Kiosk Companies
17 pages
Linux Driver 4.19.19.00 Tool User Guide
No ratings yet
Linux Driver 4.19.19.00 Tool User Guide
19 pages
Figma Auto Layout Playground (Community)
No ratings yet
Figma Auto Layout Playground (Community)
56 pages
Value Proposition
No ratings yet
Value Proposition
30 pages
4.2-1 Maintain Computer System
No ratings yet
4.2-1 Maintain Computer System
17 pages
Python for Data Science – Ultimate Library Guide
No ratings yet
Python for Data Science – Ultimate Library Guide
5 pages
Top 31 Tableau Interview Questions and Answers For 2020 - Simplilearn
No ratings yet
Top 31 Tableau Interview Questions and Answers For 2020 - Simplilearn
34 pages
TCO1114 FINAL EXAM QUESTION TERM2430-3
No ratings yet
TCO1114 FINAL EXAM QUESTION TERM2430-3
4 pages
F-Series F1612 & F2630
No ratings yet
F-Series F1612 & F2630
16 pages
EngiLab Frame.2D User Manual
No ratings yet
EngiLab Frame.2D User Manual
234 pages
U5 1155CS101 - Ethics in Engineering
No ratings yet
U5 1155CS101 - Ethics in Engineering
22 pages
Debre Tabor University Faculty of Technology Department of Information Technology Title:Brain of Computer Interface
No ratings yet
Debre Tabor University Faculty of Technology Department of Information Technology Title:Brain of Computer Interface
31 pages
Answer Key For Cim Iat-3
No ratings yet
Answer Key For Cim Iat-3
2 pages
NATool Cat Revised2019 Final Web
No ratings yet
NATool Cat Revised2019 Final Web
136 pages
Firstfridays@gsa - Gov: First Fridays Usability Testing Script / Howto - Gov/firstfridays
No ratings yet
Firstfridays@gsa - Gov: First Fridays Usability Testing Script / Howto - Gov/firstfridays
6 pages

DSBDA_prac2

Uploaded by

DSBDA_prac2

Uploaded by

In [70]: import pandas as pd

In [71]: df = pd.read_csv("D:\\Jupyter notebook\\datasets_74977_169835_StudentsPerformance.csv")

0 female group B bachelor's degree standard none 72.0 72.0 74.0

1 female group C some college standard completed 69.0 90.0 88.0

2 female group B master's degree standard none 90.0 95.0 93.0

3 male group A associate's degree free/reduced none 47.0 57.0 44.0

4 male group C some college standard none 76.0 78.0 75.0

0 False False False False False False False False

1 False False False False False False False False

2 False False False False False False False False

3 False False False False False False False False

4 False False False False False False False False

... ... ... ... ... ... ... ... ...

995 False False False False False False False False

996 False False False False False False False False

997 False False False False False False False False

998 False False False False False False False False

999 False False False False False False False False

1000 rows × 8 columns

In [76]: df['reading score'].fillna(df['reading score'].mean(),inplace=True)

Out[78]: <Axes: >

In [79]: newdf = df[df["math score"]>20]

In [80]: !pip install matplotlib

Defaulting to user installation because normal site-packages is not writeable

In [81]: import matplotlib.pyplot as plt

In [83]: newdf = df[df["writing score"]>20]

In [85]: pip install scipy

Requirement already satisfied: scipy in c:\users\manasi deshmukh\appdata\local\programs\python\python39\lib\site-packages (1.12.0)

In [86]: from scipy.stats import zscore

In [87]: df['z_scores_math'] = zscore(df['math score'])

1000 rows × 9 columns

In [89]: outliers = (df["z_scores_math"]> 1) | (df["z_scores_math"] < -1)

In [91]: df_no_math_score_outiler=df[(df.z_scores_math >-1) & (df.z_scores_math <1)]

697 rows × 9 columns

In [93]: def RemoveOutlier(df, var):

In [94]: data = RemoveOutlier(df,'math score')

Outliers removed in math score

In [95]: col= "math score"

In [99]: data = RemoveOutlier(data,'reading score')

Outliers removed in reading score

In [101… data = RemoveOutlier(data,'math score')

Outliers removed in math score

You might also like