0% found this document useful (0 votes)
5 views

DSBDA2 - Jupyter Notebook

The document is a Jupyter Notebook containing a practical exercise by Maithili Kishor Narkhede, focusing on data analysis using a dataset titled 'Student Performance.csv'. It includes various operations such as checking for null values, dropping missing data, and visualizing scores using boxplots and scatter plots. The notebook also demonstrates the use of NumPy and Pandas for data manipulation and analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

DSBDA2 - Jupyter Notebook

The document is a Jupyter Notebook containing a practical exercise by Maithili Kishor Narkhede, focusing on data analysis using a dataset titled 'Student Performance.csv'. It includes various operations such as checking for null values, dropping missing data, and visualizing scores using boxplots and scatter plots. The notebook also demonstrates the use of NumPy and Pandas for data manipulation and analysis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

3/25/25, 1:29 AM DSBDA2 - Jupyter Notebook

Name :- Maithili Kishor Narkhede


Roll No. :- COTA28

# Practical 2
In [10]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [11]: sp=pd.read_csv("Student Performance.csv")

In [12]: sp.isnull()

Out[12]: Club
Math Reading Writing Placement Placement
Gender Join Region
Score Score Score Score Offer Count
Year

0 False False False False False False False False

1 False False False False False False False False

2 False False False False False False False False

3 False False False False False False False False

4 False False False False False False False False

5 False False False False False False False False

6 False False False False False False False False

7 False False False False False False False False

8 False False False False False False False False

9 False False False False False False False False

10 False False False False False False False False

11 False False False False False False False False

12 False False False False False False False False

In [13]: series = pd.isnull(sp["Math Score"])


sp[series]

Out[13]: Club
Math Reading Writing Placement Placement
Gender Join Region
Score Score Score Score Offer Count
Year

localhost:8891/notebooks/DSBDA/DSBDA2.ipynb 1/7
3/25/25, 1:29 AM DSBDA2 - Jupyter Notebook

In [14]: sp.notnull()

Out[14]: Club
Math Reading Writing Placement Placement
Gender Join Region
Score Score Score Score Offer Count
Year

0 True True True True True True True True

1 True True True True True True True True

2 True True True True True True True True

3 True True True True True True True True

4 True True True True True True True True

5 True True True True True True True True

6 True True True True True True True True

7 True True True True True True True True

8 True True True True True True True True

9 True True True True True True True True

10 True True True True True True True True

11 True True True True True True True True

12 True True True True True True True True

In [15]: series1 = pd.notnull(sp["Math Score"])


sp[series1]

Out[15]: Club
Math Reading Writing Placement Placement
Gender Join Region
Score Score Score Score Offer Count
Year

0 Male 91 70 90 70 2019 1 Urban

1 Female 80 60 95 60 2020 5 Rular

2 Male 69 50 100 50 2021 2 Urban

3 Female 58 40 105 40 2022 2 Rular

4 Male 47 30 110 30 2023 6 Urban

5 Female 36 20 115 20 2019 3 Rular

6 Male 25 10 120 10 2020 3 Urban

7 Female 14 0 125 0 2021 7 Rular

8 Male 3 70 130 70 2022 4 Urban

9 Male 91 60 135 60 2023 4 Urban

10 Female 80 50 140 50 2019 8 Rular

11 Male 69 40 145 40 2020 5 Urban

12 Female 58 30 150 30 2021 5 Rular

In [16]: from sklearn.preprocessing import LabelEncoder


le = LabelEncoder()

localhost:8891/notebooks/DSBDA/DSBDA2.ipynb 2/7
3/25/25, 1:29 AM DSBDA2 - Jupyter Notebook

In [17]: sp.dropna()

Out[17]: Club
Math Reading Writing Placement Placement
Gender Join Region
Score Score Score Score Offer Count
Year

0 Male 91 70 90 70 2019 1 Urban

1 Female 80 60 95 60 2020 5 Rular

2 Male 69 50 100 50 2021 2 Urban

3 Female 58 40 105 40 2022 2 Rular

4 Male 47 30 110 30 2023 6 Urban

5 Female 36 20 115 20 2019 3 Rular

6 Male 25 10 120 10 2020 3 Urban

7 Female 14 0 125 0 2021 7 Rular

8 Male 3 70 130 70 2022 4 Urban

9 Male 91 60 135 60 2023 4 Urban

10 Female 80 50 140 50 2019 8 Rular

11 Male 69 40 145 40 2020 5 Urban

12 Female 58 30 150 30 2021 5 Rular

In [18]: sp.dropna(how = 'all')

Out[18]: Club
Math Reading Writing Placement Placement
Gender Join Region
Score Score Score Score Offer Count
Year

0 Male 91 70 90 70 2019 1 Urban

1 Female 80 60 95 60 2020 5 Rular

2 Male 69 50 100 50 2021 2 Urban

3 Female 58 40 105 40 2022 2 Rular

4 Male 47 30 110 30 2023 6 Urban

5 Female 36 20 115 20 2019 3 Rular

6 Male 25 10 120 10 2020 3 Urban

7 Female 14 0 125 0 2021 7 Rular

8 Male 3 70 130 70 2022 4 Urban

9 Male 91 60 135 60 2023 4 Urban

10 Female 80 50 140 50 2019 8 Rular

11 Male 69 40 145 40 2020 5 Urban

12 Female 58 30 150 30 2021 5 Rular

localhost:8891/notebooks/DSBDA/DSBDA2.ipynb 3/7
3/25/25, 1:29 AM DSBDA2 - Jupyter Notebook

In [19]: sp.dropna(axis = 1)

Out[19]: Club
Math Reading Writing Placement Placement
Gender Join Region
Score Score Score Score Offer Count
Year

0 Male 91 70 90 70 2019 1 Urban

1 Female 80 60 95 60 2020 5 Rular

2 Male 69 50 100 50 2021 2 Urban

3 Female 58 40 105 40 2022 2 Rular

4 Male 47 30 110 30 2023 6 Urban

5 Female 36 20 115 20 2019 3 Rular

6 Male 25 10 120 10 2020 3 Urban

7 Female 14 0 125 0 2021 7 Rular

8 Male 3 70 130 70 2022 4 Urban

9 Male 91 60 135 60 2023 4 Urban

10 Female 80 50 140 50 2019 8 Rular

11 Male 69 40 145 40 2020 5 Urban

12 Female 58 30 150 30 2021 5 Rular

In [20]: new_data = sp.dropna(axis = 0, how ='any')


new_data

Out[20]: Club
Math Reading Writing Placement Placement
Gender Join Region
Score Score Score Score Offer Count
Year

0 Male 91 70 90 70 2019 1 Urban

1 Female 80 60 95 60 2020 5 Rular

2 Male 69 50 100 50 2021 2 Urban

3 Female 58 40 105 40 2022 2 Rular

4 Male 47 30 110 30 2023 6 Urban

5 Female 36 20 115 20 2019 3 Rular

6 Male 25 10 120 10 2020 3 Urban

7 Female 14 0 125 0 2021 7 Rular

8 Male 3 70 130 70 2022 4 Urban

9 Male 91 60 135 60 2023 4 Urban

10 Female 80 50 140 50 2019 8 Rular

11 Male 69 40 145 40 2020 5 Urban

12 Female 58 30 150 30 2021 5 Rular

In [21]: print(np.where(sp['Math Score']>90))

(array([0, 9], dtype=int64),)

localhost:8891/notebooks/DSBDA/DSBDA2.ipynb 4/7
3/25/25, 1:29 AM DSBDA2 - Jupyter Notebook

In [22]: print(np.where(sp['Reading Score']<25))


print(np.where(sp['Writing Score']<30))

(array([5, 6, 7], dtype=int64),)


(array([], dtype=int64),)

In [36]: col = ['Math Score', 'Reading Score' ,' Placement Score']


sp.boxplot(col)
plt.show()

localhost:8891/notebooks/DSBDA/DSBDA2.ipynb 5/7
3/25/25, 1:29 AM DSBDA2 - Jupyter Notebook

In [41]: fig, axes = plt.subplots(figsize = (18,10))


axes.scatter(sp[' Placement Score'], sp[' Placement Offer Count'])
plt.show()

localhost:8891/notebooks/DSBDA/DSBDA2.ipynb 6/7
3/25/25, 1:29 AM DSBDA2 - Jupyter Notebook

In [42]: print(np.where((sp[' Placement Score']<50) & (sp[' Placement Offer Count']>1


print(np.where((sp[' Placement Score']>85) & (sp[' Placement Offer Count']<3

(array([ 3, 4, 5, 6, 7, 11, 12], dtype=int64),)


(array([], dtype=int64),)

In [43]: sorted_rscore= sorted(sp['Reading Score'])

In [44]: sorted_rscore

Out[44]: [0, 10, 20, 30, 30, 40, 40, 50, 50, 60, 60, 70, 70]

In [45]: q1 = np.percentile(sorted_rscore, 25)


q3 = np.percentile(sorted_rscore, 75)
print(q1,q3)

30.0 60.0

In [46]: IQR = q3-q1

In [47]: lwr_bound = q1-(1.5*IQR)


upr_bound = q3+(1.5*IQR)
print(lwr_bound, upr_bound)

-15.0 105.0

In [ ]: ​

localhost:8891/notebooks/DSBDA/DSBDA2.ipynb 7/7

You might also like