DSBDA_prac2
DSBDA_prac2
In [72]: df.head()
Out[72]: gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
In [73]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 1000 non-null object
1 race/ethnicity 1000 non-null object
2 parental level of education 1000 non-null object
3 lunch 1000 non-null object
4 test preparation course 1000 non-null object
5 math score 997 non-null float64
6 reading score 997 non-null float64
7 writing score 998 non-null float64
dtypes: float64(3), object(5)
memory usage: 62.6+ KB
In [74]: df.isnull()
Out[74]: gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
In [75]: df.isnull().sum()
Out[75]: gender 0
race/ethnicity 0
parental level of education 0
lunch 0
test preparation course 0
math score 3
reading score 3
writing score 2
dtype: int64
In [77]: df.isnull().sum()
Out[77]: gender 0
race/ethnicity 0
parental level of education 0
lunch 0
test preparation course 0
math score 0
reading score 0
writing score 0
dtype: int64
In [78]: df.boxplot()
In [82]: newdf.boxplot()
plt.show()
In [84]: newdf.boxplot()
plt.show()
In [88]: df
Out[88]: gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score z_scores_math
0 female group B bachelor's degree standard none 72.0 72.0 74.0 0.390843
1 female group C some college standard completed 69.0 90.0 88.0 0.192706
2 female group B master's degree standard none 90.0 95.0 93.0 1.579670
3 male group A associate's degree free/reduced none 47.0 57.0 44.0 -1.260305
4 male group C some college standard none 76.0 78.0 75.0 0.655027
... ... ... ... ... ... ... ... ... ...
995 female group E master's degree standard completed 88.0 99.0 95.0 1.447578
996 male group C high school free/reduced none 62.0 55.0 55.0 -0.269616
997 female group C high school free/reduced completed 59.0 71.0 65.0 -0.467754
998 female group D some college standard completed 68.0 78.0 77.0 0.126660
999 female group D some college free/reduced none 77.0 86.0 86.0 0.721073
In [90]: outliers
Out[90]: 0 False
1 False
2 True
3 True
4 False
...
995 True
996 False
997 False
998 False
999 False
Name: z_scores_math, Length: 1000, dtype: bool
Out[91]: gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score z_scores_math
0 female group B bachelor's degree standard none 72.0 72.0 74.0 0.390843
1 female group C some college standard completed 69.0 90.0 88.0 0.192706
4 male group C some college standard none 76.0 78.0 75.0 0.655027
5 female group B associate's degree standard none 71.0 83.0 78.0 0.324798
8 male group D high school free/reduced completed 64.0 64.0 67.0 -0.137524
... ... ... ... ... ... ... ... ... ...
994 male group A high school standard none 63.0 63.0 62.0 -0.203570
996 male group C high school free/reduced none 62.0 55.0 55.0 -0.269616
997 female group C high school free/reduced completed 59.0 71.0 65.0 -0.467754
998 female group D some college standard completed 68.0 78.0 77.0 0.126660
999 female group D some college free/reduced none 77.0 86.0 86.0 0.721073
In [92]: df_no_math_score_outiler.boxplot()
plt.show()
In [96]: data.boxplot()
plt.show()
In [100… data.boxplot()
plt.show()
In [102… data.boxplot()
plt.show()
In [ ]: