Exp5_naive.ipynb - Colab
Exp5_naive.ipynb - Colab
keyboard_arrow_down EXPERIENT 5
Write a program to implement the naïve Bayesian classifier for a sample training data set stored
as a .CSV file. Compute the accuracy of the classifier, considering few test data sets
Mounted at /content/drive
import pandas as pd
import numpy as np
df = pd.read_csv(dataset_path)
df.head()
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Serial No. 500 non-null int64
1 GRE Score 500 non-null int64
2 TOEFL Score 500 non-null int64
3 University Rating 500 non-null int64
4 SOP 500 non-null float64
5 LOR 500 non-null float64
6 CGPA 500 non-null float64
7 Research 500 non-null int64
8 Chance of Admit 500 non-null float64
dtypes: float64(4), int64(5)
memory usage: 35.3 KB
df.describe()
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit
count 500.000000 500.000000 500.000000 500.000000 500.000000 500.00000 500.000000 500.000000 500.00000
mean 250.500000 316.472000 107.192000 3.114000 3.374000 3.48400 8.576440 0.560000 0.72174
std 144.481833 11.295148 6.081868 1.143512 0.991004 0.92545 0.604813 0.496884 0.14114
min 1.000000 290.000000 92.000000 1.000000 1.000000 1.00000 6.800000 0.000000 0.34000
25% 125.750000 308.000000 103.000000 2.000000 2.500000 3.00000 8.127500 0.000000 0.63000
50% 250.500000 317.000000 107.000000 3.000000 3.500000 3.50000 8.560000 1.000000 0.72000
75% 375.250000 325.000000 112.000000 4.000000 4.000000 4.00000 9.040000 1.000000 0.82000
max 500.000000 340.000000 120.000000 5.000000 5.000000 5.00000 9.920000 1.000000 0.97000
l = df.columns
print('The columns are: ',l)
The columns are: Index(['Serial No.', 'GRE Score', 'TOEFL Score', 'University Rating', 'SOP',
'LOR ', 'CGPA', 'Research', 'Chance of Admit'],
dtype='object')
print(df.isnull().sum())
print('\n\nNo null values')
Serial No. 0
GRE Score 0
TOEFL Score 0
University Rating 0
SOP 0
LOR 0
CGPA 0
Research 0
Chance of Admit 0
dtype: int64
No null values
df.describe().T #transpose
Serial No. 500.0 250.50000 144.481833 1.00 125.7500 250.50 375.25 500.00
GRE Score 500.0 316.47200 11.295148 290.00 308.0000 317.00 325.00 340.00
TOEFL Score 500.0 107.19200 6.081868 92.00 103.0000 107.00 112.00 120.00
University Rating 500.0 3.11400 1.143512 1.00 2.0000 3.00 4.00 5.00
Chance of Admit 500.0 0.72174 0.141140 0.34 0.6300 0.72 0.82 0.97
df.describe()
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit
count 500.000000 500.000000 500.000000 500.000000 500.000000 500.00000 500.000000 500.000000 500.00000
mean 250.500000 316.472000 107.192000 3.114000 3.374000 3.48400 8.576440 0.560000 0.72174
std 144.481833 11.295148 6.081868 1.143512 0.991004 0.92545 0.604813 0.496884 0.14114
min 1.000000 290.000000 92.000000 1.000000 1.000000 1.00000 6.800000 0.000000 0.34000
25% 125.750000 308.000000 103.000000 2.000000 2.500000 3.00000 8.127500 0.000000 0.63000
50% 250.500000 317.000000 107.000000 3.000000 3.500000 3.50000 8.560000 1.000000 0.72000
75% 375.250000 325.000000 112.000000 4.000000 4.000000 4.00000 9.040000 1.000000 0.82000
max 500.000000 340.000000 120.000000 5.000000 5.000000 5.00000 9.920000 1.000000 0.97000
plt.rcParams['axes.facecolor'] = "#ffe5e5"
plt.rcParams['figure.facecolor'] = "#ffe5e5"
plt.figure(figsize=(6,6))
plt.subplot(2, 1, 1)
sns.histplot(df['GRE Score'],bins=34,color='Red', kde=True, line_kws={"color": "y", "lw": 3, "label": "KDE"}, linewidth=2, alpha=0.3)
plt.subplot(2, 1, 2)
sns.histplot(df['TOEFL Score'],bins=12,color='Blue' ,kde=True, line_kws={"color": "k", "lw": 3, "label": "KDE"}, linewidth=7, alpha=0.3)
plt.show()
co_gre=df[df["GRE Score"]>=300]
co_toefel=df[df["TOEFL Score"]>=100]
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit
Next steps: Generate code with toppers toggle_off View recommended plots New interactive sheet
X=df.drop('Chance of Admit',axis=1)
y=df['Chance of Admit']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.20,random_state=101)