0% found this document useful (0 votes)
1 views

Exp5_naive.ipynb - Colab

The document outlines the implementation of a naïve Bayesian classifier using a sample dataset for predicting admission chances based on various academic metrics. It includes data preprocessing steps, such as reading a CSV file, handling missing values, and normalizing features, followed by training and testing the model. The classifier achieved an accuracy of 89% on the test dataset.

Uploaded by

Jahnvi Kedia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Exp5_naive.ipynb - Colab

The document outlines the implementation of a naïve Bayesian classifier using a sample dataset for predicting admission chances based on various academic metrics. It includes data preprocessing steps, such as reading a CSV file, handling missing values, and normalizing features, followed by training and testing the model. The classifier achieved an accuracy of 89% on the test dataset.

Uploaded by

Jahnvi Kedia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

4-5 February 2025

keyboard_arrow_down EXPERIENT 5
Write a program to implement the naïve Bayesian classifier for a sample training data set stored
as a .CSV file. Compute the accuracy of the classifier, considering few test data sets

from google.colab import drive


drive.mount('/content/drive')

Mounted at /content/drive

import pandas as pd
import numpy as np

# Specify the full path to the dataset


dataset_path = '/content/drive/MyDrive/Datasets_ml/Copy of Admission_Predict_Ver1.1.csv'
df = pd.read_csv(dataset_path, sep=",")

# it may be needed in the future.


serialNo = df["Serial No."].values

df.drop(["Serial No."], axis=1, inplace=True)

df = df.rename(columns={'Chance of Admit ': 'Chance of Admit'})

import matplotlib.pyplot as plt #data visualization


import seaborn as sns #statistical data visualisation

df = pd.read_csv(dataset_path)
df.head()

Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit

0 1 337 118 4 4.5 4.5 9.65 1 0.92

1 2 324 107 4 4.0 4.5 8.87 1 0.76

2 3 316 104 3 3.0 3.5 8.00 1 0.72

3 4 322 110 3 3.5 2.5 8.67 1 0.80

4 5 314 103 2 2.0 3.0 8.21 0 0.65

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Serial No. 500 non-null int64
1 GRE Score 500 non-null int64
2 TOEFL Score 500 non-null int64
3 University Rating 500 non-null int64
4 SOP 500 non-null float64
5 LOR 500 non-null float64
6 CGPA 500 non-null float64
7 Research 500 non-null int64
8 Chance of Admit 500 non-null float64
dtypes: float64(4), int64(5)
memory usage: 35.3 KB

df=df.rename(columns = {'Chance of Admit ':'Chance of Admit'})

df.describe()
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit

count 500.000000 500.000000 500.000000 500.000000 500.000000 500.00000 500.000000 500.000000 500.00000

mean 250.500000 316.472000 107.192000 3.114000 3.374000 3.48400 8.576440 0.560000 0.72174

std 144.481833 11.295148 6.081868 1.143512 0.991004 0.92545 0.604813 0.496884 0.14114

min 1.000000 290.000000 92.000000 1.000000 1.000000 1.00000 6.800000 0.000000 0.34000

25% 125.750000 308.000000 103.000000 2.000000 2.500000 3.00000 8.127500 0.000000 0.63000

50% 250.500000 317.000000 107.000000 3.000000 3.500000 3.50000 8.560000 1.000000 0.72000

75% 375.250000 325.000000 112.000000 4.000000 4.000000 4.00000 9.040000 1.000000 0.82000

max 500.000000 340.000000 120.000000 5.000000 5.000000 5.00000 9.920000 1.000000 0.97000

l = df.columns
print('The columns are: ',l)

The columns are: Index(['Serial No.', 'GRE Score', 'TOEFL Score', 'University Rating', 'SOP',
'LOR ', 'CGPA', 'Research', 'Chance of Admit'],
dtype='object')

print(df.isnull().sum())
print('\n\nNo null values')

Serial No. 0
GRE Score 0
TOEFL Score 0
University Rating 0
SOP 0
LOR 0
CGPA 0
Research 0
Chance of Admit 0
dtype: int64

No null values

df.describe().T #transpose

count mean std min 25% 50% 75% max

Serial No. 500.0 250.50000 144.481833 1.00 125.7500 250.50 375.25 500.00

GRE Score 500.0 316.47200 11.295148 290.00 308.0000 317.00 325.00 340.00

TOEFL Score 500.0 107.19200 6.081868 92.00 103.0000 107.00 112.00 120.00

University Rating 500.0 3.11400 1.143512 1.00 2.0000 3.00 4.00 5.00

SOP 500.0 3.37400 0.991004 1.00 2.5000 3.50 4.00 5.00

LOR 500.0 3.48400 0.925450 1.00 3.0000 3.50 4.00 5.00

CGPA 500.0 8.57644 0.604813 6.80 8.1275 8.56 9.04 9.92

Research 500.0 0.56000 0.496884 0.00 0.0000 1.00 1.00 1.00

Chance of Admit 500.0 0.72174 0.141140 0.34 0.6300 0.72 0.82 0.97

df.describe()

Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit

count 500.000000 500.000000 500.000000 500.000000 500.000000 500.00000 500.000000 500.000000 500.00000

mean 250.500000 316.472000 107.192000 3.114000 3.374000 3.48400 8.576440 0.560000 0.72174

std 144.481833 11.295148 6.081868 1.143512 0.991004 0.92545 0.604813 0.496884 0.14114

min 1.000000 290.000000 92.000000 1.000000 1.000000 1.00000 6.800000 0.000000 0.34000

25% 125.750000 308.000000 103.000000 2.000000 2.500000 3.00000 8.127500 0.000000 0.63000

50% 250.500000 317.000000 107.000000 3.000000 3.500000 3.50000 8.560000 1.000000 0.72000

75% 375.250000 325.000000 112.000000 4.000000 4.000000 4.00000 9.040000 1.000000 0.82000

max 500.000000 340.000000 120.000000 5.000000 5.000000 5.00000 9.920000 1.000000 0.97000

plt.rcParams['axes.facecolor'] = "#ffe5e5"
plt.rcParams['figure.facecolor'] = "#ffe5e5"
plt.figure(figsize=(6,6))
plt.subplot(2, 1, 1)
sns.histplot(df['GRE Score'],bins=34,color='Red', kde=True, line_kws={"color": "y", "lw": 3, "label": "KDE"}, linewidth=2, alpha=0.3)
plt.subplot(2, 1, 2)
sns.histplot(df['TOEFL Score'],bins=12,color='Blue' ,kde=True, line_kws={"color": "k", "lw": 3, "label": "KDE"}, linewidth=7, alpha=0.3)
plt.show()
co_gre=df[df["GRE Score"]>=300]
co_toefel=df[df["TOEFL Score"]>=100]

print("Average GRE Score :{0:.2f} out of 340".format(df['GRE Score'].mean()))


print('Average TOEFL Score:{0:.2f} out of 120'.format(df['TOEFL Score'].mean()))
print('Average CGPA:{0:.2f} out of 10'.format(df['CGPA'].mean()))
print('Average Chance of getting admitted:{0:.2f}%'.format(df['Chance of Admit'].mean()*100))

Average GRE Score :316.47 out of 340


Average TOEFL Score:107.19 out of 120
Average CGPA:8.58 out of 10
Average Chance of getting admitted:72.17%

toppers=df[(df['GRE Score']>=330) & (df['TOEFL Score']>=115) & (df['CGPA']>=9.5)].sort_values(by=['Chance of Admit'],ascending=False)


toppers.head()

Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit

202 203 340 120 5 4.5 4.5 9.91 1 0.97

143 144 340 120 4 4.5 4.0 9.92 1 0.97

24 25 336 119 5 4.0 3.5 9.80 1 0.97

203 204 334 120 5 4.0 5.0 9.87 1 0.97

213 214 333 119 5 5.0 4.5 9.78 1 0.96

Next steps: Generate code with toppers toggle_off View recommended plots New interactive sheet

X=df.drop('Chance of Admit',axis=1)
y=df['Chance of Admit']

from sklearn.model_selection import train_test_split


from sklearn import preprocessing

#Normalisation works slightly better for Regression.


X_norm=preprocessing.normalize(X)
X_train,X_test,y_train,y_test=train_test_split(X_norm,y,test_size=0.20,random_state=101)

from sklearn.naive_bayes import GaussianNB


from sklearn.metrics import accuracy_score,mean_squared_error

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.20,random_state=101)

#If Chance of Admit greater than 80% we classify it as 1


y_train_c = [1 if each > 0.8 else 0 for each in y_train]
y_test_c = [1 if each > 0.8 else 0 for each in y_test]

from sklearn.naive_bayes import GaussianNB


from sklearn.metrics import accuracy_score

# Initialize the classifier


classifiers = GaussianNB()
# Initialize an empty list to store predictions

# Fit the classifier and make predictions


classifiers.fit(X_train, y_train_c)
predictions = classifiers.predict(X_test)

# Calculate and store the accuracy


print("GaussianNB Accuracy:", accuracy_score(y_test_c, predictions))

GaussianNB Accuracy: 0.89

Start coding or generate with AI.

You might also like