0% found this document useful (0 votes)

1 views

Exp5_naive.ipynb - Colab

The document outlines the implementation of a naïve Bayesian classifier using a sample dataset for predicting admission chances based on various academic metrics. It includes data preprocessing steps, such as reading a CSV file, handling missing values, and normalizing features, followed by training and testing the model. The classifier achieved an accuracy of 89% on the test dataset.

Uploaded by

Jahnvi Kedia

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Exp5_naive.ipynb - Colab

Uploaded by

Jahnvi Kedia

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

4-5 February 2025

keyboard_arrow_down EXPERIENT 5
Write a program to implement the naïve Bayesian classifier for a sample training data set stored
as a .CSV file. Compute the accuracy of the classifier, considering few test data sets

from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive

import pandas as pd
import numpy as np

# Specify the full path to the dataset

dataset_path = '/content/drive/MyDrive/Datasets_ml/Copy of Admission_Predict_Ver1.1.csv'
df = pd.read_csv(dataset_path, sep=",")

# it may be needed in the future.

serialNo = df["Serial No."].values

df.drop(["Serial No."], axis=1, inplace=True)

df = df.rename(columns={'Chance of Admit ': 'Chance of Admit'})

import matplotlib.pyplot as plt #data visualization

import seaborn as sns #statistical data visualisation

df = pd.read_csv(dataset_path)
df.head()

Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit

0 1 337 118 4 4.5 4.5 9.65 1 0.92

1 2 324 107 4 4.0 4.5 8.87 1 0.76

2 3 316 104 3 3.0 3.5 8.00 1 0.72

3 4 322 110 3 3.5 2.5 8.67 1 0.80

4 5 314 103 2 2.0 3.0 8.21 0 0.65

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Serial No. 500 non-null int64
1 GRE Score 500 non-null int64
2 TOEFL Score 500 non-null int64
3 University Rating 500 non-null int64
4 SOP 500 non-null float64
5 LOR 500 non-null float64
6 CGPA 500 non-null float64
7 Research 500 non-null int64
8 Chance of Admit 500 non-null float64
dtypes: float64(4), int64(5)
memory usage: 35.3 KB

df=df.rename(columns = {'Chance of Admit ':'Chance of Admit'})

df.describe()
Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit

count 500.000000 500.000000 500.000000 500.000000 500.000000 500.00000 500.000000 500.000000 500.00000

mean 250.500000 316.472000 107.192000 3.114000 3.374000 3.48400 8.576440 0.560000 0.72174

std 144.481833 11.295148 6.081868 1.143512 0.991004 0.92545 0.604813 0.496884 0.14114

min 1.000000 290.000000 92.000000 1.000000 1.000000 1.00000 6.800000 0.000000 0.34000

25% 125.750000 308.000000 103.000000 2.000000 2.500000 3.00000 8.127500 0.000000 0.63000

50% 250.500000 317.000000 107.000000 3.000000 3.500000 3.50000 8.560000 1.000000 0.72000

75% 375.250000 325.000000 112.000000 4.000000 4.000000 4.00000 9.040000 1.000000 0.82000

max 500.000000 340.000000 120.000000 5.000000 5.000000 5.00000 9.920000 1.000000 0.97000

l = df.columns
print('The columns are: ',l)

The columns are: Index(['Serial No.', 'GRE Score', 'TOEFL Score', 'University Rating', 'SOP',
'LOR ', 'CGPA', 'Research', 'Chance of Admit'],
dtype='object')

print(df.isnull().sum())
print('\n\nNo null values')

Serial No. 0
GRE Score 0
TOEFL Score 0
University Rating 0
SOP 0
LOR 0
CGPA 0
Research 0
Chance of Admit 0
dtype: int64

No null values

df.describe().T #transpose

count mean std min 25% 50% 75% max

Serial No. 500.0 250.50000 144.481833 1.00 125.7500 250.50 375.25 500.00

GRE Score 500.0 316.47200 11.295148 290.00 308.0000 317.00 325.00 340.00

TOEFL Score 500.0 107.19200 6.081868 92.00 103.0000 107.00 112.00 120.00

University Rating 500.0 3.11400 1.143512 1.00 2.0000 3.00 4.00 5.00

SOP 500.0 3.37400 0.991004 1.00 2.5000 3.50 4.00 5.00

LOR 500.0 3.48400 0.925450 1.00 3.0000 3.50 4.00 5.00

CGPA 500.0 8.57644 0.604813 6.80 8.1275 8.56 9.04 9.92

Research 500.0 0.56000 0.496884 0.00 0.0000 1.00 1.00 1.00

Chance of Admit 500.0 0.72174 0.141140 0.34 0.6300 0.72 0.82 0.97

df.describe()