0% found this document useful (0 votes)

47 views

ML Lab Experiments (1) - Pages-3

The document discusses implementing a naive Bayesian classifier to classify documents. It describes using the classifier to perform document classification and calculating accuracy, precision and recall metrics for the classification. The objective is to build the classifier using Java APIs and classify a set of documents.

Uploaded by

Tarasha Maheshwari

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

ML Lab Experiments (1) - Pages-3

Uploaded by

Tarasha Maheshwari

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Machine Learning Lab (IT804) Jan-Jun 2021

Experiment No: 5
Objective: Write a program to implement the naïve Bayesian classifier for a sample training
data set stored as a .CSV file. Compute the accuracy of the classifier, considering few test data
sets.
Explanation:
Bayes’ Theorem is stated as:

Where,
P(h|D) is the probability of hypothesis h given the data D. This is called the posterior
probability.
P(D|h) is the probability of data d given that the hypothesis h was true.
P(h) is the probability of hypothesis h being true. This is called the prior probability of h.
P(D) is the probability of the data. This is called the prior probability of D.

After calculating the posterior probability for a number of different hypotheses h, and is
interested in finding the most probable hypothesis h ∈ H given the observed data D. Any such
maximally probable hypothesis is called a maximum a posteriori (MAP) hypothesis.

Bayes theorem to calculate the posterior probability of each candidate hypothesis is hMAP is a
MAP hypothesis provided

(Ignoring P(D) since it is a constant)

Gaussian Naive Bayes

A Gaussian Naive Bayes algorithm is a special type of Naïve Bayes algorithm. It’s specifically
used when the features have continuous values. It’s also assumed that all the features are
following a Gaussian distribution i.e., normal distribution

2
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

Representation for Gaussian Naive Bayes

We calculate the probabilities for input values for each class using a frequency. With real-valued
inputs, we can calculate the mean and standard deviation of input values (x) for each class to
summarize the distribution.
This means that in addition to the probabilities for each class, we must also store the mean and
standard deviations for each input variable for each class.
Gaussian Naive Bayes Model from Data
The probability density function for the normal distribution is defined by two parameters (mean
and standard deviation) and calculating the mean and standard deviation values of each input
variable (x) for each class value.

Sample Examples:
Diabetic
Blood Skin
Examples Pregnancies Glucose Insulin BMI Pedigree Age Outcome
Pressure Thickness
Function
1 6 148 72 35 0 33.6 0.627 50 1
2 1 85 66 29 0 26.6 0.351 31 0
3 8 183 64 0 0 23.3 0.672 32 1
4 1 89 66 23 94 28.1 0.167 21 0
5 0 137 40 35 168 43.1 2.288 33 1
6 5 116 74 0 0 25.6 0.201 30 0
7 3 78 50 32 88 31 0.248 26 1
8 10 115 0 0 0 35.3 0.134 29 0
9 2 197 70 45 543 30.5 0.158 53 1
10 8 125 96 0 0 0 0.232 54 1

Program:
import csv
import random
3
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

import math
def loadcsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]
return dataset

def splitdataset(dataset, splitratio):

#67% training size
trainsize = int(len(dataset) * splitratio);
trainset = [ ]
copy = list(dataset);
while len(trainset) < trainsize:

#generate indices for the dataset list randomly to pick ele for training data
index = random.randrange(len(copy));
trainset.append(copy.pop(index))
return [trainset, copy]

def separatebyclass(dataset):
separated = {} #dictionary of classes 1 and 0

#creates a dictionary of classes 1 and 0 where the values are

#the instances belonging to each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated

def mean(numbers):
4
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

return sum(numbers)/float(len(numbers))

def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)

def summarize(dataset): #creates a dictionary of classes

summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)];
del summaries[-1] #excluding labels +ve or -ve
return summaries

def summarizebyclass(dataset):
separated = separatebyclass(dataset);
#print(separated)
summaries = {}
for classvalue, instances in separated.items():

#for key,value in dic.items()

#summaries is a dic of tuples(mean,std) for each class value
summaries[classvalue] = summarize(instances)
#summarize is used to cal to mean and std
return summaries

def calculateprobability(x, mean, stdev):

exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent

def calculateclassprobabilities(summaries, inputvector):

# probabilities contains the all prob of all class of test data
probabilities = {}
for classvalue, classsummaries in summaries.items():
#class and attribute information as mean and sd
probabilities[classvalue] = 1
5
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

for i in range(len(classsummaries)):
mean, stdev = classsummaries[i]
#take mean and sd of every attribute for class 0 and 1 seperaely
x = inputvector[i] #testvector's first attribute
probabilities[classvalue] *= calculateprobability(x, mean, stdev);#use normal dist
return probabilities

def predict(summaries, inputvector): #training and test data is passed

probabilities = calculateclassprobabilities(summaries, inputvector)
bestLabel, bestProb = None, -1
for classvalue, probability in probabilities.items():
#assigns that class which has he highest prob
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classvalue
return bestLabel

def getpredictions(summaries, testset):

predictions = []
for i in range(len(testset)):
result = predict(summaries, testset[i])
predictions.append(result)
return predictions

def getaccuracy(testset, predictions):

correct = 0
for i in range(len(testset)):
if testset[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testset))) * 100.0

def main():
filename = 'naivedata.csv'
splitratio = 0.67
6
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

dataset = loadcsv(filename);
trainingset, testset = splitdataset(dataset, splitratio)
print('Split {0} rows into train={1} and test={2} rows'.format(len(dataset),
len(trainingset), len(testset)))
# prepare model
summaries = summarizebyclass(trainingset);
#print(summaries)
# test model
predictions = getpredictions(summaries, testset)
#find the predictions of test data with the training data
accuracy = getaccuracy(testset, predictions)
print('Accuracy of the classifier is : {0}%'.format(accuracy))
main()

Output:
Split 768 rows into train=514 and test=254 rows
Accuracy of the classifier is: 71.65354330708661%

7
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

Experiment No: 6
Objective: Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision, and recall for your data set.
Explanation:
Naive Bayes algorithms for learning and classifying text
LEARN_NAIVE_BAYES_TEXT (Examples, V)
Examples is a set of text documents along with their target values. V is the set of all possible
target values. This function learns the probability terms P(wk |vj,), describing the probability that
a randomly drawn word from a document in class vj will be the English word wk. It also learns
the class prior probabilities P(vj).
1. collect all words, punctuation, and other tokens that occur in Examples
 Vocabulary ← c the set of all distinct words and other tokens occurring in any text
document from Examples
2. calculate the required P(vj) and P(wk|vj) probability terms
 For each target value vj in V do
 docsj ← the subset of documents from Examples for which the target value is vj
 P(vj) ← | docsj | / |Examples|
 Textj ← a single document created by concatenating all members of docsj
 n ← total number of distinct word positions in Textj
 for each word wk in Vocabulary
 nk ← number of times word wk occurs in Textj
 P(wk|vj) ← ( nk + 1) / (n + | Vocabulary| )

CLASSIFY_NAIVE_BAYES_TEXT (Doc)
Return the estimated target value for the document Doc. ai denotes the word found in the ith
position within Doc.
 positions ← all word positions in Doc that contain tokens found in Vocabulary

 Return VNB, where

8
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

Program:
import pandas as pd

msg=pd.read_csv('naivetext.csv',names=['message','label'])

print('The dimensions of the dataset',msg.shape)

msg['labelnum']=msg.label.map({'pos':1,'neg':0})

X=msg.message
Y=msg.labelnum
print(X)
print(Y)

#splitting the dataset into train and test data

from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,Y)

print ('\n The total number of Training Data :',ytrain.shape)

print ('\n The total number of Test Data :',ytest.shape)

#output of count vectoriser is a sparse matrix

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
print('\n The words or Tokens in the text documents \n')
print(count_vect.get_feature_names())

df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names())

# Training Naive Bayes (NB) classifier on training data.

from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
9
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

#printing accuracy, Confusion matrix, Precision and Recall

from sklearn import metrics
print('\n Accuracy of the classifer is’, metrics.accuracy_score(ytest,predicted))
print('\n Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))

print('\n The value of Precision' , metrics.precision_score(ytest,predicted))

print('\n The value of Recall' , metrics.recall_score(ytest,predicted))

Data set:
Text Documents Label
1 I love this sandwich pos
2 This is an amazing place pos
3 I feel very good about these beers pos
4 This is my best work pos
5 What an awesome view pos
6 I do not like this restaurant neg
7 I am tired of this stuff neg
8 I can't deal with this neg
9 He is my sworn enemy neg
10 My boss is horrible neg
11 This is an awesome place pos
12 I do not like the taste of this juice neg
13 I love to dance pos
14 I am sick and tired of this place neg
15 What a great holiday pos
16 That is a bad locality to stay neg
17 We will have good fun tomorrow pos
18 I went to my enemy's house today neg

Output:

10
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

The dimensions of the dataset (18, 2)

0 I love this sandwich
1 This is an amazing place
2 I feel very good about these beers
3 This is my best work
4 What an awesome view
5 I do not like this restaurant
6 I am tired of this stuff
7 I can't deal with this
8 He is my sworn enemy
9 My boss is horrible
10 This is an awesome place
11 I do not like the taste of this juice
12 I love to dance
13 I am sick and tired of this place
14 What a great holiday
15 That is a bad locality to stay
16 We will have good fun tomorrow
17 I went to my enemy's house today
Name: message, dtype: object
01
11
21
31
41
50
60
70
80
90
10 1
11 0
12 1
13 0
11
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

14 1
15 0
16 1
17 0
Name: labelnum, dtype: int64
The total number of Training Data: (13,)
The total number of Test Data: (5,)

The words or Tokens in the text documents

['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'can', 'deal', 'do', 'enemy', 'feel', 'fun',
'good', 'great', 'have', 'he', 'holiday', 'house', 'is', 'like', 'love', 'my', 'not', 'of', 'place', 'restaurant',
'sandwich', 'sick', 'sworn', 'these', 'this', 'tired', 'to', 'today', 'tomorrow', 'very', 'view', 'we', 'went',
'what', 'will', 'with', 'work']

Accuracy of the classifier is 0.8

Confusion matrix
[[2 1]
[0 2]]
The value of Precision 0.6666666666666666
The value of Recall 1.0

12
Laboratory File

Mastering Arabic 1 Jane Wightwick download pdf
100% (1)
Mastering Arabic 1 Jane Wightwick download pdf
65 pages
Homework3 Sol
No ratings yet
Homework3 Sol
5 pages
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Rapid Development PDF
100% (2)
Rapid Development PDF
66 pages
Mllabprog 5
No ratings yet
Mllabprog 5
6 pages
Practical-3 Ritesh
No ratings yet
Practical-3 Ritesh
5 pages
Naive Bayes
No ratings yet
Naive Bayes
9 pages
Exp 3 Bi 30
No ratings yet
Exp 3 Bi 30
7 pages
Exp 3 Bi
No ratings yet
Exp 3 Bi
12 pages
Practical_3 (2)
No ratings yet
Practical_3 (2)
11 pages
3NaiveBayesModel
No ratings yet
3NaiveBayesModel
3 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
Ex 6
No ratings yet
Ex 6
2 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
Assignment - 01
No ratings yet
Assignment - 01
4 pages
Department of Computer Engineering: Experiment No.6
No ratings yet
Department of Computer Engineering: Experiment No.6
5 pages
Department of Computer Engineering Academic Term: June-Nov 2021
No ratings yet
Department of Computer Engineering Academic Term: June-Nov 2021
6 pages
Naive Biase
No ratings yet
Naive Biase
6 pages
Naive Bates Classifier
No ratings yet
Naive Bates Classifier
18 pages
Naive Bayes Classifiers - Parta
No ratings yet
Naive Bayes Classifiers - Parta
17 pages
Ai 5
No ratings yet
Ai 5
7 pages
Unit 2 AAM
No ratings yet
Unit 2 AAM
32 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
Naive Bayes Classifier in Machine Learning
No ratings yet
Naive Bayes Classifier in Machine Learning
16 pages
School of Engineering: Lab Manual On Machine Learning Lab
No ratings yet
School of Engineering: Lab Manual On Machine Learning Lab
23 pages
Naive Bayes Algorithm With Classification Example 1697128543
No ratings yet
Naive Bayes Algorithm With Classification Example 1697128543
16 pages
Lab Manual ML
No ratings yet
Lab Manual ML
28 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
naivebayes labprg2
No ratings yet
naivebayes labprg2
3 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
Naive Bayes
No ratings yet
Naive Bayes
11 pages
AI and ML Lab Manual
No ratings yet
AI and ML Lab Manual
29 pages
lecture3-linear-classifiers
No ratings yet
lecture3-linear-classifiers
36 pages
ML Lab
No ratings yet
ML Lab
7 pages
AIML 3.2.dpk
No ratings yet
AIML 3.2.dpk
3 pages
Naive
No ratings yet
Naive
5 pages
Remaining ML Program
No ratings yet
Remaining ML Program
12 pages
14 Supervised Machine Learning
No ratings yet
14 Supervised Machine Learning
94 pages
6TH-P6
No ratings yet
6TH-P6
3 pages
Lecture 2 - Principle of Machine Learning
No ratings yet
Lecture 2 - Principle of Machine Learning
39 pages
ML Assignment-7
No ratings yet
ML Assignment-7
3 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
NBayes Log Reg
No ratings yet
NBayes Log Reg
18 pages
Wa0001
No ratings yet
Wa0001
39 pages
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
No ratings yet
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression
17 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
ML Lab Manual - Ex No. 1 To 9
No ratings yet
ML Lab Manual - Ex No. 1 To 9
26 pages
Machine Learning Lab New
No ratings yet
Machine Learning Lab New
14 pages
L10-Naive Bayes Continuous
No ratings yet
L10-Naive Bayes Continuous
16 pages
07 Naive - Bayes
No ratings yet
07 Naive - Bayes
7 pages
Machine Ass
No ratings yet
Machine Ass
33 pages
2 Classification
No ratings yet
2 Classification
38 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
46 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
Machine Learning 10-601: Today: - Bayes Classifiers - Conditional Independence - Naïve Bayes Readings
No ratings yet
Machine Learning 10-601: Today: - Bayes Classifiers - Conditional Independence - Naïve Bayes Readings
51 pages
CSL0777 L24
No ratings yet
CSL0777 L24
38 pages
AD3461-Machine Learning Lab Manual
No ratings yet
AD3461-Machine Learning Lab Manual
26 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Naive Bayes Numericals
No ratings yet
Naive Bayes Numericals
9 pages
Core Concepts in Real Analysis
From Everand
Core Concepts in Real Analysis
Roshan Trivedi
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
DC Machines and Alternators Lab: Course Outcomes
No ratings yet
DC Machines and Alternators Lab: Course Outcomes
13 pages
Data Collection Methods
No ratings yet
Data Collection Methods
13 pages
Cultural Studies Review 1
No ratings yet
Cultural Studies Review 1
10 pages
Disciplines and Ideas in Applied Social Sciences Diass Nat Reviewer
No ratings yet
Disciplines and Ideas in Applied Social Sciences Diass Nat Reviewer
5 pages
LEARNING TO READ WITH MONTESSORI METHOD
No ratings yet
LEARNING TO READ WITH MONTESSORI METHOD
8 pages
A Detailed Lesson Plan in Science Parts of The Flower
100% (4)
A Detailed Lesson Plan in Science Parts of The Flower
4 pages
Perincian Pend Sivik SR (Bi)
No ratings yet
Perincian Pend Sivik SR (Bi)
46 pages
Week 4 - Lesson 3 - Functions of Art
No ratings yet
Week 4 - Lesson 3 - Functions of Art
20 pages
Measuring Values With The Short Schwartz's Value Survey
100% (1)
Measuring Values With The Short Schwartz's Value Survey
10 pages
Planning Grid Graphics by Sean Merrins Eamonn Fitzpatrick
No ratings yet
Planning Grid Graphics by Sean Merrins Eamonn Fitzpatrick
7 pages
Tired of Not Getting Band 4 On TKT?
100% (1)
Tired of Not Getting Band 4 On TKT?
118 pages
Tbab Trading PLC
No ratings yet
Tbab Trading PLC
2 pages
Department of Education: Sablayan National Comprehensive High School
No ratings yet
Department of Education: Sablayan National Comprehensive High School
5 pages
Intro To Poetry Lesson Plan
No ratings yet
Intro To Poetry Lesson Plan
3 pages
Elementary Personality Vocabulary 2020 Student
No ratings yet
Elementary Personality Vocabulary 2020 Student
1 page
SOS India Annual Report 2016 17 Final
No ratings yet
SOS India Annual Report 2016 17 Final
44 pages
Overall Report of The Internship
No ratings yet
Overall Report of The Internship
11 pages
RAKSHU PLANT IN_103956
No ratings yet
RAKSHU PLANT IN_103956
10 pages
Marking Scheme (SET 1) Section A Question 1 (10 Marks)
No ratings yet
Marking Scheme (SET 1) Section A Question 1 (10 Marks)
12 pages
Sample CV Resume For Fresh Grad
No ratings yet
Sample CV Resume For Fresh Grad
3 pages
Seminar Report (ISp)
No ratings yet
Seminar Report (ISp)
21 pages
Download Full The Cambridge Companion to Thomas Reid Terence Cuneo PDF All Chapters
100% (9)
Download Full The Cambridge Companion to Thomas Reid Terence Cuneo PDF All Chapters
67 pages
Conjunction S
100% (2)
Conjunction S
4 pages
Bonafide Certificate: Signature Signature
No ratings yet
Bonafide Certificate: Signature Signature
8 pages
MCQs QAs 2
No ratings yet
MCQs QAs 2
5 pages
Preschooler Growth and Developmental Milestones Nursing Review
100% (1)
Preschooler Growth and Developmental Milestones Nursing Review
6 pages
Data Science by BITSPilani
No ratings yet
Data Science by BITSPilani
27 pages
Ultimate TOEFL Vocabulary List - The 327 Best Words To Know - PrepScholar TOEFL
100% (1)
Ultimate TOEFL Vocabulary List - The 327 Best Words To Know - PrepScholar TOEFL
12 pages