MLlab Manual LIET
MLlab Manual LIET
Lab Manual
Machine Learning
1
2
1. Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a
.CSV file.
FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
Training Examples:
3
Program:
4
Data Set:
Output:
are : 4
The hypothesis for the training instance 1 is : ['sunny', 'warm', 'normal', 'strong',
'warm', 'same']
The hypothesis for the training instance 2 is : ['sunny', 'warm', '?', 'strong', 'warm',
'same']
The hypothesis for the training instance 3 is : ['sunny', 'warm', '?', 'strong', 'warm',
'same']
1
2. For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.
• If d is a negative example
• Remove from S any hypothesis inconsistent with d
• For each hypothesis g in G that is not consistent with d
• Remove g from G
• Add to G all minimal specializations h of g such that
• h is consistent with d, and some member of S is more specific than h
• Remove from G any hypothesis that is less general than another hypothesis in G
Training Examples:
2
Program:
3
Data Set:
Output:
Final Specific_h:
['sunny' 'warm' '?' 'strong' '?' '?']
Final General_h:
[['sunny', '?', '?', '?', '?', '?'],
['?', 'warm', '?', '?', '?', '?']]
1
3. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
ID3 Algorithm
Examples are the training examples. Target_attribute is the attribute whose value is to be
predicted by the tree. Attributes is a list of other attributes that may be tested by the
learned decision tree. Returns a decision tree that correctly classifies the given Examples.
Otherwise Begin
A ← the attribute from Attributes that best* classifies Examples
The decision attribute for Root ← A
For each possible value, vi, of A,
Add a new tree branch below Root, corresponding to the test A = vi
Let Examples vi, be the subset of Examples that have value vi for A
If Examples vi , is empty
Then below this new branch add a leaf node with label = most common
value of Target_attribute in Examples
Else below this new branch add the subtree
ID3(Examples vi, Targe_tattribute, Attributes – {A}))
End
Return Root
2
ENTROPY:
Entropy measures the impurity of a collection of examples.
INFORMATION GAIN:
Training Dataset:
Test Dataset:
3
Program:
import math
import csv
def load_csv(filename):
lines=csv.reader(open(filename,"r"));
dataset = list(lines) headers =
dataset.pop(0)return dataset,headers
class Node:
def init (self,attribute):
self.attribute=attribute
self.children=[] self.answer=""
def subtables(data,col,delete):
dic={}
coldata=[row[col] for row in data]
attr=list(set(coldata))
counts=[0]*len(attr)
r=len(data) c=len(data[0])
for x in range(len(attr)):for y in
range(r):
if data[y][col]==attr[x]:counts[x]+=1
for x in range(len(attr)):
dic[attr[x]]=[[0 for i in range(c)] for j inrange(counts[x])]
pos=0
for y in range(r):
if data[y][col]==attr[x]:if delete:
del data[y][col]
dic[attr[x]][pos]=data[y] pos+=1
return attr,dic
4
def entropy(S):
attr=list(set(S))
if len(attr)==1:return 0
counts=[0,0]
for i in range(2):
counts[i]=sum([1 for x in S if attr[i]==x])/(len(S)*1.0)
sums=0
for cnt in counts:
sums+=-1*cnt*math.log(cnt,2) return sums
def compute_gain(data,col):
attr,dic = subtables(data,col,delete=False)
total_size=len(data)
entropies=[0]*len(attr)
ratio=[0]*len(attr)
def build_tree(data,features):
lastcol=[row[-1] for row in data] if(len(set(lastcol)))==1:
node=Node("")
node.answer=lastcol[0]return
node
n=len(data[0])-1
gains=[0]*n
for col in range(n): gains[col]=compute_gain(data,col)
split=gains.index(max(gains))
node=Node(features[split])
fea = features[:split]+features[split+1:] attr,dic=subtables(data,split,delete=True)
5
for x in range(len(attr)): child=build_tree(dic[attr[x]],fea)
node.children.append((attr[x],child))
return node
def print_tree(node,level):
if node.answer!="":
print(" "*level,node.answer)return
def classify(node,x_test,features):
if node.answer!="":
print(node.answer)return
pos=features.index(node.attribute) for value, n in
node.children:
if x_test[pos]==value: classify(n,x_test,features)
'''Main program'''
dataset,features=load_csv("data3.csv") node1=build_tree(dataset,features)
6
Output:
Outlook
rain
Wind
strong
no
weak
yes
overcast
yes
7
sunny
Humidity
normal
yes
high
no
1
4. Build an Artificial Neural Network by implementing the Backpropagation algorithm and
test the same using appropriate data sets.
BACKPROPAGATION Algorithm
Create a feed-forward network with ni inputs, nhidden hidden units, and nout output
units.
Initialize all network weights to small random numbers
Until the termination condition is met, Do
2
Training Examples:
Expected % in
Example Sleep Study
Exams
1 2 9 92
2 1 5 86
3 3 6 89
Program:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinallyy = y/100
#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))
#Variable initialization
epoch=5000 #Setting training iterations
lr=0.1 #Setting learning rate
3
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neur ons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neuron s))
bout=np.random.uniform(size=(1,output_neurons))
#Forward Propogation
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
wh += X.T.dot(d_hiddenlayer) *lr
4
Machine Learning Laboratory 15CSL76
Output:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.89726759]
[0.87196896]
[0.9000671]]
1
Machine Learning Laboratory 15CSL76
5. Write a program to implement the naïve Bayesian classifier for a sample training data set
storedas a .CSV file. Compute the accuracy of the classifier, considering few test data
sets.
Where,
P(h|D) is the probability of hypothesis h given the data D. This is called the posterior
probability.
P(D|h) is the probability of data d given that the hypothesis h was true.
P(h) is the probability of hypothesis h being true. This is called the prior probability of h.
P(D) is the probability of the data. This is called the prior probability of D
After calculating the posterior probability for a number of different hypotheses h, and is
interested in finding the most probable hypothesis h ∈ H given the observed data D. Any such
maximally probable hypothesis is called a maximum a posteriori (MAP) hypothesis.
Bayes theorem to calculate the posterior probability of each candidate hypothesis is hMAP is a
MAP hypothesis provided
2
Machine Learning Laboratory 15CSL76
A Gaussian Naive Bayes algorithm is a special type of Naïve Bayes algorithm. It’s specifically
used when the features have continuous values. It’s also assumed that all the features are
following a Gaussian distribution i.e., normal distribution
This means that in addition to the probabilities for each class, we must also store the mean and
standard deviations for each input variable for each class.
3
Machine Learning Laboratory 15CSL76
Examples:
The data set used in this program is the Pima Indians Diabetes problem.
This data set is comprised of 768 observations of medical details for Pima Indians
patents. The records describe instantaneous measurements taken from the patient such
as their age, the number of times pregnant and blood workup. All patients are women
aged 21 or older. All attributes are numeric, and their units vary from attribute to
attribute.
The attributes are Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI,
DiabeticPedigreeFunction, Age, Outcome
Each record has a class value that indicates whether the patient suffered an onset of
diabetes within 5 years of when the measurements were taken (1) or not (0)
Sample Examples:
Examples Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diabetic Age Outcome
Pedigree
Function
1 6 148 72 35 0 33.6 0.627 50 1
2 1 85 66 29 0 26.6 0.351 31 0
3 8 183 64 0 0 23.3 0.672 32 1
4 1 89 66 23 94 28.1 0.167 21 0
5 0 137 40 35 168 43.1 2.288 33 1
6 5 116 74 0 0 25.6 0.201 30 0
7 3 78 50 32 88 31 0.248 26 1
8 10 115 0 0 0 35.3 0.134 29 0
9 2 197 70 45 543 30.5 0.158 53 1
10 8 125 96 0 0 0 0.232 54 1
4
Machine Learning Laboratory 15CSL76
Program:
def loadcsv(filename):
lines = csv.reader(open(filename, "r"));dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]
return dataset
#generate indices for the dataset list randomly to pick ele for
training data
while len(trainset) < trainsize:
index = random.randrange(len(copy));
trainset.append(copy.pop(index))
return [trainset, copy]
def separatebyclass(dataset):
separated = {} #dictionary of classes 1 and 0
#creates a dictionary of classes 1 and 0 where the values are
#the instances belonging to each class
for i in range(len(dataset)):vector =
dataset[i]
if (vector[-1] not in separated):separated[vector[-
1]] = []
separated[vector[-1]].append(vector)return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in
numbers])/float(len(numbers)-1)
return math.sqrt(variance)
5
Machine Learning Laboratory 15CSL76
def summarizebyclass(dataset):
separated = separatebyclass(dataset);#print(separated)
summaries = {}
for classvalue, instances in separated.items():
#for key,value in dic.items()
#summaries is a dic of tuples(mean,std) for each class value
summaries[classvalue] = summarize(instances)#summarize is
used to cal to mean and std
return summaries
6
Machine Learning Laboratory 17CSL76
def main():
filename = 'naivedata.csv'splitratio =
0.67
dataset = loadcsv(filename);
Output:
7
Machine Learning Laboratory 17CSL76
6. Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision, and recall for your data set.
LEARN_NAIVE_BAYES_TEXT (Examples, V)
Examples is a set of text documents along with their target values. V is the set of all possible
target values. This function learns the probability terms P(wk |vj,), describing the probability
that a randomly drawn word from a document in class vj will be the English word wk. It also
learns the class prior probabilities P(vj).
1. collect all words, punctuation, and other tokens that occur in Examples
Vocabulary ← c the set of all distinct words and other tokens occurring in any text
document from Examples
CLASSIFY_NAIVE_BAYES_TEXT (Doc)
Return the estimated target value for the document Doc. ai denotes the word found in the ith
position within Doc.
positions ← all word positions in Doc that contain tokens found in Vocabulary
Return VNB, where
Data set:
8
Machine Learning Laboratory 17CSL76
9
Machine Learning Laboratory 17CSL76
Program:
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
print ('\n The total number of Training Data :',ytrain.shape) print ('\n The total number of
Test Data :',ytest.shape)
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_fe ature_names())
10
Machine Learning Laboratory 17CSL76
Output:
11
Machine Learning Laboratory 17CSL76
Basic knowledge
12
Machine Learning Laboratory 17CSL76
Confusion Matrix
True positives: data points labelled as positive that are actually positive
False positives: data points labelled as positive that are actually negative
True negatives: data points labelled as negative that are actually negative
False negatives: data points labelled as negative that are actually positive
13
Machine Learning Laboratory 17CSL76
Example:
Unique word
< I, loved, the, movie, hated, a, great, good, poor, acting>
Doc I loved the movie hated a great good poor acting Class
1 1 1 1 1 +
2 1 1 1 1 -
3 2 1 1 1 +
4 1 1 -
5 1 1 1 1 +
Doc I loved the movie hated a great good poor acting Class
1 1 1 1 1 +
3 2 1 1 1 +
5 1 1 1 1 +
3
𝑃(+) = = 0.6
5
15
Machine Learning Laboratory 17CSL76
1 +1 1+1
𝑃(𝐼 |+) = = 0.0833 𝑃(𝑎 |+) = = 0.0833
14 + 10 14 + 10
1+1 2+1
𝑃(𝑙𝑜𝑣𝑒𝑑 |+) = = 0.0833 𝑃(𝑔𝑟𝑒𝑎𝑡 |+) = = 0.125
14 + 10 14 + 10
1+1 2+1
𝑃(𝑡ℎ𝑒 |+) = = 0.0833 𝑃(𝑔𝑜𝑜𝑑 |+) = = 0.125
14 + 10 14 + 10
4+1 0+1
𝑃(𝑚𝑜𝑣𝑖𝑒 |+) = = 0.2083 𝑃(𝑝𝑜𝑜𝑟 |+) = = 0.0416
14 + 10 14 + 10
0+1 1+1
𝑃(ℎ𝑎𝑡𝑒𝑑 |+) = = 0.0416 𝑃(𝑎𝑐𝑡𝑖𝑛𝑔 |+) = = 0.0833
14 + 10 14 + 10
Doc I loved the movie hated a great good poor acting Class
2 1 1 1 1 -
4 1 1 -
2
𝑃(−) = = 0.4
5
1+1 0+1
𝑃(𝐼 |−) = = 0.125 𝑃(𝑎 |−) = = 0.0625
6 + 10 6 + 10
0+1 0+1
𝑃(𝑙𝑜𝑣𝑒𝑑 |−) = = 0.0625 𝑃(𝑔𝑟𝑒𝑎𝑡 |−) = = 0.0625
6 + 10 6 + 10
1+1 0+1
𝑃(𝑡ℎ𝑒 |−) = = 0.125 𝑃(𝑔𝑜𝑜𝑑 |−) = = 0.0625
6 + 10 6 + 10
1+1 1+1
𝑃(𝑚𝑜𝑣𝑖𝑒|−) = = 0.125 𝑃(𝑝𝑜𝑜𝑟|−) = = 0.125
6 + 10 6 + 10
1+1 1+1
𝑃(ℎ𝑎𝑡𝑒𝑑 |−) = = 0.125 𝑃(𝑎𝑐𝑡𝑖𝑛𝑔|−) = = 0.125
6 + 10 6 + 10
9
16
Let’s classify the new document
10
17
7. Write a program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease Data
Set. You can use Java/Python ML library classes/API
Theory
A Bayesian network is a directed acyclic graph in which each edge corresponds to a conditional
dependency, and each node corresponds to a unique random variable.
Bayesian network consists of two major parts: a directed acyclic graph and a set of conditional
probability distributions
The directed acyclic graph is a set of random variables represented by nodes.
The conditional probability distribution of a node (random variable) is defined for every
possible outcome of the preceding causal node(s).
For illustration, consider the following example. Suppose we attempt to turn on our computer,
but the computer does not start (observation/evidence). We would like to know which of the
possible causes of computer failure is more likely. In this simplified illustration, we assume
only two possible causes of this misfortune: electricity failure and computer malfunction.
The corresponding directed acyclic graph is depicted in below figure.
Fig: Directed acyclic graph representing two independent possible causes of a computer failure.
The goal is to calculate the posterior conditional probability distribution of each of the possible
unobserved causes given the observed evidence, i.e. P [Cause | Evidence].
Data Set:
Title: Heart Disease Databases
The Cleveland
18 database contains 76 attributes, but all published experiments refer to using a
subset of 14 of them. In particular, the Cleveland database is the only one that has been used
by ML researchers to this date. The "Heartdisease" field refers to the presence of heart disease
in the patient. It is integer valued from 0 (no presence) to 4.
Database: 0 1 2 3 4 Total
Cleveland: 164 55 36 35 13 303
Attribute Information:
1. age: age in years
2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
Value 1: typical angina
Value 2: atypical angina
Value 3: non-anginal pain
Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
Value 0: normal
Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation
or depression of > 0.05 mV)
Value 2: showing probable or definite left ventricular hypertrophy by Estes'
criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
Value 1: upsloping
Value 2: flat
Value 3: downsloping
12. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
13. Heartdisease: It is integer valued from 0 (no presence) to 4.
2
Machine Learning Laboratory 17CSL76
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal Heartdisease
63 1 1 145 233 1 2 150 0 2.3 3 0 6 0
19
67 1 4 160 286 0 2 108 1 1.5 2 3 3 2
67 1 4 120 229 0 2 129 1 2.6 2 2 7 1
41 0 2 130 204 0 2 172 0 1.4 1 0 3 0
62 0 4 140 268 0 2 160 0 3.6 3 2 3 3
60 1 4 130 206 0 2 132 1 2.4 2 2 7 4
Program:
20
Machine Learning Laboratory 17CSL76
21
Machine Learning Laboratory 17CSL76
Output:
22
Machine Learning Laboratory 15CSL76
9. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.
Training algorithm:
For each training example (x, f (x)), add the example to the list training examples
Classification algorithm:
Given a query instance xq to be classified,
Let x1 . . .xk denote the k instances from training examples that are nearest to xq
Return
Where, f(xi) function to calculate the mean value of the k nearest training
examples.
Data Set:
Iris Plants Dataset: Dataset contains 150 instances (50 in each of three classes)
Number of Attributes: 4 numeric, predictive attributes and the Class
1
Machine Learning Laboratory 15CSL76
Program:
""" Iris Plants Dataset, dataset contains 150 (50 in each of three
classes)Number of Attributes: 4 numeric, predictive attributes and
the Class
"""
iris=datasets.load_iris()
""" The x variable contains the first four columns of the dataset
(i.e. attributes) while y contains the labels.
"""
x = iris.datay =
iris.target
""" Splits the dataset into 70% train data and 30% test data. This
means that out of total 150 records, the training set will contain
105 records and the test set contains 45 of those records
"""
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3)
2
Machine Learning Laboratory 15CSL76
Output:
Confusion Matrix
[[20 0 0]
[ 0 10 0]
[ 0 1 14]]
Accuracy Metrics
3
Machine Learning Laboratory 15CSL76
Basic knowledge
Confusion Matrix
True positives: data points labelled as positive that are actually positive
False positives: data points labelled as positive that are actually negative
True negatives: data points labelled as negative that are actually negative
False negatives: data points labelled as negative that are actually positive
F1-Score:
4
Machine Learning Laboratory
15CSL7
Example:
10. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
1
Machine Learning Laboratory
15CSL7
Regression:
Regression is a technique from statistics that is used to predict values of a desired
target quantity when the target quantity is continuous.
In regression, we seek to identify (or estimate) a continuous variable y associated with
a given input vector x.
y is called the dependent variable.
x is called the independent variable.
Loess/Lowess Regression:
Loess regression is a nonparametric technique that uses local weighted regression to fit a
smooth curve through points in a scatter plot.
2
Machine Learning Laboratory
15CSL7
Lowess Algorithm:
Locally weighted regression is a very powerful nonparametric model used in statistical
learning.
Given a dataset X, y, we attempt to find a model parameter β(x) that minimizes
residual sum of weighted squared errors.
The weights are given by a kernel function (k or w) which can be chosen arbitrarily
Algorithm
1. Read the Given data Sample to X and the curve (linear or non linear) to Y
2. Set the value for Smoothening parameter or Free parameter say τ
3. Set the bias /Point of interest set x0 which is a subset of X
4. Determine the weight matrix using :
6. Prediction = x0*β:
Program
import numpy as np
from bokeh.plotting import figure, show, output_notebookfrom bokeh.layouts
import gridplot
from bokeh.io import push_notebook
3
Machine Learning Laboratory
15CSL7
# predict value
return x0 @ beta # @ Matrix Multiplication or Dot Productfor prediction
def radial_kernel(x0, X, tau):
return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau *tau))
# Weight or Radial Kernal Bias Function
n = 1000
# generate dataset
X = np.linspace(-3, 3, num=n)
print("The Data Set ( 10 Samples) X :\n",X[1:10])Y = np.log(np.abs(X
** 2 - 1) + .5)
print("The Fitting Curve Data Set (10 Samples) Y
:\n",Y[1:10])
# jitter X
X += np.random.normal(scale=.1, size=n) print("Normalised (10
Samples) X :\n",X[1:10])
show(gridplot([
[plot_lwr(10.), plot_lwr(1.)],
[plot_lwr(0.1), plot_lwr(0.01)]]))
4
Machine Learning Laboratory
15CSL7
Output
5
Machine Learning Laboratory
15CSL7
def localWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W
def localWeightRegression(xmat,ymat,k):
m,n = np1.shape(xmat)
6
Machine Learning Laboratory
15CSL7
ypred =
np1.zeros(m)for i
in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip,
7
Machine Learning Laboratory
15CSL7
color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show();