ML Manual
ML Manual
Date:
AIM:
To implement Candidate Elimination Algorithm using python script.
ALGORITHM:
Step 1: Initialize the version space.
➢ Initialize the most general hypothesis (h_G) to the maximally general hypothesis (all
attributes set to '?').
➢ Initialize the most specific hypothesis (h_S) to the maximally specific hypothesis (all
attributes set to specific values or 'null' if not possible).
Step 2: Iterate through the training examples.
➢ For each positive example, update h_G and h_S as follows:
➢ For each attribute that does not match the positive example, make it more specific in h_G.
➢ For each attribute that matches the positive example, make it more specific in h_S.
➢ For each negative example, update h_G and h_S as follows:
➢ For each attribute that does not match the negative example, make it more specific in h_S.
➢ For each attribute that matches the negative example, make it more specific in h_G.
Step 3: Refine the version space.
➢ Remove any hypothesis from the version space that is more general than another
hypothesis or more specific than another hypothesis.
Step 4: Repeat Steps 2 and 3 until convergence.
Keep iterating through the training examples and refining the version space until it becomes
consistent, i.e., contains only one specific hypothesis that correctly classifies all the training
examples.
Step 5: Output the final hypothesis.
PROGRAM:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('finds1.csv'))
concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-1])
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
1
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "No":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm",i+1)
print("Specific_h ",i+1,"\n ")
print(specific_h)
print("general_h ", i+1, "\n ")
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_hs_final, g_final = learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")
2
OUTPUT:
initialization of specific_h and general_h
['Cloudy' 'Cold' 'High' 'Strong' 'Warm' 'Change']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '? '], ['?', '?',
'?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
steps of Candidate Elimination Algorithm 8
Specific_h 8
['?' '?' '?' 'Strong' '?' '?']
general_h 8
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', 'Strong', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific_h:
['?' '?' '?' 'Strong' '?' '?']
Final General_h:
[['?', '?', '?', 'Strong', '?', '?']]
RESULT:
Thus the implementation of candidate - Elimination algorithm has been implemented
successfully
3
Ex.No: 2 IMPLEMENTATION OF DECISION TREE BASED ID3 ALGORITHM
Date:
AIM:
To implement Decision Tree Based ID3 Algorithm using python script.
ALGORITHM:
Step 1: Start the program
Step 2: Load the dataset and organize it into a table, with rows representing instances and
columns representing features. The last column should contain the class labels.
Step 3: Define a function to calculate the entropy of the dataset. Entropy measures the
uncertainty in the dataset based on class distribution.
Step 4: For each feature, calculate the information gain. Information gain measures how much
a feature contributes to reducing the uncertainty in the dataset.
Step 5: Select the feature with the highest information gain as the best feature to split the
dataset.
Step 6: Divide the dataset into subsets based on the values of the best feature found in Step 4.
Step 7: Repeat Recursively
Step 8: Build the decision tree by assigning the best feature as the splitting criterion at
each internal node and the majority class as the class label for each leaf node.
Step 9: Use the created decision tree to classify new instances by traversing the tree from
the root to the appropriate leaf node based on their feature values.
Step 10: Evaluate the Model
Step 11: Stop the program
PROGRAM:
import pandas as pd
import numpy as np
dataset=pd.read_csv('playtennis.csv',names=['outlook','temperature','humidity','wind','class',])
def entropy(target_col):
elements,counts = np.unique(target_col,return_counts = True)
entropy=np.sum([(counts[i]/np.sum(counts))*np.log2(counts[i]/np.sum(counts)) for i in
range(len(elements))])
return entropy
def InfoGain(data,split_attribute_name,target_name="class"):
total_entropy = entropy(data[target_name])
4
vals,counts= np.unique(data[split_attribute_name],return_counts=True)
Weighted_Entropy=np.sum([counts[i]/np.sum(counts))*entropy(data.where(data[split_attrib
ute_name]==vals[i].dropna()[target_name]) for i in range(len(vals))])
Information_Gain = total_entropy - Weighted_Entropy
return Information_Gain
def ID3(data,originaldata,features,target_attribute_name="class",
parent_node_class = None):
if len(np.unique(data[target_attribute_name])) <= 1:
return np.unique(data[target_attribute_name])[0]
elif len(data)==0:
returnnp.unique(originaldata[target_attribute_name])
[np.argmax(np.uniqe(originaldata[target_attribute_name],return_counts=True)[1])]
elif len(features) ==0:
return parent_node_class
else:
parent_node_classnp.unique(data[target_attribute_name])
[np.argmax(np.unique(data[target_attribute_name],return_counts=True)[1])]
item_values = [InfoGain(data,feature,target_attribute_name) for feature in features]
#Return the information gain values for the features in the dataset
best_feature_index = np.argmax(item_values)
best_feature = features[best_feature_index]
tree = {best_feature:{}}
features = [i for i in features if i != best_feature]
for value in np.unique(data[best_feature]):
value = value
sub_data = data.where(data[best_feature] == value).dropna()
subtree=ID3(sub_data,dataset,features,target_attribute_name,parent_node_class)
tree[best_feature][value] = subtree
return(tree)
tree = ID3(dataset,dataset,dataset.columns[:-1])
print(' \nDisplay Tree\n',tree)
OUTPUT:
Display Tree
{'outlook': {'Overcast': 'Yes', 'Rain': {'wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny':
5
{'humidity': {'High': 'No', 'Normal': 'Yes'}}}}
RESULT:
Thus, the implementation candidate - Elimination algorithm has been implemented
successfully
6
EX NO.3 IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK USING
BACK PROPAGATION ALGORITHM
Date:
AIM:
To implement Artificial Neural Network using back Propagation Algorithm using
python script.
ALGORITHM:
Step 1: Inputs X, arrive through the preconnected path.
Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer.
Step 4: Calculate the error in the outputs
Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce the
error.
Step 6: Repeat the process until the desired output is achieved.
PROGRAM:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally y = y/100
#Sigmoid Function
def sigmoid (x):
return (1/(1 + np.exp(-x)))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
7
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
# draws a random range of numbers uniformly of dim x*y
#Forward Propagation
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
#how much hidden layer wts contributed to error
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr
# dotproduct of nextlayererror and currentlayerop
bout += np.sum(d_output, axis=0,keepdims=True) *lr
wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
8
OUTPUT:
Input:
[[ 0.66666667 1. ]
[ 0.33333333 0.55555556]
[ 1. 0.66666667]]
Actual Output:
[[ 0.92]
[ 0.86]
[ 0.89]]
Predicted Output:
[[ 0.89559591]
[ 0.88142069]
[ 0.8928407 ]]
RESULT:
Thus, the implementation of back propagation algorithm has been done successfully.
9
EX.NO 4: IMPLEMENTATION OF NAIVE BAYESIAN CLASSIFIER
Date:
AIM:
To implement Naïve Bayesian Classifier using python script.
ALGORITHM:
Step 1: Data Pre-processing step
Step 2: Fitting Naive Bayes to the Training set
Step 3: Predicting the test result
Step 4: Test accuracy of the result(Creation of Confusion matrix)
Step 5: Visualizing the test set result.
PROGRAM:
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)
print(ytest.shape)
print(ytrain.shape)
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
from sklearn import metrics
print('Accuracy metrics')
10
print('Accuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))
OUTPUT:
The dimensions of the dataset (18, 2)
0 I love this sandwich
1 This is an amazing place
2 I feel very good about these beers
3 This is my best work
4 What an awesome view
5 I do not like this restaurant
6 I am tired of this stuff
7 I can't deal with this
8 He is my sworn enemy
9 My boss is horrible
10 This is an awesome place
11 I do not like the taste of this juice
12 I love to dance
13 I am sick and tired of this place
14 What a great holiday
15 That is a bad locality to stay
16 We will have good fun tomorrow
17 I went to my enemy's house today
Name: message, dtype: object
01
11
21
31
41
50
11
60
70
80
90
10 1
11 0
12 1
13 0
14 1
15 0
16 1
17 0
Name: labelnum, dtype: int64
(5,)
(13,)
(5,)
(13,)
Accuracy metrics
Accuracy of the classifer is 0.8
Confusion matrix
[[3 1]
[0 1]]
Recall and Precison
1.0
0.5
RESULT:
Thus, the implementation of Naive Bayesian Classifier algorithm has been done successfully.
12
EX NO 5: IMPLEMENTATION OF NAIVE BAYESIAN CLASSIFIER MODEL
TO CLASSIFY A SET OF DOCUMENTS
Date:
AIM:
To implement the Naïve Bayesian Classifier Model to Classify the document set using python.
ALGORITHM:
Step 1: Input the total Number of Documents from the user.
Step 2: Input the text and class of Each document and split it into a List.
Step 3: Create a 2D array and append each document list into an array
Step 4: Using a Set data structure, store all the keywords in a list.
Step 5: Input the text to be classified by the user.
PROGRAM:
import csv
import random
import math
def loadCsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def splitDataset(dataset, splitRatio):
#67% training size
trainSize = int(len(dataset)* splitRatio);
trainSet = []
copy = list(dataset);
while len(trainSet) < trainSize:
#generate indices for the dataset list randomly to pick ele for training data
index = random.randrange(len(copy));
trainSet.append(copy.pop(index))
return [trainSet, copy]
def separateByClass(dataset):
separated = {}
13
#creates a dictionary of classes 1 and 0 where the values are the instacnes
belonging to each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)
def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)];
del summaries[-1]
return summaries
def summarizeByClass(dataset):
separated = separateByClass(dataset);
summaries = {}
for classValue, instances in separated.items():
#summaries is a dic of tuples(mean,std) for each class value
summaries[classValue] = summarize(instances)
return summaries
def calculateProbability(x, mean, stdev):
exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent
def calculateClassProbabilities(summaries, inputVector):
probabilities = {}
for classValue, classSummaries in summaries.items():#class and attribute information
as mean and sd
probabilities[classValue] = 1
for i in range(len(classSummaries)):
14
mean, stdev = classSummaries[i] #take mean and sd of every attribute
for class 0 and 1 seperaely
x = inputVector[i] #testvector's first attribute
probabilities[classValue] *= calculateProbability(x, mean, stdev);#use
normal dist
return probabilities
def predict(summaries, inputVector):
probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():#assigns that class which has he
highest prob
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel
def getPredictions(summaries, testSet):
predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions
def getAccuracy(testSet, predictions):
correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testSet))) * 100.0
def main():
filename = '5data.csv' splitRatio = 0.67
dataset = loadCsv(filename);
trainingSet, testSet = splitDataset(dataset, splitRatio)
print('Split {0} rows into train={1} and test={2} rows'.format(len(dataset),
len(trainingSet), len(testSet)))
# prepare model
15
summaries = summarizeByClass(trainingSet);
# test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)
print('Accuracy of the classifier is : {0}%'.format(accuracy))
main()
OUTPUT:
confusion matrix is as
follows [[17 0 0]
[ 0 17 0]
[ 0 0 11]]
Accuracy metrics
precision recall f1-score support
0 1.00 1.00 1.00 17
1 1.00 1.00 1.00 17
2 1.00 1.00 1.00 11
avg / total 1.00 1.00 1.00 45
RESULT:
Thus, the implementation of Naïve Bayesian Classifier model has been done successfully.
16
EX NO 6: CONSTRUCTING A BAYESIAN NETWORK TO DIAGNOSE AN
INFECTION USING WHO DATA SET.
Date:
AIM:
To write and execute a program to construct a Bayesian network considering medical data.
ALGORITHM:
Step 1: Open Java/Python ML library classes/API
Step 2: Bayesian network consists of two major parts: a directed acyclic graph and a set of
conditional probability distributions.
Step 3: The directed acyclic graph is a set of random variables represented by nodes
Step 4: The conditional probability distribution of a node (random variable) is defined for
every possible outcome of the preceding causal node(s).
Step 5: Data set is given as an input to the model.
Step 6: The model is used to demonstrate the diagnosis of heart patients using standard Heart
Disease Data Set.
PROGRAM
import pandas as pd
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
data = pd.read_csv("ds4.csv")
heart_disease = pd.DataFrame(data)
print(heart_disease)
model = BayesianModel([
('age', 'Lifestyle'),
('Gender', 'Lifestyle'),
('Family', 'heartdisease'),
('diet', 'cholestrol'),
('Lifestyle', 'diet'),
('cholestrol', 'heartdisease'),
('diet', 'cholestrol')
])
17
model.fit(heart_disease, estimator=MaximumLikelihoodEstimator)
HeartDisease_infer = VariableElimination(model)
q = HeartDisease_infer.query(variables=['heartdisease'], evidence={
'age': int(input('Enter Age: ')),
'Gender': int(input('Enter Gender: ')),
'Family': int(input('Enter Family History: ')),
'diet': int(input('Enter Diet: ')),
'Lifestyle': int(input('Enter Lifestyle: ')),
'cholestrol': int(input('Enter Cholestrol: '))
})
print(q)
18
OUTPUT:
19
+-----------------+---------------------+
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.5000 |
+-----------------+---------------------+
| heartdisease(1) | 0.5000 |
+-----------------+---------------------+
Finding Elimination Order: : : 0it [00:00, ?it/s]
0it [00:00, ?it/s]
RESULT:
To the implementation of a Bayesian Network to diagnose an infection with WHO dataset
has been done successfully
20
EX NO: 7 IMPLEMENTATION OF EM ALGORITHM TO CLUSTER A SET OF
DATA
Date:
AIM:
To implement EM algorithm to cluster a data set using python.
ALGORITHM:
Step 1: Identify the variable in which the set of attributes are specified in the data set
Step 2: Determine the domain of each variable to take from the set of values.
Step 3: Create a directed graph network or node where each node represents the attributes
and edges represents child relationship.
Step 4: Determine the prior and conditional probability for each attribute
Step 5: Perform the inference on the module and determine the marginal probability.
PROGRAM:
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture
import pandas as pd
X=pd.read_csv("kmeansdata.csv")
x1 = X['Distance_Feature'].values
x2 = X['Speeding_Feature'].values
X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)
plt.plot()
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()
#code for EM
gmm = GaussianMixture(n_components=3)
gmm.fit(X)
em_predictions = gmm.predict(X)
print("\nEM predictions")
print(em_predictions)
21
print("mean:\n",gmm.means_)
print('\n')
print("Covariances\n",gmm.covariances_)
print(X)
plt.title('Exceptation Maximum')
plt.scatter(X[:,0], X[:,1],c=em_predictions,s=50)
plt.show()
#code for Kmeans
import matplotlib.pyplot as plt1
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
print(kmeans.cluster_centers_)
print(kmeans.labels_)
plt.title('KMEANS')
plt1.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt1.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')
OUTPUT:
EM predictions
[0 0 0 1 0 1 1 1 2 1 2 2 1 1 2 1 2 1 0 1 0 1 1]
mean: [[57.70629058 25.73574491][52.12044022 22.46250453]
[46.4364858 39.43288647]]
Covariances [[[83.51878796 14.926902 ] [14.926902 2.70846907]] [[29.95910352
15.83416554]
[15.83416554 67.01175729]]
[[79.34811849 29.55835938] [29.55835938 18.17157304]]] [[71.24 28. ] [52.53 25. ] [64.54
27. ]
[55.69 22. ] [54.58 25. ] [41.91 10. ] [58.64 20. ] [52.02 8. ] [31.25 34. ] [44.31 19. ] [49.35 40.
]
[58.07 45. ] [44.22 22. ] [55.73 19. ] [46.63 43. ] [52.97 32. ] [46.25 35. ] [51.55 27. ] [57.05
26. ]
RESULT:
Thus the EM Algorithm to cluster a data set has been implemented successfully.
22
EX NO 8: IMPLEMENTATION OF K-NEAREST NEIGHBOUR ALGORITHM TO
CLASSIFY IRIS DATASET
Date:
AIM:
To implement the K-Nearest Neighbour Algorithm to classify the Dataset using python
ALGORITHM:
Step 1: Start the Program
Step 2: Importing the Modules.
Step 3: Creating dataset, scikit_learn has a lot of tools for creating synthetic datasets.
Step 4: Visualize the dataset
Step 5: Splitting data into training and testing dataset.
Step 6: Build a KNN classifier object for the implementation.
Step 7: Predictions for the KNN Classifier, then in the test set, we forecast the target
values and compare them to the actual values.
Step 8: Predict Accuracy for both K-values
Step 9: Visualize Predictions
Step 10: Stop the Program.
PROGRAM:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.datasets import load_iris
iris=load_iris()
iris.keys()
df=pd.DataFrame(iris['data'])
X=df
y=iris['target']
print(X.head())
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)
classifier = KNeighborsClassifier(n_neighbors=3).fit(Xtrain, ytrain)
ypred = classifier.predict(Xtest)
i=0
23
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
print ("-------------------------------------------------------------------------")
24
OUTPUT:
0123
5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
-------------------------------------------------------------------------
Original Label Predicted Label Correct/Wrong
0 2 Correct
1 1 Correct
2 2 Correct
3 0 Correct
0 0 Correct
1 1 Correct
2 2 Correct
2 2 Correct
0 0 Correct
0 0 Correct
0 0 Correct
1 1 Correct
2 2 Correct
1 1 Correct
1 1 Correct
Confusion Matrix:
[[5 0 0]
[0 5 0]
[0 0 5]]
precision recall f1-score support
0 1.00 1.00 1.00 5
1 1.00 1.00 1.00 5
2 1.00 1.00 1.00 5
accuracy 1.00 15
25
macro avg 1.00 1.00 1.00 15
weighted avg 1.00 1.00 1.00 15
-------------------------------------------------------------------------
Accuracy of the classifer is 1.00\n
-------------------------------------------------------------------------
RESULT:
Thus the K-Nearest Neighbour Algorithm to classify the data set using Python has been
implemented successfully.
26
EX NO 9: IMPLEMENTATION OF NON-PARAMETRIC LOCALLY
WEIGHTED REGRESSION ALGORITHM
Date:
AIM:
To implement the non-parametric locally weighted regression algorithm using python.
ALGORITHM:
Step 1: Fitting data points to Give n Polynomial of degree 1
import all the necessary library functions.
Step 2: Initialise the number of x points.
Step 3:Compute the residual of smoothing functions.
Step 4: Calculate the weight function.
Step 5: Set the value of k
Step 6: Initialise predictor and delta.
Step 7: Calculate the cumulative weights.
Step 8: Find the median of residuals.
Step 9: Calculate the sinusoidal fitting and Straight Line fitting.
PROGRAM:
from math import ceil
import numpy as np
from scipy import linalg
def lowess(x, y, f= 2. / 3., iter=3):
n = len(x) # Number of x points
r = int(ceil(f * n)) # Computing the residual of smoothing functions
h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)] #
w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0) # Weight Function
w = (1 - w ** 3) ** 3 # Tricube Weight Function
ypred = np.zeros(n) # Initialisation of predictor
delta = np.ones(n) # Initialisation of delta
27
A = np.array([[np.sum(weights), np.sum(weights * x)],
[np.sum(weights * x), np.sum(weights * x * x)]]) # Matrix A
beta = linalg.solve(A, b) # Beta,Solution of AX= B equation
ypred[i] = beta[0] + beta[1] * x[i]
residuals = y - ypred # Finding Residuals
s = np.median(np.abs(residuals)) # Median of Residuals
delta = np.clip(residuals / (6.0 * s), -1, 1) # Delta
delta = (1 - delta ** 2) ** 2 # Delta
return ypred
if __name__ == '__main__': # Main Function
import math
n = 100 # Number of data points
#Case1: Sinusoidal Fitting
x = np.linspace(0, 2 * math.pi, n)
print(x)
y = np.sin(x) + 0.3 * np.random.randn(n)
#Case2 : Straight Line Fitting
#x=np.linspace(0,2.5,n) # For Linear
#y= 1 + 0.25*np.random.randn(n) # For Linear
f = 0.25
ypred = lowess(x, y, f=f, iter=3)
import pylab as pl
pl.clf()
pl.plot(x, y, label='Y NOISY')
pl.plot(x, ypred, label='Y PREDICTED')
pl.legend()
pl.show()
28
OUTPUT:
[ 0. 0.06346652 0.12693304 0.19039955 0.25386607 0.31733259
0.38079911 0.44426563 0.50773215 0.57119866 0.63466518 0.6981317
0.76159822 0.82506474 0.88853126 0.95199777 1.01546429 1.07893081
1.14239733 1.20586385 1.26933037 1.33279688 1.3962634 1.45972992
1.52319644 1.58666296 1.65012947 1.71359599 1.77706251 1.84052903
1.90399555 1.96746207 2.03092858 2.0943951 2.15786162 2.22132814
2.28479466 2.34826118 2.41172769 2.47519421 2.53866073 2.60212725
2.66559377 2.72906028 2.7925268 2.85599332 2.91945984 2.98292636
3.04639288 3.10985939 3.17332591 3.23679243 3.30025895 3.36372547
3.42719199 3.4906585 3.55412502 3.61759154 3.68105806 3.74452458
3.8079911 3.87145761 3.93492413 3.99839065 4.06185717 4.12532369
4.1887902 4.25225672 4.31572324 4.37918976 4.44265628 4.5061228
4.56958931 4.63305583 4.69652235 4.75998887 4.82345539 4.88692191
4.95038842 5.01385494 5.07732146 5.14078798 5.2042545 5.26772102
5.33118753 5.39465405 5.45812057 5.52158709 5.58505361 5.64852012
5.71198664 5.77545316 5.83891968 5.9023862 5.96585272 6.02931923
6.09278575 6.15625227 6.21971879 6.28318531]
RESULT:
Thus the non-parametric locally weighted regression algorithm has been implemented
successfully.
29
EX NO 10: IMPLEMENTATION OF REGRESSION ALGORITHM
AIM:
To implement the Regression algorithm using python script.
ALGORITHM:
Step 1: import the libraries for matplot
Step 2: estimate the total number of points or observation
Step 3: calculate the mean of a and b vector
Step 4: calculate the cross deviation and deviation about a
Step 5: calculate the regression coefficients
Step 6: plot the actual points or observation as scatter plot
Step 7: calculate the predicted response vector
Step 8: plot the regression line
PROGRAM:
import numpy as nmp
import matplotlib.pyplot as mtplt
def estimate_coeff(p, q):
# Here, we will estimate the total number of points or observation
n1 = nmp.size(p)
# Now, we will calculate the mean of a and b vector
m_p = nmp.mean(p)
m_q = nmp.mean(q)
# here, we will calculate the cross deviation and deviation about a
SS_pq = nmp.sum(q * p) - n1 * m_q * m_p
SS_pp = nmp.sum(p * p) - n1 * m_p * m_p
# here, we will calculate the regression coefficients
b_1 = SS_pq / SS_pp
b_0 = m_q - b_1 * m_p
return (b_0, b_1)
def plot_regression_line(p, q, b):
# Now, we will plot the actual points or observation as scatter plot
mtplt.scatter(p, q, color = "m",
marker = "o", s = 30)
# here, we will calculate the predicted response vector
q_pred = b[0] + b[1] * p
30
# here, we will plot the regression line
mtplt.plot(p, q_pred, color = "g")
# here, we will put the labels
mtplt.xlabel('p')
mtplt.ylabel('q')
# here, we will define the function to show plot
mtplt.show()
def main():
# entering the observation points or data
p = np.array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
q = np.array([11, 13, 12, 15, 17, 18, 18, 19, 20, 22])
# now, we will estimate the coefficients
b = estimate_coeff(p, q)
print("Estimated coefficients are :\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# Now, we will plot the regression line
plot_regression_line(p, q, b)
if __name__ == "__main__":
main()
31
Output:
Estimated coefficients are :
b_0 = -0.4606060606060609
b_1 = 1.1696969696969697
RESULT:
Thus the Regression Algorithm using Python has been implemented successfully.
32
EX NO 11: IMPLEMENTATION OF FIND S ALGORITHM
AIM:
To implement the Find-S Algorithm using python script.
ALGORITHM:
Step 1: Initialize ‘h’ with the most specific hypothesis, generally, it is the first positive example
in the data set.
Step 2: Check for each positive example. If the example is negative, move on to the next
example but if it is a positive example consider it for the next step.
Step 3: Check if each attribute in the example is equal to the hypothesis value.
Step 4: If the value matches, then no changes are made.
Step 5: If the value does not match, the value is changed to ‘?’.
Step 6: Do this until we reach the last positive example in the data set.
PROGRAM:
import pandas as pd
import numpy as np
33
specific_hypothesis = c[i].copy()
break
return specific_hypothesis
OUTPUT:
RESULT:
Thus Find S Algorithm using Python has been implemented successfully.
34