ML RECORD - Merged
ML RECORD - Merged
REG No : ………………………….
Website : www.dnrcet.org
D.N.R. COLLEGE OF ENGINEERING AND TECHNOLOGY
(APPROVED BY AICTE, AFFILIATED TO JNTU KAKINADA & GOVT. OF A.P.)
CERTIFICATE
S Exp PAGE
NAME OF THE EXPERIMENT
No No NO
Experiment-1
Experiment-2
Experiment-3
Experiment-4
Experiment-5
Experiment-1
Exercises to solve the real-world problems using the following machine learning methods:
a) Linear Regression
b) Logistic Regression.
A)Linear Regression:
Source Code:
import numpy as np
import matplotlib.pyplot as plt
return(b_0, b_1)
# putting labels
plt.xlabel('x')
plt.ylabel('y')
def main():
# observations
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
if __name__ == "__main__":
main()
Output:
Graph:
B)Logistic Regression:
Data – User_Data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Step3: Build a data frame & Create the Model in Python (In this example
Logistic Regression)
dataset = pd.read_csv('...\\User_Data.csv')
# input
# output
y = dataset.iloc[:, 4].values
Splitting the dataset to train and test. 75% of data is used for training the model and 25% of it is used to
test the performance of our model.
from sklearn.cross_validation import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(
x, y, test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
xtrain = sc_x.fit_transform(xtrain)
xtest = sc_x.transform(xtest)
Step 5: Prediction with a New Set of Data and evaluate the accuracy
Confusion Matrix:
Output :
Output :
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
Output :
Experiment-2
Output:
After this, we will explore the data. Take a look at various values in the data-set. Check the target
variable, etc.
Explore Data
The shape means that this data-set has 569 rows and 30 columns.
1 print(cancer_data.data.shape)
2 #target set
3 print(cancer_data.target)
Output:
Splitting Data
We will divide the data-set into a training set and test set to get accurate results. After this, we will split
the data using the train_test_split() function. We will need 3 parameters like in the example below. The
features to train the model, the target, and the test set size.
from sklearn.model_selection import train_test_split
cancer_data = datasets.load_breast_cancer()
Output:
Output:
Experiment-3
Write a program to simulate a perception network for pattern classification and function
approximation.
Source Code:
# Python program to implement a
# single neuron neural network
# import all necessery libraries
from numpy import exp, array, random, dot, tanh
# Class to create a neural
# network with single neuron
class NeuralNetwork():
def __init__(self):
# forward propagation
def forward_propagation(self, inputs):
return self.tanh(dot(inputs, self.weight_matrix))
self.tanh_derivative(output))
# Driver Code
if __name__ == "__main__":
neural_network = NeuralNetwork()
Output :
Experiment-4
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample
ID3 Algorithm:
Training Dataset:
Test Dataset:
import math
import csv
def load_csv(filename):
lines=csv.reader(open(filename,"r"));
dataset = list(lines) headers =
dataset.pop(0) return dataset,headers
class Node:
def init (self,attribute): self.attribute=attribute
self.children=[] self.answer=""
def subtables(data,col,delete):
dic={}
coldata=[row[col] for row in data]
attr=list(set(coldata))
counts=[0]*len(attr) r=len(data)
c=len(data[0])
for x in range(len(attr)): for y in range(r):
if data[y][col]==attr[x]: counts[x]+=1
for x in range(len(attr)):
dic[attr[x]]=[[0 for i in range(c)] for j in range(counts[x])]
pos=0
for y in range(r):
if data[y][col]==attr[x]: if delete:
del data[y][col] dic[attr[x]][pos]=data[y]
pos+=1
return attr,dic
def entropy(S):
attr=list(set(S))
if len(attr)==1: return 0
counts=[0,0]
for i in range(2):
counts[i]=sum([1 for x in S if attr[i]==x])/(len(S)*1.0)
sums=0
for cnt in counts:
sums+=-1*cnt*math.log(cnt,2) return sums
def compute_gain(data,col):
attr,dic = subtables(data,col,delete=False)
total_size=len(data)
entropies=[0]*len(attr)
ratio=[0]*len(attr)
def build_tree(data,features):
lastcol=[row[-1] for row in data] if(len(set(lastcol)))==1:
node=Node("")
node.answer=lastcol[0] return node
n=len(data[0])-1
gains=[0]*n
DNR COLLEGE OF ENGINEERING & TECHNOLOGY PAGE NO:
MACHINE LEARNING LAB 239P1D5809
def print_tree(node,level):
if node.answer!="":
print(" "*level,node.answer) return
def classify(node,x_test,features):
if node.answer!="":
print(node.answer) return
pos=features.index(node.attribute) for value, n in
node.children:
if x_test[pos]==value: classify(n,x_test,features)
'''Main program'''
dataset,features=load_csv("data3.csv") node1=build_tree(dataset,features)
print("The decision tree for the dataset using ID3 algorithm is")
print_tree(node1,0) testdata,features=load_csv("data3_test.csv") for xtest
in testdata:
print("The test instance:",xtest)
print("The label for test instance:",end=" ")
classify(node1,xtest,features)
Output :
Experiment-5
Build an Artificial Neural Network by implementing the Back propagation algorithm and test the
same using appropriate data sets
BACKPROPAGATION ALGORITHM:
Program:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally y = y/100
#Variable initialization
epoch=5000 #Setting training iterations lr=0.1
#Setting learning rate
inputlayer_neurons = 2
#number of features in data set hiddenlayer_neurons = 3
#number of hidden layers neurons output_neurons = 1
#number of neurons at output layer
#Forward Propogation
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout) outinp= outinp1+ bout
output = sigmoid(outinp)
#Backpropagation EO = y-
output
outgrad = derivatives_sigmoid(output) d_output = EO*
outgrad
EH = d_output.dot(wout.T)
Output:
Experiment-6
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision, and recall for your data set.
Naive Bayes algorithms for learning and classifying text:
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes
import MultinomialNB
from sklearn import metrics
print ('\n The total number of Training Data :',ytrain.shape) print ('\n The total number of
Test Data :',ytest.shape)
df=pd.DataFrame(xtrain_dtm.toarray(),columns=cv.get_feature_na mes())
Output:
Experiment-7
Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set.
Print both correct and wrong predictions
Data Set:
Iris Plants Dataset: Dataset contains 150 instances (50 in each of three
classes) Number of Attributes: 4 numeric, predictive attributes and the Class
Program:
""" Iris Plants Dataset, dataset contains 150 (50 in each of three classes)Number of Attributes: 4
numeric, predictive attributes and the Class
"""
iris=datasets.load_iris()
""" The x variable contains the first four columns of thedataset (i.e. attributes) while y contains the
labels.
"""
x = iris.data y =
iris.target
""" Splits the dataset into 70% train data and 30% test data. This means that out of total 150
records, the training set will contain
105 records and the test set contains 45 of those records """
""" For evaluating an algorithm, confusion matrix, precision, recall and f1 score are the most
commonly used metrics.
"""
Output: