0% found this document useful (0 votes)
2 views

ml final

The document outlines five experiments demonstrating various machine learning algorithms including FIND-S, Candidate-Elimination, ID3 decision tree, Backpropagation for neural networks, and Naïve Bayes classifier. Each experiment includes code snippets for implementing the algorithms on specific datasets, showcasing their functionality and results. The experiments cover tasks such as hypothesis generation, decision tree classification, neural network training, and model evaluation with accuracy metrics.

Uploaded by

tanishagusain23
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ml final

The document outlines five experiments demonstrating various machine learning algorithms including FIND-S, Candidate-Elimination, ID3 decision tree, Backpropagation for neural networks, and Naïve Bayes classifier. Each experiment includes code snippets for implementing the algorithms on specific datasets, showcasing their functionality and results. The experiments cover tasks such as hypothesis generation, decision tree classification, neural network training, and model evaluation with accuracy metrics.

Uploaded by

tanishagusain23
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

EXPERIMENT-1

Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file.
import pandas as pd
import numpy as np
data=pd.DataFrame([
['Sunny','warm','normal','strong','warm','same','yes'],
['Sunny','warm','high','strong','warm','same','yes'],
['rainy','cold','high','strong','warm','change','no'],
['Sunny','warm','high','strong','cool','change','yes']
],
columns=['sky','temp','humidity','wind','water','forecast','enjoy']
)
data
Output:
sky temp humidity wind water forecast enjoy
0 Sunny warm normal strong warm same yes
1 Sunny warm high strong warm same yes
2 rainy cold high strong warm change no
3 Sunny warm high strong cool change yes

concepts = np.array(data)[:,:-1]
target = np.array(data)[:,-1]
def train(con,tar):
for i,val in enumerate(tar):
if val=='yes':
specific_h = con[i].copy()
break

for i,val in enumerate(con):


if tar[i]=='yes':
for x in range(len(specific_h)):
if val[x] != specific_h[x]:
specific_h[x] = '?'
else:
pass
return specific_h
hypothesis=train(concepts,target)
hypothesis
Output:
array(['Sunny', 'warm', '?', 'strong', '?', '?'], dtype=object)
def predict_output(hypothesis, input_example):
for i in range(len(hypothesis)):
if hypothesis[i] != '?' and hypothesis[i] != input_example[i]:
return 'No' # There's a mismatch, so predict No
return 'Yes' # All attributes match or are generalize
# Example input to predict
input_example = ['Sunny', 'warm', 'high', 'strong', 'warm', 'same']
# Predict the output
output = predict_output(hypothesis, input_example)
print(f"Predicted Output: {output}")
Output:
Predicted Output: Yes
EXPERIMENT-2
For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.
import pandas as pd
import numpy as np
data=pd.DataFrame([
['Sunny','warm','normal','strong','warm','same','yes'],
['Sunny','warm','high','strong','warm','same','yes'],
['rainy','cold','high','strong','warm','change','no'],
['Sunny','warm','high','strong','cool','change','yes']
],
columns=['sky','temp','humidity','wind','water','forecast','enjoy']
)
data
Output:
sky temp humidity wind water forecast enjoy
0 Sunny warm normal strong warm same yes
1 Sunny warm high strong warm same yes
2 rainy cold high strong warm change no
3 Sunny warm high strong cool change yes

concepts = np.array(data)[:,:-1]
target = np.array(data)[:,-1]
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h \n",specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print("initialization of general_h \n", general_h)
for i, h in enumerate(concepts):
if target[i] == "yes":
print("If instance is Positive ")
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
if target[i] == "no":
print("If instance is Negative ")
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" step {}".format(i+1))
print(specific_h)
print(general_h)
print("\n")
print("\n")
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")
Output:
initialization of specific_h
['Sunny' 'warm' 'normal' 'strong' 'warm' 'same']
initialization of general_h
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
If instance is Positive
step 1
['Sunny' 'warm' 'normal' 'strong' 'warm' 'same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
If instance is Positive
step 2
['Sunny' 'warm' '?' 'strong' 'warm' 'same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
If instance is Negative
step 3
['Sunny' 'warm' '?' 'strong' 'warm' 'same']
[['Sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', 'same']]
If instance is Positive
step 4
['Sunny' 'warm' '?' 'strong' '?' '?']
[['Sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific_h:
['Sunny' 'warm' '?' 'strong' '?' '?']
Final General_h:
[['Sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?']]
EXPERIMENT-3
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.
# Import necessary libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import pandas as pd
# Prepare the dataset
data = {
'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast', 'Sunny', 'Sunny', 'Rain',
'Sunny', 'Overcast', 'Overcast', 'Rain'],
'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild',
'Mild', 'Hot', 'Mild'],
'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal',
'Normal', 'Normal', 'High', 'Normal', 'High'],
'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Weak', 'Weak',
'Strong', 'Strong', 'Weak', 'Strong'],
'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
}
# Convert the dataset to a pandas DataFrame
df = pd.DataFrame(data)
df
# Convert categorical attributes to numerical data
df['Outlook'] = df['Outlook'].map({'Sunny': 0, 'Overcast': 1, 'Rain': 2})
df['Temperature'] = df['Temperature'].map({'Hot': 0, 'Mild': 1, 'Cool': 2})
df['Humidity'] = df['Humidity'].map({'High': 0, 'Normal': 1})
df['Wind'] = df['Wind'].map({'Weak': 0, 'Strong': 1})
df['PlayTennis'] = df['PlayTennis'].map({'No': 0, 'Yes': 1})
# Separate features (X) and target (y)
X = df[['Outlook', 'Temperature', 'Humidity', 'Wind']]
y = df['PlayTennis']
# Train the Decision Tree Classifier using ID3 algorithm
clf = DecisionTreeClassifier(criterion='entropy') # ID3 uses entropy
clf = clf.fit(X, y)
# Visualize the Decision Tree
tree.plot_tree(clf, feature_names=['Outlook', 'Temperature', 'Humidity', 'Wind'],
class_names=['No', 'Yes'], filled=True)
print("Decision Tree Trained Successfully")
# Classify a new sample: Outlook=Sunny, Temperature=Cool, Humidity=Normal, Wind=Strong
new_sample = pd.DataFrame([[0, 2, 1, 1]], columns=['Outlook', 'Temperature', 'Humidity',
'Wind'])
prediction = clf.predict(new_sample)
print("Predicted class for the new sample: ", 'Yes' if prediction[0] == 1 else 'No')
print(df)
from sklearn.metrics import accuracy_score
# Accuracy on training data
X_train_prediction = clf.predict(X)
training_data_accuracy = accuracy_score(X_train_prediction,y)
print(f"Accuracy on Training data: ",training_data_accuracy*100)
Output:
Decision Tree Trained Successfully
Predicted class for the new sample: Yes
Outlook Temperature Humidity Wind PlayTennis
0 0 0 0 0 0
1 0 0 0 1 0
2 1 0 0 0 1
3 2 1 0 0 1
4 2 2 1 0 1
5 2 2 1 1 0
6 1 2 1 1 1
7 0 1 0 0 0
8 0 2 1 0 1
9 2 1 1 0 1
10 0 1 1 1 1
11 1 1 0 1 1
12 1 0 1 0 1
13 2 1 0 1 0
Accuracy on Training data: 100.0
EXPERIMENT-4
Build an Artificial Neural Network by implementing the Backpropagation algorithm and test
the same using appropriate data sets.
import numpy as np
# Activation function: Sigmoid
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Derivative of the sigmoid function (for backpropagation)
def sigmoid_derivative(x):
return x * (1 - x)
# Input dataset (XOR problem)
# Each row is an input (X1, X2)
inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
# Output dataset (Expected outputs for XOR)
outputs = np.array([[0], [1], [1], [0]])
# Set a random seed for reproducibility
np.random.seed(42)
# Initialize weights randomly with values between 0 and 1
# Weights between input and hidden layer (2 inputs, 2 neurons in hidden layer)
weights_input_hidden = np.random.rand(2, 2)
# Weights between hidden layer and output (2 neurons, 1 output)
weights_hidden_output = np.random.rand(2, 1)
# Learning rate
learning_rate = 0.1
# Number of iterations for training
epochs = 10000
# Training the network
for epoch in range(epochs):
# ---- Forward Propagation ----
# Step 1: Input to Hidden layer
hidden_layer_input = np.dot(inputs, weights_input_hidden) # Linear sum
hidden_layer_output = sigmoid(hidden_layer_input)
final_input = np.dot(hidden_layer_output, weights_hidden_output) # Linear sum
final_output = sigmoid(final_input)
error = outputs - final_output # Difference between actual and predicted output
# ---- Backward Propagation ----
d_output = error * sigmoid_derivative(final_output)
error_hidden_layer = d_output.dot(weights_hidden_output.T)
d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_output)
weights_hidden_output += hidden_layer_output.T.dot(d_output) * learning_rate
weights_input_hidden += inputs.T.dot(d_hidden_layer) * learning_rate
if epoch % 1000 == 0:
mse = np.mean(np.square(error))
print(f"Epoch {epoch}, Error: {mse}")
print(final_output)
# Final predicted output after training
print("Final output after training:")
print(final_output)
Output:
Epoch 0, Error: 0.2520513692725072
[[0.53892274]
[0.55132394]
[0.5510619 ]
[0.56117033]]
Final output after training:
[[0.20369158]
[0.73603066]
[0.73604444]
[0.34370702]]
EXPERIMENT-5
Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# Load the dataset
data = pd.read_csv("heart.csv")
# Check for missing values and handle them
print(data.isnull().sum())
Output:
age 0
sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0
thal 0
target 0
dtype: int64
# Split the dataset into features (X) and labels (y)
X = data.drop(columns='target') # Features: all columns except the target
y = data['target'] # Labels: target column
# Split the dataset into training and testing sets (70% training, 30% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create a Gaussian Naive Bayes model
model = GaussianNB()
# Train the model on the training data
model.fit(X_train, y_train)
# Make predictions on the test data
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
# Display confusion matrix and classification report
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
Output:
Accuracy: 81.49%
Confusion Matrix:
[[118 41]
[ 16 133]]
Classification Report:
precision recall f1-score support
0 0.88 0.74 0.81 159
1 0.76 0.89 0.82 149
accuracy 0.81 308
macro avg 0.82 0.82 0.81 308
weighted avg 0.82 0.81 0.81 308
EXPERIMENT-6
AIM: Write a program to k-Nearest Neighbour algorithm to classify the iris data set. Print both correct and
wrong predictions.
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import numpy as np
dataset=load_iris()

X_train,X_test,y_train,y_test=train_test_split(dataset["data"],dataset["target"],random_state=0)
kn=KNeighborsClassifier(n_neighbors=1)
kn.fit(X_train,y_train)
y_pred=kn.predict(X_test)

print(classification_report(y_test, y_pred))
print("Test Accuracy:", kn.score(X_test, y_test))

for i in range(len(X_test)):
x=X_test[i]
x_new=np.array([x])
prediction=kn.predict(x_new)

print("TARGET=",y_test[i],dataset["target_names"][y_test[i]],"PREDICTED=",prediction,dataset["target
_names"][prediction])
print(kn.score(X_test,y_test))

Output:

Test Accuracy: 0.9736842105263158


EXPERIMENT-7
AIM: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for clustering
using k-Means algorithm. Compare the results of these two algorithms and comment on the quality of
clustering.
from sklearn.cluster import KMeans
from sklearn import preprocessing
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset = load_iris()
x = pd.DataFrame(dataset.data)
x.columns=['Sepal_Length', 'Sepal_Length', 'Petal_Length', 'Petal_Width']
y = pd.DataFrame(dataset.target)
y.columns = ['targets']
plt.figure(figsize=(14,7))
colormap = np.array(['red', 'lime', 'black'])

# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(x.Petal_Length, x.Petal_Width, c=colormap[y.targets],s=40)
plt.title('Real Clusters')

# K-MEANS PLOT
plt.subplot(1,3,2)
model= KMeans(n_clusters = 3)
model.fit(x)
predY = np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(x.Petal_Length, x.Petal_Width, c = colormap[predY], s=40)
plt.title('K-Means Clustering')

# GMM PlOT
scaler = preprocessing.StandardScaler()
scaler.fit(x)
xsa = scaler.transform(x)
xs = pd.DataFrame(xsa, columns = x.columns)
gmm = GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm = gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(x.Petal_Length, x.Petal_Width, c=colormap[y_cluster_gmm], s=40)
plt.title("Gaussian Mixture Model")
Output:
EXPERIMENT-8
AIM: Write a program to construct a Machine Learning algorithm considering medical data. Use this
model to demonstrate the diagnosis of heart pa􀆟ents using standard Heart Disease Data Set.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import rcParams
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

df = pd.read_csv("heart.csv")
df
df.describe()
df.hist()

df["target"].value_counts()
X = df.drop(columns = "target", axis=1)
Y = df["target"]
X.shape

from sklearn.model_selection import train_test_split


X_train,X_test,y_train,y_test=train_test_split(X, Y, test_size = 0.40, stratify = Y,random_state=2)

from sklearn.preprocessing import StandardScaler


scalar = StandardScaler()
X_train = scalar.fit_transform(X_train)
X_test = scalar.fit_transform(X_test)
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix,accuracy_score,roc_curve,classification_report
ml="Logistic Regression"
lr_model = LogisticRegression(solver="liblinear", random_state = 2)
lr_model.fit(X_train, y_train)

X_train_prediction = lr_model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, y_train)
print(f"Accuracy on Training data: ",training_data_accuracy*100)

Output:
TARGET VALUE:
1 526
0 499
Name: target, dtype: int64
SHAPE: (1025, 13)
LogisticRegression(random_state=2, solver='liblinear')
Accuracy on Training data: 88.13008130081302
EXPERIMENT-9
AIM: Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Calculate the accuracy, precision, and recall for your data set.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

msg=pd.read_csv('text.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg

msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum

#spliting the dataset into train and test data


xtrain,xtest,ytrain,ytest=train_test_split(X,y,test_size=0.33)
print ('\n the total number of Training Data :',ytrain.shape)
print ('\n the total number of Test Data :',ytest.shape)

#output the words or Tokens in the text documents


cv = CountVectorizer()
xtrain_dtm = cv.fit_transform(xtrain)
xtest_dtm=cv.transform(xtest)
print('\n The words or Tokens in the text documents \n')
print(cv.get_feature_names_out())
df=pd.DataFrame(xtrain_dtm.toarray(),columns=cv.get_feature_names_out())
df

xtrain_dtm.shape
# Training Naive Bayes (NB) classifier on training data.
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)

#printing accuracy, Confusion matrix, Precision and Recall


print('\n Accuracy of the classifier is',metrics.accuracy_score(ytest,predicted))
print('\n Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('\n The value of Precision', metrics.precision_score(ytest,predicted))
print('\n The value of Recall', metrics.recall_score(ytest,predicted))
Output:

the total number of Training Data : (12,)


the total number of Test Data : (6,)
The words or Tokens in the text documents
['about' 'am' 'an' 'and' 'awesome' 'bad' 'beers' 'boss' 'can' 'dance'
'deal' 'do' 'enemy' 'feel' 'fun' 'good' 'great' 'have' 'holiday'
'horrible' 'house' 'is' 'juice' 'like' 'locality' 'love' 'my' 'not' 'of'
'place' 'restaurant' 'sick' 'stay' 'taste' 'that' 'the' 'these' 'this'
'tired' 'to' 'today' 'tomorrow' 'very' 'we' 'went' 'what' 'will' 'with']

SHAPE: (12, 44)


Accuracy of the classifier is 0.8333333333333334
Confusion matrix
[[4 0]
[1 1]]
The value of Precision 1.0
The value of Recall 0.5

You might also like