0% found this document useful (0 votes)
3 views

Remaining ML Program

The document describes multiple experiments involving machine learning techniques, including the implementation of a Naïve Bayesian classifier, clustering with k-Means and Gaussian Mixture Models, and building an Artificial Neural Network using backpropagation. Each experiment includes source code and explanations of the steps taken, such as data loading, preprocessing, model training, and evaluation. The final experiment demonstrates the Candidate-Elimination algorithm for hypothesis generation based on training data.

Uploaded by

b220752
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Remaining ML Program

The document describes multiple experiments involving machine learning techniques, including the implementation of a Naïve Bayesian classifier, clustering with k-Means and Gaussian Mixture Models, and building an Artificial Neural Network using backpropagation. Each experiment includes source code and explanations of the steps taken, such as data loading, preprocessing, model training, and evaluation. The final experiment demonstrates the Candidate-Elimination algorithm for hypothesis generation based on training data.

Uploaded by

b220752
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Experiment-5

Objective: Write a program to implement the naïve Bayesian classifier for a sample training data
set stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
Source Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix

# Load dataset
data = pd.read_csv("PlayTennis.csv")

# Encoding categorical features


label_encoders = {}
for column in data.columns[:-1]: # Excluding target column
le = LabelEncoder()
data[column] = le.fit_transform(data[column])
label_encoders[column] = le

target_encoder = LabelEncoder()
data['PlayTennis'] = target_encoder.fit_transform(data['PlayTennis'])

# Splitting dataset into train and test sets


X = data.drop(columns=['PlayTennis'])
y = data['PlayTennis']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Naïve Bayes classifier


classifier = GaussianNB()
classifier.fit(X_train, y_train)

# Predictions

Machine Learning 6IT4-22 37


y_pred = classifier.predict(X_test)

# Compute accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

# Correct and incorrect classifications


correct = (y_pred == y_test).sum()
incorrect = (y_pred != y_test).sum()
print(f"Correct Classifications: {correct}")
print(f"Incorrect Classifications: {incorrect}")

Explanation:
1. Load the Dataset
The script reads a CSV file (PlayTennis.csv) containing categorical features like Outlook,
Temperature, Humidity, Wind, and the target variable PlayTennis.
2. Encode Categorical Features
Since Naïve Bayes works with numerical data, Label Encoding is used to convert categorical
features into numerical form. Each categorical column (except the target) is encoded using
LabelEncoder().
The target column (PlayTennis) is also encoded separately.
3. Split Dataset into Training & Testing Sets
The dataset is split into 80% training and 20% testing using train_test_split().
X (features) and y (target) are separated before splitting.
4. Train the Naïve Bayes Classifier
A Gaussian Naïve Bayes model (GaussianNB()) is trained on the training data.
The model learns from the probability distributions of the feature values.

Machine Learning 6IT4-22 38


5. Make Predictions
The classifier predicts outcomes on the test dataset.
6. Compute Accuracy
The model's accuracy is calculated using accuracy_score(y_test, y_pred).
This gives the percentage of correctly classified instances.
7. Generate Confusion Matrix
The confusion matrix (confusion_matrix(y_test, y_pred)) shows the number of:
True Positives (Correct Yes)
True Negatives (Correct No)
False Positives (Incorrect Yes)
False Negatives (Incorrect No)
8. Count Correct & Incorrect Classifications
The script calculates and prints the number of correctly and incorrectly classified instances.

Machine Learning 6IT4-22 39


Experiment-8
Objective: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data
set for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in the
program.

Source Code:
from sklearn.cluster import KMeans
from sklearn import preprocessing
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load the dataset


dataset = load_iris()
X = pd.DataFrame(dataset.data)
X.columns = ['Sepal_Length', 'Sepal_width', 'Petal_Length', 'Petal_Width']
y = pd.DataFrame(dataset.target)
y.columns = ['Targets']

# Plotting
plt.figure(figsize=(14, 7))
colormap = np.array(['red', 'lime', 'black'])

# Real Plot
plt.subplot(1, 3, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Real')

# KMeans Plot

Machine Learning 6IT4-22 46


plt.subplot(1, 3, 2)
model = KMeans(n_clusters=3)
model.fit(X)
predY = np.choose(model.labels_, [0, 1, 2]).astype(np.int64)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[predY], s=40)
plt.title('KMeans')

# GMM Plot
scaler = preprocessing.StandardScaler()
scaler.fit(X)
xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns=X.columns)
gmm = GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm = gmm.predict(xs)
plt.subplot(1, 3, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_cluster_gmm], s=40)
plt.title('GMM Classification')

plt.show()

Machine Learning 6IT4-22 47


Output:
Exp

Explanation:
1. Data Loading:
• The script loads the Iris dataset, which includes features
like Sepal_Length, Sepal_width, Petal_Length, and Petal_Width, and the target
variable Targets (species of the flower).
2. k-Means Clustering:
• The k-Means algorithm is used to cluster the data into 3 clusters (since there are 3
species in the Iris dataset).
• The predicted cluster labels are used to color the data points in the plot.
3. Gaussian Mixture Model (GMM) Clustering:
• The data is standardized using StandardScaler to ensure all features contribute
equally to the clustering.
• The GMM algorithm is applied to fit the data into 3 Gaussian distributions
(components).
• The predicted cluster labels from GMM are used to color the data points in the plot.

Machine Learning 6IT4-22 48


4. Visualization:
• Three subplots are created:
• The first subplot shows the real data distribution colored by the true species
labels.
• The second subplot shows the clustering result of k-Means.
• The third subplot shows the clustering result of GMM.

Machine Learning 6IT4-22 49


Experiment-9
Objective: Build an Artificial Neural Network by implementing the Backpropagation algorithm
and test the same using appropriate datasets.
Source Code:
import pandas as pd
import tensorflow as tf
from google.colab import files

# Upload the dataset


file_upload = files.upload()

# Read the dataset


df = pd.read_csv('Celsius_to_Fahrenheit.csv')
print(df.head())

# Prepare the data


X = df['Celsius']
y = df['Fahrenheit']

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=101)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

# Plot the data


import seaborn as sns
sns.scatterplot(x=df['Celsius'], y=df['Fahrenheit'], marker='.', s=20, color='b')
plt.show()

# Initialize the model


model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(units=1, input_shape=[1]))
model.summary()

Machine Learning 6IT4-22 50


# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(0.5), loss='mean_squared_error')

# Train the model


epochs_hist = model.fit(X_train, y_train, epochs=500)

# Get the model weights


print(model.get_weights())

# Make a prediction
celsius_temp = 100
print(f'Prediction from our perceptron model is: {model.predict([celsius_temp])}')

# Calculate the actual value


F = 9/5 * celsius_temp + 32
print(f'Prediction from actual formula is: {F}')

# Plot the loss progression


import matplotlib.pyplot as plt
plt.plot(epochs_hist.history['loss'])
plt.xlabel('Number of epochs')
plt.ylabel('loss')
plt.title('Loss progression during training')
plt.show()

# Plot the regression line on training data


plt.scatter(X_train, y_train, c='b', marker='.')
plt.plot(X_train, model.predict(X_train), c='g')
plt.xlabel('Celsius')
plt.ylabel('Fahrenheit')
plt.title('Regression line on training data')
plt.show()

Machine Learning 6IT4-22 51


# Plot the regression line on test data
plt.scatter(X_test, y_test, c='b', marker='.')
plt.plot(X_test, model.predict(X_test), c='r')
plt.xlabel('Celsius')
plt.ylabel('Fahrenheit')
plt.title('Regression line on test data')
plt.show()

Output:

Machine Learning 6IT4-22 52


Machine Learning 6IT4-22 53
Experiment-10
Objective: For a given set of training data examples stored in a .CSV file, implement
and demonstrate the Candidate-Elimination algorithm to output a description of the set
of all hypotheses consistent with the training examples.

Source Code:
import numpy as np
import pandas as pd

# Loading Data from a CSV File


data = pd.DataFrame(data=pd.read_csv('trainingdata.csv'))
print(data)

# Separating concept features from Target


concepts = np.array(data.iloc[:,0:-1])
print(concepts)

# Isolating target into a separate DataFrame


# copying last column to target array
target = np.array(data.iloc[:,-1])
print(target)

def learn(concepts, target):

'''
learn() function implements the learning method of the Candidate elimination
algorithm.
Arguments:
concepts - a data frame with all the features
target - a data frame with corresponding output values
'''

# Initialise S0 with the first instance from concepts

Machine Learning 6IT4-22 54

You might also like