Remaining ML Program
Remaining ML Program
Objective: Write a program to implement the naïve Bayesian classifier for a sample training data
set stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
Source Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix
# Load dataset
data = pd.read_csv("PlayTennis.csv")
target_encoder = LabelEncoder()
data['PlayTennis'] = target_encoder.fit_transform(data['PlayTennis'])
# Predictions
# Compute accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
# Confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)
Explanation:
1. Load the Dataset
The script reads a CSV file (PlayTennis.csv) containing categorical features like Outlook,
Temperature, Humidity, Wind, and the target variable PlayTennis.
2. Encode Categorical Features
Since Naïve Bayes works with numerical data, Label Encoding is used to convert categorical
features into numerical form. Each categorical column (except the target) is encoded using
LabelEncoder().
The target column (PlayTennis) is also encoded separately.
3. Split Dataset into Training & Testing Sets
The dataset is split into 80% training and 20% testing using train_test_split().
X (features) and y (target) are separated before splitting.
4. Train the Naïve Bayes Classifier
A Gaussian Naïve Bayes model (GaussianNB()) is trained on the training data.
The model learns from the probability distributions of the feature values.
Source Code:
from sklearn.cluster import KMeans
from sklearn import preprocessing
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Plotting
plt.figure(figsize=(14, 7))
colormap = np.array(['red', 'lime', 'black'])
# Real Plot
plt.subplot(1, 3, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Real')
# KMeans Plot
# GMM Plot
scaler = preprocessing.StandardScaler()
scaler.fit(X)
xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns=X.columns)
gmm = GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm = gmm.predict(xs)
plt.subplot(1, 3, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_cluster_gmm], s=40)
plt.title('GMM Classification')
plt.show()
Explanation:
1. Data Loading:
• The script loads the Iris dataset, which includes features
like Sepal_Length, Sepal_width, Petal_Length, and Petal_Width, and the target
variable Targets (species of the flower).
2. k-Means Clustering:
• The k-Means algorithm is used to cluster the data into 3 clusters (since there are 3
species in the Iris dataset).
• The predicted cluster labels are used to color the data points in the plot.
3. Gaussian Mixture Model (GMM) Clustering:
• The data is standardized using StandardScaler to ensure all features contribute
equally to the clustering.
• The GMM algorithm is applied to fit the data into 3 Gaussian distributions
(components).
• The predicted cluster labels from GMM are used to color the data points in the plot.
# Make a prediction
celsius_temp = 100
print(f'Prediction from our perceptron model is: {model.predict([celsius_temp])}')
Output:
Source Code:
import numpy as np
import pandas as pd
'''
learn() function implements the learning method of the Candidate elimination
algorithm.
Arguments:
concepts - a data frame with all the features
target - a data frame with corresponding output values
'''