0% found this document useful (0 votes)
8 views

ML LAB34

The document describes a series of experiments involving data processing and machine learning techniques applied to housing data and a linear regression model. It includes steps for data imputation, anomaly detection, standardization, normalization, and encoding, followed by the implementation of a gradient descent algorithm for linear regression. The final results include predictions from the model and statistical calculations related to height and weight data.

Uploaded by

Manav Purswani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

ML LAB34

The document describes a series of experiments involving data processing and machine learning techniques applied to housing data and a linear regression model. It includes steps for data imputation, anomaly detection, standardization, normalization, and encoding, followed by the implementation of a gradient descent algorithm for linear regression. The final results include predictions from the model and statistical calculations related to height and weight data.

Uploaded by

Manav Purswani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Experiment 1

import pandas as pd
data = pd.read_csv('large_housing_data_mumbai.csv')
print("Original Data:")
print(data.head())

Original Data:
House_ID Bedrooms Size (sq ft) Price (INR) Location Year_Built
0 1 4.0 855.0 31356226.0 Juhu 2002.0
1 2 5.0 1847.0 27775439.0 Andheri 2004.0
2 3 NaN 2363.0 37325149.0 Bandra 2000.0
3 4 5.0 626.0 6147116.0 South Mumbai 2002.0
4 5 5.0 NaN 49899606.0 Worli NaN

#Imputation
#Handle missing values using median for numerical columns and the most
frequent value for categorical columns.
from sklearn.impute import SimpleImputer
num_features = ['Bedrooms', 'Size (sq ft)', 'Price (INR)', 'Year_Built']
cat_features = ['Location']
num_imputer = SimpleImputer(strategy='median')
data[num_features] = num_imputer.fit_transform(data[num_features])
cat_imputer = SimpleImputer(strategy='most_frequent')
data[cat_features] = cat_imputer.fit_transform(data[cat_features])
print("\nData After Imputation:")
print(data.head())

Data After Imputation:


House_ID Bedrooms Size (sq ft) Price (INR) Location Year_Built
0 1 4.0 855.0 31356226.0 Juhu 2002.0
1 2 5.0 1847.0 27775439.0 Andheri 2004.0
2 3 3.0 2363.0 37325149.0 Bandra 2000.0
3 4 5.0 626.0 6147116.0 South Mumbai 2002.0
4 5 5.0 1702.5 49899606.0 Worli 2012.0

#Anomaly Detection
#Detect anomalies in the dataset. Here, we use Z-scores to identify anomalies
in the Price (INR) column.
from scipy import stats
z_scores = stats.zscore(data[num_features])
data['Anomaly'] = (abs(z_scores) > 3).any(axis=1) # Mark anomalies
print("\nData After Anomaly Detection:")
print(data.head())
#Rule-Based Anomaly Detection
#simple rules where:
#A house with less than 1000 sq ft should have 1 to 2 bedrooms.
#A house with 1000-2000 sq ft should have 2 to 4 bedrooms.
#A house with more than 2000 sq ft should have 3 or more bedrooms.
def is_bedroom_size_reasonable(row):
if row['Size (sq ft)'] < 1000:
return 1 <= row['Bedrooms'] <= 2
elif row['Size (sq ft)'] <= 2000:
return 2 <= row['Bedrooms'] <= 4
else:
return row['Bedrooms'] >= 3
data['Bed_Size_Anomaly'] = ~data.apply(is_bedroom_size_reasonable, axis=1)
print("\nData After Rule-Based Anomaly Detection:")
print(data.head())

Data After Anomaly Detection:


House_ID Bedrooms Size (sq ft) Price (INR) Location Year_Built \
0 1 4.0 855.0 31356226.0 Juhu 2002.0
1 2 5.0 1847.0 27775439.0 Andheri 2004.0
2 3 3.0 2363.0 37325149.0 Bandra 2000.0
3 4 5.0 626.0 6147116.0 South Mumbai 2002.0
4 5 5.0 1702.5 49899606.0 Worli 2012.0

Anomaly
0 False
1 False
2 False
3 False
4 False

Data After Rule-Based Anomaly Detection:


House_ID Bedrooms Size (sq ft) Price (INR) Location Year_Built \
0 1 4.0 855.0 31356226.0 Juhu 2002.0
1 2 5.0 1847.0 27775439.0 Andheri 2004.0
2 3 3.0 2363.0 37325149.0 Bandra 2000.0
3 4 5.0 626.0 6147116.0 South Mumbai 2002.0
4 5 5.0 1702.5 49899606.0 Worli 2012.0

Anomaly Bed_Size_Anomaly
0 False True
1 False True
2 False False
3 False True
4 False True

#Standardization
#Standardize numerical features so they have a mean of 0 and a standard
deviation of 1.
from sklearn.preprocessing import StandardScaler
# Standardize numericals
scaler = StandardScaler()
data[num_features] = scaler.fit_transform(data[num_features])
print("\nData After Standardization:")
print(data.head())

Data After Standardization:


House_ID Bedrooms Size (sq ft) Price (INR) Location Year_Built \
0 1 0.710719 -1.231366 -0.017529 Juhu -1.432248
1 2 1.432261 0.194650 -0.110953 Andheri -1.124866
2 3 -0.010823 0.936408 0.138203 Bandra -1.739631
3 4 1.432261 -1.560557 -0.675243 South Mumbai -1.432248
4 5 1.432261 -0.013071 0.466275 Worli 0.104664

Anomaly Bed_Size_Anomaly
0 False True
1 False True
2 False False
3 False True
4 False True

#Normalization
#Normalize numerical features to fit within the range [0, 1]
from sklearn.preprocessing import MinMaxScaler

normalizer = MinMaxScaler()
data[num_features] = normalizer.fit_transform(data[num_features])
print("\nData After Normalization:")
print(data.head())

Data After Normalization:


House_ID Bedrooms Size (sq ft) Price (INR) Location Year_Built \
0 1 0.75 0.142055 0.058226 Juhu 0.090909
1 2 1.00 0.540128 0.050309 Andheri 0.181818
2 3 0.50 0.747191 0.071422 Bandra 0.000000
3 4 1.00 0.050161 0.002493 South Mumbai 0.090909
4 5 1.00 0.482143 0.099222 Worli 0.545455

Anomaly Bed_Size_Anomaly
0 False True
1 False True
2 False False
3 False True
4 False True

#Encoding
#One-Hot Encode the categorical feature Location.
from sklearn.preprocessing import OneHotEncoder
# One-Hot Encoding for 'Location'
encoder = OneHotEncoder(sparse=False)
encoded_location = encoder.fit_transform(data[['Location']])
encoded_df = pd.DataFrame(encoded_location,
columns=encoder.get_feature_names_out(['Location']))

data_encoded = pd.concat([data, encoded_df], axis=1).drop('Location', axis=1)

print("\nData After Encoding:")


print(data_encoded.head())

Data After Encoding:


House_ID Bedrooms Size (sq ft) Price (INR) Year_Built Anomaly \
0 1 0.75 0.142055 0.058226 0.090909 False
1 2 1.00 0.540128 0.050309 0.181818 False
2 3 0.50 0.747191 0.071422 0.000000 False
3 4 1.00 0.050161 0.002493 0.090909 False
4 5 1.00 0.482143 0.099222 0.545455 False

Bed_Size_Anomaly Location_Andheri Location_Bandra Location_Borivali \


0 True 0.0 0.0 0.0
1 True 1.0 0.0 0.0
2 False 0.0 1.0 0.0
3 True 0.0 0.0 0.0
4 True 0.0 0.0 0.0

Location_Juhu Location_Malad Location_Pali Hill Location_South Mumbai


\
0 1.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 1.0
4 0.0 0.0 0.0 0.0

Location_Worli
0 0.0
1 0.0
2 0.0
3 0.0
4 1.0

/usr/local/lib/python3.10/dist-
packages/sklearn/preprocessing/_encoders.py:975: FutureWarning: `sparse` was
renamed to `sparse_output` in version 1.2 and will be removed in 1.4.
`sparse_output` is ignored unless you leave `sparse` to its default value.
warnings.warn(
Experiment 2
import numpy as np

class GradientDescentMSE:
def __init__(self, lr=0.01, n_iters=1000):
self.lr = lr
self.n_iters = n_iters
self.x1 = None
self.x2 = None

def fit(self, X, y):


# Initialize parameters
self.x1 = np.random.randn() # Initialize x1
self.x2 = np.random.randn() # Initialize x2

for _ in range(self.n_iters):
# Compute predictions
y_pred = self.x1 * X[:, 0] + self.x2 * X[:, 1]

# Compute gradients for MSE


grad_x1 = (2/len(y)) * np.sum((y_pred - y) * X[:, 0])
grad_x2 = (2/len(y)) * np.sum((y_pred - y) * X[:, 1])

# Update parameters
self.x1 = self.x1 - self.lr * grad_x1
self.x2 = self.x2 - self.lr * grad_x2

return self.x1, self.x2

def objective_function(self, X):


return self.x1 * X[:, 0] + self.x2 * X[:, 1]

# Example dataset with two features


X = np.array([[0.5, 1.0], [1.0, 2.0], [1.5, 2.5], [2.0, 3.5]]) # Features
y = np.array([1.5, 2.5, 3.0, 4.0]) # True values

# Initialize and run gradient descent


gd_mse = GradientDescentMSE(lr=0.01, n_iters=1000)
final_x1, final_x2 = gd_mse.fit(X, y)
final_predictions = gd_mse.objective_function(X)

print(f"Final x1: {final_x1}, Final x2: {final_x2}")


print(f"Final Predictions: {final_predictions}")

Final x1: -1.3189740745133114, Final x2: 1.9351908822144432


Final Predictions: [1.27570384 2.55140769 2.85951609 4.13521994]
Experiment 3
import pandas as pd
import numpy as np
file_path = '/ml-linear-reg.csv'
data = pd.read_csv(file_path)
#display data
print(data)

Height Weight
0 151 63
1 174 81
2 138 56
3 186 91
4 128 47
5 136 57
6 179 76
7 163 72
8 152 62
9 131 48

#mean of X (Height) and Y (Weight)


x_mean = np.mean(data['Height'])
y_mean = np.mean(data['Weight'])

#Display
print(f"Mean of Height (x_mean): {x_mean}")
print(f"Mean of Weight (y_mean): {y_mean}")

Mean of Height (x_mean): 153.8


Mean of Weight (y_mean): 65.3

# Calculate xi - x_bar and yi - y_bar


data['xi-xbar'] = data['Height'] - x_mean
data['yi-ybar'] = data['Weight'] - y_mean

#Display xi - x_bar and yi - y_bar


print(data[['Height', 'Weight', 'xi-xbar', 'yi-ybar']])

Height Weight xi-xbar yi-ybar


0 151 63 -2.8 -2.3
1 174 81 20.2 15.7
2 138 56 -15.8 -9.3
3 186 91 32.2 25.7
4 128 47 -25.8 -18.3
5 136 57 -17.8 -8.3
6 179 76 25.2 10.7
7 163 72 9.2 6.7
8 152 62 -1.8 -3.3
9 131 48 -22.8 -17.3
# Calculate product of (xi - x_bar) and (yi - y_bar)
data['(xi-xbar)*(yi-ybar)'] = data['xi-xbar'] * data['yi-ybar']

# Display product of (xi - x_bar) and (yi - y_bar)


print(data[['Height', 'Weight', 'xi-xbar', 'yi-ybar', '(xi-xbar)*(yi-
ybar)']])

Height Weight xi-xbar yi-ybar (xi-xbar)*(yi-ybar)


0 151 63 -2.8 -2.3 6.44
1 174 81 20.2 15.7 317.14
2 138 56 -15.8 -9.3 146.94
3 186 91 32.2 25.7 827.54
4 128 47 -25.8 -18.3 472.14
5 136 57 -17.8 -8.3 147.74
6 179 76 25.2 10.7 269.64
7 163 72 9.2 6.7 61.64
8 152 62 -1.8 -3.3 5.94
9 131 48 -22.8 -17.3 394.44

# Calculate square of (xi - x_bar)


data['sq(xi-xbar)'] = data['xi-xbar'] ** 2

# Display square of (xi - x_bar)


print(data[['Height', 'Weight', 'xi-xbar', 'yi-ybar', '(xi-xbar)*(yi-ybar)',
'sq(xi-xbar)']])

Height Weight xi-xbar yi-ybar (xi-xbar)*(yi-ybar) sq(xi-xbar)


0 151 63 -2.8 -2.3 6.44 7.84
1 174 81 20.2 15.7 317.14 408.04
2 138 56 -15.8 -9.3 146.94 249.64
3 186 91 32.2 25.7 827.54 1036.84
4 128 47 -25.8 -18.3 472.14 665.64
5 136 57 -17.8 -8.3 147.74 316.84
6 179 76 25.2 10.7 269.64 635.04
7 163 72 9.2 6.7 61.64 84.64
8 152 62 -1.8 -3.3 5.94 3.24
9 131 48 -22.8 -17.3 394.44 519.84

# Calculate sum of square of (xi - x_bar)


sum_sq_xi_xbar = np.sum(data['sq(xi-xbar)'])

# Display sum of square of (xi - x_bar)


print(f"Sum of square of (xi - x_bar): {sum_sq_xi_xbar}")

Sum of square of (xi - x_bar): 3927.6000000000004

# Calculate sum of (xi - x_bar) * (yi - y_bar)


sum_xiyi_xbar_ybar = np.sum(data['(xi-xbar)*(yi-ybar)'])

# Display sum of (xi - x_bar) * (yi - y_bar)


print(f"Sum of (xi - x_bar) * (yi - y_bar): {sum_xiyi_xbar_ybar}")
Sum of (xi - x_bar) * (yi - y_bar): 2649.6

# Calculate b1 (slope)
b1 = sum_xiyi_xbar_ybar / sum_sq_xi_xbar

# Display b1 (slope)
print(f"Slope (b1): {b1}")

Slope (b1): 0.6746104491292392

# Calculate b0 (intercept)
b0 = y_mean - b1 * x_mean

# Display b0 (intercept)
print(f"Intercept (b0): {b0}")

Intercept (b0): -38.45508707607699

# Define a function to predict weight from height


def predict(height):
return b1 * height + b0

# Example prediction
height_new = 160
weight_prediction = predict(height_new)
print(f'Predicted weight for height {height_new} cm is
{weight_prediction:.2f} kg')

Predicted weight for height 160 cm is 69.48 kg


Experiment 4
import numpy as np
1
The sigmoid function is defined as 𝜙(𝑧) = 1+𝑧−𝑧

$\therefore \phi (\hat{y}) = \frac{1}{1 + e^(-\hat{y})} $


$\therefore \hat{y} = \frac{1}{1 + e^(-wx+b)} $
def sigmoid(x):
return 1/(1+np.exp(-x))

class LogisticRegression():

def __init__(self, lr=0.01, n_iters=1000):


self.lr = lr
self.n_iters = n_iters
self.weights = None
self.bias = None

def fit(self, X, y):


n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0

for _ in range(self.n_iters):
linear_pred = np.dot(X, self.weights) + self.bias
y_pred = sigmoid(linear_pred) # Logistic addition. Rest all is
Linear Regression.

dw = (1/n_samples) * np.dot(X.T, (y_pred - y))


db = (1/n_samples) * np.sum(y_pred - y)

self.weights = self.weights - self.lr*dw


self.bias = self.bias - self.lr*db

def predict(self, X):


linear_pred = np.dot(X, self.weights) + self.bias
y_pred = sigmoid(linear_pred)
class_pred = [0 if y<=0.5 else 1 for y in y_pred]
return class_pred

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt

bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=1234)

model = LogisticRegression(lr=0.01)
model.fit(X_train,y_train)
y_pred = model.predict(X_test)

def accuracy(y_pred, y_test):


return np.sum(y_pred==y_test)/len(y_test)

acc = accuracy(y_pred, y_test)


print(acc)

0.9210526315789473

C:\Users\rohra\AppData\Local\Temp\ipykernel_19392\4033946986.py:2:
RuntimeWarning: overflow encountered in exp
return 1/(1+np.exp(-x))

import pandas as pd

results = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})

print(results)

Actual Predicted
0 1 1
1 1 1
2 1 1
3 1 1
4 1 1
.. ... ...
109 1 1
110 0 0
111 1 0
112 0 0
113 0 0

[114 rows x 2 columns]


Experiment 5
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("penguins_size.csv")
df.head()

# EDA
# Missing Data
df.info()

df.isna().sum()

# What percentage are we dropping?


100*(10/344)
df = df.dropna()
df.info()

df.head()

df['sex'].unique()

df['island'].unique()

df = df[df['sex']!='.']
# Feature Engineering
pd.get_dummies(df)
pd.get_dummies(df.drop('species',axis=1),drop_first=True)

# Train and Test split


X = pd.get_dummies(df.drop('species',axis=1),drop_first=True)
y = df['species']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train,y_train)
base_pred = model.predict(X_test)
from sklearn.metrics import confusion_matrix, classification_report, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Generate confusion matrix


y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
# Display the confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()

plt.show()

print(classification_report(y_test,base_pred))

model.feature_importances_
pd.DataFrame(index=X.columns,data=model.feature_importances_,columns=['Feature
Importance'])

# Visualize the tree


from sklearn.tree import plot_tree
plt.figure(figsize=(12,8))
plot_tree(model);

from sklearn.tree import plot_tree


import matplotlib.pyplot as plt
# Convert X.columns to a list
plt.figure(figsize=(12, 8), dpi=150)
plot_tree(model, filled=True, feature_names=X.columns.tolist())

plt.show()

def report_model(model):
model_preds = model.predict(X_test)
print(classification_report(y_test,model_preds))
print('\n')
plt.figure(figsize=(12,8),dpi=150)
plot_tree(model,filled=True,feature_names=X.columns);
pruned_tree = DecisionTreeClassifier(max_depth=2)
pruned_tree.fit(X_train,y_train)
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

def report_model(model):
# Print classification report if needed (e.g., precision, recall, etc.)
print('\n')

# Convert X.columns to a list before passing to plot_tree


plt.figure(figsize=(12, 8), dpi=150)
plot_tree(model, filled=True, feature_names=X.columns.tolist())
plt.show()

pruned_tree = DecisionTreeClassifier(max_leaf_nodes=3)
pruned_tree.fit(X_train,y_train)
report_model(pruned_tree)

entropy_tree = DecisionTreeClassifier(criterion='entropy')
entropy_tree.fit(X_train,y_train)
report_model(entropy_tree)
Experiment 6
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder

# Load dataset from CSV


data = pd.read_csv('iris.csv')

print(data)

# Use only the first two features for training and visualization
X = data.iloc[:, :2].values # First two features
y = data.iloc[:, -1].values # Target variable (last column)

# Encode target labels (species) to numeric values


label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

# Split the data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.3,
random_state=42)

# Create and train the SVM model with RBF kernel


svm_rbf = SVC(kernel='rbf', gamma='auto')
svm_rbf.fit(X_train, y_train)

# Make predictions
y_pred = svm_rbf.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred) * 100
print(f"Accuracy: {accuracy:.4f}\n")

# Classification report (formatted as a DataFrame)


report_dict = classification_report(y_test, y_pred,
target_names=label_encoder.classes_, output_dict=True)
report_df = pd.DataFrame(report_dict).transpose()
print("Classification Report:\n", report_df)

# Confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("\nConfusion Matrix:")
print(conf_matrix)
print()

# Visualize the Confusion Matrix as a heatmap


plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.title('Confusion Matrix Heatmap')
plt.show()

print()

# Visualize the decision boundary (for 2D data)


def plot_decision_boundary(X, y, model):
h = .02 # step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.coolwarm)


plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o',
cmap=plt.cm.coolwarm)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('SVM Decision Boundary with RBF Kernel')
plt.show()

# Plot decision boundary using the test set


plot_decision_boundary(X_test, y_test, svm_rbf)
Experiment 7
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset from CSV


data = pd.read_csv('iris.csv')

# Use only the first two features for training and visualization
X = data.iloc[:, :2].values # First two features
y = data.iloc[:, -1].values # Target variable (last column)

# Encode target labels (species) to numeric values


label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

# Split the data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.3,
random_state=42)

# 1. SVM Model
svm_rbf = SVC(kernel='rbf', gamma='auto', probability=True)
svm_rbf.fit(X_train, y_train)
y_pred_svm = svm_rbf.predict(X_test)

# 2. Random Forest Model (Bagging)


rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)

# 3. AdaBoost Model (Boosting)


ada = AdaBoostClassifier(base_estimator=SVC(kernel='linear', probability=True),
n_estimators=50, random_state=42)
ada.fit(X_train, y_train)
y_pred_ada = ada.predict(X_test)

# Combine predictions using majority voting


y_pred_ensemble = np.array([y_pred_svm, y_pred_rf, y_pred_ada])
y_pred_final = np.array([np.bincount(x).argmax() for x in y_pred_ensemble.T])

# Accuracy for each model


for model_name, y_pred in zip(['SVM', 'Random Forest', 'AdaBoost', 'Ensemble'],
[y_pred_svm, y_pred_rf, y_pred_ada, y_pred_final]):
accuracy = accuracy_score(y_test, y_pred)
print(f"{model_name} Accuracy: {accuracy:.4f}\n")

# Classification report for the ensemble model


report_dict = classification_report(y_test, y_pred_final, target_names=label_encoder.classes_,
output_dict=True)
report_df = pd.DataFrame(report_dict).transpose()
print("Ensemble Classification Report:\n", report_df)

# Confusion matrix for the ensemble model


conf_matrix = confusion_matrix(y_test, y_pred_final)
print("\nEnsemble Confusion Matrix:")
print(conf_matrix)

# Visualize the Confusion Matrix as a heatmap for the ensemble model


plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.title('Ensemble Confusion Matrix Heatmap')
plt.show()

# Visualize the decision boundary for the ensemble model


def plot_decision_boundary(X, y, model):
h = .02 # step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.coolwarm)


plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o', cmap=plt.cm.coolwarm)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Ensemble Decision Boundary with Bagging and Boosting')
plt.show()

# Since we can't train an ensemble model directly, we just plot the decision boundary using the
SVM model
plot_decision_boundary(X_test, y_test, svm_rbf)
Experiment 8
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Step 1: Load the Iris dataset from a CSV file


data = pd.read_csv('iris.csv')

# Assuming the last column is the target label and the rest are features
X = data.iloc[:, :-1].values # Features (all rows, all columns except the last)
y = data.iloc[:, -1].values # Target (all rows, last column)

# Map string labels to integers


label_mapping = {'setosa': 0, 'versicolor': 1, 'virginica': 2}
y_numeric = np.array([label_mapping[label] for label in y])

# Step 2: Standardize the data


scaler = StandardScaler()
X_std = scaler.fit_transform(X)

# Step 3: Calculate the covariance matrix


cov_matrix = np.cov(X_std.T)

# Step 4: Calculate the eigenvalues and eigenvectors


eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Step 5: Sort the eigenvalues and eigenvectors


sorted_indices = np.argsort(eigenvalues)[::-1]
eigenvalues_sorted = eigenvalues[sorted_indices]
eigenvectors_sorted = eigenvectors[:, sorted_indices]

# Step 6: Select the number of principal components


n_components = 2
eigenvectors_subset = eigenvectors_sorted[:, :n_components]

# Step 7: Transform the data


X_pca = X_std.dot(eigenvectors_subset)

# Step 8: Visualize the results


plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y_numeric, cmap='viridis', edgecolor='k', s=100)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA of Iris Dataset')
plt.grid()
plt.show()

You might also like