0% found this document useful (0 votes)

24 views

ML-LAB-Manual

The document is a practice lab report for a Machine Learning course at Visvesvaraya Technological University, detailing various experiments and programs to be developed using datasets like California Housing, Iris, and Boston Housing. It includes tasks such as creating histograms, implementing algorithms like k-Nearest Neighbour and Principal Component Analysis, and using tools like Anaconda and Jupyter Notebook. The report outlines the objectives, methodologies, and expected outcomes for each experiment, emphasizing the practical application of machine learning techniques.

Uploaded by

mallikarjunkumbar411

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

ML-LAB-Manual

Uploaded by

mallikarjunkumbar411

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnana Sangama, Belagavi-590018

Government Engineering College

Chamarajanagara - 571313
Department Of Computer Science And Engineering

Practice Lab Report

Machine Learning LAB (BCSL606)

Academic Year: 2024-25

Student Name :
Register Number :

Signature Signature
Lab Coradinator Head of the Department

1
CONTENT LIST
SL.NO. EXPERIMENT NAME PAGE NO.
1. Introduction
3-4
Program 1: Develop a program to create histograms for all numerical
2. features and analyze the distribution of each feature. Generate box plots for 5
all numerical features and identify any outliers. Use California Housing
dataset.
Program 2: Develop a program to Compute the correlation matrix to
understand the relationships between pairs of features. Visualize the
correlation matrix using a heatmap to know which variables have strong
3. positive/negative correlations. Create a pair plot to visualize pairwise 6
relationships between features. Use California Housing dataset.

Program 3: Develop a program to implement Principal Component Analysis

4. (PCA) for reducing the dimensionality of the Iris dataset from 4 features to 2. 7

Program 4: For a given set of training data examples stored in a .CSV

5. file, implement and demonstrate the Find-S algorithm to output a 8
description of the set of all hypotheses consistent with the training
examples.
Program 5: Develop a program to implement k-Nearest Neighbour algorithm
to classify the randomly generated 100 values of x in the range of [0,1].
Perform the following based on dataset generated.

6. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi

9
Class1, else xi Class1
Classify the remaining points, x51,……,x100 using KNN. Perform this for
k=1,2,3,4,5,20,30
Program 6: Implement the non-parametric Locally Weighted Regression
7. algorithm in order to fit data points. Select appropriate data set for your 10
experiment and draw graphs

Program 7: Develop a program to demonstrate the working of Linear

8. Regression and Polynomial Regression. Use Boston Housing Dataset for
Linear Regression and Auto MPG Dataset (for vehicle fuel efficiency
11-12
prediction) for Polynomial Regression.

Program 8: Develop a program to demonstrate the working of the decision

9. tree algorithm. Use Breast Cancer Data set for building the decision tree and 13
apply this knowledge to classify a new sample.

Program 9: Develop a program to implement the Naive Bayesian classifier

10. considering Olivetti Face Data set for training. Compute the accuracy of the 14-15
classifier, considering a few test data sets.

Program 10: Develop a program to implement k-means clustering using

11. Wisconsin Breast Cancer data set and visualize the clustering result. 16-17

12. Viva Questions

2
INTRODUCTION
Machine Learning
Machine Learning is used anywhere from automating mundane tasks to offering
intelligent insights, industries in every sector try to benefit from it. You may already
be using a device that utilizes it. For example, a wearable fitness tracker like Fitbit,
or an intelligent home assistant like Google Home. But there are much more
examples of ML in use.
• Prediction: Machine learning can also be used in the prediction systems. Considering
the loan example, to compute the probability of a fault, the system will need to
classify the available data in groups.
• Image recognition: Machine learning can be used for face detection in an image as
well. There is a separate category for each person in a database of several people.
• Speech Recognition: It is the translation of spoken words into the text. It is used in
voice searches and more. Voice user interfaces include voice dialing, call routing,
and appliance control. It can also be used a simple data entry and the preparation of
structured documents.
• Medical diagnoses: ML is trained to recognize cancerous tissues.
• Financial industry: and trading: companies use ML in fraud investigations and credit
checks.
Types of Machine Learning?
Machine learning can be classified into 3 types of algorithms
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning

Anaconda Navigator
Anaconda Navigator is a desktop graphical user interface (GUI) included in
Anaconda distribution that allows users to launch applications and manage conda
packages, environments and channels without using command-line commands.
Navigator can search for packages on Anaconda Cloud or in a local Anaconda
Repository, install them in an environment, run the packages and update them. It is
available for Windows, macOS and Linux.
The following applications are available by default in Navigator:
JupyterLab, Jupyter Notebook, QtConsole[19], Spyder, Glue, Orange, RStudio, Visual Studio Code

3
Conda
Conda is an open source cross-platform, language-agnostic package manager and
environment management system that installs, runs, and updates packages and
their dependencies. It was created for Python programs, but it can package and
distribute software for any language (e.g., R), including multi-language projects. The
conda package and environment manager is included in all versions of Anaconda,
Miniconda, and Anaconda Repository.

Jupyter Notebook
Jupyter Notebook can colloquially refer to two different concepts, either the user-
facing application to edit code and text, or the underlying file format which is
interoperable across many implementations.

Jupyter Notebook (formerly IPython Notebook) is a web-based interactive

computational environment for creating notebook documents. Jupyter Notebook is
built using several open- source libraries, including IPython, ZeroMQ, Tornado,
jQuery, Bootstrap, and MathJax. A Jupyter Notebook application is a browser-based
REPL containing an ordered list of input/output cells which can contain code, text
(using Github Flavored Markdown), mathematics, plots and rich media.

Jupyter Notebook is similar to the notebook interface of other programs such as

Maple, Mathematica, and SageMath, a computational interface style that originated
with Mathematica in the 1980s.

4
1.Develop a program to create histograms for all numerical features and analyze the
distribution of each feature. Generate box plots for all numerical features and identify any
outliers. Use California Housing dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing

housing_data = fetch_california_housing() df =
pd.DataFrame(housing_data.data,
columns=housing_data.feature_names)

df.hist(figsize=(12, 8), bins=30, edgecolor='black')

plt.suptitle("Histograms of Numerical Features", fontsize=16)

plt.figure(figsize=(12, 8))
df.boxplot(rot=45)
plt.title("Box Plots of Numerical Features", fontsize=16) plt.show()

def detect_outliers(df):
outliers_dict = {}
for column in df.columns:
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df[column] < lower_bound) | (df[column] > upper_bound)]
outliers_dict[column] = outliers.shape[0]
return outliers_dict

outliers = detect_outliers(df)
print("Outlier count per feature:", outliers)

OUTPUT:-

Outlier count per feature: {'MedInc': 681, 'HouseAge': 0, 'AveRooms':511, 'AveBedrms': 1424, 'Population':
1196, 'AveOccup': 711, 'Latitude': 0, 'Longitude': 0}

5
2.Develop a program to Compute the correlation matrix to understand the relationships
between pairs of features. Visualize the correlation matrix using a heatmap to know which
variables have strong positive/negative correlations. Create a pair plot to visualize pairwise
relationships between features. Use California Housing dataset.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing

california_data = fetch_california_housing(as_frame=True) data =

california_data.frame

correlation_matrix = data.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Matrix of California Housing Features') plt.show()

sns.pairplot(data, diag_kind='kde', plot_kws={'alpha': 0.5}) plt.suptitle('Pair Plot of California

Housing Features', y=1.02) plt.show()

OUTPUT:-

6
3. Develop a program to implement Principal Component Analysis (PCA) for reducing the
dimensionality of the Iris dataset from 4 features to 2.

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt# Load the Iris dataset
iris = load_iris()
data = iris.data
labels = iris.target
label_names = iris.target_names# Convert to a DataFrame for better visualization
iris_df = pd.DataFrame(data, columns=iris.feature_names)# Perform PCA to reduce dimensionality to 2
pca = PCA(n_components=2)
data_reduced = pca.fit_transform(data)# Create a DataFrame for the reduced data
reduced_df = pd.DataFrame(data_reduced, columns=['Principal Component 1', 'Principal Component 2'])
reduced_df['Label'] = labels# Plot the reduced data
plt.figure(figsize=(8, 6))
colors = ['r', 'g', 'b']
for i, label inenumerate(np.unique(labels)):

plt.scatter(
reduced_df[reduced_df['Label'] == label]['Principal Component 1'],
reduced_df[reduced_df['Label'] == label]['Principal Component 2'], label=label_names[label],
color=colors[i])
plt.title('PCA on Iris Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.grid()
plt.show()

OUTPUT:-

7
4. For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Find-Salgorithm to output a description of the set of all hypotheses consistent with the
training examples

import pandas as pd

def find_s_algorithm(file_path):
data = pd.read_csv(file_path)

print("Training data:")
print(data)

attributes = data.columns[:-1]
class_label = data.columns[-1]

hypothesis = ['?'for _ in attributes]

for index, row in data.iterrows():

if row[class_label] == 'Yes':
for i, value inenumerate(row[attributes]):
if hypothesis[i] == '?'or hypothesis[i] == value:
hypothesis[i] = value
else:
hypothesis[i] = '?'

return hypothesis

file_path = 'sample.csv'
hypothesis = find_s_algorithm(file_path)
print("\nThe final hypothesis is:", hypothesis)

OUTPUT:-
Training data:
id first last gender Marks selected
1 John Doe M 85 Yes
2 Jane Smith F 90 No
3 Jim Brown M 75 Yes
4 Jill White F 88 No
The final hypothesis is: ['?', '?', '?', 'M', '?']

8
5. Develop a program to implement k-Nearest Neighbour algorithm to classify the randomly
generated 100 values of x in the range of [0,1]. Perform the following based on dataset
generated.

a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊ Class1, else xi ∊ Class1

b. Classify the remaining points, x51,……,x100 using KNN. Perform this for k=1,2,3,4,5,20,30

import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier

x_values = np.random.rand(100)
y_labels = np.array([1if x <= 0.5else2for x in x_values[:50]])

x_train = x_values[:50].reshape(-1, 1)
y_train = y_labels
x_test = x_values[50:].reshape(-1, 1)

def classify_and_plot(k_values):
plt.figure(figsize=(10, 6))
for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)
plt.scatter(x_test, y_pred, label=f'k={k}')

plt.xlabel('x values')
plt.ylabel('Predicted Class')
plt.title('KNN Classification of Random Points')
plt.legend()
plt.show()
k_values = [1, 2, 3, 4, 5, 20, 30]
classify_and_plot(k_values)

OUTPUT:-

9
6.Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.

import numpy as np
import matplotlib.pyplot as plt

def gaussian_kernel(x, xi, tau):

return np.exp(-np.sum((x - xi) ** 2) / (2 * tau ** 2))

def locally_weighted_regression(x, X, y, tau):

m = X.shape[0]
weights = np.array([gaussian_kernel(x, X[i], tau) for i in range(m)])
W = np.diag(weights)
X_transpose_W = X.T @ W
theta = np.linalg.inv(X_transpose_W @ X) @ X_transpose_W @ y

return x @ theta

np.random.seed(42)
X = np.linspace(0, 2 * np.pi, 100)
y = np.sin(X) + 0.1 * np.random.randn(100) X_bias =
np.c_[np.ones(X.shape), X]

x_test = np.linspace(0, 2 * np.pi, 200)

x_test_bias = np.c_[np.ones(x_test.shape), x_test]
tau = 0.5
y_pred = np.array([locally_weighted_regression(xi, X_bias, y, tau) for xi in x_test_bias])
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='red', label='Training Data', alpha=0.7) plt.plot(x_test, y_pred, color='blue',
label=f'LWR Fit (tau={tau})', linewidth=2)
plt.xlabel('X', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.title('Locally Weighted Regression', fontsize=14)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)
plt.show()

OUTPUT:-

10
7. Develop a program to demonstrate the working of Linear Regression and Polynomial
Regression. Use Boston Housing Dataset for Linear Regression and Auto MPG Dataset (for
vehicle fuel efficiency prediction) for Polynomial Regression.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures, StandardScaler from sklearn.pipeline import
make_pipeline
from sklearn.metrics import mean_squared_error, r2_score
def linear_regression_california():
housing = fetch_california_housing(as_frame=True) X =
housing.data[["AveRooms"]]
y = housing.target
X_train, X_test, y_train,
y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
plt.scatter(X_test, y_test, color="blue", label="Actual")
plt.plot(X_test, y_pred, color="red", label="Predicted”) plt.xlabel("Average number of
rooms (AveRooms)")
plt.ylabel("Median value of homes ($100,000)")
plt.title("Linear Regression - California Housing Dataset") plt.legend()
plt.show()
print("Linear Regression - California Housing Dataset")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred)) print("R^2 Score:",
r2_score(y_test, y_pred))

def polynomial_regression_auto_mpg():
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/ auto-mpg.data"
column_names = ["mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration",
"model_year", "origin"]
data = pd.read_csv(url, sep='\s+',
names=column_names, na_values="?")
data = data.dropna()
X = data["displacement"].values.reshape(-1, 1)
y = data["mpg"].values
X_train, X_test, y_train,
y_test = train_test_split(X, y, test_size=0.2, random_state=42)
poly_model = make_pipeline(PolynomialFeatures(degree=2), StandardScaler(),
LinearRegression())
poly_model.fit(X_train, y_train)
y_pred = poly_model.predict(X_test)
plt.scatter(X_test, y_test, color="blue", label="Actual") plt.scatter(X_test, y_pred,
color="red", label="Predicted") plt.xlabel("Displacement")
plt.ylabel("Miles per gallon (mpg)")
plt.title("Polynomial Regression - Auto MPG Dataset")

11
plt.legend()
plt.show()
print("Polynomial Regression - Auto MPG Dataset")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred)) print("R^2 Score:",
r2_score(y_test, y_pred))
if__name__ == "__main__":
print("Demonstrating Linear Regression and Polynomial Regression\ n")
linear_regression_california()
polynomial_regression_auto_mpg()

OUTPUT:-

Demonstrating Linear Regression and Polynomial Regression

Linear Regression - California Housing Dataset Mean Squared Error: 1.2923314440807299

R^2 Score: 0.013795337532284901

Polynomial Regression - Auto MPG Dataset Mean Squared Error: 0.743149055720586 R^2 Score:
0.7505650609469626

12
8. Develop a program to demonstrate the working of the decision tree algorithm. Use Breast
Cancer Data set for building the decision tree and apply this knowledge to classify a new
sample.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from
sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score
from sklearn import tree

data = load_breast_cancer()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy * 100:.2f}%") new_sample =
np.array([X_test[0]])
prediction = clf.predict(new_sample)

prediction_class = "Benign"if prediction == 1else"Malignant" print(f"Predicted Class for the new

sample: {prediction_class}")

plt.figure(figsize=(12, 8))
tree.plot_tree(
clf,
filled=True,
feature_names=data.feature_names.tolist(), # Ensure it's a list
class_names=data.target_names.tolist() # Convert to list
)
plt.title("Decision Tree - Breast Cancer Dataset")
plt.show()

OUTPUT:-

13
9. Develop a program to implement the Naive Bayesian classifier considering Olivetti Face
Data set for training. Compute the accuracy of the classifier, considering a few test data sets

import numpy as np
from sklearn.datasets import fetch_olivetti_faces
from sklearn.model_selection import train_test_split, cross_val_score from sklearn.naive_bayes import
GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt

data = fetch_olivetti_faces(shuffle=True, random_state=42) X = data.data

y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)

accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy:

{accuracy * 100:.2f}%')

print("\nClassification Report:")
print(classification_report(y_test, y_pred, zero_division=1))

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

cross_val_accuracy = cross_val_score(gnb, X, y, cv=5,

scoring='accuracy')
print(f'\nCross-validation accuracy: {cross_val_accuracy.mean() * 100:.2f}%')

fig, axes = plt.subplots(3, 5, figsize=(12, 8))

for ax, image, label, prediction inzip(axes.ravel(), X_test, y_test, y_pred):
ax.imshow(image.reshape(64, 64), cmap=plt.cm.gray)
ax.set_title(f"True: {label}, Pred: {prediction}")
ax.axis('off')

plt.show()

14
OUTPUT:-
Accuracy: 80.83%

Classification Report:

precision recall f1-score support

0 0.67 1.00 0.80 2

1 1.00 1.00 1.00 2
2 0.33 0.67 0.44 3
3 1.00 0.00 0.00 5
4 1.00 0.50 0.67 4
5 1.00 1.00 1.00 2
7 1.00 0.75 0.86 4
8 1.00 0.67 0.80 3
9 1.00 0.75 0.86 4
10 1.00 1.00 1.00 3
11 1.00 1.00 1.00 1
12 0.40 1.00 0.57 4
13 1.00 0.80 0.89 5
14 1.00 0.40 0.57 5
15 0.67 1.00 0.80 2
16 1.00 0.67 0.80 3
17 1.00 1.00 1.00 3
18 1.00 1.00 1.00 3
19 0.67 1.00 0.80 2
20 1.00 1.00 1.00 3
21 1.00 0.67 0.80 3
22 1.00 0.60 0.75 5
23 1.00 0.75 0.86 4
24 1.00 1.00 1.00 3
25 1.00 0.75 0.86 4
26 1.00 1.00 1.00 2
27 1.00 1.00 1.00 5
28 0.50 1.00 0.67 2
29 1.00 1.00 1.00 2
30 1.00 1.00 1.00 2
31 1.00 0.75 0.86 4
32 1.00 1.00 1.00 2
34 0.25 1.00 0.40 1
35 1.00 1.00 1.00 5
36 1.00 1.00 1.00 3
37 1.00 1.00 1.00 1
38 1.00 0.75 0.86 4
39 0.50 1.00 0.67 5
accuracy 0.81 120
macro avg 0.89 0.85 0.83 120
weighted avg 0.91 0.81 0.81 120
Confusion Matrix:
[[2 0 0 ... 0 0 0]
[0 2 0 ... 0 0 0]
[0 0 2 ... 0 0 1]
...
[0 0 0 ... 1 0 0]
[0 0 0 ... 0 3 0]
[0 0 0 ... 0 0 5]]

Cross-validation accuracy: 87.25%

15
10. Develop a program to implement k-means clustering using Wisconsin Breast Cancer data
set and visualize the clustering result.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix, classification_report

data = load_breast_cancer()
X = data.data
y = data.target

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

kmeans = KMeans(n_clusters=2, random_state=42) y_kmeans =

kmeans.fit_predict(X_scaled)

print("Confusion Matrix:")
print(confusion_matrix(y, y_kmeans))
print("\nClassification Report:")
print(classification_report(y, y_kmeans))

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

df = pd.DataFrame(X_pca, columns=['PC1', 'PC2']) df['Cluster'] =

y_kmeans
df['True Label'] = y

plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='Cluster', palette='Set1', s=100,
edgecolor='black', alpha=0.7)
plt.title('K-Means Clustering of Breast Cancer Dataset') plt.xlabel('Principal Component
1')
plt.ylabel('Principal Component 2')
plt.legend(title="Cluster")
plt.show()
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='True Label', palette='coolwarm', s=100, edgecolor='black',
alpha=0.7) plt.title('True Labels of Breast Cancer Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="True Label")
plt.show()

16
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='Cluster',
palette='Set1', s=100, edgecolor='black', alpha=0.7)
centers = pca.transform(kmeans.cluster_centers_)
plt.scatter(centers[:, 0], centers[:, 1], s=200, c='red', marker='X', label='Centroids')
plt.title('K-Means Clustering with Centroids')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="Cluster")
plt.show()

OUTPUT:-

17
VIVA QUESTIONS
1. What Does The Fetch_California_Housing() Function Do?
2. Why Are Histograms And Boxplots Used In The Code?
3. What Is The Purpose Of The Detect_Outlier() Function?
4. Why Is Plt.Figure() Used Before The Boxplot?
5. Why Is It Important To Detect Outliers In A Dataset?
6. What Is The Purpose Of The Fetch_California_Housing(As_Frame=True) Function?
7. What Does The Data.Corr() Function Return?
8. What Is The Significance Of Using A Heatmap In This Context?
9. Why Is Pairplot() Used In Data Analysis?
10. What Does Diag_Kind='Kde' Do In The Pairplot Function?
11. What Is The Purpose Of Using Standardscaler() In PCA?
12. What Does The Covariance Matrix Represent In PCA?
13. Why Are Eigenvalues And Eigenvectors Computed From The Covariance Matrix?
14. What Does Np.Argsort(Eigenvalues)[::-1] Achieve In The Code?
15. What Does The Final Scatter Plot Represent?
16. What Is The Main Goal Of The Find-S Algorithm?
17. Why Do We Initialize The Hypothesis With None Values?
18. What Is The Role Of The Condition If Row[Label_Column] == 'Yes'?
19. Why Do We Replace Values With '?' In The Hypothesis?
20. What Does The Final Hypothesis Represent?
21. What Is The Purpose Of Using The Kneighborsclassifier In This Code?
22. How Are Class Labels Assigned To The Training Data?
23. Why Is The Data Reshaped Using Reshape(-1, 1) Before Fitting The Model?
24. What Is The Effect Of Changing The Value Of K In KNN?
25. What Does The Final Scatter Plot Represent?
26. What Is The Role Of The Gaussian_Kernel Function In The Code?
27. Why Is A Bias Term (1) Added To The Input Features In The Locally Weighted Regression Function?
28. What Is The Purpose Of Using Np.Linalg.Pinv In The Locally Weighted Regression Function?
29. What Do The Different Subplots In The Final Plot Represent?
30. What Does The Tau Parameter Control In The Locally Weighted Regression Model?
31. What Is The Purpose Of The Decisiontreeclassifier In This Code?
32. Why Is The Dataset Split Into Training And Testing Sets Using Train_Test_Split?
33. How Is The Accuracy Of The Decision Tree Model Evaluated?
34. What Does The Line Prediction_Class = "Benign" If Prediction == 1 Else "Malignant" Do?
35. What Is The Purpose Of Plotting The Decision Tree Using Tree.Plot_Tree?
36. What Is The Purpose Of The Linearregression Model In The Linear_Regression_California() Function?
37. Why Is Train_Test_Split Used In Both Regression Functions?
38. What Does The Mean_Squared_Error Metric Tell You About The Model's Performance?
39. How Does Polynomial Regression Differ From Linear Regression In The Polynomial_Regression_Auto_Mpg()
40. What Is The Significance Of Using Make_Pipeline In Polynomial Regression?
41. What Is The Role Of The Gaussiannb Model In This Code?
42. Why Is The Dataset Split Into Training And Testing Sets Using Train_Test_Split?
43. What Does The Accuracy_Score Metric Measure In This Case?
44. How Does Cross_Val_Score Help Evaluate The Model?
45. What Is The Purpose Of Displaying The Images Of Test Samples With True And Predicted Labels?
46. What Is The Purpose Of Using Standardscaler In This Code?
47. What Does PCA (Principal Component Analysis) Do In This Code?
48. Why Is The Confusion_Matrix Used Here?
49. What Is The Significance Of The Scatter Plots In The Visualization?
50. Why Are The Cluster Centroids Marked In The Final Scatter Plot?

My Resume - 1558151431
No ratings yet
My Resume - 1558151431
7 pages
3 Using Farm Tools and Equipment (UFTE)
100% (5)
3 Using Farm Tools and Equipment (UFTE)
64 pages
Pig Farming Book 3
50% (4)
Pig Farming Book 3
67 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
43 pages
lab manual ML.docx
No ratings yet
lab manual ML.docx
26 pages
Machinelearninglabmanual
No ratings yet
Machinelearninglabmanual
47 pages
Bcsl606_lab Manual (1)
No ratings yet
Bcsl606_lab Manual (1)
28 pages
Machine Learning With Python
100% (2)
Machine Learning With Python
137 pages
Machine Learning (BCSL606) Lab Manual
No ratings yet
Machine Learning (BCSL606) Lab Manual
30 pages
Dsbda Lab Manual
No ratings yet
Dsbda Lab Manual
167 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
ML Lab Manual
No ratings yet
ML Lab Manual
90 pages
Skill Based Projects - Data - Science (See List On Last Page)
No ratings yet
Skill Based Projects - Data - Science (See List On Last Page)
4 pages
ML Lab Manual
No ratings yet
ML Lab Manual
40 pages
AIot Lab Syllabus
No ratings yet
AIot Lab Syllabus
4 pages
ml observation
No ratings yet
ml observation
29 pages
Project Report
No ratings yet
Project Report
37 pages
Module 4- ML-21EC744 (1)
No ratings yet
Module 4- ML-21EC744 (1)
18 pages
House Report
No ratings yet
House Report
26 pages
ML[1]
No ratings yet
ML[1]
49 pages
ML lab manual
No ratings yet
ML lab manual
25 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
60 pages
ML LAB
No ratings yet
ML LAB
20 pages
Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
With Python: Machine Learning
No ratings yet
With Python: Machine Learning
3 pages
Machine Learning Lab Record Report
No ratings yet
Machine Learning Lab Record Report
38 pages
MACHINE LEARNING LAB manual
No ratings yet
MACHINE LEARNING LAB manual
48 pages
BCSL606 MACHINE LEARNING LAB FINAL DRAFT
No ratings yet
BCSL606 MACHINE LEARNING LAB FINAL DRAFT
32 pages
ML Syllabus
No ratings yet
ML Syllabus
5 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
@vtudeveloper.in ISMLA Mod 5
No ratings yet
@vtudeveloper.in ISMLA Mod 5
30 pages
vishnu. ml
No ratings yet
vishnu. ml
26 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
CETM313 - Workshop Week 06-4
No ratings yet
CETM313 - Workshop Week 06-4
9 pages
IML Lab Manual
No ratings yet
IML Lab Manual
31 pages
Case Study 219302405
No ratings yet
Case Study 219302405
14 pages
Module 2
No ratings yet
Module 2
20 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
AI - ML Curriculum Powered by IBM - Pregrad
No ratings yet
AI - ML Curriculum Powered by IBM - Pregrad
31 pages
AIML Curriculum powered by IBM - Pregrad-merged
No ratings yet
AIML Curriculum powered by IBM - Pregrad-merged
66 pages
AIML-Curriculum by Pregrad
No ratings yet
AIML-Curriculum by Pregrad
33 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
BCSL606 MACHINE LEARNING LAB
No ratings yet
BCSL606 MACHINE LEARNING LAB
33 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Updated ML LAB Manual-2020-21
No ratings yet
Updated ML LAB Manual-2020-21
57 pages
ML LAB - BCSL606
No ratings yet
ML LAB - BCSL606
67 pages
ML Lab Manual Arpan
No ratings yet
ML Lab Manual Arpan
48 pages
House-Price-Prediction-Using-Regression-Techniques Retouch - Removed
No ratings yet
House-Price-Prediction-Using-Regression-Techniques Retouch - Removed
14 pages
Lecture4.pptx
No ratings yet
Lecture4.pptx
56 pages
NAME-Rajat Gupta Section - B2B2 (Marketing and Analytics) UID - 2019-1706-0001-0007
No ratings yet
NAME-Rajat Gupta Section - B2B2 (Marketing and Analytics) UID - 2019-1706-0001-0007
9 pages
Ml Lab Manual Completed
No ratings yet
Ml Lab Manual Completed
56 pages
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
100% (1)
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
256 pages
Lecture4
No ratings yet
Lecture4
56 pages
AIMLlatestmodule 2Notes Removed
No ratings yet
AIMLlatestmodule 2Notes Removed
33 pages
module_2
No ratings yet
module_2
35 pages
vamshi ml-1,2
No ratings yet
vamshi ml-1,2
25 pages
Chapter-14 Data Science
No ratings yet
Chapter-14 Data Science
12 pages
ML_lab
No ratings yet
ML_lab
30 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
Internship-Data Science and Machine Learning Using Python
No ratings yet
Internship-Data Science and Machine Learning Using Python
5 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
q
No ratings yet
q
25 pages
Bb499bac f087 4404 9d09 7c518140c0fd (1) Compressed Compressed Compressed
No ratings yet
Bb499bac f087 4404 9d09 7c518140c0fd (1) Compressed Compressed Compressed
49 pages
techinlseminr ppt
No ratings yet
techinlseminr ppt
12 pages
Reduced 20 Page Document (1)
No ratings yet
Reduced 20 Page Document (1)
20 pages
Original
No ratings yet
Original
4 pages
PTERIDOPHYTES WITH ANSWER
100% (2)
PTERIDOPHYTES WITH ANSWER
8 pages
Experiential Marketing
No ratings yet
Experiential Marketing
27 pages
Patrick Resume 2023
No ratings yet
Patrick Resume 2023
3 pages
Narrative threads Accounting and recounting in Andean Khipu 1st Edition Gary Urton - The ebook in PDF/DOCX format is available for instant download
100% (1)
Narrative threads Accounting and recounting in Andean Khipu 1st Edition Gary Urton - The ebook in PDF/DOCX format is available for instant download
48 pages
Vac Unit Vertical 24V A0FF00151-00 0: Revision History
No ratings yet
Vac Unit Vertical 24V A0FF00151-00 0: Revision History
4 pages
A Cannabinoid Hypothesis of Schizophrenia
No ratings yet
A Cannabinoid Hypothesis of Schizophrenia
6 pages
Influence of Gender
No ratings yet
Influence of Gender
10 pages
Inbound 6373544405030215126
No ratings yet
Inbound 6373544405030215126
1 page
Cat 390DL
100% (1)
Cat 390DL
32 pages
Omega Air Product Data Sheet Filter Element XR AF and AAF v4.00
No ratings yet
Omega Air Product Data Sheet Filter Element XR AF and AAF v4.00
2 pages
(eBook PDF) Inventing Arguments 2016 MLA Update 4th Edition download
100% (2)
(eBook PDF) Inventing Arguments 2016 MLA Update 4th Edition download
51 pages
Student Academic Profile: SAT Scores
No ratings yet
Student Academic Profile: SAT Scores
2 pages
City Government of Calapan Football Development Programme
No ratings yet
City Government of Calapan Football Development Programme
10 pages
Kalendvl
100% (1)
Kalendvl
209 pages
Maine Aqua Ventus Release 11/6/2013 - 11:00am
No ratings yet
Maine Aqua Ventus Release 11/6/2013 - 11:00am
5 pages
Gstarcad 2025 vs Zwcad 2025
No ratings yet
Gstarcad 2025 vs Zwcad 2025
12 pages
Bahan PPT e Proc
No ratings yet
Bahan PPT e Proc
15 pages
3870022962PL Beckman
No ratings yet
3870022962PL Beckman
14 pages
HACEP Design of Cheese Factory PDF
No ratings yet
HACEP Design of Cheese Factory PDF
53 pages
Daily News 2012 11 13
No ratings yet
Daily News 2012 11 13
5 pages
Research Proposal Hilma
No ratings yet
Research Proposal Hilma
14 pages
Fittings Ferrules Adaptors
No ratings yet
Fittings Ferrules Adaptors
46 pages
15+ resume
No ratings yet
15+ resume
3 pages
Power System Restructuring
No ratings yet
Power System Restructuring
54 pages
Catheter Choice
No ratings yet
Catheter Choice
13 pages
Brochure en
No ratings yet
Brochure en
8 pages

ML-LAB-Manual

Uploaded by

ML-LAB-Manual

Uploaded by

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnana Sangama, Belagavi-590018

Government Engineering College

Practice Lab Report

Machine Learning LAB (BCSL606)

Program 3: Develop a program to implement Principal Component Analysis

Program 4: For a given set of training data examples stored in a .CSV

6. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi

Program 7: Develop a program to demonstrate the working of Linear

Program 8: Develop a program to demonstrate the working of the decision

Program 9: Develop a program to implement the Naive Bayesian classifier

Program 10: Develop a program to implement k-means clustering using

12. Viva Questions

Jupyter Notebook (formerly IPython Notebook) is a web-based interactive

Jupyter Notebook is similar to the notebook interface of other programs such as

df.hist(figsize=(12, 8), bins=30, edgecolor='black')

california_data = fetch_california_housing(as_frame=True) data =

sns.pairplot(data, diag_kind='kde', plot_kws={'alpha': 0.5}) plt.suptitle('Pair Plot of California

hypothesis = ['?'for _ in attributes]

for index, row in data.iterrows():

def gaussian_kernel(x, xi, tau):

def locally_weighted_regression(x, X, y, tau):

x_test = np.linspace(0, 2 * np.pi, 200)

Demonstrating Linear Regression and Polynomial Regression

Linear Regression - California Housing Dataset Mean Squared Error: 1.2923314440807299

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

accuracy = accuracy_score(y_test, y_pred)

prediction_class = "Benign"if prediction == 1else"Malignant" print(f"Predicted Class for the new

data = fetch_olivetti_faces(shuffle=True, random_state=42) X = data.data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy:

cross_val_accuracy = cross_val_score(gnb, X, y, cv=5,

fig, axes = plt.subplots(3, 5, figsize=(12, 8))

precision recall f1-score support

0 0.67 1.00 0.80 2

Cross-validation accuracy: 87.25%

kmeans = KMeans(n_clusters=2, random_state=42) y_kmeans =

df = pd.DataFrame(X_pca, columns=['PC1', 'PC2']) df['Cluster'] =

You might also like