0% found this document useful (0 votes)
24 views

ML-LAB-Manual

The document is a practice lab report for a Machine Learning course at Visvesvaraya Technological University, detailing various experiments and programs to be developed using datasets like California Housing, Iris, and Boston Housing. It includes tasks such as creating histograms, implementing algorithms like k-Nearest Neighbour and Principal Component Analysis, and using tools like Anaconda and Jupyter Notebook. The report outlines the objectives, methodologies, and expected outcomes for each experiment, emphasizing the practical application of machine learning techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

ML-LAB-Manual

The document is a practice lab report for a Machine Learning course at Visvesvaraya Technological University, detailing various experiments and programs to be developed using datasets like California Housing, Iris, and Boston Housing. It includes tasks such as creating histograms, implementing algorithms like k-Nearest Neighbour and Principal Component Analysis, and using tools like Anaconda and Jupyter Notebook. The report outlines the objectives, methodologies, and expected outcomes for each experiment, emphasizing the practical application of machine learning techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnana Sangama, Belagavi-590018

Government Engineering College


Chamarajanagara - 571313
Department Of Computer Science And Engineering

Practice Lab Report

Machine Learning LAB (BCSL606)


Academic Year: 2024-25

Student Name :
Register Number :

Signature Signature
Lab Coradinator Head of the Department

1
CONTENT LIST
SL.NO. EXPERIMENT NAME PAGE NO.
1. Introduction
3-4
Program 1: Develop a program to create histograms for all numerical
2. features and analyze the distribution of each feature. Generate box plots for 5
all numerical features and identify any outliers. Use California Housing
dataset.
Program 2: Develop a program to Compute the correlation matrix to
understand the relationships between pairs of features. Visualize the
correlation matrix using a heatmap to know which variables have strong
3. positive/negative correlations. Create a pair plot to visualize pairwise 6
relationships between features. Use California Housing dataset.

Program 3: Develop a program to implement Principal Component Analysis


4. (PCA) for reducing the dimensionality of the Iris dataset from 4 features to 2. 7

Program 4: For a given set of training data examples stored in a .CSV


5. file, implement and demonstrate the Find-S algorithm to output a 8
description of the set of all hypotheses consistent with the training
examples.
Program 5: Develop a program to implement k-Nearest Neighbour algorithm
to classify the randomly generated 100 values of x in the range of [0,1].
Perform the following based on dataset generated.

6. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi


9
Class1, else xi Class1
Classify the remaining points, x51,……,x100 using KNN. Perform this for
k=1,2,3,4,5,20,30
Program 6: Implement the non-parametric Locally Weighted Regression
7. algorithm in order to fit data points. Select appropriate data set for your 10
experiment and draw graphs

Program 7: Develop a program to demonstrate the working of Linear


8. Regression and Polynomial Regression. Use Boston Housing Dataset for
Linear Regression and Auto MPG Dataset (for vehicle fuel efficiency
11-12
prediction) for Polynomial Regression.

Program 8: Develop a program to demonstrate the working of the decision


9. tree algorithm. Use Breast Cancer Data set for building the decision tree and 13
apply this knowledge to classify a new sample.

Program 9: Develop a program to implement the Naive Bayesian classifier


10. considering Olivetti Face Data set for training. Compute the accuracy of the 14-15
classifier, considering a few test data sets.

Program 10: Develop a program to implement k-means clustering using


11. Wisconsin Breast Cancer data set and visualize the clustering result. 16-17

12. Viva Questions


18

2
INTRODUCTION
Machine Learning
Machine Learning is used anywhere from automating mundane tasks to offering
intelligent insights, industries in every sector try to benefit from it. You may already
be using a device that utilizes it. For example, a wearable fitness tracker like Fitbit,
or an intelligent home assistant like Google Home. But there are much more
examples of ML in use.
• Prediction: Machine learning can also be used in the prediction systems. Considering
the loan example, to compute the probability of a fault, the system will need to
classify the available data in groups.
• Image recognition: Machine learning can be used for face detection in an image as
well. There is a separate category for each person in a database of several people.
• Speech Recognition: It is the translation of spoken words into the text. It is used in
voice searches and more. Voice user interfaces include voice dialing, call routing,
and appliance control. It can also be used a simple data entry and the preparation of
structured documents.
• Medical diagnoses: ML is trained to recognize cancerous tissues.
• Financial industry: and trading: companies use ML in fraud investigations and credit
checks.
Types of Machine Learning?
Machine learning can be classified into 3 types of algorithms
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning

Anaconda Navigator
Anaconda Navigator is a desktop graphical user interface (GUI) included in
Anaconda distribution that allows users to launch applications and manage conda
packages, environments and channels without using command-line commands.
Navigator can search for packages on Anaconda Cloud or in a local Anaconda
Repository, install them in an environment, run the packages and update them. It is
available for Windows, macOS and Linux.
The following applications are available by default in Navigator:
JupyterLab, Jupyter Notebook, QtConsole[19], Spyder, Glue, Orange, RStudio, Visual Studio Code

3
Conda
Conda is an open source cross-platform, language-agnostic package manager and
environment management system that installs, runs, and updates packages and
their dependencies. It was created for Python programs, but it can package and
distribute software for any language (e.g., R), including multi-language projects. The
conda package and environment manager is included in all versions of Anaconda,
Miniconda, and Anaconda Repository.

Jupyter Notebook
Jupyter Notebook can colloquially refer to two different concepts, either the user-
facing application to edit code and text, or the underlying file format which is
interoperable across many implementations.

Jupyter Notebook (formerly IPython Notebook) is a web-based interactive


computational environment for creating notebook documents. Jupyter Notebook is
built using several open- source libraries, including IPython, ZeroMQ, Tornado,
jQuery, Bootstrap, and MathJax. A Jupyter Notebook application is a browser-based
REPL containing an ordered list of input/output cells which can contain code, text
(using Github Flavored Markdown), mathematics, plots and rich media.

Jupyter Notebook is similar to the notebook interface of other programs such as


Maple, Mathematica, and SageMath, a computational interface style that originated
with Mathematica in the 1980s.

4
1.Develop a program to create histograms for all numerical features and analyze the
distribution of each feature. Generate box plots for all numerical features and identify any
outliers. Use California Housing dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing

housing_data = fetch_california_housing() df =
pd.DataFrame(housing_data.data,
columns=housing_data.feature_names)

df.hist(figsize=(12, 8), bins=30, edgecolor='black')


plt.suptitle("Histograms of Numerical Features", fontsize=16)

plt.figure(figsize=(12, 8))
df.boxplot(rot=45)
plt.title("Box Plots of Numerical Features", fontsize=16) plt.show()

def detect_outliers(df):
outliers_dict = {}
for column in df.columns:
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df[column] < lower_bound) | (df[column] > upper_bound)]
outliers_dict[column] = outliers.shape[0]
return outliers_dict

outliers = detect_outliers(df)
print("Outlier count per feature:", outliers)

OUTPUT:-

Outlier count per feature: {'MedInc': 681, 'HouseAge': 0, 'AveRooms':511, 'AveBedrms': 1424, 'Population':
1196, 'AveOccup': 711, 'Latitude': 0, 'Longitude': 0}

5
2.Develop a program to Compute the correlation matrix to understand the relationships
between pairs of features. Visualize the correlation matrix using a heatmap to know which
variables have strong positive/negative correlations. Create a pair plot to visualize pairwise
relationships between features. Use California Housing dataset.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing

california_data = fetch_california_housing(as_frame=True) data =


california_data.frame

correlation_matrix = data.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Matrix of California Housing Features') plt.show()

sns.pairplot(data, diag_kind='kde', plot_kws={'alpha': 0.5}) plt.suptitle('Pair Plot of California


Housing Features', y=1.02) plt.show()

OUTPUT:-

6
3. Develop a program to implement Principal Component Analysis (PCA) for reducing the
dimensionality of the Iris dataset from 4 features to 2.

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt# Load the Iris dataset
iris = load_iris()
data = iris.data
labels = iris.target
label_names = iris.target_names# Convert to a DataFrame for better visualization
iris_df = pd.DataFrame(data, columns=iris.feature_names)# Perform PCA to reduce dimensionality to 2
pca = PCA(n_components=2)
data_reduced = pca.fit_transform(data)# Create a DataFrame for the reduced data
reduced_df = pd.DataFrame(data_reduced, columns=['Principal Component 1', 'Principal Component 2'])
reduced_df['Label'] = labels# Plot the reduced data
plt.figure(figsize=(8, 6))
colors = ['r', 'g', 'b']
for i, label inenumerate(np.unique(labels)):

plt.scatter(
reduced_df[reduced_df['Label'] == label]['Principal Component 1'],
reduced_df[reduced_df['Label'] == label]['Principal Component 2'], label=label_names[label],
color=colors[i])
plt.title('PCA on Iris Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.grid()
plt.show()

OUTPUT:-

7
4. For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Find-Salgorithm to output a description of the set of all hypotheses consistent with the
training examples

import pandas as pd

def find_s_algorithm(file_path):
data = pd.read_csv(file_path)

print("Training data:")
print(data)

attributes = data.columns[:-1]
class_label = data.columns[-1]

hypothesis = ['?'for _ in attributes]

for index, row in data.iterrows():


if row[class_label] == 'Yes':
for i, value inenumerate(row[attributes]):
if hypothesis[i] == '?'or hypothesis[i] == value:
hypothesis[i] = value
else:
hypothesis[i] = '?'

return hypothesis

file_path = 'sample.csv'
hypothesis = find_s_algorithm(file_path)
print("\nThe final hypothesis is:", hypothesis)

OUTPUT:-
Training data:
id first last gender Marks selected
1 John Doe M 85 Yes
2 Jane Smith F 90 No
3 Jim Brown M 75 Yes
4 Jill White F 88 No
The final hypothesis is: ['?', '?', '?', 'M', '?']

8
5. Develop a program to implement k-Nearest Neighbour algorithm to classify the randomly
generated 100 values of x in the range of [0,1]. Perform the following based on dataset
generated.

a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊ Class1, else xi ∊ Class1

b. Classify the remaining points, x51,……,x100 using KNN. Perform this for k=1,2,3,4,5,20,30

import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier

x_values = np.random.rand(100)
y_labels = np.array([1if x <= 0.5else2for x in x_values[:50]])

x_train = x_values[:50].reshape(-1, 1)
y_train = y_labels
x_test = x_values[50:].reshape(-1, 1)

def classify_and_plot(k_values):
plt.figure(figsize=(10, 6))
for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)
plt.scatter(x_test, y_pred, label=f'k={k}')

plt.xlabel('x values')
plt.ylabel('Predicted Class')
plt.title('KNN Classification of Random Points')
plt.legend()
plt.show()
k_values = [1, 2, 3, 4, 5, 20, 30]
classify_and_plot(k_values)

OUTPUT:-

9
6.Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.

import numpy as np
import matplotlib.pyplot as plt

def gaussian_kernel(x, xi, tau):


return np.exp(-np.sum((x - xi) ** 2) / (2 * tau ** 2))

def locally_weighted_regression(x, X, y, tau):


m = X.shape[0]
weights = np.array([gaussian_kernel(x, X[i], tau) for i in range(m)])
W = np.diag(weights)
X_transpose_W = X.T @ W
theta = np.linalg.inv(X_transpose_W @ X) @ X_transpose_W @ y

return x @ theta

np.random.seed(42)
X = np.linspace(0, 2 * np.pi, 100)
y = np.sin(X) + 0.1 * np.random.randn(100) X_bias =
np.c_[np.ones(X.shape), X]

x_test = np.linspace(0, 2 * np.pi, 200)


x_test_bias = np.c_[np.ones(x_test.shape), x_test]
tau = 0.5
y_pred = np.array([locally_weighted_regression(xi, X_bias, y, tau) for xi in x_test_bias])
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='red', label='Training Data', alpha=0.7) plt.plot(x_test, y_pred, color='blue',
label=f'LWR Fit (tau={tau})', linewidth=2)
plt.xlabel('X', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.title('Locally Weighted Regression', fontsize=14)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)
plt.show()

OUTPUT:-

10
7. Develop a program to demonstrate the working of Linear Regression and Polynomial
Regression. Use Boston Housing Dataset for Linear Regression and Auto MPG Dataset (for
vehicle fuel efficiency prediction) for Polynomial Regression.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures, StandardScaler from sklearn.pipeline import
make_pipeline
from sklearn.metrics import mean_squared_error, r2_score
def linear_regression_california():
housing = fetch_california_housing(as_frame=True) X =
housing.data[["AveRooms"]]
y = housing.target
X_train, X_test, y_train,
y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
plt.scatter(X_test, y_test, color="blue", label="Actual")
plt.plot(X_test, y_pred, color="red", label="Predicted”) plt.xlabel("Average number of
rooms (AveRooms)")
plt.ylabel("Median value of homes ($100,000)")
plt.title("Linear Regression - California Housing Dataset") plt.legend()
plt.show()
print("Linear Regression - California Housing Dataset")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred)) print("R^2 Score:",
r2_score(y_test, y_pred))

def polynomial_regression_auto_mpg():
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/ auto-mpg.data"
column_names = ["mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration",
"model_year", "origin"]
data = pd.read_csv(url, sep='\s+',
names=column_names, na_values="?")
data = data.dropna()
X = data["displacement"].values.reshape(-1, 1)
y = data["mpg"].values
X_train, X_test, y_train,
y_test = train_test_split(X, y, test_size=0.2, random_state=42)
poly_model = make_pipeline(PolynomialFeatures(degree=2), StandardScaler(),
LinearRegression())
poly_model.fit(X_train, y_train)
y_pred = poly_model.predict(X_test)
plt.scatter(X_test, y_test, color="blue", label="Actual") plt.scatter(X_test, y_pred,
color="red", label="Predicted") plt.xlabel("Displacement")
plt.ylabel("Miles per gallon (mpg)")
plt.title("Polynomial Regression - Auto MPG Dataset")

11
plt.legend()
plt.show()
print("Polynomial Regression - Auto MPG Dataset")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred)) print("R^2 Score:",
r2_score(y_test, y_pred))
if__name__ == "__main__":
print("Demonstrating Linear Regression and Polynomial Regression\ n")
linear_regression_california()
polynomial_regression_auto_mpg()

OUTPUT:-

Demonstrating Linear Regression and Polynomial Regression

Linear Regression - California Housing Dataset Mean Squared Error: 1.2923314440807299


R^2 Score: 0.013795337532284901

Polynomial Regression - Auto MPG Dataset Mean Squared Error: 0.743149055720586 R^2 Score:
0.7505650609469626

12
8. Develop a program to demonstrate the working of the decision tree algorithm. Use Breast
Cancer Data set for building the decision tree and apply this knowledge to classify a new
sample.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from
sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score
from sklearn import tree

data = load_breast_cancer()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)


print(f"Model Accuracy: {accuracy * 100:.2f}%") new_sample =
np.array([X_test[0]])
prediction = clf.predict(new_sample)

prediction_class = "Benign"if prediction == 1else"Malignant" print(f"Predicted Class for the new


sample: {prediction_class}")

plt.figure(figsize=(12, 8))
tree.plot_tree(
clf,
filled=True,
feature_names=data.feature_names.tolist(), # Ensure it's a list
class_names=data.target_names.tolist() # Convert to list
)
plt.title("Decision Tree - Breast Cancer Dataset")
plt.show()

OUTPUT:-

13
9. Develop a program to implement the Naive Bayesian classifier considering Olivetti Face
Data set for training. Compute the accuracy of the classifier, considering a few test data sets

import numpy as np
from sklearn.datasets import fetch_olivetti_faces
from sklearn.model_selection import train_test_split, cross_val_score from sklearn.naive_bayes import
GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt

data = fetch_olivetti_faces(shuffle=True, random_state=42) X = data.data


y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)

accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy:


{accuracy * 100:.2f}%')

print("\nClassification Report:")
print(classification_report(y_test, y_pred, zero_division=1))

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

cross_val_accuracy = cross_val_score(gnb, X, y, cv=5,


scoring='accuracy')
print(f'\nCross-validation accuracy: {cross_val_accuracy.mean() * 100:.2f}%')

fig, axes = plt.subplots(3, 5, figsize=(12, 8))


for ax, image, label, prediction inzip(axes.ravel(), X_test, y_test, y_pred):
ax.imshow(image.reshape(64, 64), cmap=plt.cm.gray)
ax.set_title(f"True: {label}, Pred: {prediction}")
ax.axis('off')

plt.show()

14
OUTPUT:-
Accuracy: 80.83%

Classification Report:

precision recall f1-score support

0 0.67 1.00 0.80 2


1 1.00 1.00 1.00 2
2 0.33 0.67 0.44 3
3 1.00 0.00 0.00 5
4 1.00 0.50 0.67 4
5 1.00 1.00 1.00 2
7 1.00 0.75 0.86 4
8 1.00 0.67 0.80 3
9 1.00 0.75 0.86 4
10 1.00 1.00 1.00 3
11 1.00 1.00 1.00 1
12 0.40 1.00 0.57 4
13 1.00 0.80 0.89 5
14 1.00 0.40 0.57 5
15 0.67 1.00 0.80 2
16 1.00 0.67 0.80 3
17 1.00 1.00 1.00 3
18 1.00 1.00 1.00 3
19 0.67 1.00 0.80 2
20 1.00 1.00 1.00 3
21 1.00 0.67 0.80 3
22 1.00 0.60 0.75 5
23 1.00 0.75 0.86 4
24 1.00 1.00 1.00 3
25 1.00 0.75 0.86 4
26 1.00 1.00 1.00 2
27 1.00 1.00 1.00 5
28 0.50 1.00 0.67 2
29 1.00 1.00 1.00 2
30 1.00 1.00 1.00 2
31 1.00 0.75 0.86 4
32 1.00 1.00 1.00 2
34 0.25 1.00 0.40 1
35 1.00 1.00 1.00 5
36 1.00 1.00 1.00 3
37 1.00 1.00 1.00 1
38 1.00 0.75 0.86 4
39 0.50 1.00 0.67 5
accuracy 0.81 120
macro avg 0.89 0.85 0.83 120
weighted avg 0.91 0.81 0.81 120
Confusion Matrix:
[[2 0 0 ... 0 0 0]
[0 2 0 ... 0 0 0]
[0 0 2 ... 0 0 1]
...
[0 0 0 ... 1 0 0]
[0 0 0 ... 0 3 0]
[0 0 0 ... 0 0 5]]

Cross-validation accuracy: 87.25%

15
10. Develop a program to implement k-means clustering using Wisconsin Breast Cancer data
set and visualize the clustering result.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix, classification_report

data = load_breast_cancer()
X = data.data
y = data.target

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

kmeans = KMeans(n_clusters=2, random_state=42) y_kmeans =


kmeans.fit_predict(X_scaled)

print("Confusion Matrix:")
print(confusion_matrix(y, y_kmeans))
print("\nClassification Report:")
print(classification_report(y, y_kmeans))

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

df = pd.DataFrame(X_pca, columns=['PC1', 'PC2']) df['Cluster'] =


y_kmeans
df['True Label'] = y

plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='Cluster', palette='Set1', s=100,
edgecolor='black', alpha=0.7)
plt.title('K-Means Clustering of Breast Cancer Dataset') plt.xlabel('Principal Component
1')
plt.ylabel('Principal Component 2')
plt.legend(title="Cluster")
plt.show()
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='True Label', palette='coolwarm', s=100, edgecolor='black',
alpha=0.7) plt.title('True Labels of Breast Cancer Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="True Label")
plt.show()

16
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='Cluster',
palette='Set1', s=100, edgecolor='black', alpha=0.7)
centers = pca.transform(kmeans.cluster_centers_)
plt.scatter(centers[:, 0], centers[:, 1], s=200, c='red', marker='X', label='Centroids')
plt.title('K-Means Clustering with Centroids')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="Cluster")
plt.show()

OUTPUT:-

17
VIVA QUESTIONS
1. What Does The Fetch_California_Housing() Function Do?
2. Why Are Histograms And Boxplots Used In The Code?
3. What Is The Purpose Of The Detect_Outlier() Function?
4. Why Is Plt.Figure() Used Before The Boxplot?
5. Why Is It Important To Detect Outliers In A Dataset?
6. What Is The Purpose Of The Fetch_California_Housing(As_Frame=True) Function?
7. What Does The Data.Corr() Function Return?
8. What Is The Significance Of Using A Heatmap In This Context?
9. Why Is Pairplot() Used In Data Analysis?
10. What Does Diag_Kind='Kde' Do In The Pairplot Function?
11. What Is The Purpose Of Using Standardscaler() In PCA?
12. What Does The Covariance Matrix Represent In PCA?
13. Why Are Eigenvalues And Eigenvectors Computed From The Covariance Matrix?
14. What Does Np.Argsort(Eigenvalues)[::-1] Achieve In The Code?
15. What Does The Final Scatter Plot Represent?
16. What Is The Main Goal Of The Find-S Algorithm?
17. Why Do We Initialize The Hypothesis With None Values?
18. What Is The Role Of The Condition If Row[Label_Column] == 'Yes'?
19. Why Do We Replace Values With '?' In The Hypothesis?
20. What Does The Final Hypothesis Represent?
21. What Is The Purpose Of Using The Kneighborsclassifier In This Code?
22. How Are Class Labels Assigned To The Training Data?
23. Why Is The Data Reshaped Using Reshape(-1, 1) Before Fitting The Model?
24. What Is The Effect Of Changing The Value Of K In KNN?
25. What Does The Final Scatter Plot Represent?
26. What Is The Role Of The Gaussian_Kernel Function In The Code?
27. Why Is A Bias Term (1) Added To The Input Features In The Locally Weighted Regression Function?
28. What Is The Purpose Of Using Np.Linalg.Pinv In The Locally Weighted Regression Function?
29. What Do The Different Subplots In The Final Plot Represent?
30. What Does The Tau Parameter Control In The Locally Weighted Regression Model?
31. What Is The Purpose Of The Decisiontreeclassifier In This Code?
32. Why Is The Dataset Split Into Training And Testing Sets Using Train_Test_Split?
33. How Is The Accuracy Of The Decision Tree Model Evaluated?
34. What Does The Line Prediction_Class = "Benign" If Prediction == 1 Else "Malignant" Do?
35. What Is The Purpose Of Plotting The Decision Tree Using Tree.Plot_Tree?
36. What Is The Purpose Of The Linearregression Model In The Linear_Regression_California() Function?
37. Why Is Train_Test_Split Used In Both Regression Functions?
38. What Does The Mean_Squared_Error Metric Tell You About The Model's Performance?
39. How Does Polynomial Regression Differ From Linear Regression In The Polynomial_Regression_Auto_Mpg()
40. What Is The Significance Of Using Make_Pipeline In Polynomial Regression?
41. What Is The Role Of The Gaussiannb Model In This Code?
42. Why Is The Dataset Split Into Training And Testing Sets Using Train_Test_Split?
43. What Does The Accuracy_Score Metric Measure In This Case?
44. How Does Cross_Val_Score Help Evaluate The Model?
45. What Is The Purpose Of Displaying The Images Of Test Samples With True And Predicted Labels?
46. What Is The Purpose Of Using Standardscaler In This Code?
47. What Does PCA (Principal Component Analysis) Do In This Code?
48. Why Is The Confusion_Matrix Used Here?
49. What Is The Significance Of The Scatter Plots In The Visualization?
50. Why Are The Cluster Centroids Marked In The Final Scatter Plot?

18

You might also like