0% found this document useful (0 votes)
2 views

ML Lab Manual

The document outlines practical exercises for a Machine Learning lab course at Arasu Engineering College, focusing on various machine learning techniques such as linear regression, binary classification, KNN classification, and k-means clustering. Each exercise includes aims, procedures, and expected results, guiding students through the implementation of models using real datasets. The document emphasizes the importance of data preparation, model training, evaluation, and hyperparameter tuning.

Uploaded by

vardhinijothi11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ML Lab Manual

The document outlines practical exercises for a Machine Learning lab course at Arasu Engineering College, focusing on various machine learning techniques such as linear regression, binary classification, KNN classification, and k-means clustering. Each exercise includes aims, procedures, and expected results, guiding students through the implementation of models using real datasets. The document emphasizes the importance of data preparation, model training, evaluation, and hyperparameter tuning.

Uploaded by

vardhinijothi11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ARASU ENGINEERING COLLEGE

(Approved by AICTE, New Delhi | Affiliated to Anna University, Chennai


|NAAC Accredited | ISO 9001:2008 Certified)
Chennai Main Road, Kumbakonam – 612501

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

(NBA Accredited)

CP4252 – MACHINE LEARNING


COMPONENT LAB
(PRACTICAL EXERCISES)

II SEMESTER – R 2021

Prepared By: Approved By:


Mr.S.Alaguganesan.,M.E Dr. Kalaimani Shanmugam,
Mr.M.A.Mohamed Aslam., M.E Professor & Head
Assistant Professor, Department of CSE
Department of CSE
🐉

EX.NO: 1

Implement a Linear Regression with a Real Dataset Experiment with different


features in building a model. Tune the model hyperparameter

AIM:

To implement a linear regression model with a real dataset and experiment with different features, as well as tune
the model hyperparameters, we will use the "Housing Prices" dataset from Kaggle.

 Step 1: Import the necessary libraries and load the Housing Prices dataset

import numpy as np

import pandas as pd

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error

# Load the Housing Prices dataset (replace 'housing_prices.csv' with the actual file
path)

data_url = "https://example.com/housing_prices.csv"

df = pd.read_csv(data_url)

 Step 2: Prepare the data and define the features and target
# Select the features and target variable
features = ['OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF']
target = 'SalePrice'
# Separate the features (X) and the target (y)
X = df[features]
y = df[target]
 Step 3: Split the data into training and test sets

# Split the data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
 Step 4: Train and evaluate the linear regression model

# Create a linear regression model


model = LinearRegression()

# Train the model on the training data


model.fit(X_train, y_train)

2
🐉

# Make predictions on the test set


y_pred = model.predict(X_test)

# Evaluate the model using mean squared error


mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
 Step 5: Experiment with different features and hyperparameter tuning
 You can experiment with different combinations of features by modifying the features list. Additionally,
you can tune the hyperparameters of the linear regression model using techniques such as grid search or
randomized search. Here's an example of tuning the fit_intercept hyperparameter:
# Perform hyperparameter tuning
fit_intercept_values = [True, False]

for fit_intercept in fit_intercept_values:


# Create a linear regression model with the selected hyperparameter value
model = LinearRegression(fit_intercept=fit_intercept)

# Train the model on the training data


model.fit(X_train, y_train)

# Make predictions on the test set


y_pred = model.predict(X_test)

# Evaluate the model using mean squared error


mse = mean_squared_error(y_test, y_pred)
print(f"Fit Intercept: {fit_intercept}")
print(f"Mean Squared Error: {mse}")
print("------------------------------")

RESULTS:

In this code, we iterate over different values of the fit_intercept hyperparameter (True and False) and create a
linear regression model with each value. We then train the model, make predictions on the test set, and evaluate
the model's performance using mean squared error. This allows you to assess how the hyperparameter influences
the model's performance.

3
🐉

Implement a binary classification model. That is, answers a binary question such as
EX.NO: 2 "Are houses in this neighborhood above a certain price?"(use data from exercise 1).
Modify the classification threshold and determine how that modification influences
the model. Experiment with different classification metrics to determine your
model's effectiveness.

AIM:

To implement a binary classification model that answers a question such as "Are houses in this neighborhood
above a certain price?" using the California Housing Dataset from exercise 1, and experiment with different
classification thresholds and metrics.

PROCEDURE:

 Step 1: Import the necessary libraries and load the California Housing Dataset (from exercise 1)
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
roc_auc_score
# Load the California Housing Dataset (assuming it is loaded as 'df' from exercise 1)
 Step 2: Prepare the data and define the binary classification target

# Assume the target is whether the median house value is above a certain threshold
(e.g., $200,000)

threshold = 200000

df['above_threshold'] = (df['median_house_value'] > threshold).astype(int)

# Separate the features (X) and the binary target (y)

X = df.drop(['median_house_value', 'above_threshold'], axis=1)

y = df['above_threshold']

 Step 3: Split the data into training and test sets

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

 Step 4: Train a binary classification model (e.g., logistic regression)

# Create a logistic regression model

4
🐉

model = LogisticRegression()

# Train the model on the training data

model.fit(X_train, y_train)

 Step 5: Evaluate the model's performance with different classification thresholds and metrics
# Predict probabilities for the test set
y_pred_probs = model.predict_proba(X_test)[:, 1] # Probability of class 1 (above
threshold)

# Define different classification thresholds


thresholds = [0.25, 0.5, 0.75]

for threshold in thresholds:


# Convert probabilities to binary predictions based on the threshold
y_pred = (y_pred_probs > threshold).astype(int)

# Calculate and print classification metrics


accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_pred_probs)

print(f"Threshold: {threshold}")
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")
print(f"ROC AUC: {roc_auc}")
print("------------------------------")

RESULTS:

In this code, we train a binary classification model using logistic regression. We define a threshold to determine
the class predictions (above or below the specified house price threshold) was implemented successfully.

5
🐉

EX.NO: 3
Classification with Nearest Neighbors. In this question, you will use the scikit-
learn’s KNN classifier to classify real vs. fake news headlines.

AIM:

To classify real vs. fake news headlines using scikit-learn's KNN classifier and perform a training/validation split,
we will need to use a dataset related to news headlines.

PROCEDURE:

The general process of using scikit-learn's KNN classifier with a different dataset that includes real vs. Fake news
headlines. Let's assume we have a suitable dataset called the "News Headlines Dataset" for this task.

 Step 1: Import the necessary libraries

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score

 Step 2: Load and preprocess the News Headlines Dataset

# Load the dataset (replace 'news_headlines.csv' with the actual file path or URL)

data_url = "https://example.com/news_headlines.csv"

df = pd.read_csv(data_url)

# Separate the features (headline) and target (real vs. fake)

X = df["headline"].values

y = df["label"].values

# Preprocess the data if necessary (e.g., text cleaning, feature extraction, etc.)

# You may need to convert the headline text into numerical features (e.g., using TF-
IDF, word embeddings, etc.)

 Step 3: Split the data into training and validation sets

# Split the data into training and validation sets

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2,


random_state=42)
6
🐉

 Step 4: Vectorize the headlines (if necessary)

# Vectorize the headlines (if they are in textual form) using appropriate techniques like
TF-IDF, word embeddings, etc.

# The vectorization process will convert the headlines into numerical features suitable
for training the KNN classifier

# You can use libraries like scikit-learn's CountVectorizer, TfidfVectorizer, or pre-


trained word embeddings like Word2Vec, GloVe, etc.

 Step 5: Train and evaluate the KNN classifier

# Create a KNN classifier object

knn = KNeighborsClassifier(n_neighbors=5)

# Train the KNN classifier on the training data

knn.fit(X_train, y_train)

# Make predictions on the validation data

y_val_pred = knn.predict(X_val)

# Evaluate the accuracy of the KNN classifier on the validation data

accuracy = accuracy_score(y_val, y_val_pred)

print(f"Validation Accuracy: {accuracy}")

 Step 6: Perform further analysis and improvements

# Analyze the validation results and make improvements to the model if necessary

# You can try different values of 'n_neighbors' or explore other hyperparameters and
techniques to improve the model's performance

# You can also perform additional preprocessing steps, feature engineering, or use
more advanced models if needed

RESULTS:

The code performs a training/validation split, trains the KNN classifier on the training data, makes predictions on
the validation data, and evaluates the accuracy of the classifier.

7
🐉

Experiment with validation sets and test sets using the dataset. Split a training set
EX.NO: 4 into a smaller training set and a validation set. Analyze deltas between training set
and validation set results. Test the trained model with a test set to determine
whether your trained model is overfitting. Detect and fix a common training
problem.

AIM:

To experiment with validation sets and test sets, and analyze the deltas between training set and validation set
results, as well as detect and fix overfitting, we can modify the previous implementation.

PROCEDURE:

 Step 1: Import the necessary libraries (same as before)

import numpy as np

import pandas as pd

from sklearn.cluster import KMeans

from sklearn.preprocessing import LabelEncoder, StandardScaler

from sklearn.model_selection import train_test_split

 Step 2: Load and preprocess the dataset (same as before)

# Load the dataset

data_url = "https://archive.ics.uci.edu/ml/machine-learning-
databases/00323/data.zip"

df = pd.read_csv(data_url, compression='zip')

# Drop irrelevant columns

df = df.drop(["Kingdom", "DNAtype", "Species"], axis=1)

# Convert categorical data to numerical using LabelEncoder

label_encoder = LabelEncoder()

df_encoded = df.apply(label_encoder.fit_transform)

# Scale the data

scaler = StandardScaler()

df_scaled = scaler.fit_transform(df_encoded)

 Step 3: Split the data into training, validation, and test sets

8
🐉

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(df_scaled, df_encoded,


test_size=0.2, random_state=42)

# Split the training set into smaller training and validation sets

X_train_small, X_val, y_train_small, y_val = train_test_split(X_train, y_train,


test_size=0.2, random_state=42)

 Step 4: Perform k-means clustering on the smaller training set and evaluate on the validation set

# Set the number of clusters (k)

k=3

# Create a k-means object

kmeans = KMeans(n_clusters=k, random_state=42)

# Fit the k-means model to the smaller training set

kmeans.fit(X_train_small)

# Get the cluster labels for the validation set

val_labels = kmeans.predict(X_val)

# Get the cluster labels for the smaller training set

train_labels = kmeans.labels_

 Step 5: Analyze the deltas between training set and validation set results

# Compare the cluster labels between training set and validation set

train_val_delta = np.mean(train_labels != val_labels)

print(f"Training vs. Validation set delta: {train_val_delta}")

Step 6: Test the trained model on the test set

# Get the cluster labels for the test set

test_labels = kmeans.predict(X_test)

# Compare the cluster labels between training set and test set

train_test_delta = np.mean(train_labels != test_labels)

print(f"Training vs. Test set delta: {train_test_delta}")

9
🐉

RESULT:

Thus incorporated the splitting of the training set into a smaller training set and a validation set, analyzed the
deltas between the training set and validation set results, and tested the trained model on a separate test set to
check for overfitting implemented successfully.

10 
🐉

EX.NO: 5 Implement the k-means algorithm using


https://archive.ics.uci.edu/ml/datasets/Codon+usage dataset

AIM:

To implement the k-means algorithm using https://archive.ics.uci.edu/ml/datasets/Codon+usage dataset

PROCEDURE:

 Certainly! The k-means algorithm is an unsupervised machine learning algorithm used for clustering. I

will guide you through implementing the k-means algorithm using the Codon usage dataset from the UCI

Machine Learning Repository.

 Before we start, please note that the Codon usage dataset consists of categorical data, and k-means is

typically used for numerical data. However, we can still apply k-means by converting the categorical data

into numerical representations.

 Let's proceed with the implementation:

 Step 1: Import the necessary libraries

import numpy as np

import pandas as pd

from sklearn.cluster import KMeans

from sklearn.preprocessing import LabelEncoder, StandardScaler

 Step 2: Load and preprocess the dataset

# Load the dataset

data_url = "https://archive.ics.uci.edu/ml/machine-learning
databases/00323/data.zip"

df = pd.read_csv(data_url, compression='zip')

# Drop irrelevant columns

df = df.drop(["Kingdom", "DNAtype", "Species"], axis=1)

# Convert categorical data to numerical using LabelEncoder

label_encoder = LabelEncoder()

df_encoded = df.apply(label_encoder.fit_transform)
11 
🐉

# Scale the data

scaler = StandardScaler()

df_scaled = scaler.fit_transform(df_encoded)

 Step 3: Perform k-means clustering

# Set the number of clusters (k)


k=3

# Create a k-means object


kmeans = KMeans(n_clusters=k, random_state=42)

# Fit the k-means model to the scaled data


kmeans.fit(df_scaled)

# Get the cluster labels


labels = kmeans.labels_
 Step 4: Analyze the results

# Add the cluster labels to the original dataframe


df["Cluster"] = labels

# Print the count of samples in each cluster


print(df["Cluster"].value_counts())

RESULT:
The k-means algorithm on the Codon usage dataset was implemented successfully.

12 
🐉

EX.NO: 6
Implement the Naïve Bayes Classifier using
https://archive.ics.uci.edu/ml/datasets/Gait+Classification dataset

AIM:

To implement the Naïve Bayes Classifier using the Gait Classification dataset, we'll need to perform the following
steps:

PROCEDURE:

 Load the dataset: Download the dataset from the provided URL
(https://archive.ics.uci.edu/ml/datasets/Gait+Classification) and load it into your programming
environment. The dataset contains both the training and testing data.
 Preprocess the data: Preprocess the dataset to prepare it for the Naïve Bayes Classifier. This may involve
handling missing values, normalizing the data, and converting categorical variables into numerical
representations if necessary.
 Train the Naïve Bayes Classifier: Implement the training phase of the Naïve Bayes Classifier using the
training data. Calculate the class priors and class conditional probabilities based on the training samples.
 Classify test samples: Use the trained model to classify the test samples by calculating the posterior
probability for each class given the test sample. The class with the highest probability will be assigned as
the predicted class for that sample.
 Evaluate the model: Compare the predicted classes with the true labels of the test samples to evaluate the
performance of the Naïve Bayes Classifier. Calculate metrics such as accuracy, precision, recall, or F1
score to assess the classifier's effectiveness.

PROGRAM:

# Step 1: Load the dataset


# (Assuming the dataset is stored in a CSV file named "gait_classification.csv")

import pandas as pd

# Load the dataset into a DataFrame


data = pd.read_csv("gait_classification.csv")

# Step 2: Preprocess the data


# (Perform any necessary data preprocessing steps here)

# Step 3: Train the Naïve Bayes Classifier

# Separate features (X) and target variable (y)

13 
🐉

X = data.drop("target_class", axis=1)
y = data["target_class"]

from sklearn.naive_bayes import GaussianNB

# Initialize the Naïve Bayes Classifier


nb_classifier = GaussianNB()

# Train the classifier


nb_classifier.fit(X, y)
# Step 4: Classify test samples

# Load the test data (assuming it's stored in a separate CSV file named "test_data.csv")
test_data = pd.read_csv("test_data.csv")

# Separate features (X_test) and true labels (y_true) of the test data
X_test = test_data.drop("target_class", axis=1)
y_true = test_data["target_class"]

# Predict the classes of the test samples


y_pred = nb_classifier.predict(X_test)

# Step 5: Evaluate the model

from sklearn.metrics import accuracy_score, classification_report

# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print("Accuracy:", accuracy)

# Generate a classification report


report = classification_report(y_true, y_pred)
print("Classification Report:\n", report)

RESULTS:
The code assumes that you have the necessary libraries installed, such as pandas and scikit-learn. Additionally,
make sure to adjust the code based on the specific preprocessing steps required for your dataset.

14 

You might also like