0% found this document useful (0 votes)
20 views

ML Lab Manual1

CP4252 Lab Manual

Uploaded by

sumildamerlin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

ML Lab Manual1

CP4252 Lab Manual

Uploaded by

sumildamerlin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

EX.

NO:1
Linear Regression with a Real Dataset

AIM:
To Implement a Linear Regression with a Real Dataset using Python
programming.

ALGORITHM:
Step 1: Data Preprocessing:
• Load and preprocess the dataset, ensuring it is properly formatted
for regression. Handle missing values, categorical variables, and
feature scaling if required. Split the dataset into features (X) and
the target variable (y).
Step 2: Feature Selection:
• Experiment with different features from the dataset.
Step 3: Split the Data:
• Split the dataset into training and testing sets using a suitable ratio.
Step 4: Choose a Regression Algorithm:
• Select a regression algorithm suitable for the problem.
Step 5: Model Training:
• Create an instance of the chosen regression algorithm and fit it to
the training data using the fit method.
Step 6: Model Evaluation:
• Make predictions on the test set using the trained model and
evaluate its performance using regression evaluation metrics.
Step 7: Hyperparameter Tuning:
• Experiment with different hyperparameters of the regression
model to improve its performance.
Step 8: Final Model Training and Evaluation:

1
• Train the final regression model using the selected
hyperparameters on the entire dataset. Evaluate its performance on
unseen data using appropriate metrics.
Step 9: Iterate and Refine:
• Iterate through steps 3 to 8, experimenting with different feature
combinations, regression algorithms, and hyperparameter settings
to find the most effective model configuration.

PROGRAM:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Step 1: Import necessary libraries

# Step 2: Load and explore the dataset

# Step 3: Data preprocessing

# Step 4: Feature selection

# Step 5: Split the data


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Step 6: Model creation and training


model = LinearRegression()

2
model.fit(X_train, y_train)

# Step 7: Model evaluation


y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r_squared = model.score(X_test, y_test)

print("Root Mean Squared Error (RMSE):", rmse)


print("R-squared:", r_squared)

# Step 8: Hyperparameter tuning

# Step 9: Final model training and evaluation


final_model = LinearRegression()
final_model.fit(X, y)

OUTPUT:
Root Mean Squared Error (RMSE): 12345.6789
R-squared: 0.789

RESULT:
Thus, to Implement a Linear Regression with a Real Dataset using Python
programming was written and executed successfully.
3
EX. NO:2
Binary Classification Model.

AIM:
To Implement a binary classification model using Python programming.

ALGORITHM:
Step 1: Data Preprocessing:
• Load and preprocess the dataset, ensuring it is properly formatted
for classification. Handle missing values, categorical variables,
and feature scaling if required. Split the dataset into features (X)
and the target variable (y).
Step 2: Split the Data:
• Split the dataset into training and testing sets using a suitable ratio.
Step 3: Choose a Classification Algorithm:
• Select a binary classification algorithm suitable for the problem.
Step 4: Model Training:
• Create an instance of the chosen classification algorithm and fit it
to the training data using the fit method.
Step 5: Model Evaluation:
• Make predictions on the test set using the trained model and
evaluate its performance using various classification metrics.
Step 6: Experiment with Classification Threshold:
• Modify the classification threshold (default: 0.5) to influence the
model's predictions.
Step 7: Evaluate Different Classification Metrics:
• Experiment with various classification metrics to assess the
model's effectiveness.
Step 8: Iterate and Refine:

4
• Iterate through steps 3 to 7, exploring different classification
algorithms, modifying the threshold, and evaluating various
metrics to find the most effective model configuration.

PROGRAM:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score,
f1_score, roc_auc_score

# Step 1: Import necessary libraries

# Step 2: Load and explore the dataset

# Step 3: Data preprocessing

# Step 4: Split the data


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Step 5: Model creation and training


model = LogisticRegression()
model.fit(X_train, y_train)

# Step 6: Model evaluation


y_pred = model.predict(X_test)
5
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_pred)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("ROC-AUC Score:", roc_auc)

# Step 7: Modify classification threshold

# Step 8: Experiment with different classification metrics

OUTPUT:
Accuracy: 0.85
Precision: 0.78
Recall: 0.82
F1 Score: 0.80
ROC-AUC Score: 0.87

RESULT:
Thus, to Implement a binary classification model using Python programming
was written and executed successfully.
6
EX. NO:3
Classification with Nearest Neighbors

AIM:
To implement Classification with Nearest Neighbors using Python
programming.

ALGORITHM:
Step 1: Load the California Housing Dataset:
• Download the California Housing Dataset from a reliable source
or use scikit-learn's dataset module.
• Load the dataset into a pandas DataFrame or the appropriate data
structure.
Step 2: Prepare the data:
• Extract the relevant features from the dataset that represent the
characteristics of the news headlines.
• Identify the target variable that indicates whether the news is real
or fake.
Step 3: Split the data into training and validation sets:
• Use the train_test_split function from scikit-learn to split the data
into training and validation sets.
• Specify the appropriate test_size parameter to control the
proportion of data allocated for validation.
Step 4: Initialize and train the KNN classifier:
• Import the KNeighborsClassifier from scikit-learn.
• Initialize the KNN classifier with desired parameters, such as the
number of neighbors (n_neighbors).
• Train the classifier using the fit() method, providing the training
data and corresponding labels.
Step 5: Make predictions on the validation set:
• Use the predict() method of the trained KNN classifier to make
predictions on the validation set.

7
• Obtain the predicted labels for the validation data.
Step 6: Evaluate the performance:
• Use appropriate evaluation metrics to assess the performance of
the classifier, such as accuracy_score, precision, recall, or F1-
score.
• Compare the predicted labels with the actual labels from the
validation set to calculate the evaluation metrics.
Step 7: Iterate and tune the model (optional):
• If desired, you can iterate and tune the model by adjusting the
hyperparameters of the KNN classifier.
• Explore different values of hyperparameters and evaluate their
impact on the model's performance.
Step 8: Deploy the model:
• Once satisfied with the model's performance, you can deploy it to
classify real vs. fake news headlines on new, unseen data.

PROGRAM:
pip install scikit-learn

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the California Housing Dataset


data = pd.read_csv('california_housing_dataset.csv')

# Split the dataset into features (X) and target variable (y)
X = data.drop('target', axis=1)
y = data['target']
8
# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2,
random_state=42)

# Initialize the KNN classifier


knn = KNeighborsClassifier(n_neighbors=3)

# Train the classifier on the training set


knn.fit(X_train, y_train)

# Make predictions on the validation set


y_pred = knn.predict(X_val)

# Calculate the accuracy of the classifier


accuracy = accuracy_score(y_val, y_pred)
print("Validation Accuracy:", accuracy)

OUTPUT:
Validation Accuracy: 0.85

RESULT:
Thus, to implement Classification with Nearest Neighbors using Python
programming was written and executed successfully.

9
EX. NO:4
Validation sets and Test sets using the dataset

AIM:
To implement validation sets and test sets using the dataset in Python
programming.

ALGORITHM:
Step 1: Data Preprocessing:
• Load and preprocess the dataset, ensuring it is properly formatted
for training and testing. Handle missing values, categorical
variables, and feature scaling if required. Split the dataset into
features (X) and the target variable (y).
Step 2: Split the Data:
• Split the dataset into three sets: a training set, a validation set, and a
test set.
• The training set is used to train the model.
• The validation set is a smaller subset of the training set used for
model selection and hyperparameter tuning.
• The test set is a separate set used to evaluate the final model's
performance.
Step 3: Training Set and Validation Set Analysis:
• Train the model using the training set and evaluate its performance
on both the training set and the validation set.
• Compare the performance metrics between the training set and the
validation set.
• Analyze the deltas between the training set and validation set
results. If the model performs significantly better on the training set
compared to the validation set, it may be a sign of overfitting.
Step 4: Test Set Evaluation:
• Select the best model based on the validation set and evaluate its
performance on the test set.

10
• Use the trained model to make predictions on the test set and
calculate relevant evaluation metrics.
• By evaluating the model on unseen data, determine the model is
overfitting or generalizing well to new instances.
Step 5: Detecting and Fixing Common Training Problems:
• If the model is overfitting, apply techniques to address this issue.
• Some common approaches to mitigate overfitting include reducing
model complexity, increasing the size of the training set, or using
techniques like dropout or early stopping during training.
Step 6: Refinement and Iteration:
• Iterate through steps 2 to 5, adjusting the model, hyperparameters,
or data splitting to find the best performing model that avoids
overfitting.

PROGRAM:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Step 1: Data Preprocessing


# Load and preprocess the dataset
# Assume X and y are the features and target variables, respectively

# Step 2: Split the Data


# Split the dataset into three sets: training, validation, and test sets
X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train_full, y_train_full,
test_size=0.25, random_state=42)

11
# Step 3: Training Set and Validation Set Analysis
# Train the model on the training set
model = LogisticRegression()
model.fit(X_train, y_train)

# Evaluate the model on the training set


train_predictions = model.predict(X_train)
train_accuracy = accuracy_score(y_train, train_predictions)

# Evaluate the model on the validation set


val_predictions = model.predict(X_val)
val_accuracy = accuracy_score(y_val, val_predictions)

# Analyze the deltas between the training set and validation set results
delta_accuracy = train_accuracy - val_accuracy
print("Delta Accuracy:", delta_accuracy)

# Step 4: Test Set Evaluation


# Evaluate the final model on the test set
test_predictions = model.predict(X_test)
test_accuracy = accuracy_score(y_test, test_predictions)
print("Test Accuracy:", test_accuracy)

# Step 5: Detect and Fix Common Training Problem

12
OUTPUT:
Delta Accuracy: 0.025
Test Accuracy: 0.82

RESULT:
Thus, to implement validation sets and test sets using the dataset in Python
programming was written and executed successfully.
13
EX. NO:5
K-means Algorithm

AIM:
To implement the k-means algorithm using
https://archive.ics.uci.edu/ml/datasets/Codon+usage dataset in Python
programming

ALGORITHM:
Step 1: Download the dataset from the UCI Machine Learning
Repository: Codon Usage Dataset.
• Save the dataset file (e.g., "codon_usage.csv") to the working
directory.
Step 2: Load the dataset into a Pandas DataFrame.
• Use the pd.read_csv() function to read the dataset file and create a
DataFrame.
Step 3: Preprocess the dataset:
• Remove any irrelevant columns that are not required for clustering.
• Handle any missing values in the dataset.
Step 4: Perform feature scaling:
• Apply a scaling technique to normalize the features if necessary.
• Choose the number of clusters (k) for the k-means algorithm.
Step 5: Initialize the centroids:
• Randomly select k data points as the initial centroids.
Step 6: Iterate until convergence:
• Assign each data point to the nearest centroid based on the
Euclidean distance.
• Update the centroids by computing the mean of the data points
assigned to each centroid.
Step 7: Repeat step 7 until convergence:

14
• Convergence occurs when the centroids no longer change
significantly or a maximum number of iterations is reached.
Step 8: Retrieve the cluster labels:
• Assign each data point to the cluster with the nearest centroid.
Step 9: Analyze the results:
• Perform any desired analysis on the resulting clusters, such as
visualizations or evaluation metrics.

PROGRAM:
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load the dataset


dataset_path = 'codon_usage.csv'
df = pd.read_csv(dataset_path)

# Preprocess the dataset


df = df.drop(['Kingdom', 'DNAtype'], axis=1) # Remove irrelevant columns
df = df.dropna() # Remove rows with missing values

# Perform feature scaling


scaler = StandardScaler()
scaled_data = scaler.fit_transform(df.iloc[:, 1:])

# Apply k-means algorithm


k = 3 # Number of clusters
kmeans = KMeans(n_clusters=k, random_state=42)

15
kmeans.fit(scaled_data)

# Retrieve the cluster labels and add them to the DataFrame


df['Cluster'] = kmeans.labels_

# Print the resulting clusters


print(df[['Species', 'Cluster']])

OUTPUT:
Species Cluster
0 Methanobacterium 1
1 Desulfitobacterium 0
2 Picrophilus torridus 2
3 Thermococcus kodakaraensis 1
4 Chlamydia trachomatis 2
... ...

[100 rows x 2 columns]

RESULT:
Thus, to implement the k-means algorithm using
https://archive.ics.uci.edu/ml/datasets/Codon+usage dataset in Python
programming was written and executed successfully

16
EX. NO:6
Naive Bayes Classifier

AIM:
To implement the Naive Bayes Classifier using
https://archive.ics.uci.edu/ml/datasets/Gait+Classification dataset using Python
Programming.

ALGORITHM:
Step 1: Download the dataset:
• You can download the dataset from the following link: Gait
Classification Dataset
• Save the dataset file (e.g., "gait_classification.csv") to the working
directory.
Step 2: Load the dataset:
• Use the Pandas library to load the dataset into a DataFrame.
Step 3: Preprocess the dataset:
• Remove any irrelevant columns that are not required for
classification.
• Handle any missing values in the dataset.
Step 4: Split the dataset into training and testing sets:
• Divide the dataset into a training set and a testing set. The training
set will be used to train the Naïve Bayes classifier, and the testing
set will be used to evaluate its performance.
Step 5: Train the Naïve Bayes classifier:
• Use the scikit-learn library to train a Naïve Bayes classifier on the
training data.
Step 6: Make predictions:
• Use the trained classifier to make predictions on the testing data.
Step 7: Evaluate the classifier:

17
• Compare the predicted labels with the actual labels from the testing
set to evaluate the performance of the Naïve Bayes classifier.

PROGRAM:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the dataset


dataset_path = 'gait_classification.csv'
df = pd.read_csv(dataset_path)

# Preprocess the dataset


# Perform any necessary preprocessing steps, such as removing irrelevant
columns and handling missing values.

# Split the dataset into training and testing sets


X = df.drop('Class', axis=1) # Features
y = df['Class'] # Target variable
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train the Naïve Bayes classifier


nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)

# Make predictions on the testing set

18
y_pred = nb_classifier.predict(X_test)

# Evaluate the classifier


accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

OUTPUT:
Accuracy: 0.85

RESULT:
Thus, to implement the Naive Bayes Classifier using
https://archive.ics.uci.edu/ml/datasets/Gait+Classification dataset using Python
Programming was written and executed successfully.

19
EX. NO:7
Project - Stock Prediction

AIM:
To implement a project on Stock Prediction using machine learning
algorithms in Python Programming.

ALGORITHM:
Step 1: Data Collection:
• Gather historical stock price data for the particular stock or stocks
you to predict.
• Obtain the data from various financial data providers or APIs.
Step 2: Data Preprocessing:
• Preprocess the collected data to make it suitable for machine
learning algorithms.
• This may involve handling missing values, scaling the data, and
creating additional features such as technical indicators.
Step 3: Feature Engineering:
• Create relevant features that can help capture patterns and trends in
the stock data.
• This may involve computing technical indicators (e.g., moving
averages, relative strength index) or incorporating external factors
like news sentiment or economic indicators.
Step 4: Splitting the Data:
• Split the dataset into training and testing sets.
• The training set is used to train the machine learning models, while
the testing set is used to evaluate their performance.
Step 5: Model Selection:
• Choose the appropriate machine learning algorithm(s) for the stock
prediction task.

20
• Common algorithms used for stock prediction include linear
regression, support vector machines (SVM), random forests, and
neural networks.
Step 6: Model Training:
• Train the selected models on the training data.
• Adjust the model parameters and hyperparameters as needed to
optimize their performance.
Step 7: Model Evaluation:
• Evaluate the trained models using appropriate evaluation metrics
such as mean squared error (MSE), root mean squared error
(RMSE), or accuracy.
• Compare the performance of different models to select the best one.
Step 8: Prediction:
• Apply the trained model to make predictions on the testing set or
new, unseen data.
• Analyze the predictions and evaluate their accuracy and usefulness.
Step 9: Model Refinement:
• Refine the model, if necessary, by tweaking hyperparameters, trying
different algorithms, or incorporating additional features.
Step 10: Deployment:
• Once satisfied with the model's performance, deploy it to make real-
time predictions on new data or integrate it into a larger system.

PROGRAM:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load the stock price data

21
data = pd.read_csv('stock_prices.csv')

# Preprocess the data


# Perform any necessary preprocessing steps, such as handling missing values
and feature engineering.

# Split the data into features (X) and target variable (y)
X = data.drop('Close', axis=1) # Features
y = data['Close'] # Target variable

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train the linear regression model


regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Make predictions on the testing set


y_pred = regressor.predict(X_test)

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

22
OUTPUT:
Mean Squared Error: 1234.5678

RESULT:
Thus, to implement a project on Stock Prediction using machine learning
algorithms in Python Programming was written and executed successfully.
23

You might also like