0% found this document useful (0 votes)
4 views

Write a lab report on Linear Regression and Logistic Regression. Include the cost function differentiation and the code in the report.

Uploaded by

213002178
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Write a lab report on Linear Regression and Logistic Regression. Include the cost function differentiation and the code in the report.

Uploaded by

213002178
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Green University of Bangladesh

Department of Computer Science and Engineering (CSE)


Faculty of Sciences and Engineering
Semester: (Fall, Year:2024), B.Sc. in CSE (Day)

LAB REPORT NO #01


Course Title: Machine Learning Lab
Course Code: CSE-412 Section: 213-D4

Lab Experiment Name: Write a lab report on Linear Regression and Logistic Regression.
Include the cost function differentiation and the code in the report.

Student Details

Name ID
1. Shahedul Islam 213002178

Lab Date : _ _06-11-2024 _ _ _ _ _ _ _ _ _ _ _ _ _


Submission Date : _ _ 15-11-2024_ _ _ _ _ _ _ _ _ _ _ _ _
Course Teacher’s Name : _ _ Farhan Mahmud_ _

[For Teachers use only: Don’t Write Anything inside this box]

Lab Report Status

Marks: ………………………………… Signature: .....................

Comments: .............................................. Date: ..............................


Introduction

Logistic regression is a fundamental statistical and machine learning algorithm used for binary classification
problems. Unlike linear regression, which predicts continuous numerical values, logistic regression predicts the
probability of an outcome belonging to one of two classes. For example, it can be used to determine whether a
patient has a specific disease (1) or not (0) based on their medical features.

In this lab, we implemented logistic regression to classify individuals as diabetic or non-diabetic based on a
medical dataset. The dataset includes features such as glucose levels, blood pressure, BMI, and insulin levels,
along with the target variable (Outcome), which indicates whether the patient has diabetes.

Objective

The objective of this lab is to:

1. Implement Linear Regression and Logistic Regression models.


2. Understand the mathematical foundations of these algorithms, including the cost function and its
differentiation.
3. Analyze datasets using Python.

1. Linear Regression

Mathematical Foundation

Linear Regression aims to model the relationship between a dependent variable y and one or more independent
variables X. The model is expressed as:

Cost Function

The cost function for Linear Regression is the Mean Squared Error (MSE):

Where:

• is the hypothesis function.


• m is the number of samples.

Gradient Descent

To minimize the cost function:


Update rule:

Implementation

Dataset

We use the dataset provided, focusing on the Price (dependent variable) and other attributes as independent
variables.

# Importing necessary libraries


import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Dataset preparation
data = pd.read_csv('Car_Raw_Data.csv')
df = pd.DataFrame(data)

# Checking for missing values


print("Missing values in each column before handling:")
print(df.isnull().sum())

# Handling missing values by imputing them with suitable values


df['Mileage'].fillna(df['Mileage'].mean(), inplace=True)
df['EngineV'].fillna(df['EngineV'].median(), inplace=True)
df['Price'].fillna(df['Price'].mean(), inplace=True)
df['Year'].fillna(df['Year'].median(), inplace=True)

# Verifying that there are no missing values


print("\nMissing values in each column after handling:")
print(df.isnull().sum())

# Preprocessing
df['Age'] = 2024 - df['Year'] # Calculate the age of the car
X = df[['Mileage', 'EngineV', 'Age']] # Independent variables
y = df['Price'] # Dependent variable

# Splitting dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Linear Regression Model


model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Results
print("\nModel Coefficients:")
print(model.coef_)
print(f"Intercept: {model.intercept_}")

# Plotting actual vs predicted prices


plt.scatter(y_test, y_pred, alpha=0.7)
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title('Actual vs Predicted Prices')
plt.show()

OUTPUT:
2. Logistic Regression
Mathematical Foundation
Logistic Regression is used for classification tasks. It uses the sigmoid function to map predictions to
probabilities:

Cost Function
The cost function for logistic regression is:

Gradient Descent
The gradients are calculated as:

Implementation
Dataset
We use the dataset for predicting Outcome (dependent variable) based on independent variables.
# Importing necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt

# Dataset preparation
data = pd.read_csv('diabetes.csv')
df = pd.DataFrame(data)

# Checking for missing values


print("Missing values in each column before handling:")
print(df.isnull().sum())
# Handling missing values by imputing suitable values
df['Glucose'].fillna(df['Glucose'].median(), inplace=True)
df['BloodPressure'].fillna(df['BloodPressure'].median(), inplace=True)
df['SkinThickness'].fillna(df['SkinThickness'].median(), inplace=True)
df['Insulin'].fillna(df['Insulin'].median(), inplace=True)
df['BMI'].fillna(df['BMI'].median(), inplace=True)

# Verifying that there are no missing values


print("\nMissing values in each column after handling:")
print(df.isnull().sum())

# Separating features and target variable


X = df.drop('Outcome', axis=1) # Independent variables
y = df['Outcome'] # Target variable

# Splitting the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Logistic Regression Model


model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Making predictions
y_pred = model.predict(X_test)

# Evaluation Metrics
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Displaying results
print(f"\nAccuracy: {accuracy:.2f}")
print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(class_report)

# Visualization of the Confusion Matrix


plt.matshow(conf_matrix, cmap='Blues')
plt.title('Confusion Matrix')
plt.colorbar()
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
OUTPUT:

Discussion

The logistic regression implementation provided a practical introduction to classification modeling. The
pipeline—from data preprocessing to evaluation—highlights best practices, including missing value imputation
and performance measurement. However, further steps such as hyperparameter tuning, feature scaling, and
handling class imbalance could enhance the model’s predictive capabilities.

You might also like