0% found this document useful (0 votes)
3 views24 pages

chp6 (10) Fam

The document provides an overview of linear regression, including types such as simple linear regression and multiple linear regression, which predict continuous variables based on independent variables. It also covers logistic regression for binary outcomes and metrics for evaluating regression models like MAE, MSE, and RMSE. Additionally, it discusses the confusion matrix and performance metrics like accuracy, precision, and recall in the context of classification problems.

Uploaded by

Diya Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views24 pages

chp6 (10) Fam

The document provides an overview of linear regression, including types such as simple linear regression and multiple linear regression, which predict continuous variables based on independent variables. It also covers logistic regression for binary outcomes and metrics for evaluating regression models like MAE, MSE, and RMSE. Additionally, it discusses the confusion matrix and performance metrics like accuracy, precision, and recall in the context of classification problems.

Uploaded by

Diya Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

9:05 0.

00
KB/S lO78%

linear regression

Linear regression is a fundamental


supervised learning algorithm used for
predictingcontinuous variables by
modeling the relationship between
dependent and independent variables.

Typesof Linear Regression


1. Simple Linear Regression: Involves
one independent variable predicting a
dependent variable (e.g.,temperature
affecting pollution levels)1 2.
2. Multiple Linear Regression: Utilizes
two or more independent variables to
predict a dependent variable (e.g.
blood pressure influenced by height,
weight, and exercise)12.
3. Logistic Regression: Used for binary
outcomes, predicting categorical
dependent variables
4. Ordinal Regression: Deals with
ordered categories for the dependent
variable

5 MultinomialLogistic Reareccion
+ Ask follow-up...
9:05 6.00
KB/S a1ll78%

simple linearregression
What is Simple Linear Regression?
Simple linear regression isa way to
understand the relationship between two
things: one thing we want to predict (called
the dependent variable) and one thing we
use tomake that prediction (called the
independent variable).

How It Works

1. Equation: It uses a formula that looks


like this:

"y=mx +b

" Here, yis what you want to


predict, x is the independent
variable,m is the slope (how much
y changes when x changes), and b
is where the line crosses the y
axis (the starting point).
2. Example: Imagine youwant to predict
Someone's weight based on their
height. If you have data showing how
weight changes with height, you can
create a line on a graph that best fits

+ Ask follow-up...
9:05 6.001ilO78%
KB/S

axis(the starting point).


2. Example: Imagine you want to predict
someone's weight based on their
height. If you have data showing how
weight changes with height, youcan
create a line on a graph that best fits
those points. This line helps you
predict weight for any given height.

Key Points
Straight Line: The relationship is
represented by a straight line on a
graph.
One Predictor: It only uses one
independent variable to make
predictions.
Easy to Understand: It's one of the
simplest methods in statistics and
machine learning.
In summary, simple linear regression helps
us see howone thing affects another and
allows us to make predictions based on
that relationship!

+ Ask follow-u..
78%

multiple linear
regression
Multiple linear regression (MLR) is a
statistical method used to predict the
outcomne of one variable based on two or
more other variables.

Key Points
" Purpose: MLR helps understand how
multiple factors influence a single
result, like how various factors affect a
house's price(size, location, age).
Equation: The relationship is
expressed as:

y= bo t+ bj1 +b2 +...+ bnen t


where y is the outcome, Tare the
independent variables,b are
coefficients indicatingthe impact of
each variable, and [ is the error term.
Assumptions: It assumes a linear
relationship, independence of
observations, and constant variance of
errors.

In summary, MLR allows for more complex


predictions bv considerina multinle
+ Ask follow-up...
9:06 0.00
KB/S lO78%

Key Points
" Purpose: MLR helps understand how
multiple factors influence a single
result, like how various factors affect a
house's price (size, location, age).
" Equation: The relationship is
expressed as:

y=bo + bjæ1 + byT2 t ... + b,n t


where y is the outcome, T are the
independent variables, bare
coefficients indicating the impact of
each variable,and [ is the error term.
Assumptions: It assumes a linear
relationship, independence of
observations, and constant variance of
errors.

In summary, MLRallows for more complex


predictions by considering multiple
influencing factors at once

+ Ask follow-up...
9:08 1.90ull78%
KB/S

Certainly! Below are some example


problems (sums) related to Multiple Linear
Regression, along with their solutions.

Example Problems

Problem 1: Predicting House Prices


Youhave a dataset containing information
about houses, including:
Size (in square feet)
Number of Bedrooms

" Age of the House (in years)


Price (in thousands)
Given the following data:

Size Bedrooms Age Price


(sq ft) (years) (thousands)

1500 3 10 300

1800 4 LO
5 400

2400 4 15 500

+ Ask follow-up..
9:08 0.04
KB/S uullO78%

Size Bedrooms Age Price


(sq ft) (years) (thousands)

1500 10 300

1800 4 5 400

2400 4 15 500

3000 LO
5 20 600

3500 5 650

Question: Build a multiple linear regression


model to predict the price of a house based
on size, number of bedrooms, and age.
What would be the predicted price for a
house that is 2000 sq ft, has 3 bedrooms,
and is 12 years old?

Solution

1. Define Variables:
" Independent Variables (X): Size,
Bedrooms, Age
Dependent Variable (y): Price

+ Ask follow-up.
VOU WOuld fit a linear rearession
9:08 0.04lO78%
KB/S

Solution
1. Define Variables:

" Independent Variables (X): Size,


Bedrooms, Age
Dependent Variable (y): Price
2. Fit the Model:
" Using Python's 'scikit-learn',
youwould fit a linear regression
model to this data.

3. Make Predictions:
" After fitting the model, you would
input the values for the new
house:

" Size=2000

" Bedrooms = 3

Age = 12

4. Example Code:
python

import pandas as pd

+ Ask follow-up.
9:08 0.07O78%
KB/S

4. Example Code:

python

import pandas as pd
from sklea rn.linear_model
import LinearRegression

# Create Data Frame


data = {
'Size': [1500, 1800,
2400, 3000, 3500] ,
'Bedrooms': [3, 4,
5, 5],
'Age': [10, 5, 15, 20,
8],
'Price': [300, 400,
500, 600, 650]

df = pd . DataFrame (data)

# Define X and y
X = df [['Size ', 'Bedrooms'
'Age']]
y = df['Price']

# Fit the model

+ Ask follow-u..
9:08 0.06
KB/S
lO78%

df = pd. DataFrame (data)

# Define X and y
X = df [['Size ' , 'Bedrooms'
'Age']]
y = df['Price ']

# Fit the model

model = LinearRegression ()
model.fit (X, y)

# Predict price for a new


house
new_house = [[2000, 3, 12]]
predicted_price =
model.predict (new_house)
print ("Predicted Price:"
predicted_price [0])

Problem 2: Marketing Spend Analysis


You are analyzing the effect of marketing
spend onsales. You have data on:
TV Advertising Spend
+ Ask follow-up...
Newspaper Advertising Spend
9:09 100lO77%

metrics for regression


1. Mean Absolute Error (MAE)
What It Is: MAE measures the average
difference between the predicted values
and the actual values. It tells you how far off
your predictions are, on average.

Formula:

1
MAE =÷)lyi
i=1

where u; is the actual value and û: is the


predicted value.

Example:
" Actual Prices: [100, 150, 200]
" Predicted Prices: [110, 140, 210]
Calculating MAE:
" Differences: [10, 10, 10]
" MAE = (10 +10+10) /3 = 10

2.Mean Squared Error (MSE)


What It Is: MSE calculates the average of
the squared differences between predicted
+ Ask follow-up...
9:10 0.00ullO77%
KB/S

" MAE= (10 +10 +10)/3 =10

2. Mean Squared Error (MSE)


What It Is: MSE calculates the average of
the squared differences between predicted
andactual values. It gives more weight to
larger errors.

Formula:

1H
MSE =
i=1

Example:
" Actual Prices: [100, 150, 200]
" Predicted Prices: [110, 140,210]
Calculating MSE:
Squared Differences: [(10), (10)2, (10)2]
- [100, 100, 100]
MSE = (100 + 1O0 + 100) /3 = 100

3. Root Mean Squared Error (RMSE)


What It ls: RMSE is the square root of MSE.

+ Ask follow-up.
intne same units as tne target variable.
9:10 097ll O77%
KB/S

" Predicted Prices: [110, 140, 210]


Calculating MSE:
Squared Differences: [(10), (10)2, (10)2]
= [100, 100, 100]
" MSE=(100 +100+ 100) /3 = 100

3.Root Mean Sguared Error (RMSE)


What It Is: RMSE is the square root of MSE.
It gives you an idea of how much your
predictions deviate from the actual values
in the same units as the target variable.

Formula:

RMSE =MSE

Example:
Using the previous MSE of 100, we
calculate RMSE:

" RMSE = V100 =10

4. R-squared (R?)
What It Is: R² shows how wellyour model
explains the variability of the target
+ Ask follow-up.
9:10 1791l77%
KB/S

logistic regression
What is Logistic Regression?
Logistic regression is a method used in
machine learning to predict whether
something belongs to orne of two
categories. For example, it can help
determine if an email is "spam" or "not
spam," or if a patient hasa disease (yes or
no).

How It Works

1. Predicting Probability: Instead of just


saying "yes" or "no," logistic regression
predicts the probability that
something belongs to a category. This
probability willbe between Oand 1. For
instance, it might say there's a 70%
chance that an email is spam.
2. Using a Curve: It uses a special curve
called the sigmoid functionto convert
any number (like how many hours
someone studied) into aprobability
between O and1.

3. Example: Imagine you want to predict


if a student will pass an exam based

+ Ask follow-up...
9:10 .00il O77%
KB/S

3. Example: Imagine you want to predict


if a student will pass an exam based
on how many hours they studied:

" If they studied for 5 hours, the


model might predict a 90%
chance of passing.
If they studied for 1 hour, it might
predicta 30% chance.

Why Use Logistic Regression?


Easy to Understand: It's simple and
straightforward.
Fast: It runs quickly and doesn't
require a lot of computer power.
Useful for Decision-Making: Since it
gives probabilities, you can make
informed decisions based on the
likelihood of different outcomes.

In summary, logistic regression helps us


classify things into two groups by
predicting probabilities,making it a
valuable tool in many fields like healthcare,
finance, and marketing!

+ Ask follow-up...
O14
O.00l
KB/S O77%
all imp
1. Confusion Matrix
Aconfusion matrix is a table that shows
how wel your model predicted the classes.
It contains four key components:

Predicted Predicted
Positive Negative

Actual True Positives False


Positive (TP) Negatives (FN)

Actual False True Negatives


Negative Positives (FP) (TN)

Example:
True Positives (TP): Correctly
predicted positives (e.g., spam emails
correctly marked as spam).
" True Negatives (TN): Correctly
predicted negatives (e.g., non-spam
emails correctly marked as non
spam).
False Positives (FP): Incorrectly
predictedpositives (e.g., non-spam
+ Ask follow-up...
9:17 0.07ll O77%
KB/S

" False Positives (FP): Incorrectly


predictedpositives (e.g., non-spam
emails marked as spam).
" False Negatives (FN): Incorrectly
predicted negatives (e.g., spam emails
marked as non-spam).

2. Accuracy
Accuracy measures how often the model is
correct.

Formula:

TP +TN TP + TN
TotalPredictions TP +TN + FP +.

Example:
If you have 80 correct predictions
100total predictions, the accuracy is:
80
Accuracy = 100
= 0.8 (or 80%)

3. Precision
Proaieian tallo uau baA monL af tha

+ Ask follow-up...
po
9:17 022llO77%
X

" False Positives (FP): Incorrectly


predicted positives (e.g., non-spam
emails marked as spam).
" False Negatives (FN): Incorrectly
predicted negatives (e.g., spam emails
marked as non-spam).

2. Accuracy
Accuracy measures how often the model is
Correct.

Formula:

TP +TN TP + TN
al Predictions TP + TN + FP+ FN

Example:
If youhave 80 correct predictions out of
100 total predictions, the accuracy is:
80
Accuracy 100
= 0.8 (or 80%)

3. Precision

Precision tells VOLhow manv of the


+ Ask follow-up...
9:17 0.02ll O77%
KB/S

3. Precision

Precision tells you howmany of the


predictedpositive cases were actually
positive.

Formula:

TP
Precision =
TP+ FP

Example:
If your model predicts 10 emails as spam
and only 7 are actually spam, the precision
is:

7 7
Precision = 0.7 (or 709
7+3 10

4. RecalI(Sensitivity)
Recall measures howmany actual positive
cases were correctly identified.

Formula:
TP
Recall =

+ Ask follow-up...
ExaipG.
9:17 .04l77%
KB/S

4. Recall(Sensitivity)
Recall measures how many actual positive
cases were correctly identified.

Formula:
TP
Recall
TP +FN

Example:
If there are 10actual spam emails and your
model identifies 7 of them, the recall is:

7 7
Recall =
7+3 10
=0.7 (or 70%)

5. F1 Score

The F1 score combines precision and recall


into a single metric.

Formula:
Precision x Recall
F1= 2 x
Precision + Recall

F
+ Ask follow-up.
the F1 score wOuld he
9:17 075l77%
KB/S

Example:
If precision is 70% and recall is also 70%,
the F1 score would be:
0.7 × 0.7 0.49
Fl= 2x =2 x = 0.7
0.7 +0.7 1.4

6. ROC Curve and AUC


The ROCcurve shows the trade-off
between true positive rate (recall) and false
positive rate at different thresholds.
True Positive Rate (TPR): Also known
as recall.

" False Positive Rate (FPR):


FP
FPR=
FP +TN

AUC (Area Under the Curve)


AUC measures how well the model
distinguishes between classes. A higher
AUCindicates better performance.

Example:

+ Ask follow-up.
9:17 075l77%
KB/S

" False Positive Rate (FPR):


FP
FPR=
FP +TN

AUC(Area Under the Curve)


AUC measures how well the model
distinguishes between classes. A higher
AUCindicates better performance.

Example:
An AUC of 0.8 means there's an 80%
chance that the model will rank a randomly
chosen positive instance higher than a
randomly chosen negative one.

These metrics help you evaluate how well


yourclassification model performs and
guide improvements!

Related

How do youderive the formula for the


+
F1 score

+ Ask follow-up.

+
9:24 11.0ull75%
KB/S

Feature Overfitting Underfitting

Definition Learns too Learns too

much, little; misses


including patterns.
noise.

Training Very high. Low.


Accuracy

Test Accuracy Poor on new Poor on both

data. training and


new data.

Model Too complex Too simple


Complexity (too many (not enough
details). details).

Example Memorizing Missing


answers. obvious
trends.

Solution Simplify the Make the


model. model more

complex.

Bias/Variance Low bias, High bias, low

+ Ask follow-up..
Commn Tomni
9:24 6.00ull75%
KB/S

model. model more

complex.

Bias/Variance Low bias, High bias, low


high variance.
variance.

Common Too many Not enough


Causes features, features,
excessive overly simple
training. model.

Impact on Specific to General and


Predictions training data inaccurate
only. predictions.

Real-World A student Astudent


Example who who doesn't
memorizes study enough
facts but and fails to

can't apply understand


them. the subject.

Key Points:
1. Overfitting captures noise; underfitting
micooo bau informatian

+ Ask follow-up.
pertormance, undertitting shows low

You might also like