Regression Analysis Linear Multiple Logistic
Regression Analysis Linear Multiple Logistic
Learning
Linear, Multiple, and Logistic
Regression with Examples
Introduction to Regression
• Regression analysis is a set of statistical
methods used to estimate relationships
between a dependent variable and one or
more independent variables.
• It's used for predictive modeling and
identifying trends.
Linear Regression
• Linear regression predicts the value of a
dependent variable based on the value of one
independent variable.
• Equation: Y = β0 + β1X + ε
• Example: Predicting house price based on size.
Example: Linear Regression
• In predicting house price:
• • Y (Price) = β0 + β1 (Size) + ε
• • For every unit increase in house size, price
increases by β1.
Assumptions of Linear Regression
• 1. Linearity: Relationship between independent
and dependent variables is linear.
• 2. Independence: Observations are
independent.
• 3. Homoscedasticity: Constant variance of
residuals.
• 4. Normality: Residuals are normally
distributed.
Multiple Regression
• Multiple regression uses two or more
independent variables to predict the
dependent variable.
• Equation: Y = β0 + β1X1 + β2X2 + ... + βnXn + ε
• Example: Predicting house price based on size,
number of rooms, and location.
Example: Multiple Regression
• Predicting house price:
• • Y (Price) = β0 + β1(Size) + β2(Bedrooms) +
β3(Location) + ε
• • For each increase in Size, price increases by
β1, holding other factors constant.
Assumptions of Multiple Regression
• 1. Linearity: Relationship between dependent
and independent variables is linear.
• 2. No Multicollinearity: Independent variables
should not be highly correlated.
• 3. Independence of errors.
• 4. Homoscedasticity and normality of
residuals.
Logistic Regression
• Logistic regression is used to predict binary or
categorical outcomes.
• Equation: log(p/(1-p)) = β0 + β1X
• Where p is the probability of the event
occurring.
Example: Logistic Regression
• Predicting if a student will pass or fail based
on study hours:
• • log(p/(1-p)) = β0 + β1 (Study Hours)
• • p is the probability of passing.
Differences Between Linear, Multiple, and
Logistic Regression
• 1. Linear regression predicts continuous
values, Logistic regression predicts categorical
outcomes.
• 2. Linear and multiple use least squares;
logistic uses maximum likelihood.
• 3. Logistic regression outputs probabilities,
while linear regression outputs a direct
prediction.
Introduction to Multiple Regression
• Definition: A statistical technique that uses
multiple independent variables to predict the
value of a dependent variable.
• Multiple regression is an extension of linear
regression. It models the relationship between
two or more independent variables and a
single dependent variable. The idea is to
understand how multiple factors influence the
target variable.
• Multicollinearity: When two or more independent
variables are highly correlated, it can distort the
coefficient estimates in the regression.
• R-squared (R²): A statistical measure of how close the data
are to the fitted regression line. It tells how well the
independent variables explain the variability in the
dependent variable.
• Equation:
• Y=β0+β1X1+β2X2+...+βnXn+ε Where:
• ε = Error term (captures the difference between actual
and predicted values)
• In a study to predict student exam scores,
multiple regression could be used to analyze
how factors like study hours, attendance, and
participation rates affect scores.
3. Logistic Regression
• Logistic regression is a statistical method used to predict
binary outcomes (1/0, True/False, Yes/No) by modeling the
probability of an event occurring.
• Unlike linear regression, logistic regression deals with
categorical (usually binary) outcomes rather than
continuous variables.
• Key Concepts:
• Binary Logistic Regression: Used for binary outcomes.
• Multinomial Logistic Regression: Used for multi-class
classification.
• Sigmoid Function: Logistic regression uses the sigmoid
function to transform linear predictions into probabilities.
• The key concept behind logistic regression is
transforming a linear equation's output (which
could be any real number) into a probability
between 0 and 1 using the logistic (sigmoid)
function.
Logistic Regression Formula
• The logistic regression model can be
expressed as:
Example Problem
• Scenario:
• You want to predict whether a student will
pass (1) or fail (0) an exam based on the
number of hours they study.
Steps to Apply Logistic Regression:
• Make Predictions: To predict whether a
student will pass or fail based on the hours
studied, input the hours into the model.
• Decision Threshold:
• In most cases, you will use a threshold of 0.5
to classify the outcomes. If p>0.5, predict 1
(pass); otherwise, predict 0 (fail).
• The threshold can be adjusted depending on
the problem’s needs.
Model Evaluation
Key Properties of Logistic Regression:
• Interpretability: The coefficients βi represent the
log-odds of the outcome. They can give insight into
how each feature influences the probability of the
outcome.
• Assumptions: Logistic regression assumes linearity
between the independent variables and the log-
odds of the dependent variable.
• No Collinearity: Multicollinearity among features
can impact the model, so features should ideally be
independent.
• Logistic regression is a simple yet powerful
technique for binary classification problems.
• Its ease of interpretability and efficiency
makes it a popular choice in various fields such
as finance, medicine, and marketing.
Conclusion
• Regression techniques are fundamental in
machine learning for predictive modeling and
classification tasks. Understanding the
differences and applications is essential for
building accurate models.