0% found this document useful (0 votes)
19 views

Regression

The document discusses various machine learning regression techniques including linear regression, multivariate regression, polynomial regression, and logistic regression. It covers concepts such as bias, variance, and the bias-variance tradeoff in model selection and evaluation.

Uploaded by

amrutamhetre9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Regression

The document discusses various machine learning regression techniques including linear regression, multivariate regression, polynomial regression, and logistic regression. It covers concepts such as bias, variance, and the bias-variance tradeoff in model selection and evaluation.

Uploaded by

amrutamhetre9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Machine Learning

Module1_2
Regression

Dr. Abhishek Bhatt

[email protected]
Topics to be covered

Regression: Linear Regression, Multivariate


Regression, Subset Selection, Shrinkage
Methods, Principal Component Regression,
Logistic Regression, Partial Least Squares
What is Regression?

Regression analysis is one of the most important fields in


statistics and machine learning. There are many regression
methods available. Linear regression is one of them.

Regression searches for relationships among variables.

For example, you can observe several employees of some


company and try to understand how their salaries depend on
the features, such as experience, level of education, role, city
they work in, and so on.
Linear Regression
• Linear Regression is a supervised machine learning
algorithm.
• It tries to find out the best linear relationship that describes
the data you have.
• It assumes that there exists a linear relationship between a
dependent variable and independent variable(s).
• The value of the dependent variable of a linear regression
model is a continuous value i.e. real numbers.

When implementing linear regression of some dependent


variable 𝑦 on the set of independent variables 𝐱 = (𝑥₁, …, 𝑥ᵣ),
where 𝑟 is the number of predictors,
Simple Linear Regression
Simple or single-variate linear regression is the simplest case of linear regression
with a single independent variable, 𝐱 = x.

The estimated regression function (red line) has the equation:

𝑓(𝑥)=𝑏𝑜+𝑏1𝑥

Our goal is to calculate the optimal values of the predicted weights 𝑏₀ and 𝑏₁ that minimize
the sum of squared residuals (SSR) and determine the estimated regression function.
Linear Re(lationship)
you assume a linear relationship between 𝑦 and 𝐱:
𝑝

𝑓 𝑥 = 𝛽0 + 𝛽𝑗 𝑥𝑗
𝑗=1

i.e., E 𝑦 𝑥 is Linear

Where, 𝑦 = 𝛽₀ + 𝛽₁𝑥₁ + ⋯ + 𝛽ᵣ𝑥ᵣ + 𝜀.

This equation is the regression equation.

Where,

𝛽₀, 𝛽₁, …, 𝛽ᵣ are the regression coefficients, and

𝜀 is the random error.


Ordinary least square or Residual Sum of squares (RSS) —
Here the cost function is the (y(i) — y(pred))² which is minimized to find that
value of β0 and β1, to find that best fit of the predicted line.
Multiple Linear Regression
Multiple or multivariate linear regression is a case of linear
regression with two or more independent variables.

If there are just two independent variables, the estimated


regression function is:
𝑓(𝑥1,𝑥2)=𝑏0+𝑏1𝑥1+𝑏2𝑥2

It represents a regression plane in a three-dimensional space.


The goal of regression is to determine the values of the
weights 𝑏₀, 𝑏₁, and 𝑏₂ such that this plane is as close as
possible to the actual responses and yield the minimal SSR.
Polynomial Regression
• You can regard polynomial regression as a generalized case of linear
regression. You assume the polynomial dependence between the
output and inputs and, consequently, the polynomial estimated
regression function.

• In other words, in addition to linear terms like 𝑏₁𝑥₁, your regression


function 𝑓 can include non-linear terms such as 𝑏₂𝑥₁², 𝑏₃𝑥₁³, or even
𝑏₄𝑥₁𝑥₂, 𝑏₅𝑥₁²𝑥₂, and so on.

• The simplest example of polynomial regression has a single


independent variable, and the estimated regression function is a
polynomial of degree 2:
• 𝑓(𝑥)=𝑏𝑜+𝑏1𝑥+𝑏2𝑥2
Polynomial Regression
Regression Assumptions
• Linearity
The simple meaning of linearity is understand
from scatter plot which is the straight line that
drawn in between points on scatter plot.
1. Superposition 2. Homegeneity
• No Heteroskedasticity
The presence of non-constant variance in the
error terms results in heteroskedasticity
Regression Assumptions…..
• No Multi-collinearity
Multi-collinearity refers to a situation in which two or more
explanatory variables in a multiple regression model are highly linearly
related.

• How to identify this:


Using Pearson-correlation.

• How to fix this:

Drop one of the two variables


Create a function to create a new independent variable using the
correlated features and drop the correlated features.
Cost Function
• It is a function that measures the performance
of a Machine Learning model for given data.
Cost Function quantifies the error between
predicted values and expected values and
presents it in the form of a single real number.
Let us talk about the weather.

It rains only if it’s a little humid and does not rain if it’s windy,
hot or freezing.

In this case, how would you train a predictive model and


ensure that there are no errors in forecasting the weather?

You may say that there are many learning algorithms to


choose from.

They are distinct in many ways but there is a major


difference in what we expect and what the model predicts.

That’s the concept of Bias and Variance Tradeoff.


The primary aim of the Machine Learning model is to learn from the given data and
generate predictions based on the pattern observed during the learning process.

We also quantify the model’s performance using metrics like Accuracy, Mean
Squared Error(MSE), F1-Score, etc and try to improve these metrics.

A supervised Machine Learning model aims to train itself on the input variables(X) in
such a way that the predicted values(Y) are as close to the actual values as possible.
This difference between the actual values and predicted values is the error and it is
used to evaluate the model. The error for any supervised Machine Learning algorithm
comprises of 3 parts:
Bias error
Variance error
The noise (irreducible that we cannot eliminate)
What is Bias?
In the simplest terms, Bias is the difference between the Predicted Value and the
Expected Value. To explain further, the model makes certain assumptions when it
trains on the data provided. When it is introduced to the testing/validation data,
these assumptions may not always be correct.

Mathematically, let the input variables be X and a target variable Y. We map the
relationship between the two using a function f.

By high bias, the data predicted is in a


straight line format, thus not fitting
accurately in the data in the data set.
Such fitting is known as Underfitting
of Data. This happens when the
hypothesis is too simple or linear in
nature. Refer to the graph given
below for an example of such a
situation.
Variance
The variability of model prediction for a given data point which tells us spread of our
data is called the variance of the model. The model with high variance has a very
complex fit to the training data and thus is not able to fit accurately on the data which
it hasn’t seen before. As a result, such models perform very well on training data but
has high error rates on test data.
When a model is high on variance, it is then said to as Overfitting of Data. Overfitting
is fitting the training set accurately via complex curve and high order hypothesis but is
not the solution as the error with unseen data is high.
While training a data model variance should be kept low.
The high variance data looks like follows.
Trade Off
If the algorithm is too simple (hypothesis with linear eq.) then it may be on high bias and
low variance condition and thus is error-prone. If algorithms fit too complex ( hypothesis
with high degree eq.) then it may be on high variance and low bias. In the latter
condition, the new entries will not perform well. Well, there is something between both
of these conditions, known as Trade-off or Bias Variance Trade-off.
This tradeoff in complexity is why there is a tradeoff between bias and variance. An
algorithm can’t be more complex and less complex at the same time. For the graph, the
perfect tradeoff will be like.
The best fit will be given by hypothesis on the tradeoff point.
The error to complexity graph to show trade-off is given as –
1. Subset Selection

2. Shrinkage/Ridge Regression

3. Derived Inputs
Evaluating an Estimator: Bias and
Variance
Tuning Model Complexity:
Bias/Variance Dilemma
The Expected Value
A Didactic Example
Bias Variance Trade-off
Under fitting/Over fitting

If there is bias, this indicates that our model class does


not contain the solution; this is underfitting.

If there is variance, the model class is too general and


also learns the noise; this is overfitting.

As for variance, it also depends on the size of the training


set; the variability due to sample decreases as the sample
size increases.
Logistic Regression
Logistic regression is a fundamental classification
technique. It belongs to the group of linear
classifiers and is somewhat similar to polynomial
and linear regression.

Logistic regression is fast and relatively


uncomplicated, and it’s convenient for you to
interpret the results. Although it’s essentially a
method for binary classification, it can also be
applied to multiclass problems.
sigmoid function
This image shows the sigmoid function (or S-shaped curve) of some variable 𝑥.
The sigmoid function has values very close to either 0 or 1 across most of its domain.
This fact makes it suitable for application in classification methods.
Single-Variate Logistic Regression
Single-variate logistic regression is the most straightforward case of logistic
regression. There is only one independent variable (or feature), which is 𝐱 = 𝑥.
This figure illustrates single-variate logistic regression:

Here, you have a given set of input-output (or 𝑥-𝑦) pairs, represented by green circles.
These are your observations. Remember that 𝑦 can only be 0 or 1.

For example, the leftmost green circle has the input 𝑥 = 0 and the actual output 𝑦 = 0. The
rightmost observation has 𝑥 = 9 and 𝑦 = 1.
Single-Variate Logistic Regression
• Logistic regression finds the weights 𝑏₀ and 𝑏₁ that
correspond to the maximum log-likelihood function (LLF).
• These weights define the
𝑓(𝑥) = 𝑏₀ + 𝑏₁𝑥,
which is the dashed black line.
They also define the predicted probability
𝑝(𝑥)=1/(1+𝑒𝑥𝑝(−𝑓(𝑥)))p(x)=1/(1+exp(−𝑓(𝑥))),
shown here as the full black line.
In this case, the threshold 𝑝(𝑥) = 0.5 and 𝑓(𝑥) = 0 corresponds
to the value of 𝑥 slightly higher than 3. This value is the limit
between the inputs with the predicted outputs of 0 and 1.

You might also like