3 - Linear Regression-Least Square Error Fit
3 - Linear Regression-Least Square Error Fit
Regression is a statistical method used to find the relationship between one dependent
It attempts to model an output feature between output feature and input features by
Simple Regression
Simple Linear Regression
Simple Non-Linear Regression
Multiple Regression
Multiple Linear Regression
Multiple Non-Linear Regression
Linear Regression
Linear regression is a linear model, e.g. a model that assumes a linear relationship between
the input variables (x) and the single output variable (y). More specifically, that y can be
Example:
over time shows that there seems to be a relationship between time and population
growth, but that it is a nonlinear relationship, requiring the use of a nonlinear regression
model. A logistic population growth model can provide estimates of the population for
periods that were not measured, and predictions of future population growth.
Linear Regression
Nonlinear regression and linear regression models are similar in one sense that both seek to
Nonlinear models are more complicated than linear models to develop because the function
is created through a series of approximations (iterations) that may stem from trial-and-error.
Mathematicians use several established methods, such as the Gauss-Newton method and the
Levenberg-Marquardt method.
Linear Regression
You have to make sure that a linear relationship exists between the dependent
• For example:
- Predict height from age
- Predict house price from
number of bed rooms
X
X: No of Bed Y: Price(in Lacs)
Rooms
5 101.5
3 79.9
5 99.87
3 56.9
2 66.6
5 105.45
5 126.3
4 89.25
5 99.97
3 87.6
4 112.6
3 85.6
3 78.5
2 74.3
2 74.8
Figure 2(a) Figure 2(b)
Simple Linear Regression (SLR)
Simple linear regression is a linear regression model with a single explanatory
variable.
The adjective simple refers to the fact that the outcome variable is related to a
single predictor.
Simple Linear Regression (SLR) Contd….
Simple linear regression finds a linear
function (a non-vertical straight line)
that, as accurately as possible, predicts
the dependent variable values as a
function of the independent variable.
For instance, in the house price
predicting problem (with only one
input variable-plot size), a linear
regressor will fit a straight line with x-
axis representing plot size and y-axis
representing price.
Fitting the Straight Line for SLR
The linear function that binds the input variable x with the corresponding
predicted value of (yˆ) can be given by the equation of straight line(slope-
intercept form) as:
𝑦ˆ=𝛽0 + 𝛽1 𝑥
where 𝛽1 is the slope of line (i.e. it measures change in output variable y with
unit change in independent variable x).
𝛽0 represents y-intercept i.e. the point at which the line touch x-axis
𝑦ˆ is the predicted value of the output for the particular value of input variable x.
Cost/Error function for SLR
The major goal of SLR model is to fit the straight line that predicts the output
variable value quite close to the actual value.
But, in real world scenario, there is always some error (regression residual) in
predicting the values, i.e.
𝑎𝑐𝑡𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒𝑖 + 𝑒𝑟𝑟𝑜𝑟
𝑦𝑖 = 𝑦ˆ𝑖 + 𝜖𝑖
𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝐸𝑟𝑟𝑜𝑟 = 𝜖𝑖 = 𝑦𝑖 − 𝑦ˆ𝑖
This error may be positive or negative, as it may predict values greater or lesser
than actual values. So we consider square of each error value.
Cost/Error function for SLR
The total error for all the n points in the dataset is given by:
= (𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 )2
𝑖=1
The mean of square error is called the cost or error function for simple linear function denoted by J(𝛽0 ,
𝛽1 ) and given by:
𝑛
1
J(𝛽0 , 𝛽1 )= (𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 )2
𝑛
𝑖=1
There exist many methods to optimize (minimize) this cost/error function to find line of best fit.
Least Square Method for Line of Best Fit
The least square method aims to find values 𝛽0 ˆand 𝛽1 ˆ for 𝛽0 and 𝛽1 for which the square error
between the actual and the predicted values is minimum i.e. least (So, the name is least square
error fit).
The values 𝛽0 ˆand 𝛽1 ˆ for 𝛽0 and 𝛽1 for which the square error function (J (𝛽0 , 𝛽1 )) is minimum
are computed using second derivative test as below:
𝜕 J (𝛽0 ,𝛽1 ) 𝜕 J (𝛽0 ,𝛽1 )
1. Compute partial derivatives of J (𝛽0 , 𝛽1 ) w.r.t 𝛽0 and 𝛽1 i.e. 𝜕𝛽0
and 𝜕 𝛽1
𝜕 J (𝛽0 ,𝛽1 )
2. Find values 𝛽0 ˆand 𝛽1 ˆ for which
𝜕𝛽0
=0 and 𝜕 J 𝜕𝛽
(𝛽0 ,𝛽1 )
=0
1
𝜕 2J (𝛽0 ,𝛽1 ) 𝜕2 J (𝛽0 ,𝛽1 )
3. Find second partial derivative and ; and prove it be minimum for 𝛽0 ˆand
𝜕𝛽0 2 𝜕𝛽1 2
𝛽1 ˆ .
Least Square Error Fit- Contd…..
𝑇𝑜𝑡𝑎𝑙 𝑆𝑞𝑎𝑢𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 = 𝐽(𝛽0 , 𝛽1 ) = σ𝑛𝑖=1(𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 )2
𝜕 J (𝛽0,𝛽1 ) 𝜕 J (𝛽0 ,𝛽1 )
Step 1: Compute partial derivatives of J (𝛽0, 𝛽1 ) w.r.t 𝛽0 and 𝛽1 i.e. 𝜕𝛽0
and 𝜕𝛽1
𝜕 J (𝛽0 ,𝛽1 )
= −2 σ𝑛𝑖=1(𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 )
𝜕𝛽0
𝜕 J (𝛽0 ,𝛽1 )
= −2 σ𝑛𝑖=1 𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 𝑥𝑖 = −2 σ𝑛𝑖=1(𝑥𝑖 𝑦𝑖 − 𝛽0 𝑥𝑖 − 𝛽1 𝑥𝑖2)
𝜕𝛽1
Therefore, both are positive i.e. the cost function continuously increases with respect to 𝛽0
and 𝛽1 beyond 𝛽0 ˆ and 𝛽1 ˆ. But attains it minimum value at 𝛽0 ˆ and 𝛽1 ˆ
Least Square Error Fit- Summary
The linear function that binds the input variable x with the corresponding predicted value of (yˆ)
can be given by the equation of straight line(slope-intercept form) as:
𝑦ˆ=𝛽0 + 𝛽1 𝑥
The square error in prediction is minimized when
𝑛 σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − σ𝑛𝑖=1 𝑥𝑖 σ𝑛𝑖=1 𝑦𝑖
𝛽1 ˆ =
n σ𝑛𝑖=1 𝑥𝑖2 − ( σ𝑛𝑖=1 𝑥𝑖 )2
𝜎𝑦 σ𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦 ҧ 𝑖 − 𝑦) ത
= 𝑟𝑥𝑦 =
𝜎𝑥 σ𝑛𝑖=1(𝑥𝑖 − 𝑥)ҧ 2
and 𝛽0 ˆ = 𝑦ത − 𝛽1 ˆ𝑥ҧ
Least Square Error Fit- Example
Height (m), xi Mass (kg), yi
The data set (shown in table) gives
1.47 52.21
average masses for women as a 1.50 53.12
function of their height in a sample of 1.52 54.48
American women of age 30–39. 1.55 55.84
1.57 57.20
(a) Fit a square line for average mass as 1.60 58.57
function of height using least square 1.63 59.93
𝛽0ˆ = 𝑦ത − 𝛽1ˆ 𝑥ҧ
931.17 24.76
𝛽0 ˆ = − 61.19 × = −38.88
15 15
Therefore the line of best fit is given by: 𝑦ˆ= − 38.88 + 61.19𝑥
Predicted value of y when x is 1.4 is
𝑦ˆ= − 38.88 + 61.19 × 1.4 = 46.78
Multiple Linear Regression (MLR)
Multiple regression models describe how a single response variable Y depends
linearly on a number of predictor variables.
Examples:
The selling price of a house can depend on the desirability of the location, the
number of bedrooms, the number of bathrooms, the year the house was built,
the square footage of the lot and a number of other factors.
The height of a child can depend on the height of the mother, the height of the
father, nutrition, and environmental factors.
Regression Example
House Value Prediction- The example below shows that the price variable
(output dependent continuous variable) depends upon various input
(independent) variables such as plot size, number of bedrooms, covered area,
granite flooring, distance from city, age, upgraded kitchen, etc.
Multiple Linear Regression Model
A multiple linear regression model with k independent predictor variables x1,x2...,xk predicts the
output variable as:
𝑦 ˆ = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … … . . +𝛽 𝑥
0 1 1 2 2 3 3 𝑘 𝑘
There is always some error (regression residual) in predicting the values, i.e.
The total error can be computed from all the values in dataset i.e. i=1,2,….,n
𝐽 𝛽 = 𝑦 𝑇 𝑦 − 𝛽 𝑇𝑋 𝑇𝑦 − 𝑦 𝑇𝑋𝛽 + 𝛽 𝑇𝑋 𝑇 𝑋𝛽
𝑱 𝜷 = 𝒚𝑻 𝒚 − 𝟐𝒚𝑻𝑿𝜷 + 𝜷𝑻𝑿𝑻 𝑿𝜷
−2𝑋 𝑇 𝑦 + 2𝑋 𝑇 𝑋𝛽ˆ = 0
𝑋 𝑇 𝑋𝛽ˆ=𝑋 𝑇 𝑦
𝜷ˆ=(𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒚
𝝏2 J(𝜷)
Step 3: Compute and prove it to be minimum for 𝜷ˆ
𝝏𝜷𝟐
number of cases of product the distance walked by the Delivery time (in min) y
stocked (x1) driver (x2)
7 560 16.68
3 220 11.50
3 340 12.03
(a) Fit a multiple regression line using least square error fit.
(b) Compute the delivery time when 4 cases are stocked and the distance traveled by driver is 80 feet.
Least Square Error Fit for MLR- Example
Soln
The multiple linear regression equation is: y = 𝛽1ˆ+𝛽2ˆx1+ 𝛽3ˆx2
𝛽1ˆ
Where 𝛽1ˆ, 𝛽2ˆ, 𝛽3ˆ or 𝛽ˆ= 𝛽2ˆ are regression coefficients for line of best fit.
𝛽3ˆ
We know, 𝜷ˆ=(𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒚
1 7 560 1 1 1
𝑋= 1 3 220 𝑎𝑛𝑑 𝑋 𝑇 = 7 3 3
1 3 340 560 220 340
3 13 1120
𝑋 𝑇 𝑋 = 13 67 5600
1120 5600 477600
Least Square Error Fit for MLR- Example
Soln
799/288 79/288 −7/720
(𝑋 𝑇 𝑋)−1= 79/288 223/288 −7/720
−7/720 −7/720 1/7200
7.7696
𝛽ˆ=(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = 0.9196
0.0044
The line of best fit is, 𝑦 = 7.7696 + 0.9196𝑥1 + 0.0044𝑥2
When x1= 4, x2= 80
𝑦 = 7.7696 + 0.9196 𝑋 4 + 0.0044 𝑋 80 = 11.80 min