REGRESSION
REGRESSION
AGENDA:
The formula for a straight line is usually written like b0 always refers to the intercept term, and b1
this:
y=mx+c
refers to the slope. Refers to the estimate or
the prediction that our regression line is making.
The two variables are x and y, and we have two Difference between the model prediction and
coefficients, m and c. The coefficient m represents that actual data point as a residual, and we’ll
the slope of the line, and the coefficient c represents refer to it as ϵi.
the y-intercept of the line.
Thus our LR model is:
What is sum of squared error (SSE or RSS)?
Location on
the x,y plane (1, (2, 5) (3,3)
2)
Linear Regression: Analytical
Approach y
Location on
the x,y plane (1, 2) (2, 5) (3,3)
x
Linear Regression: Analytical
Approach y
Location on
the x,y plane (1, 2) (2, 5) (3,3)
(1,2 )
x
Linear Regression: Analytical
Approach y
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(1,2 )
x
Linear Regression: Analytical
Approach y
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5 )
(3,3)
(1,2 )
x
Linear Regression: Analytical
Approach y
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5 )
(3,3)
(1,2 )
x
Linear Regression: Analytical
Approach y
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5 )
(3,3)
(1,2 )
x
Linear Regression: Analytical
Approach y
Initial line to represent
where to pass the
connection on the x-y
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2 )
x
Linear Regression: Analytical
Approach y
Initial line to represent
where to pass the
connection on the x-y
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2 )
x
Linear Regression: Analytical
Approach y
Initial line to represent
where to pass the
connection on the x-y
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2 )
x
Linear Regression: Analytical
Approach y
Initial line to represent
where to pass the
connection on the x-y
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
(3,3)
(1,2 )
x
Linear Regression: Analytical
Approach y
Initial line to represent
where to pass the
connection on the x-y
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
x
Linear Regression: Analytical
Approach y
Initial line to represent
where to pass the
connection on the x-y
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
x
Linear Regression: Analytical
Approach y
Initial line to represent
where to pass the
connection on the x-y
plane
Location on
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
x
Linear Regression: Analytical
Approach y
Initial line to represent
where to pass the
connection on the x-y
plane
Location on y = m x +b
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
x
Linear Regression: Analytical
Approach y
Initial line to represent
where to pass the
connection on the x-y
plane
Location on y =mx + b
the x,y plane (1, 2) (2, 5) (3,3) (2, 5)
y = m +b
x
(2, 5)
(3,3)
(1,2)
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
y = m +b
x
(2, 5)
(3,3)
(1,2)
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
y = m +b
x
(2, 5)
(1, m+b)
(3,3)
(1,2)
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
y = m +b
x
(2, 5)
(1, m+b)
(3,3)
(1,2) (m + b– 2 )2
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
y = m +b
x
(2, 5)
(2m + b − 5)2
(3,3)
(1,2) (m + b– 2 )2
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
y = m +b
x
(2, 5)
(2m + b − 5)2
2
(3m + b− 3)
(3,3)
(1,2) (m + b– 2 )2
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
y = m +b
x m2 + b2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2 +4m2 + b2 + 25 + 4mb − 20m − 10b
2
(3m + b− 3) +9m2 + b2 + 9 + 6mb − 18m − 6b
(3,3)
(1,2) (m + b – 2 )2
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
y = m +b
x m2 + b2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2 +4m2 + b2 + 25 + 4mb − 20m − 10b
2
(3m + b− 3) +9m2 + b2 + 9 + 6mb − 18m − 6b
(3,3)
(1,2) (m + b – 2 )2
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
y = m +b
x m2 + b2 + 4 + 2mb − 4m − 4b
(2, 5)
(2m + b − 5)2 +4m2 + b2 + 25 + 4mb − 20m − 10b
2
(3m + b− 3) +9m2 + b2 + 9 + 6mb − 18m − 6b
(3,3)
(1,2) (m + b– 2 )2 E(m, b) = 14m2 + 3b2 + 38 + 12mb − 42m − 2 0 b
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
(3,3)
(1,2)
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
(3,3)
(1,2)
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
(3,3) ∂E
(1,2) ∂b = 0
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
(3,3) ∂E
(1,2) ∂b =
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
(3,3) ∂E Quiz:
(1,2) ∂b =
Find the partial derivative of
x with respect to
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
(3,3) ∂E
(1,2) ∂b = 6b + 12m − 20
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
(3,3) ∂E
(1,2) ∂b = 6b + 12m − 20
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
(3,3) ∂E
(1,2) ∂b = 6b + 12m − 20
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
(3,3) ∂E =0
(1,2) ∂b = 6b + 12m − 20
x
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
(3,3) ∂E =0
(1,2) ∂b = 6b + 12m − 20
x
m=
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
(3,3) ∂E =0
(1,2) ∂b = 6b + 12m − 20
x
m=
b=
Linear Regression: Analytical
Approach
y
Goal: Minimize sum of squares cost
(3,3) ∂E =0
(1,2) ∂b = 6b + 12m − 20
x
m=
b= ?
Linear Regression: Analytical
Approach
Goal: Minimize sum of squares cost
∂E =0
∂m = 28m + 12b − 42
∂E =0
∂b = 6b + 12m − 20
m=
b= ?
Linear Regression: Analytical
Approach
Goal: Minimize sum of squares cost
∂E =0
∂m = 28m + 12b − 42
∂E =0
∂b = 6b + 12m − 20
m=
b= ?
Linear Regression: Analytical
Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
∂E =0
∂m = 28m + 12b − 42
∂E =0
∂b = 6b + 12m − 20
m=
b= ?
Linear Regression: Analytical
Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m2 + 3b2 + 38 + 12mb − 42m − 20b
∂E =0
∂m = 28m + 12b − 42
∂E =0
∂b = 6b + 12m − 20
m=
b= ?
Linear Regression: Analytical
Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m2 + 3b2 + 38 + 12mb − 42m − 20b
2
m = 4= 0.5
∂E =0
∂m = 28m + 12b − 42
∂E =0
∂b = 6b + 12m − 20
m=
b= ?
Linear Regression: Analytical
Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m2 + 3b2 + 38 + 12mb − 42m − 20b
2
m = 4= 0.5
∂E =0
∂m = 28m + 12b − 42 6b + 12(0.5) − 20 = 0
∂E =0
∂b = 6b + 12m − 20
m=
b= ?
Linear Regression: Analytical
Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m2 + 3b2 + 38 + 12mb − 42m − 20b
2
m = 4= 0.5
∂E =0
∂m = 28m + 12b − 42 6b + 12(0.5) − 20 = 0
∂E =0 6b + 6 − 20 = 0
∂b = 6b + 12m − 20
m=
b= ?
Linear Regression: Analytical
Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m2 + 3b2 + 38 + 12mb − 42m − 20b
2
m = 4= 0.5
∂E =0
∂m = 28m + 12b − 42 6b + 12(0.5) − 20 = 0
∂E =0 6b + 6 − 20 = 0
∂b = 6b + 12m − 20
6b − 14 = 0
m=
b= ?
Linear Regression: Analytical
Approach
Goal: Minimize sum of squares cost 12b + 24m − 40 = 0
4m − 2 = 0
E(m, b) = 14m2 + 3b2 + 38 + 12mb − 42m − 20b
2
m = 4= 0.5
∂E =0
∂m = 28m + 12b − 42 6b + 12(0.5) − 20 = 0
∂E =0 6b + 6 − 20 = 0
∂b = 6b + 12m − 20
6b − 14 = 0
m=
b= ? b=
14
6
7
=
3
Linear Regression: Analytical
Approach
Goal: Minimize sum of squares cost
∂E =0 7
∂m = 28m + 12b − 42
b=
3
∂E =0
∂b = 6b + 12m − 20
1 7
E(m = , b = ) ≈ 4.167
2 3
m=
b= ?
Linear Regression: Optimal Solution
y
1
m= 2
y = m +b
x
7
(2, 5) b= 3
(3,3) 1 7
E(m = , b = ) ≈ 4.167
(1,2 )
x
2 3
Linear Regression: Optimal Solution
y
1
m= 2
y = m +b
x
7
(2, 5) b= 3
(3,3) 1 7
E(m = , b = ) ≈ 4.167
(1,2 )
x
2 3
Linear Regression: Optimal Solution
y
1
m= 2
y = m +b
x
(2, 5)
7
b= 3
(3,3)
(1,2) 1 7
E(m = , b = ) ≈ 4.167
x
2 3
Linear Regression: Gradient Descent
Goal: Minimize sum of squares cost
∂E =0
∂m = 28m + 12b − 42
Gradient Descent to the rescue
∂E =0
∂b = 6b + 12m − 20
m=
b= ?
REGRESSION ANALYSIS WAY OF SOLVING THE PROBLEM
Remember we started with the diagram below: Let’s tabulate the points in the graph:
y
X Y
1 2
y = m +b
x 2 5
(2, 5)
3 3
(3,3)
Here we’ll establish the relationship between the
(1,2) predictor X and the target outcome Y. We’d
need to calculate as given in the next slide.
x
REGRESSION ANALYSIS (II): SOLVING USING REGRESSION COEFFICIENT: b(xy) (for
slope)
X Y X*Y dx=X-X‾ dy=Y-Y‾ (dx)² (dy)² dx * dy
1 2 2 -1 -1 1 1 1
2 5 10 0 2 0 4 0
3 3 9 1 0 1 0 0
Total 6 10 21 0 1 2 5 1
Here n= no.of.instances = 3
x̅ (mean of X) = total of X/n = 6/3 = 2
y̅ (mean of Y) = total of Y/n = 10/3 = 3⅓ (let’s round it to 3 for calculating dy)
N= 3;
X Y X² X dx= dy = (dx)² (dy)² dx * =2
* X-X‾ Y-Y‾ dy
Y
= 3.33
1 2 1 2 -1 -1 1 1 1 Regression equation of Y on X:
2 5 4 10 0 2 0 4 0
Where
(3 * 21) – (6)(10)
Total 6 10 14 21 0 1 2 5 1
_____________ = 3/6 = ½ = 0.5
(3 * 14) – (6)²
3 3 1 0 1 0 0
Total 6 10 0 1 2 5 1
REGRESSION ANALYSIS (III) SOLVED USING COVARIANCE (for
slope)
N= 3;
X Y X² X dx= dy = (dx)² (dy)² dx * =2
* X-X‾ Y-Y‾ dy
Y
= 3.33
1 2 1 2 -1 -1 1 1 1 Regression equation: Y=m*X + b
Where m =
2 5 4 10 0 2 0 4 0