Lecture 5 Regression
Lecture 5 Regression
Linear Regression
Linear Regression
A regression attempts to fit a function to observed data to make predictions on new
data. A linear regression fits a straight line to observed data, attempting to
demonstrate a linear relationship between variables and make predictions on new
data yet to be observed.
Linear Regression
Linear Regression
ŷ = ax + b
The developed regression line, ŷ will be slope
the line that minimises the distance intercept
between data and fitted line, i.e. the
residuals
ε
ε = residual error
The Least Squares (Regression) Line
6
The Least Squares (Regression) Line
Sum of squared differences = (2 - 1)2 +(4 - 2)2 +(1.5 - 3)2 +(3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 +(4 - 2.5)2 + (1.5 - 2.5) +(3.2 - 2.5) = 3.99
2 2
n
SSE =
i =1
( y i − ŷ i ) 2 .
– A shortcut formula
= ŷ Predicted Value
= y i , true value
ε = residual error
Finding b
First we find the value of b that gives the minimum sum of squares
b
ε b ε
b
b b b
❑
❑
❑
❑❑ ❑
❑
❑
❑ ❑
❑ ❑
y = ax + b b = y – ax
◼ We can put our equation for a into this giving:
r sy r = correlation coefficient of x and y
b=y- s x sy = standard deviation of y
x sx = standard deviation of x
◼ The smaller the correlation, the closer the intercept is to the mean of y
Back to the model
a b
r sy r sy
ŷ = ax + b = x+y- x
sx sx
a a
r sy
Rearranges to: ŷ= (x – x) + y
sx
• If the correlation is zero, we will simply predict the mean of y for every
value of x, and our regression line is just a flat straight line crossing the
x-axis at y
• We can calculate the regression line for any data, but the important
question is how well does this line fit the data, or how good is it at
predicting y from x
Regression Sums of Squares
Sum of squares due to the regression: difference
between TSS and SSE, i.e. SSR = TSS – SSE.
n n
SSR = ( yi − yi ) − ( yi − yˆ i )
2 2
i =1 i =1
n
= ( y − yˆ i ) 2
i =1
STA6166-RegBasics 19
Graphical View
Linear Model
Mean Model
yˆ i = ˆ0 + ˆ1 xi
STA6166-RegBasics 21
Basic Linear
BasicRegression – Intercept
Linear Regression Determination
with SciPy
Python Code
Python Code
Basic Linear
BasicRegression – Intercept
Linear Regression Determination
with SciPy
Basic Linear
BasicRegression – Intercept
Linear Regression Determination
with SciPy
Basic Linear
BasicRegression – Intercept
Linear Regression Determination
with SciPy
Basic Linear
BasicRegression – Intercept
Linear Regression Determination
with SciPy
Basic Linear
BasicRegression – Intercept
Linear Regression Determination
with SciPy
Closed Form Equation
For a simple linear regression with only one input and one output
variable, here are the closed form equations to calculate m and b.
Closed Form Equation
For a simple linear regression with only one input and one output
variable, here are the closed form equations to calculate m and b.
Linear Regression
alculating m and b using Pytho
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Basic Linear Regression – Intercept Determination
Basic Linear Regression with SciPy
Basic Linear Regression with m and b Calculation
What defines a “best fit”?
How do we get to that “best fit”?
Visualizing the sum of squares - sum of all areas where each square has
a side length equal to the residual