Chapter 7 Presentation - 11.18.2024
Chapter 7 Presentation - 11.18.2024
CIVE 224
Chapter 7
Regression Analysis
Regression Analysis
- A statistical method to examine the Types of Regression:
relationship between two or more - Linear (single/multiple
variables. independent variables)
- The purpose is to understand how the - Nonlinear: exponential, power,
ONE dependent variable (Y) changes logarithmic, etc.
when any one of the independent
variables (X) changes, while the other Applications:
independent variables are held fixed. - Predicting “Y” based on “X”
- Determine the strength of
- Regression helps predicting predicators “Xi”, which X is the
dependent variable with help of most reliable to predict “Y”
independent variable
- Trends analysis: trends overtime
Linear Regression
Squaring residuals ensures that both
The Method of Least Square: positive & negative differences add to
A standard approach in regression the overall error and that larger errors
analysis to approximate the solution, are penalized more heavily.
where there are more equations than
unknowns (intercept & slope). The goal of least-square regression is to
find the values of a and b that minimize
Such system is called: Over-determined ∑(yobserved − ypredicted)2
Used to find best-fit line of a data set by
minimizing the sum of the squares of Mathematically: data set (Xi, Yi), i = n
difference “residual” between observed The linear regression is described by:
and predicted values
𝑌 𝑋𝑖 = 𝑎 + 𝑏𝑋𝑖
- Residual: the differences between
observed and predicted values
Linear Regression
𝑛
𝑌 𝑋𝑖 = 𝑎 + 𝑏𝑋𝑖
Least –square regression, Total Error (E)
𝐸 = [𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 2 ]
𝑖=1
𝑛
𝐸 = [𝑦𝑖 − 𝑌(𝑥𝑖)2 ] The goal: find slope (a) & intercept (b) that
𝑖=1
minimize (E) to ensure the best fit to the
data
𝑛
- 𝑦𝑖: observed value of dependent
variable min [𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 2 ]
𝑎,𝑏
- Y(xi): predicted value “Y” for “Xi” 𝑖=1
calculated using the equation
- [𝑦𝑖 − 𝑌 𝑥𝑖 2 ] : square residual for
each data point
Linear Regression
𝑛
We calculate partial derivatives of E
min [𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 2 ] with respect to a and b, then set them
𝑎,𝑏 to zero (this finds the minimum of EEE).
𝑖=1
With respect to “a”
Which are derived by setting the partial
derivatives of the sum of squared 𝜕𝐸
residuals with respect to each
= −2 σ𝑛𝑖=1[𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 ] = 0
𝜕𝑎
parameter (slope and intercept) to zero. Where the derivative of (−a) is (−1)
𝜕𝐸 𝜕𝐸 Equivalently:
= 0 𝑎𝑛𝑑 =0 𝑛
𝜕𝑎 𝜕𝑏
𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 =0
𝑖=1
Equation - 1
Linear Regression
Partial derivative With respect to “b” Equivalently:
𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 = −𝑥𝑖 𝑛
“a” is treated as constant with respect 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 ∗ 𝑥𝑖 = 0
to “b”. 𝑖=1
𝑛
1. Calculate the means of X and Y: This formula is derived from the least
squares criterion, ensuring that the line
𝑛
1 minimizes the sum of squared residuals
𝑥ҧ = 𝑥𝑖
𝑛 3. Calculate the intercept (a):
𝑖=1
𝑎 = 𝑦ത − 𝑏𝑥ҧ
𝑛
1 σ 𝑦𝑖 − 𝑏 ∗ σ 𝑥𝑖
𝑦ҧ = 𝑦𝑖 𝑎=
𝑛
𝑖=1 𝑛
Example b=
σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ ∗(𝑦𝑖 −𝑦ҧ)
σ𝑛𝑖=1 𝑥𝑖 −𝑥ҧ
2
For the following data set, calculate the
slope and intercept for the best-fit line 1. Calculate X average: 3
2. Calculate X Deviation: -2,-1,0,1,2
X Y
3. Calculate Y average: 4
1 2
4. Calculate Y deviation: -2,0,1,0,1
2 4
5. σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ ∗ (𝑦𝑖 − 𝑦ҧ) : 6
3 5
6. Calculate σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 2 = 10
4 4
7. b = 0.6
5 5
Example σ 𝑦𝑖 − 𝑏 ∗ σ 𝑥𝑖
𝑎=
For the following data set, calculate the 𝑛
slope and intercept for the best-fit line
1. Calculate σ 𝑥𝑖 : 15
X Y 2. Calculate b* σ 𝑥𝑖 : 9
1 2 3. Calculate σ 𝑦𝑖 : 20
2 4 4. a = 2.2
3 5
𝑌 𝑋 = 𝑎 + 𝑏𝑋
4 4
5 5 𝑌 𝑋 = 2.2 + 0.6𝑋
Measuring the Variability of Results
Total Variability
𝑆𝑦 =
𝑆𝑦𝑦 ↑ Sys high,
=
Syy = ∑(yi−𝑦ത )2
𝑛−1 Total sum of squares for the dependent
variable
Measuring the Variability of Results
Variability About the Regression Line How to calculate (SY∣X) the variability
(SY∣X) – Standard Error of Estimate: about the regression line?
𝑦ො 𝑥 = 𝑎 + 𝑏𝑥
Measuring the Variability of Results
Calculate SY∣X using the formula SXY: The sum of the multiplication of the
deviations of X and Y from their means
𝑆𝑦𝑦 −𝑏∗𝑆𝑥𝑦 𝑛
𝑆(𝑌 ∣ 𝑋) =
𝑛 −2 𝑆𝑋𝑌 = 𝑥𝑖 − 𝑥ҧ ∗ 𝑦𝑖 − 𝑦ത
𝑖=1
Syy: the total sum of squares for the
dependent variable Y (total variation) b: Regression line slope
𝑛 𝑆𝑥𝑦 σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ ∗(𝑦𝑖 −𝑦ҧ)
b= = σ𝑛 2
𝑆𝑦𝑦 = 𝑦𝑖 − 𝑦ത 2 𝑆𝑥𝑥 𝑖=1 𝑥𝑖 −𝑥ҧ
𝑖=1
n: number of observations
Correlation vs. Regression Analysis
Correlation measures the:
Regression Analysis:
▪ Strength
Concerned with predicting the LEVEL of
relationship between dependent ▪ Direction
variable Y for independent variable X of the linear relationship between two
variables.
Correlation Analysis:
Often quantified by the Pearson
Concerned with the STRENGTH of correlation coefficient
relationship between Y and X
Ranges from -1 (perfect negative
correlation) to +1 (perfect positive
correlation), with 0 indicating no linear
relationship.
Correlation vs. Regression
Correlation vs. Regression
Correlation
𝑛
Sample Correlation Coefficient:
𝑆𝑥𝑥 = 𝑥𝑖 − 𝑥ҧ 2
𝑆𝑥𝑦
𝑟= 𝑖=1
𝑆𝑥𝑥 ∗ 𝑆𝑦𝑦
𝑛 σ𝑛𝑖=1[ 𝑥𝑖 − 𝑥ҧ ∗ 𝑦𝑖 − 𝑦ത ]
𝑟= 𝑛
σ𝑖=1 𝑥𝑖 − 𝑥ҧ ∗ σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത
𝑆𝑋𝑌 = 𝑥𝑖 − 𝑥ҧ ∗ 𝑦𝑖 − 𝑦ത
𝑖=1
𝑛 To obtain the Population Correlation
Coefficient "𝜌”, we use the Fisher Z
2
𝑆𝑦𝑦 = 𝑦𝑖 − 𝑦ത transformation
𝑖=1
Example
Data for Pressure and flow rate: - Calculate Intercept (a):
σ yi−b∗σ xi
Pressure (x) 5, 6, 7, 8, 9, and 10 a= = -58.14
n
Flowrate (y) 14, 25, 70, 85, 49, and 105
- Line of Best Fit:
Calculate the line-best fit, correlation Y = −58.14 + 15.49X
coefficient (r), and explain what does it
mean.
- Calculate Correlation Coefficient (r):
- Calculate Slope (b): σ𝑛𝑖=1[ 𝑥𝑖 − 𝑥ҧ ∗ 𝑦𝑖 − 𝑦ത ]
σ𝑛 𝑟= 𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ ∗(𝑦𝑖 −𝑦ҧ) σ𝑖=1 𝑥𝑖 − 𝑥ҧ ∗ σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത
b= σ𝑛 2 = -15.49
𝑖=1 𝑥𝑖 −𝑥ҧ
271.0
𝑟= = 0.824
329.1
Example
What r Represents:
• The correlation coefficient (r) measures the strength and direction of the
linear relationship between pressure and flowrate:
• r = 0.824 indicates a strong positive correlation, meaning as the pressure
increases, the flowrate tends to increase as well.
• The value is close to 1, suggesting that the data points are relatively well
aligned with the line of best fit.