0% found this document useful (0 votes)
9 views18 pages

Chapter 7 Presentation - 11.18.2024

This document covers regression analysis, a statistical method for examining relationships between variables, focusing on linear regression and its applications. It explains the method of least squares for estimating regression coefficients, calculating slope and intercept, and measuring variability in results. Additionally, it contrasts regression analysis with correlation analysis, highlighting their respective purposes and methods of quantification.

Uploaded by

Sunny He
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views18 pages

Chapter 7 Presentation - 11.18.2024

This document covers regression analysis, a statistical method for examining relationships between variables, focusing on linear regression and its applications. It explains the method of least squares for estimating regression coefficients, calculating slope and intercept, and measuring variability in results. Additionally, it contrasts regression analysis with correlation analysis, highlighting their respective purposes and methods of quantification.

Uploaded by

Sunny He
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Statistics and Probability

CIVE 224
Chapter 7
Regression Analysis
Regression Analysis
- A statistical method to examine the Types of Regression:
relationship between two or more - Linear (single/multiple
variables. independent variables)
- The purpose is to understand how the - Nonlinear: exponential, power,
ONE dependent variable (Y) changes logarithmic, etc.
when any one of the independent
variables (X) changes, while the other Applications:
independent variables are held fixed. - Predicting “Y” based on “X”
- Determine the strength of
- Regression helps predicting predicators “Xi”, which X is the
dependent variable with help of most reliable to predict “Y”
independent variable
- Trends analysis: trends overtime
Linear Regression
Squaring residuals ensures that both
The Method of Least Square: positive & negative differences add to
A standard approach in regression the overall error and that larger errors
analysis to approximate the solution, are penalized more heavily.
where there are more equations than
unknowns (intercept & slope). The goal of least-square regression is to
find the values of a and b that minimize
Such system is called: Over-determined ∑(yobserved ​− ypredicted)2
Used to find best-fit line of a data set by
minimizing the sum of the squares of Mathematically: data set (Xi, Yi), i = n
difference “residual” between observed The linear regression is described by:
and predicted values
𝑌 𝑋𝑖 = 𝑎 + 𝑏𝑋𝑖
- Residual: the differences between
observed and predicted values
Linear Regression
𝑛
𝑌 𝑋𝑖 = 𝑎 + 𝑏𝑋𝑖
Least –square regression, Total Error (E)
𝐸 = ෍[𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 2 ]
𝑖=1
𝑛

𝐸 = ෍[𝑦𝑖 − 𝑌(𝑥𝑖)2 ] The goal: find slope (a) & intercept (b) that
𝑖=1
minimize (E) to ensure the best fit to the
data
𝑛
- 𝑦𝑖: observed value of dependent
variable min ෍[𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 2 ]
𝑎,𝑏
- Y(xi): predicted value “Y” for “Xi” 𝑖=1
calculated using the equation
- [𝑦𝑖 − 𝑌 𝑥𝑖 2 ] : square residual for
each data point
Linear Regression
𝑛
We calculate partial derivatives of E
min ෍[𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 2 ] with respect to a and b, then set them
𝑎,𝑏 to zero (this finds the minimum of EEE).
𝑖=1
With respect to “a”
Which are derived by setting the partial
derivatives of the sum of squared 𝜕𝐸
residuals with respect to each
= −2 σ𝑛𝑖=1[𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 ] = 0
𝜕𝑎
parameter (slope and intercept) to zero. Where the derivative of (−a) is (−1)
𝜕𝐸 𝜕𝐸 Equivalently:
= 0 𝑎𝑛𝑑 =0 𝑛
𝜕𝑎 𝜕𝑏
෍ 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 =0
𝑖=1
Equation - 1
Linear Regression
Partial derivative With respect to “b” Equivalently:

𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 = −𝑥𝑖 𝑛
“a” is treated as constant with respect ෍ 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 ∗ 𝑥𝑖 = 0
to “b”. 𝑖=1
𝑛

෍ 2(−𝑥𝑖) 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 Equation - 2


𝑖=1
Solving equations 1&2 gives the optimal
𝑛 intercept and slope values that
minimize the sum of squared residuals,
−2 ෍ 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 ∗ 𝑥𝑖 = 0 thereby giving us the line of best fit.
𝑖=1
Estimating Linear Regression Coefficients
Involves finding the values of a & b in a 2. Calculate the Slope:
regression model that minimize the sum of
squared residuals (the differences between
observed and predicted values) σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ ∗(𝑦𝑖 −𝑦ҧ)
b= σ𝑛 2
𝑌 𝑋 = 𝑎 + 𝑏𝑋 𝑖=1 𝑥𝑖 −𝑥ҧ

1. Calculate the means of X and Y: This formula is derived from the least
squares criterion, ensuring that the line
𝑛
1 minimizes the sum of squared residuals
𝑥ҧ = ෍ 𝑥𝑖
𝑛 3. Calculate the intercept (a):
𝑖=1
𝑎 = 𝑦ത − 𝑏𝑥ҧ
𝑛
1 σ 𝑦𝑖 − 𝑏 ∗ σ 𝑥𝑖
𝑦ҧ = ෍ 𝑦𝑖 𝑎=
𝑛
𝑖=1 𝑛
Example b=
σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ ∗(𝑦𝑖 −𝑦ҧ)
σ𝑛𝑖=1 𝑥𝑖 −𝑥ҧ
2
For the following data set, calculate the
slope and intercept for the best-fit line 1. Calculate X average: 3
2. Calculate X Deviation: -2,-1,0,1,2
X Y
3. Calculate Y average: 4
1 2
4. Calculate Y deviation: -2,0,1,0,1
2 4
5. σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ ∗ (𝑦𝑖 − 𝑦ҧ) : 6
3 5
6. Calculate σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 2 = 10
4 4
7. b = 0.6
5 5
Example σ 𝑦𝑖 − 𝑏 ∗ σ 𝑥𝑖
𝑎=
For the following data set, calculate the 𝑛
slope and intercept for the best-fit line
1. Calculate σ 𝑥𝑖 : 15
X Y 2. Calculate b* σ 𝑥𝑖 : 9
1 2 3. Calculate σ 𝑦𝑖 : 20
2 4 4. a = 2.2
3 5
𝑌 𝑋 = 𝑎 + 𝑏𝑋
4 4
5 5 𝑌 𝑋 = 2.2 + 0.6𝑋
Measuring the Variability of Results
Total Variability

➢ “Sy” measures the overall spread of Y


values around the mean “𝑦ҧ”
➢ How much of the variability in “Y”
can be attributed to its relationship
with X Sy calculates how much variation exists in the
➢ versus how much is just random dependent variable y, which is useful for
variation around the mean. assessing the fit of a regression model.

𝑆𝑦 =
𝑆𝑦𝑦 ↑ Sys high,
=

Syy​ = ∑​(yi​−𝑦​ത )2
𝑛−1 Total sum of squares for the dependent
variable
Measuring the Variability of Results
Variability About the Regression Line How to calculate (SY∣X​) the variability
(SY∣X​) – Standard Error of Estimate: about the regression line?

- SY∣X Measures how well the regression 𝑆𝑦𝑦 −𝑏𝑆𝑥𝑦


line fits the data. 𝑆(𝑌 ∣ 𝑋) =
𝑛 −2
- SY∣X is small: points are close to the
regression line The formula is derived from the
- 𝑦(x)
ො the variability of observed Y value residuals of the regression and
around the regression line predicted measures the spread of the data points
for each X value: around the fitted regression line.

𝑦ො 𝑥 = 𝑎 + 𝑏𝑥
Measuring the Variability of Results
Calculate SY∣X​ using the formula SXY: The sum of the multiplication of the
deviations of X and Y from their means
𝑆𝑦𝑦 −𝑏∗𝑆𝑥𝑦 𝑛
𝑆(𝑌 ∣ 𝑋) =
𝑛 −2 𝑆𝑋𝑌 = ෍ 𝑥𝑖 − 𝑥ҧ ∗ 𝑦𝑖 − 𝑦ത
𝑖=1
Syy: the total sum of squares for the
dependent variable Y (total variation) b: Regression line slope

𝑛 𝑆𝑥𝑦 σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ ∗(𝑦𝑖 −𝑦ҧ)
b= = σ𝑛 2
𝑆𝑦𝑦 = ෍ 𝑦𝑖 − 𝑦ത 2 𝑆𝑥𝑥 𝑖=1 𝑥𝑖 −𝑥ҧ

𝑖=1
n: number of observations
Correlation vs. Regression Analysis
Correlation measures the:
Regression Analysis:
▪ Strength
Concerned with predicting the LEVEL of
relationship between dependent ▪ Direction
variable Y for independent variable X of the linear relationship between two
variables.
Correlation Analysis:
Often quantified by the Pearson
Concerned with the STRENGTH of correlation coefficient
relationship between Y and X
Ranges from -1 (perfect negative
correlation) to +1 (perfect positive
correlation), with 0 indicating no linear
relationship.
Correlation vs. Regression
Correlation vs. Regression
Correlation
𝑛
Sample Correlation Coefficient:
𝑆𝑥𝑥 = ෍ 𝑥𝑖 − 𝑥ҧ 2
𝑆𝑥𝑦
𝑟= 𝑖=1
𝑆𝑥𝑥 ∗ 𝑆𝑦𝑦
𝑛 σ𝑛𝑖=1[ 𝑥𝑖 − 𝑥ҧ ∗ 𝑦𝑖 − 𝑦ത ]
𝑟= 𝑛
σ𝑖=1 𝑥𝑖 − 𝑥ҧ ∗ σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത
𝑆𝑋𝑌 = ෍ 𝑥𝑖 − 𝑥ҧ ∗ 𝑦𝑖 − 𝑦ത
𝑖=1
𝑛 To obtain the Population Correlation
Coefficient "𝜌”, we use the Fisher Z
2
𝑆𝑦𝑦 = ෍ 𝑦𝑖 − 𝑦ത transformation
𝑖=1
Example
Data for Pressure and flow rate: - Calculate Intercept (a):
σ yi−b∗σ xi
Pressure (x) 5, 6, 7, 8, 9, and 10 a= = -58.14
n
Flowrate (y) 14, 25, 70, 85, 49, and 105
- Line of Best Fit:
Calculate the line-best fit, correlation Y = −58.14 + 15.49X
coefficient (r), and explain what does it
mean.
- Calculate Correlation Coefficient (r):
- Calculate Slope (b): σ𝑛𝑖=1[ 𝑥𝑖 − 𝑥ҧ ∗ 𝑦𝑖 − 𝑦ത ]
σ𝑛 𝑟= 𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ ∗(𝑦𝑖 −𝑦ҧ) σ𝑖=1 𝑥𝑖 − 𝑥ҧ ∗ σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത
b= σ𝑛 2 = -15.49
𝑖=1 𝑥𝑖 −𝑥ҧ
271.0
𝑟= = 0.824
329.1
Example
What r Represents:
• The correlation coefficient (r) measures the strength and direction of the
linear relationship between pressure and flowrate:
• r = 0.824 indicates a strong positive correlation, meaning as the pressure
increases, the flowrate tends to increase as well.
• The value is close to 1, suggesting that the data points are relatively well
aligned with the line of best fit.

You might also like