0% found this document useful (0 votes)
14 views

MathEng5-M - Part 5

Uploaded by

Leonard Abarra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

MathEng5-M - Part 5

Uploaded by

Leonard Abarra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

PART 5: Curve Fitting

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 1


CURVE FITTING
Techniques
1. To fit curves to a given data to obtain intermediate estimates
2. To derive a simpler version of a complicated function to fit a number of discrete values
computed along the range of interest.
General Approach
1. Where the data exhibits a significant degree of error or “noise”, the strategy is to derive
a single curve that represents the general trend of the data. Example: least-squares
regression
2. Where the data is known to be very precise, the basic approach is to fit a curve or a
series of curves that pass directly through each of the points. The estimation of values
between well-known discrete points is called interpolation.

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 2


CURVE FITTING AND ENGINEERING PRACTICE
1. Trend analysis – represents the process of using the pattern of the data to make
predictions. This technique may be used to predict or forecast values of the dependent
variable by either extrapolation or interpolation.
• For cases where data are measured with high precision, utilize interpolating
polynomials
• For imprecise data – analyzed with least-squares regression
2. Hypothesis testing – an existing mathematical model is compared with the measured
data.
• Compare predicted values of the model with the observed values to test the adequacy
of the model (if estimates of the model coefficients are already available).

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 3


SIMPLE STATISTICS
1. Arithmetic Mean 4. Another form of variance
σ 𝒚𝒊
𝑦ത = (estimated mean) 2 σ 𝑦𝑖 2 = σ 𝑦𝑖 2 /𝑛
𝒏 𝑆𝑦 =
𝑛−1
𝑖 = 1 𝑡𝑜 𝑛 5. Coefficient of variation – ratio of the standard
2. Standard deviation deviation to the mean
𝑆𝑡 σ 𝑦𝑖 −𝑦ത 2 𝑆𝑦
𝑆𝑦 = = 𝑐. 𝑣. = 100%
𝑛−1 𝑛−1 𝑦ത

3. Variance
2 σ 𝑦𝑖 −𝑦ത 2
𝑆𝑦 =
𝑛−1

𝑛 − 1 = 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 4


THE NORMAL DISTRIBUTION (HISTOGRAM)

Normally distributed
𝑦ത − 𝑆𝑦 𝑡𝑜 𝑦ത + 𝑆𝑦 - comprises of 68% of
the total measurements
𝑦ത − 2𝑆𝑦 𝑡𝑜 𝑦ത + 2𝑆𝑦 - comprises of 95%
of the total measurements

Figure 5.1: A histogram used to depict the


distribution of data. As the number of data points
increases, the histogram could approach the smooth,
bell-shaped curved called the normal distribution.

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 5


ESTIMATION OF CONFIDENCE INTERVALS
1. Characterization of Properties of Population
Statistical inference – properties of the unknown population are inferred from a limited sample
Estimation – because the results are always reported as estimates of the population parameters
Central tendency (sample mean) *  - true mean
Spread (standard deviation, variance) *  - standard deviation
2. Interval estimator – gives the range of values within which the parameter is expected to lie with a given probability.
One-sided interval – expresses our confidence that the parameter estimate is less than or greater than the true value
Two-sided interval – deals with the more general proposition that the estimate agrees with the truth with no
consideration to the sign of discrepancy
𝑃 𝐿 ≤𝜇 ≤𝑈 =1−𝛼  = significance level
“the probability that the true mean of y, , falls within the bound from L to U is 1 - ”

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 6


ESTIMATION OF CONFIDENCE INTERVALS
3. Standard Normal Estimate

𝑦−𝜇 mean = 0
𝑧ҧ = ቅ → normally distributed
𝜎/ 𝑛 variance = 1
𝑧𝛼/2 = standard normal random variable
𝐿≤𝜇≤𝑈 → probability of 1 − α
𝜎 𝜎
𝐿 = 𝑦ത − 𝑧 𝑈 = 𝑦ത + 𝑧
𝑛 𝛼/2 𝑛 𝛼/2
𝑆𝑦 𝑆𝑦 ത
𝑦−𝜇
𝐿 = 𝑦ത − 𝑡 𝑈 = 𝑦ത + 𝑡 𝑡=
𝑛 𝛼/2,𝑛−1 𝑛 𝛼/2,𝑛−1 𝑆𝑦 / 𝑛

𝑡𝛼/2,𝑛−1 = standard normal random variable for the t – distribution with a probability of α/2

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 7


Example 1:
Given the data below, determine (a) the
mean, (b) the standard deviation, (c) the
variance, (d) the coefficient of variation,
and (e) the 96% confidence interval for
the mean. Construct a histogram from the
data using a range from 7.5 to 11.5 with
intervals of 0.5.

8.8 9.5 9.8 9.4 10.0


9.4 10.1 9.2 11.3 9.4
10.0 10.4 7.9 10.4 9.8
9.8 9.5 8.9 8.8 10.6
10.1 9.5 9.6 10.2 8.9

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 8


SOLUTION:
• Mean • The 96 % confidence interval for the mean
241.3 𝛼 = 100% − 96% = 4%
𝑦ത = = 9.652
25 𝑡𝛼Τ2,𝑛−1 = 𝑡0.02,24 = 2.20333
• Standard deviation 𝑆𝑦
𝐿 = 𝑦ത − 𝑡
σ 𝑦𝑖 −𝑦ത 2 𝑛 𝛼Τ2,𝑛−1
11.86
𝑆𝑦 = = = 0.7029698903 𝐿= 9.652 −
0.7029698903
2.20333
𝑛−1 25−1 25
• Variance 𝐿 = 9.34222507
σ 𝑦𝑖 −𝑦ത 2 𝑆𝑦
2 11.86 593 𝑈 = 𝑦ത + 𝑡
𝑆𝑦 = = = = 0.4941666667 𝑛 𝛼Τ2,𝑛−1
𝑛−1 25−1 1200 0.7029698903
• Coefficient of variation 𝑈= 9.652 + 25
2.20333
𝑆𝑦 0.7029698903 𝑈 = 9.96177493
𝑐. 𝑣. = 100% = 100% = 7.28%
𝑦ത 9.652 9.34222507 ≤ 𝜇 ≤ 9.96177493

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 9


11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 10
LEAST-SQUARES REGRESSION
• Linear Regression – fitting a straight line to a set of paired observations
𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , … . ., 𝑥𝑛 , 𝑦𝑛
𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑒 → 𝑒 = 𝑦 − 𝑎0 − 𝑎1 𝑥
𝑎0 = 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝑎1 = 𝑠𝑙𝑜𝑝𝑒 𝑒 = 𝑒𝑟𝑟𝑜𝑟
e = residual between the model and the observations
• Criteria for a “Best” Fit
σ𝑛𝑖=1 𝑒𝑖 = σ𝑛𝑖=1 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 → inadequate criterion
σ𝑛𝑖=1 𝑒𝑖 = σ𝑛𝑖=1 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 → inadequate criterion

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 11


LEAST-SQUARES REGRESSION

• Minimax criterion – a line is chosen that minimizes the maximum distance that an
individual point falls from the line.
a. Ill-suited for regression
b. Well-suited for fitting a simple function to a complicated function
2
𝑆𝑟 = σ𝑛𝑖=1 𝑒𝑖 2 = σ𝑛𝑖=1 𝑦𝑖,𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 − 𝑦𝑖,𝑚𝑜𝑑𝑒𝑙

𝑆𝑟 = σ𝑛𝑖=1 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 2

Sr = sum of the squares of the residuals

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 12


LEAST-SQUARES FIT OF A STRAIGHT LINE
1. To determine values of a0 and a1 4. Solving for a1 and ao
𝜕𝑆𝑟 𝑛 σ 𝑥𝑖 𝑦𝑖 −σ 𝑥𝑖 σ 𝑦𝑖
= −2 σ 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 𝑎1 =
𝜕𝑎0 𝑛 σ 𝑥𝑖 2 − σ 𝑥𝑖 2
𝜕𝑆𝑟 𝑎0 = 𝑦ത − 𝑎1 𝑥ҧ
= −2 σ 𝑥𝑖 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 𝑦,
ത 𝑥ҧ = means of y and x respectively
𝜕𝑎1

2. Equating partial derivatives to zero (minimum Sr)


Quantification of Error of Linear Regression
0 = σ 𝑦𝑖 − σ 𝑎0 − σ 𝑎1 𝑥𝑖
0 = σ 𝑦𝑖 𝑥𝑖 − σ 𝑎0 𝑥𝑖 − σ 𝑎1 𝑥𝑖 2 𝑆𝑟 = σ𝑛𝑖=1 𝑒𝑖 2 = σ𝑛𝑖=1 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 2

3. σ 𝑎0 = 𝑛𝑎0
𝑛𝑎0 + σ 𝑥𝑖 𝑎1 = σ 𝑦𝑖
σ 𝑥𝑖 𝑎0 + σ 𝑥𝑖 2 𝑎1 = σ 𝑥𝑖 𝑦𝑖

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 13


• Criteria 𝑟2 =
𝑆𝑡 −𝑆𝑟
(coefficient of determination)
𝑆𝑡
1. Where the spread of the points around the
line is of similar magnitude along the entire r = correlation coefficients
range of the data 2
𝑆𝑡 = σ 𝑦𝑖 − 𝑦ത
2. Where the distribution of these points about 2
the line is normal 𝑆𝑟 = σ 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖

• If the above criteria are met least-squares 𝑆𝑟 = 0


ൠ → perfect fit
regressions will provide the best estimates of a0 𝑟2 = 𝑟 = 1
and a1 (Draper and Smith, 1981) – minimum 𝑟 = 𝑟2 = 0
likelihood principle in Statistics ൠ → no improvement
𝑆𝑟 = 𝑆𝑡
𝑆𝑟
𝑆𝑦Τ𝑥 = (also called standard deviation) • For computer application
𝑛−2
𝑛 σ 𝑥𝑖 𝑦𝑖 − σ 𝑥𝑖 σ 𝑦𝑖
𝑆𝑦Τ𝑥 = standard error of the estimate 𝑟=
𝑛 σ 𝑥𝑖 2 − σ 𝑥𝑖 2 𝑛 σ 𝑦𝑖 2 − σ 𝑦𝑖 2
(quantifies the spread of the data around the
regression line) If 𝑆𝑦Τ𝑥 < 𝑆𝑟 (the linear regression model has merit)
𝑛−2 = degrees of freedom

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 14


Example 2:
Use least-squares regression to fit a straight line to x and y values given below.
Along with the slope and intercept, compute the standard error of the estimate
and the correlation coefficient. Plot the data and the regression line.

x 0 2 4 6 9 11 12 15 17 19

y 5 6 7 6 9 8 7 10 12 12

𝑛 = 10 σ 𝑦𝑖 = 82 𝑛 σ 𝑥𝑖 𝑦𝑖 −σ 𝑥𝑖 σ 𝑦𝑖 10 911 − 95 82
𝑎1 = = = 0.3524699599
𝑛 σ 𝑥𝑖 2 − σ 𝑥𝑖 2 10 1277 − 95 2
95
σ 𝑥𝑖 𝑦𝑖 = 911 𝑥ҧ = 10 = 9.5
82 𝑎0 = 𝑦ത − 𝑎1 𝑥ҧ = 8.2 − 0.3524699599 9.5 = 4.851535381
σ 𝑥𝑖 2 = 1277 𝑦ത = 10 = 8.2
σ 𝑥𝑖 = 95

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 15


11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 16
Therefore, the least square fit is:
𝑦 = 4.851535381 + 0.3524699599𝑥 Least-Squares Fit of a Straight Line
14
Standard error of the estimate
12
𝑆𝑟 σ 𝒚𝒊 −𝒂𝟎 −𝒂𝟏 𝒙𝒊 𝟐
𝑆𝑦Τ𝑥 = = 10
𝑛−2 𝑛−2

9.073965287 8
𝑆𝑦Τ𝑥 = = 1.0650097
10−2
6
Correlation coefficient
4
𝑆𝑡 −𝑆𝑟 σ 𝒚𝒊 −ഥ
𝒚 𝟐 −σ𝒚𝒊 −𝒂𝟎 −𝒂𝟏 𝒙𝒊 𝟐
𝑟2 = = 2
𝑆𝑡 𝒚 𝟐
σ 𝒚𝒊 −ഥ
55.60−9.073965287
𝑟2 = 0
0 5 10 15 20
55.60
𝑟 2 = 0.8367991855
𝒓 = 𝟎. 𝟗𝟏𝟒𝟕𝟔𝟕𝟐𝟖𝟒𝟗

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 17


LINEARIZATION OF NONLINEAR RELATIONSHIP
1. Exponential model
𝑦 = 𝛼1 𝑒 𝛽1 𝑥
𝛼1 , 𝛽1 = constants +𝛽 = increase −𝛽 = decrease
Example: population growth, radioactive decay
𝛽1 ≠ 0 → nonlinear relationship between y and x
2. Power equation
𝑦 = 𝛼2 𝑥 𝛽2 → widely applicable in all fields of engineering
𝛼2 , 𝛽2 = contant coefficients
𝛽2 ≠ 0 𝑜𝑟 1 → nonlinear

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 18


LINEARIZATION OF NONLINEAR RELATIONSHIP

3. Saturation growth rate


Well-suited for characterizing population growth rate under limiting conditions
Represents a nonlinear relationship between y and x that levels off, or “saturates”
as x increases
𝑥
𝑦 = 𝛼3 𝛼3 , 𝛽3 = constant coefficients
𝛽3 +𝑥

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 19


LINEARIZATION OF NONLINEAR RELATIONSHIP
Determination of Constant Coefficients
• 𝑦 = 𝛼1 𝑒 𝛽1𝑥 → ln 𝑦 = ln 𝛼1 + 𝛽1 𝑥 ln 𝑒
ln 𝑦 = ln 𝛼1 + 𝛽1 𝑥
ln 𝛼1 = 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝛽1 = 𝑠𝑙𝑜𝑝𝑒
• 𝑦 = 𝛼2 𝑥 𝛽2 → log 𝑦 = 𝛽2 log 𝑥 + log 𝛼2
𝛽2 = 𝑠𝑙𝑜𝑝𝑒 log 𝛼2 = 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
𝑥 1 𝛽 1 1
• 𝑦 = 𝛼3 → = 3 +
𝛽3 +𝑥 𝑦 𝛼3 𝑥 𝛼3
𝛽3 1
= 𝑠𝑙𝑜𝑝𝑒 = 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
𝛼3 𝛼3

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 20


LINEARIZATION OF NONLINEAR RELATIONSHIP
General Comments on Linear Regression
Assumptions:
1. Each x has a fixed value; it is not random and is known without error
2. The y values are independent random variables and all have the same
variance
3. The y values for a given x must be normally distributed
• x values must be error-free
• The regression of y versus x is not the same as x versus y

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 21


Example 3:
Fit the following data with (a) a saturation-growth-rate model, (b) a power equation, and (c) a parabola. In
each case, plot the data and the equation.

x 0.75 2 3 4 6 8 8.5
y 1.2 1.95 2 2.4 2.4 2.7 2.6

a) Saturation-growth rate
𝑥 78 𝑥
𝑦 = 𝛼3 𝛽 +𝑥 x y 1/x 1/y 𝑦=
5 6+5𝑥
3
1 𝛽3 1 1
𝑦
= 𝛼3 𝑥
+𝛼 0.75 1.20 1.3333333333 0.8333333333 1.2000000000
3
From the table, it yields 2.00 1.95 0.5000000000 0.5128205128 1.9500000000
1 5 1 25
𝑦
= 13 𝑥
+ 78 3.00 2.00 0.3333333333 0.5000000000 2.2285714286
78
𝛼3 = 25 = 3.12 4.00 2.40 0.2500000000 0.4166666667 2.4000000000
6
𝛽3 = 5 6.00 2.40 0.1666666667 0.4166666667 2.6000000000
78 𝑥 78 5𝑥
𝑦 = 25 6 = 25 8.00 2.70 0.1250000000 0.3703703704 2.7130434783
+𝑥 6+5𝑥
5
78 𝑥 8.50 2.60 0.1176470588 0.3846153846 2.7340206186
𝑦= 5 6+5𝑥

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 22


GRAPH FOR SATURATION-GROWTH RATE METHOD
Graph Before Regression Graph After Regression
3.00 4

2.50 3

3
2.00
2
1.50
2
1.00
1
0.50 1

0.00 0
0.00 2.00 4.00 6.00 8.00 10.00 0.00 2.00 4.00 6.00 8.00 10.00

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 23


b) Power equation From the table, it yields,
𝑦 = 𝛼2 𝑥 𝛽2 log 𝑦 = 0.4949972836 log 𝑥 + 0.1410255812
log 𝑦 = 𝛽2 log 𝑥 + log 𝛼2

Xi Yi 𝐥𝐨𝐠 𝒙𝒊 𝐥𝐨𝐠 𝒚𝒊 Y 𝛽2 = 0.4949972836


0.75 1.20 -0.1249387366 0.0791812460 1.1999999994 log 𝛼2 = 0.1410255812
2.00 1.95 0.3010299957 0.2900346114 1.9499999993 𝛼2 = 100.1410255812
3.00 2.00 0.4771212547 0.3010299957 2.3834130125 𝛼2 = 1.383647877
4.00 2.40 0.6020599913 0.3802112417 2.7481702971 𝑦 = 1.383647877𝑥 0.4949972836
6.00 2.40 0.7781512504 0.3802112417 3.3589871021 𝒚 = 𝟏. 𝟑𝟖𝟒𝒙𝟎.𝟒𝟗𝟓
8.00 2.70 0.9030899870 0.4313637642 3.8730461459

8.50 2.60 0.9294189257 0.4149733480 3.9910339727

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 24


GRAPH FOR POWER EQUATION METHOD

Graph Before Regression Graph After Regression


3.00 5
4
2.50
4
2.00 3
3
1.50
2
1.00 2
1
0.50
1
0.00 0
0.00 2.00 4.00 6.00 8.00 10.00 0.00 2.00 4.00 6.00 8.00 10.00

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 25


c) Parabola
𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐
n = number of data points
To solve for the constant coefficients a, b,
and c: By substitution to the above equations:
σ 𝑥 2 𝑎 + σ 𝑥 𝑏 + 𝑛𝑐 = σ 𝑦 201.8125𝑎 + 32.25𝑏 + 7𝑐 = 15.25
σ 𝑥 3 𝑎 + σ 𝑥 2 𝑏 + σ 𝑥 𝑐 = σ 𝑥𝑦 1441.546875𝑎 + 201.8125𝑏 + 32.25𝑐 = 78.5
σ 𝑥 4 𝑎 + σ 𝑥 3 𝑏 + σ 𝑥 2 𝑐 = σ 𝑥 2𝑦
10965.37890625𝑎 + 1441.546875𝑏 + 201.8125𝑐 = 511.92

xi yi x2 x3 x4 xy x2y
0.75 1.20 0.5625 0.421875 0.31640625 0.9 0.675
2.00 1.95 4.0000 8.000000 16.00000000 3.9 7.800
3.00 2.00 9.0000 27.000000 81.00000000 6.0 18.000
4.00 2.40 16.0000 64.000000 256.00000000 9.6 38.400
6.00 2.40 36.0000 216.000000 1296.00000000 14.4 86.400
8.00 2.70 64.0000 512.000000 4096.00000000 21.6 172.800
8.50 2.60 72.2500 614.125000 5220.06250000 22.1 187.850
32.25 15.25 201.8125 1441.546875 10965.37890625 78.5 511.925

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 26


11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 27
GRAPH FOR PARABOLA METHOD

Graph Before Regression Graph After Regression


3.00 3

2.50 3

2.00 2

1.50 2

1.00 1

0.50 1

0.00 0
0.00 2.00 4.00 6.00 8.00 10.00 0.00 2.00 4.00 6.00 8.00 10.00

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 28


POLYNOMIAL REGRESSION
a) For second order polynomial or quadratic c) Equating the above equations to zero
2
𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 +𝑒 𝑛𝑎0 + σ 𝑥𝑖 𝑎1 + σ 𝑥𝑖 𝑎2 = σ 𝑦𝑖
𝑆𝑟 = σ𝑛𝑖=1 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖 2 2 σ 𝑥𝑖 𝑎0 + σ 𝑥𝑖 2 𝑎1 + σ 𝑥𝑖 3 𝑎2 = σ 𝑥𝑖 𝑦𝑖
σ 𝑥𝑖 2 𝑎0 + σ 𝑥𝑖 3 𝑎1 + σ 𝑥𝑖 4 𝑎2 = σ 𝑥𝑖 2 𝑦𝑖
b) To solve for a0 , a1 , and a2
𝑖 = 1 𝑡𝑜 𝑛
𝜕𝑆𝑟
= −2 σ 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖 2 d) For m dimensions equations
𝜕𝑎0
𝜕𝑆𝑟 𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + ⋯ + 𝑎𝑚 𝑥 𝑚 + 𝑒
= −2 σ 𝑥𝑖 𝑦𝑖 −𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖 2
𝜕𝑎1
𝑆𝑟
𝜕𝑆𝑟 𝑆𝑦Τ𝑥 =
𝑛−(𝑚+1)
= −2 σ 𝑥𝑖 2 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖 2
𝜕𝑎2

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 29


POLYNOMIAL REGRESSION
Algorithm for Polynomial Regression
1. Input order of polynomial to be fit, m
2. Input number of data points, n
3. If (n < m + 1), print out an error message that regression is impossible and terminate
the process. If (n  m + 1), continue
4. Compute the elements of the normal equation in the form of an augmented matrix.
5. Solve the augmented matrix for the coefficients a0, a1, a2, …., am using elimination
method.
6. Print out the coefficients

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 30


Example 4:
Fit a second-order polynomial to the data given below.
𝑥𝑖 0 1 2 3 4 5
𝑦𝑖 2.1 7.7 13.6 27.2 40.9 61.1
Solution:

𝒏 𝒙𝒊 𝒚𝒊 𝒙𝒊 𝒚 𝒊 𝒙𝒊 𝟐 𝒚𝒊 𝒙𝒊 𝟐 𝒙𝒊 𝟑 𝒙𝒊 𝟒 ഥ
𝒚𝒊 − 𝒚 𝟐 𝑺𝒓 𝒚
1 0 2.1 0 0 0 0 0 544.44444 0.14332 2.47857
2 1 7.7 7.7 7.7 1 1 1 314.47111 1.00286 6.69857
3 2 13.6 27.2 54.4 4 8 16 140.02778 1.08160 14.64000
4 3 27.2 81.6 244.8 9 27 81 3.12111 0.80487 26.30286
5 4 40.9 163.6 654.4 16 64 256 239.21778 0.61959 41.68714
6 5 61.1 305.5 1527.5 25 125 625 1272.11111 0.09434 60.79286

 15 152.6 585.6 2488.8 55 225 979 2513.39333 3.74657

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 31


From the table,
𝑛=6 𝑚=2
σ 𝑥𝑖 = 15 σ 𝑦𝑖 = 152.6
σ 𝑥𝑖 𝑦𝑖 = 585.6 σ 𝑥𝑖 2 𝑦𝑖 = 2488.8
σ 𝑥𝑖 2 = 55 σ 𝑥𝑖 3 = 225
σ 𝑥𝑖 4 = 979
The equations are:
6𝑎0 + 15𝑎1 + 55𝑎2 = 152.6
15𝑎0 + 55𝑎1 + 225𝑎2 = 585.6
55𝑎0 + 225𝑎1 + 979𝑎2 = 2488.8

The standard error of the estimate is:


The resulting equation is:
𝑆𝑟 3.74657
𝑦 = 2.478571 + 2.359286𝑥𝑖 + 1.860714𝑥𝑖 2 𝑆𝑦Τ𝑥 = 𝑛−(𝑚+1)
= 6− 2+1
= 1.117523
𝑆𝑟 = σ𝑛𝑖=1 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖 2 2 = 3.74657 The coefficient of determination is:
𝑆𝑡 = 𝒚𝒊 − 𝒚 ഥ 𝟐 = 2513.39333 𝑆𝑡 −𝑆𝑟 2513.39333−3.74657
𝑟2 = = = 0.998509
𝑆𝑡 2513.39333
𝑟 = 0.999254

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 32


These results indicate that 99.8509 percent of the original uncertainty has been
explained by the model. This result supports the conclusion that the quadratic
equation represents an excellent fit.

Graph before Regression Graph after Regression


70 70

60 60

50 50

40 40

30 30

20 20

10 10

0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 33


MULTIPLE LINEAR REGRESSIONS
a) General equation
𝑦 = 𝑎0 + 𝑎1 𝑥1𝑖 + 𝑎2 𝑥2𝑖 + 𝑒
b) Sum of the squares of the residuals
𝑆𝑟 = σ𝑛𝑖=1 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 2

c) Differentiating with respect to each unknown coefficients


𝜕𝑆𝑟
= −2 σ 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖
𝜕𝑎0
𝜕𝑆𝑟
= −2 σ 𝑥1𝑖 𝑦𝑖 −𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖
𝜕𝑎1
𝜕𝑆𝑟
= −2 σ 𝑥2𝑖 2 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖
𝜕𝑎2

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 34


MULTIPLE LINEAR REGRESSIONS
d) Setting the above equations to zero
𝑛 σ 𝑥1𝑖 σ 𝑥2𝑖 𝑎1 σ 𝑦𝑖
σ 𝑥1𝑖 σ 𝑥1𝑖 2 σ 𝑥1𝑖 σ 𝑥2𝑖 𝑎2 = σ 𝑥1𝑖 𝑦𝑖
σ 𝑥2𝑖 σ 𝑥1𝑖 σ 𝑥2𝑖 σ 𝑥2𝑖 2 𝑎3 σ 𝑥21 𝑦𝑖
e) For m dimensions equations
𝑦 = 𝑎0 + 𝑎1 𝑥1 + 𝑎2 𝑥2 + ⋯ + 𝑎𝑚 𝑥𝑚 + 𝑒
𝑆𝑟
𝑆𝑦Τ𝑥 =
𝑛−(𝑚+1)

e) General form for multiple linear regression


𝑦 = 𝑎0 𝑥1 𝑎1 𝑥2 𝑎2 … 𝑥𝑚 𝑎𝑚
f) For the transformations
log 𝑦 = log 𝑎0 + 𝑎1 log 𝑥1 + 𝑎2 log 𝑥2 + ⋯ + 𝑎𝑚 log 𝑥𝑚

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 35


GENERAL LINEAR LEAST SQUARES
a) General matrix formulation for linear least-squares
𝑦 = 𝑎0 𝑧0 + 𝑎1 𝑧1 + 𝑎2 𝑧2 + ⋯ + 𝑎𝑚 𝑧𝑚 + 𝑒
b) 𝑧0 , 𝑧1 , … , 𝑧𝑚 = 𝑚 + 1 basis functions
𝑧0 = 1, 𝑧1 = 𝑥1 , 𝑧2 = 𝑥2 , … , 𝑧𝑚 = 𝑥𝑚
𝑧0 = 𝑥 0 = 1
𝑧1 = 𝑥 → simple monomials
𝑧2 = 𝑥 2 → polynomial regression
𝑧𝑚 = 𝑥 𝑚
c) If z’s are sinusoids
𝑦 = 𝑎0 + 𝑎1 cos 𝜔𝑡 + 𝑎2 sin 𝜔𝑡 → Fourier analysis

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 36


GENERAL LINEAR LEAST SQUARES

d) For simple looking model


𝑓 𝑥 = 𝑎0 1 − 𝑒 −𝑎1 𝑥 → nonlinear
e) In matrix notation
𝑌 = 𝑍 𝐴 + 𝐸
𝑧01 𝑧11 … 𝑧𝑚1
𝑧02 𝑧12 ⋯ 𝑧𝑚2
𝑍 =
⋮ ⋮ ⋯ ⋮
𝑧0𝑛 𝑧1𝑛 ⋯ 𝑧𝑚𝑛
m = number of variables n = number of data points

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 37


GENERAL LINEAR LEAST SQUARES
f) Because n  (m + 1), 𝑧 is not a square matrix
𝑌 𝑇 = 𝑦1 , 𝑦2 , … , 𝑦𝑛
𝑇
𝐴 = 𝑎1 , 𝑎2 , … , 𝑎𝑛
𝑇
𝐸 = 𝑒1 , 𝑒2 , … , 𝑒𝑛
2
𝑆𝑟 = σ𝑛𝑖=1 𝑦𝑖 − σ𝑚
𝑗=1 𝑎𝑗 𝑧𝑗𝑖

❖ take partial derivative with respect to each of the coefficients and set the resulting equations to zero
𝑇 𝑇
𝑍 𝑍 𝐴 = 𝑍 𝑌
g) Solution techniques
1. LU decomposition including Gauss elimination
2. Cholesky’s method
3. Matrix inversion

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 38


STATISTICAL ASPECT OF LEAST-SQUARES THEORY
Aside from yielding a solution for the regression coefficients, the matrix formulation also provides
estimates of their statistics. It can be shown (Draper and Smith, 1981) that the diagonal and off-
𝑇 −1
diagonal terms of the matrix 𝑍 𝑍 give, respectively, the variances and the covariances of the
−1
a’s. If the diagonal elements of 𝑍 𝑇 𝑍 are designated as 𝑧𝑖𝑗 −1 , 𝑣𝑎𝑟 𝑎𝑖−1 = 𝑧𝑖𝑗 −1 𝑆𝑦Τ𝑥 2 and
𝑐𝑜𝑣 𝑎𝑖−1 , 𝑎𝑗−𝑖 = 𝑧𝑖𝑗 −1 𝑆𝑦Τ𝑥 2
These statistics have a number of important applications. These can be used to develop confidence
intervals for the intercept and slope.
𝐿 = 𝑎0 − 𝑡𝛼Τ2,𝑛−2 𝑠 𝑎0 𝑈 = 𝑎0 + 𝑡𝛼Τ2,𝑛−2 𝑠 𝑎0

where 𝑠 𝑎𝑗 = the standard error of coefficient 𝑎𝑗 = 𝑣𝑎𝑟 𝑎𝑗 . In similar manner, lower and upper
bounds on the slope can be formulated as
𝐿 = 𝑎1 − 𝑡𝛼Τ2,𝑛−2 𝑠 𝑎1 𝑈 = 𝑎1 + 𝑡𝛼Τ2,𝑛−2 𝑠 𝑎1

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 39


11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 40
11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 41
11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 42
NONLINEAR REGRESSION
𝑓 𝑥 = 𝑎0 1 − 𝑒 −𝑎1𝑥 + 𝑒

The above equation cannot be manipulated so that it conforms to the general form of matrix
formulation for linear least squares. For the nonlinear case, the above equation can be solved
in an iterative fashion.

The Gauss-Newton method is one algorithm for minimizing the sum of the squares of the
residuals between data and nonlinear equations. The key concept underlying the technique is
that a Taylor series expansion is used to express the original nonlinear equation in an
approximate, linear form. Then, least-squares theory can be used to obtain new estimates of
the parameters that move in the direction of minimizing the residual.

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 43


NONLINEAR REGRESSION
❖ First, the relationship between the nonlinear equation and the data can be express
generally as

𝑦𝑖 = 𝑓 𝑥𝑖 : 𝑎0 , 𝑎1 , … , 𝑎𝑚 + 𝑒𝑖

where 𝑦𝑖 = a measured value of the dependent variable, 𝑓 𝑥𝑖 : 𝑎0 , 𝑎1 , … , 𝑎𝑚 = the


equation that is a function of the independent variable 𝑥𝑖 and a nonlinear function of the
parameters 𝑎0 , 𝑎1 , … , 𝑎𝑚 and 𝑒𝑖 = a random error.

❖ For convenience, this model can be expressed in abbreviated form by omitting the
parameters

𝑦𝑖 = 𝑓 𝑥𝑖 + 𝑒𝑖
11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 44
NONLINEAR REGRESSION
The nonlinear model can be expanded in a Taylor series around the parameter values and
curtailed after the first derivative. Example: for a two-parameter case
𝜕𝑓 𝑥𝑖 𝑗 𝜕𝑓 𝑥𝑖 𝑗
𝑓 𝑥𝑖 𝑗+1 = 𝑓 𝑥𝑖 𝑗+ ∆𝑎0 + ∆𝑎1
𝜕𝑎0 𝜕𝑎1

where j = the initial guess, j + 1 = the prediction, ∆𝑎0 = 𝑎0,𝑗+1 − 𝑎0,𝑗 and ∆𝑎1 =
𝑎1,𝑗+1 = 𝑎1,𝑗
𝜕𝑓 𝑥𝑖 𝑗 𝜕𝑓 𝑥𝑖 𝑗
𝑦𝑖 − 𝑓 𝑥𝑖 𝑗 = ∆𝑎0 + ∆𝑎1 + 𝑒𝑖
𝜕𝑎0 𝜕𝑎1

or in matrix form,

𝐷 = 𝑍𝑗 ∆𝐴 + 𝐸

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 45


NONLINEAR REGRESSION
where 𝑍𝑗 is the matrix of partial derivatives of the 𝑦1 − 𝑓 𝑥1
function evaluated at the initial guess j 𝑦 − 𝑓 𝑥2
𝐷 = 2

𝜕𝑓1 Τ𝜕𝑎0 𝜕𝑓1 Τ𝜕𝑎1 𝑦𝑛 − 𝑓 𝑥𝑛
𝜕𝑓2 Τ𝜕𝑎0 𝜕𝑓2 Τ𝜕𝑎1
𝑍𝑗 = and the vector ∆𝐴 contains the
⋮ ⋮
𝜕𝑓𝑛 Τ𝜕𝑎0 𝜕𝑓𝑛 Τ𝜕𝑎1 changes in the parameter values.
where n = the number of data points and 𝜕𝑓𝑖 Τ𝜕𝑎𝑘 = the
∆𝑎0
partial derivative of the functions with respect to the kth
∆𝑎1
parameter evaluated at the data point. The vector 𝐷 ∆𝐴 =

contains the differences between the measurements and ∆𝑎𝑚
the function values.
11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 46
NONLINEAR REGRESSION
❖ Applying the linear least-squares theory results in the following normal equations:

𝑇 𝑇
𝑍𝑗 𝑍𝑗 ∆𝐴 = 𝑍𝑗 𝐷

Thus the approach consists of solving for ∆𝐴 , which can be employed to compute improved
values for the parameters as in
𝑎0,𝑗+1 = 𝑎0,𝑗 + ∆𝑎0 and 𝑎1,𝑗+1 = 𝑎1,𝑗 + ∆𝑎1
❖ This procedure is repeated until the solution converges – that is, until

𝑎𝑘,𝑗+1 −𝑎𝑘,𝑗
𝜀𝑎 𝑘 = 𝑎𝑘,𝑗+1
100%

falls below the stopping criterion

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 47


NONLINEAR REGRESSION
A potential problem with the Gauss-Newton method as developed to this point is that the
partial derivatives of the function may be difficult to evaluate. Consequently, many computer
programs use different equations to approximate the partial derivatives. One method is
𝜕𝑓𝑖 𝑓 𝑥𝑖 : 𝑎0 ,….,𝑎𝑘 +𝛿𝑎𝑘 ,…,𝑎𝑚 −𝑓 𝑥𝑖 : 𝑎0 ,….,𝑎𝑘 ,…,𝑎𝑚

𝜕𝑎𝑘 𝛿𝑎𝑘

where  = a small fractional perturbation

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 48


NONLINEAR REGRESSION
Shortcomings of Gauss-Newton method
It may converge slowly
1. It may oscillate widely, that is, continually change direction
2. It may not converge at all.
Modification of this method have been developed to remedy the shortcomings
❖ A guess of the parameter is made
❖ The sum of the squares of the residuals is computed.
𝑆𝑟 = σ𝑛𝑖=1 𝑦𝑖 − 𝑎0 1 − 𝑒 −𝑎1 𝑥𝑖 2

❖ Then the parameters would be adjusted systematically to minimize Sr, using search techniques of
the type described in optimization.

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 49


EXAMPLE 6
Use nonlinear regression to fit a parabola to the following data:

x 0.2 0.5 0.8 1.2 1.7 2 2.3


y 500 700 1000 1200 2200 2650 3750

n 𝒙𝒊 𝒚𝒊 𝟏 − 𝒆−𝒂𝟏 𝒙 𝒂𝟎 𝒙𝒆−𝒂𝟏 𝒙

1 0.2 500 0.1812692469 0.1637461506


2 0.5 700 0.3934693403 0.3032653299
3 0.8 1000 0.5506710359 0.3594631713
4 1.2 1200 0.6988057881 0.3614330543
5 1.7 2200 0.8173164759 0.3105619909
6 2.0 2650 0.8646647168 0.2706705665
7 2.3 3750 0.8997411563 0.2305953406

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 50


Equations:
❖ General form: 𝑓 𝑥 = 𝑎0 1 − 𝑒 −𝑎1𝑥 + 𝑒
❖ In matrix form: 𝐷 = 𝑍𝑗 ∆𝐴 + 𝐸 (Gauss-Newton method)

𝜕𝑓1 Τ𝜕𝑎0 𝜕𝑓1 Τ𝜕𝑎1 𝑦1 − 𝑓 𝑥1 ∆𝑎0


𝜕𝑓2 Τ𝜕𝑎0 𝜕𝑓2 Τ𝜕𝑎1 𝑦 − 𝑓 𝑥2 ∆𝑎1
𝑍𝑗 = 𝐷 = 2 ∆𝐴 =
⋮ ⋮ ⋮ ⋮
𝜕𝑓𝑛 Τ𝜕𝑎0 𝜕𝑓𝑛 Τ𝜕𝑎1 𝑦𝑛 − 𝑓 𝑥𝑛 ∆𝑎𝑚
❖ Partial derivative:
𝜕𝑓 𝜕𝑓
= 1 − 𝑒 −𝑎1𝑥 = 𝑎0 𝑥𝑒 −𝑎1𝑥
𝜕𝑎0 𝜕𝑎1

❖ Applying the linear least-squares theory:


𝑇 𝑇
𝑍𝑗 𝑍𝑗 ∆𝐴 = 𝑍𝑗 𝐷

𝑎0,𝑗+1 = 𝑎0,𝑗 + ∆𝑎0 𝑎1,𝑗+1 = 𝑎1,𝑗 + ∆𝑎1

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 51


Solution:
First iteration: Let a0 = 1.0 and a1 = 1.0
The [Z] matrix

0.1813 0.1637 Multiplying [Z]T with [Z]


0.3935 0.3033
3.2044 1.2949
0.5507 0.3595 [Z]T [Z]=
1.2949 0.6015
[Z] = 0.6988 0.3614
0.8173 0.3106
0.8647 0.2707
0.8997 0.2306

Taking the transpose of [Z]

0.1813 0.3935 0.5507 0.6988 0.8173 0.8647 0.8997


[Z]T =
0.1637 0.3033 0.3595 0.3614 0.3106 0.2707 0.2306

11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 52


11/18/2021 PREPARED BY: ENGR. LUCIA V. ORTEGA 53

You might also like