0% found this document useful (0 votes)
6 views

L7-CurveFitting(LeastSquaresRegression)

The document discusses numerical methods for curve fitting, specifically focusing on least squares regression and interpolation techniques. It outlines the importance of statistical measures such as mean, standard deviation, and correlation coefficients in analyzing data sets. Additionally, it explains how to derive the best-fit line using linear regression and the significance of polynomial regression for non-linear relationships.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

L7-CurveFitting(LeastSquaresRegression)

The document discusses numerical methods for curve fitting, specifically focusing on least squares regression and interpolation techniques. It outlines the importance of statistical measures such as mean, standard deviation, and correlation coefficients in analyzing data sets. Additionally, it explains how to derive the best-fit line using linear regression and the significance of polynomial regression for non-linear relationships.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

MAT 202 E

NUMERICAL METHODS

Curve Fitting
Least Squares Regression

“These notes are only to be used in class presentations”

Textbook: Numerical Methods for Engineers, S.C. Chapra, R.P. Canale,


7th edition, 2015
Curve Fitting

• Data is given for discrete values. You may require estimates at


points between the discrete values.
• Fit the best curve to a discrete data set and obtain estimates at
points between the discrete values.

Two general approaches:


• Approximation or Regression:
Find a simple function that represents the trend of the curve given
that the data may have measurement error or ‘noise’. Function
doesn’t have to intersect the points.

• Interpolation:
The data is exact (precise), so you need to find a function that
passes through all the given points.
Two general approaches:

Data exhibit a significant degree of


scatter (error or “noise”) Find a
single curve that represents the
general trend of the data. Function
doesn’t have to intersect the points.
Least-squares regression

Data is very precise Pass a curve(s)


exactly through each of the points.

Interpolation
Basis Statistics

In course of engineering study, if several measurements are


made of a particular quantity, additional insight can be gained by
summarizing the data in one or more well chosen statistics that
convey as much information as possible about specific
characteristics of the data set.

These descriptive statistics are most often selected to represent


• The location of the center of the distribution of the data,
• The degree of spread of the data.
Given a set of data,

y1, y2 ,  yn
Arithmetic mean - The sum of the individual data points (yi)
divided by the number of points.

1 n
y   yi i  1,  , n
n i 1
Standard deviation – a common measure of spread for a sample

Degrees of freedom
St
sy 
n 1
n
St   ( yi  y ) 2
i 1
• If the individual measurements are spread out widely around
the mean, St and consequently sy will be large.
• If they are grouped tightly, sy will be small.
Variance

2 St
sy 
n 1

Coefficient of variation – quantifies the spread of data (similar


to relative error)

sy
c.v.  100%
y
Linear Least-Squares Regression
Linear Least-Squares Regression
• Set of data points: (x1, y1), (x2, y2),…,(xn, yn)
• The goal is to come up with a e Error
straight line that comes close to
fitting given data points.
•The closeness is determined by
the error (e) or residual
yi : measured value
e : error
Line equation
yi = a0 + a1 xi + e y = a 0 + a1 x
e = yi - a0 - a1 xi
a1 : slope
a0 : intercept
Choosing Criteria For a “Best Fit”
• Minimize the sum of the residual errors
for all available data?
n n

e  (y
Inadequate!
i i  ao  a1 xi )
i 1 i 1
• Positive and negative errors can cancel out
• Sum of the absolute values?
n n

 ei   yi  a0  a1 xi
i 1 i 1
Inadequate!

• May not get a unique best fit


• How about minimizing the distance that
an individual point falls from the line?
• May be overly influenced by outliers
Inadequate!
• Best strategy is to minimize the sum of the squares of
the residuals between the measured-y and the y
calculated with the linear model:
e Error
n
S r   e i2
i 1
n
  ( y i ,measured  yi ,model ) 2
i 1
n
S r   ( yi  a0  a1 x i ) 2
i 1

• Yields a unique line for a given set of data


• Easy to differentiate
• Positive errors don’t cancel out negative errors.
• Large errors are magnified
Least-Squares Fit of a Straight Line

• Need to compute a0 and a1 such that Sr is minimized

n n
Minimize error : S r   ei2   ( yi  a0  a1xi ) 2
i 1 i 1

n n n n
S r
 2 ( yi  ao  a1xi )  0   yi   a0   a1xi  0
ao i 1 i 1 i 1 i 1
n n n n
S r
 2 [( yi  ao  a1xi ) xi ]  0   i i  0 i  1 i 0
y x  a x  a x 2
a1 i 1 i 1 i 1 i 1
n
Since  a0  na0
i 1
 n  n
na0    xi a1   yi
 
 i 1  i 1 Normal equations which can
 n   n 2 n be solved simultaneously
  xi a0    xi a1   xi yi
   
 i 1   i 1  i 1

In matrix form:

 n   n 
 n  xi  a    yi 
 i 1  0   i 1 
n n  a   n 
 xi 2  1  
i 1
 i x 
i 1
xi yi 

i 1 
When these two equations are solved;
Mean values
“Goodness” of our fit

Sy/x : standard error of the estimate. The error is for the


predicted value of y corresponding to a particular value of x.

Notice the improvement in the error due to linear regression


The improvement of the total error is measured by the correlation
coefficient.

• Sr : Sum of the squares of residuals around the regression line

n
S r   ( yi  a0  a1 xi ) 2
i 1

• St : total sum of the squares around the mean

n
St   ( yi  y ) 2
i 1
• (St – Sr) quantifies the improvement or error reduction due to
describing data in terms of a straight line rather than as an
average value.

St  S r
r : correlation coefficient
r 
2
r2 : coefficient of determination St

• For a perfect fit Sr=0 and r = r2 = 1


signifies that the line explains 100 percent of the variability of
the data.
• For r = r2 = 0  Sr=St  the fit represents no
improvement
Example :
Fit a straight line to the data set. (x and y values ).
Calculate the standard deviation, standard error of the estimate and
correlation coefficient.
Approximate y=f(x) for x=2.5.

x 1 2 3 4 5 6 7
y 0.5 2.5 2.0 4.0 3.5 6.0 5.5
xi yi xi 2 xi yi ( y i  y ) 2 f (x i )  a0  a1x i (yi-f(xi))
2

1 0.5 1 0.5 8.5765 0.9107 0.1687


2 2.5 4 5.0 0.8622 1.75 0.5625
3 2.0 9 6.0 2.0408 2.5893 0.3473
4 4.0 16 16.0 0.3265 3.4286 0.3265
5 3.5 25 17.5 0.0051 4.2679 0.5897
6 6.0 36 36.0 6.6122 5.1072 0.7971
7 5.5 49 38.5 4.2908 5.9465 0.1994
Sum 28 24 140 119.5 22.7143 2.9911
Average 4 3.4
St Sr
f ( x )  0.0714  0.8393 x
s y / x  0.7735
s y  1.9457
r  0.932
f ( 2.5)  2.1697
Linearization of Nonlinear Relationships

• Linear regression is based on the fact that the relationship between


dependent and independent variables is linear
• If the function is not linear, you will need polynomial regression
techniques or other nonlinear techniques

(a) Data that is ill-suited for linear


least-squares regression
(b) Indication that a parabola may
be more suitable
For certain classes of functions, you can linearize the data and still
use linear regression
Exponential Eq.

y  e x

slope  

intercept  ln 
Power Eq.
Saturation
growth-rate Eq.
• Rational function
Example :
Fit a power equation to the given data set. Calculate the correlation
coefficient.

x 1 2 3 4 5
y 0.5 1.7 3.4 5.7 8.4

f ( x )  0.5 x1.75
r  0.9999
( y i  y ) 2 f (x i )  xi (yi-f(xi))
2 2
xi yi logxi logyi (logxi) (logxi)(logyi)
1.0 0.5 0.0000 -0.3010 0.0000 0.0000 11.8336 0.5000 0.00000
2.0 1.7 0.3010 0.2304 0.0906 0.0694 5.0176 1.6818 0.00033
3.0 3.4 0.4771 0.5315 0.2276 0.2536 0.2916 3.4193 0.00037
4.0 5.7 0.6021 0.7559 0.3625 0.4551 3.0976 5.6569 0.00186
5.0 8.4 0.6990 0.9243 0.4886 0.6460 19.8916 8.3593 0.00166
Sum 2.0792 2.1411 1.1693 1.4241 40.1320 0.0042
Average 3.0000 3.9400
St Sr
Linearization of Nonlinear Relationships
• Translate a non-polynomial relation to a linear relation
(translate data set accordingly)
• Linear regression of the translated data set
• Translate linear relation back to original relation
Polynomial Regression

• Some engineering data is poorly represented by a straight


line. A curve (polynomial) may be better suited to fit the
data. The least squares method can be extended to fit the
data to higher order polynomials.

General equation for a mth order polynomial,

f ( x)  a0  a1x  a2 x 2  .........  am x m

A straight line is a m=1 : 1st order polynomial.


• To fit the data to an mth order polynomial, we need to solve the
following system of linear equations for a0 ,a1.....am
• (m+1) equations with (m+1) unknowns

 n n   n 
 n  xi   xi m
  yi 
 i 1 i 1   a   i 1 
 n n n    n
0

  xi m 1  a
 i 1
i
x 2
  i   1     xi yi 
x
i 1 i 1    i 1
         
n n n   am   n 
 xm m 1 2m   xm y 
 i i  i   i i 
x x
i 1 i 1 i 1  i 1 
Standard error of the estimate

Sr
sy / x 
n  (m  1)

Where

n
Sr   ( yi  a0  a1 x  a2 x 2  ...........  am x m ) 2
i 1
Example :
Fit a parabola to the given data set. Calculate the standard error of
the estimate and the correlation coefficient.

x 0 1 2 3 4 5
y 2.1 7.7 13.6 27.2 40.9 61.1
f ( x i )  a 0  a1x i  a 2 x 2i
xi yi xi 2 xi 3 xi 4 xi yi xi 2 y ( y i  y )2 f(xi) (yi-f(xi))2
0 2.1 0 0 0 0 0 544.4444 2.4786 0.1433
1 7.7 1 1 1 7.7 7.7 314.4711 6.6986 1.0028
2 13.6 4 8 16 27.2 54.4 140.0278 14.6400 1.0816
3 27.2 9 27 81 81.6 244.8 3.1211 26.3028 0.8050
4 40.9 16 64 256 163.6 654.4 239.2178 41.6870 0.6194
5 61.1 25 125 625 305.5 1527.5 1272.1111 60.7926 0.0945
Sum 15 152.6 55 225 979 585.6 2488.8 2513.3933 3.7466
Average 2.5 25.4333
St Sr

f ( x )  2.4786  2.3593 x  1.8607 x 2


s y / x  1.1175
r  0.9993

You might also like