K7 Curve Fitting
K7 Curve Fitting
Mongkol JIRAVACHARADET
SURANAREE INSTITUTE OF ENGINEERING
UNIVERSITY OF TECHNOLOGY SCHOOL OF CIVIL ENGINEERING
LINEAR REGRESSION
We want to find the curve that will fit the data.
y Observation: [ xi yi ]
Model: y = α x + β
Error: ei = yi – α xi – β
x
Criteria for a “Best” Fit
Find the BEST line which minimize the sum of error for all data
ei = y i − yˆ
Where
ŷ = α x i + β However, the errors can
cancel one another and
still be wrong.
ERROR Definition
ei = y i − yˆ
But, the error minimization is going to have problems.
The solution is the minimization of the sum of squares.
S = ∑ (ei )
2
0 = ∑ y i − ∑ β − ∑ αx i
0 = ∑ y i x i − ∑ βx i − ∑ αx i2
nβ + α ∑ x i = ∑ y i
β ∑ x i + α ∑ x i2 = ∑ y i x i
1
∑ xi yi − ∑ xi ∑ yi S xy
α= n α=
1 S xx
∑ xi2 − ( ∑ xi )
2
β = y −α x
where y and x are the mean of y and x
1
Define: S xy = Σxi yi − Σxi Σyi
n Approximated y for any x is
1
S xx = Σxi2 − ( Σxi ) ŷ = α x + β
2
n
1
S yy = Σyi2 − ( i)
Σ
2
y
n
Example: Fit a straight line to x and y values
xi yi xi2 xi yi yi2
n =7
1 0.5 1 0.5 0.25
2 2.5 4 5.0 6.25 28
3 2.0 9 6.0 4 x = =4
4 4.0 16 16.0 16 7
5 3.5 25 17.5 12.25
6 6.0 36 36.0 36 24
7 5.5 49 38.5 30.25 y = = 3 . 4286
7
Σ 28 24 140 119.5 105
(119.5) − (28)(24) / 7
α= = 0.8393 Least-square fit:
(140) − (28) 2 / 7
y = 0 . 8393 x + 0 . 0714
β = 3.4286 − 0.8393(4) = 0.0714
S r = ∑ e = ∑ ( yi − β − α xi ) = ( S xx S yy − S xy2 ) / S xx
n n
2 2
i
i =1 i =1
n
St = ∑ ( yi − y ) = S yy
2
Sum of the square around the mean:
i =1
Sr
Standard errors of the estimation: sy / x =
n−2
St
Standard deviation: sy =
n−2
Linear regression
sy > sy/x
sy sy/x
y
St − S r S xy2
Coefficient of determination r2 = =
St S xx S yy
r2 คืออัตราสวนการแปรเปลี่ยนคา y ที่เกิดจากการเปลี่ยนคา x
xi yi x i2 xi yi y i2
S xx = 140 − 282 / 7 = 28
1 0.5 1 0.5 0.25
2 2.5 4 5.0 6.25 S yy = 105 − 242 / 7 = 22.7
3 2.0 9 6.0 4
4 4.0 16 16.0 16
5 3.5 25 17.5 12.25 S xy = 119.5 − 28 × 24 / 7 = 23.5
6 6.0 36 36.0 36
7 5.5 49 38.5 30.25 Sr = (28 × 22.7 − 23.52 ) / 28
Σ 28 24 140 119.5 105 = 2.977
yˆ i ± ∆
yˆ i
y y
x xi x
For CI 95%, you can be 95% confident that the two curved
confidence bands enclose the true best-fit linear regression line,
leaving a 5% chance that the true line is outside those boundaries.
A 100 (1 - α) % confidence interval for yi is given by
Confidence interval 95% → α = 0.05
1 ( xi − x ) 2
yˆ i ± tα / 2 s y / x +
n S xx
T-Distribution
Probability density function of
t 0.025 t 0.025 the t distribution:
t 0.005 t 0.005
(1 + x 2 /ν ) − (ν +1) / 2
f ( x) =
t
B(0.5, 0.5ν ) ν
95% where B is the beta function and
99% ν is a positive integer
shape parameter.
ν = df
Degree of freedom
Critical Values of t
Confidence Interval
80% 90% 95% 98% 99% 99.8%
df 0.10 0.05 0.025 0.01 0.005 0.001
Confidence Interval
80% 90% 95% 98% 99% 99.8%
df 0.10 0.05 0.025 0.01 0.005 0.001
Confidence Interval
80% 90% 95% 98% 99% 99.8%
df 0.10 0.05 0.025 0.01 0.005 0.001
y = a0 + a1x + a2 x2
∂S r
= −2 ∑ x i ( y i − a 0 − a 1 x i − a 2 x i2 )
∂a 1
∂S r
= −2 ∑ x i2 ( y i − a 0 − a 1 x i − a 2 x i2 )
∂a 2
Normal equations:
( )
n a 0 + (∑ x i )a 1 + ∑ x i2 a 2 = ∑ y i
(∑ x i )a 0 + (∑ x i2 )a + (∑ x )a
1 i
3
2 = ∑ x iy i
(∑ x )a + (∑ x
i
2
0 i
3
)a + (∑ x )a
1 i
4
2 = ∑ x i2 y i
( yi − y )
2
xi yi ( yi - a0 - a1xi - a2xi2)2
0 2.1 544.44 0.14332
1 7.7 314.47 1.00286
2 13.6 140.03 1.08158
3 27.2 3.12 0.80491
4 40.9 239.22 0.61951
5 61.1 1272.11 0.09439
2513.39 − 3.74657
r= = 0.99851 = 0.99925
2513.39
>> x = [0 1 2 3 4 5];
>> y = [2.1 7.7 13.6 27.2 40.9 61.1];
>> c = polyfit(x, y, 2)
>> [c, s] = polyfit(x, y, 2)
>> st = sum((y - mean(y)).^2)
>> sr = sum((y - polyval(c, x)).^2)
>> r = sqrt((st - sr) / st)
MATLAB polyval Function
Evaluate polynomial at the points defined by the input vector
>> y = polyval(c, x)
where x = Input vector
y = Value of polynomial evaluated at x
c = vector of coefficient in descending order
Polynomial Interpolation
70
60
50
40
y
30
20
10
0
0 1 2 3 4 5
x
>> y2 = polyval(c,x)
>> plot(x, y, ’o’, x, y2)
Error Bounds
>> plot(x,y,'o',x,y2,'g-',x,y2+2*delta,'r:',x,y2-2*delta,'r:')
xi yi
1 0.5
2 2.5
3 2.0
4 4.0
5 3.5
6 6.0
7 5.5
>> plot(x,y,'o',x,y2,'g-',x,y2+2*delta,'r:',x,y2-2*delta,'r:')
Multiple Linear Regression
Example:
x1 x2 y 6 16.5 14 c0 54
16.5
76.25 48 c1 = 243.5
0 0 5
2 1 10 14 48 54 c2 100
2.5 2 9
1 3 0
c0 = 5
4 6 3 c1 = 4
7 2 27
c2 = −3
Multivariate Fit in MATLAB
c0 + c1x11 + c2x12 + . . . + cpx1p = y1
c0 + c1x21 + c2x22 + . . . + cpx2p = y2
.
.
.
c0 + c1xm1 + c2xm2 + . . . + cpxmp = ym
Overdetermined system of equations: A c = y
x11 x12 L x1 p 1 c0 y1
x x22 L x2 p 1 c y
A= , c = , and y = 2
21 1
M M O M M M M
xm1 xm 2 L xmp 1 p
c ym
Fit norm >> c = (A’*A)\(A’*y)
Fit QR >> c = A\y
Example:
x1 x2 y
0 0 5
2 1 10
2.5 2 9
1 3 0
4 6 3
7 2 27
Mongkol JIRAVACHARADET
SURANAREE INSTITUTE OF ENGINEERING
UNIVERSITY OF TECHNOLOGY SCHOOL OF CIVIL ENGINEERING
LINEAR REGRESSION
We want to find the curve that will fit the data.
y Observation: [ xi yi ]
Model: y = α x + β
Error: ei = yi – α xi – β
x
Criteria for a “Best” Fit
Find the BEST line which minimize the sum of error for all data
ei = y i − yˆ
Where
ŷ = α x i + β However, the errors can
cancel one another and
still be wrong.
ERROR Definition
ei = y i − yˆ
But, the error minimization is going to have problems.
The solution is the minimization of the sum of squares.
S = ∑ (ei )
2
0 = ∑ y i − ∑ β − ∑ αx i
0 = ∑ y i x i − ∑ βx i − ∑ αx i2
nβ + α ∑ x i = ∑ y i
β ∑ x i + α ∑ x i2 = ∑ y i x i
1
∑ xi yi − ∑ xi ∑ yi S xy
α= n α=
1 S xx
∑ xi2 − ( ∑ xi )
2
β = y −α x
where y and x are the mean of y and x
1
Define: S xy = Σxi yi − Σxi Σyi
n Approximated y for any x is
1
S xx = Σxi2 − ( Σxi ) ŷ = α x + β
2
n
1
S yy = Σyi2 − ( i)
Σ
2
y
n
Example: Fit a straight line to x and y values
xi yi xi2 xi yi yi2
n =7
1 0.5 1 0.5 0.25
2 2.5 4 5.0 6.25 28
3 2.0 9 6.0 4 x = =4
4 4.0 16 16.0 16 7
5 3.5 25 17.5 12.25
6 6.0 36 36.0 36 24
7 5.5 49 38.5 30.25 y = = 3 . 4286
7
Σ 28 24 140 119.5 105
(119.5) − (28)(24) / 7
α= = 0.8393 Least-square fit:
(140) − (28) 2 / 7
y = 0 . 8393 x + 0 . 0714
β = 3.4286 − 0.8393(4) = 0.0714
S r = ∑ e = ∑ ( yi − β − α xi ) = ( S xx S yy − S xy2 ) / S xx
n n
2 2
i
i =1 i =1
n
St = ∑ ( yi − y ) = S yy
2
Sum of the square around the mean:
i =1
Sr
Standard errors of the estimation: sy / x =
n−2
St
Standard deviation: sy =
n−2
Linear regression
sy > sy/x
sy sy/x
y
St − S r S xy2
Coefficient of determination r2 = =
St S xx S yy
r2 คืออัตราสวนการแปรเปลี่ยนคา y ที่เกิดจากการเปลี่ยนคา x
xi yi x i2 xi yi y i2
S xx = 140 − 282 / 7 = 28
1 0.5 1 0.5 0.25
2 2.5 4 5.0 6.25 S yy = 105 − 242 / 7 = 22.7
3 2.0 9 6.0 4
4 4.0 16 16.0 16
5 3.5 25 17.5 12.25 S xy = 119.5 − 28 × 24 / 7 = 23.5
6 6.0 36 36.0 36
7 5.5 49 38.5 30.25 Sr = (28 × 22.7 − 23.52 ) / 28
Σ 28 24 140 119.5 105 = 2.977
yˆ i ± ∆
yˆ i
y y
x xi x
For CI 95%, you can be 95% confident that the two curved
confidence bands enclose the true best-fit linear regression line,
leaving a 5% chance that the true line is outside those boundaries.
A 100 (1 - α) % confidence interval for yi is given by
Confidence interval 95% → α = 0.05
1 ( xi − x ) 2
yˆ i ± tα / 2 s y / x +
n S xx
T-Distribution
Probability density function of
t 0.025 t 0.025 the t distribution:
t 0.005 t 0.005
(1 + x 2 /ν ) − (ν +1) / 2
f ( x) =
t
B(0.5, 0.5ν ) ν
95% where B is the beta function and
99% ν is a positive integer
shape parameter.
ν = df
Degree of freedom
Critical Values of t
Confidence Interval
80% 90% 95% 98% 99% 99.8%
df 0.10 0.05 0.025 0.01 0.005 0.001
Confidence Interval
80% 90% 95% 98% 99% 99.8%
df 0.10 0.05 0.025 0.01 0.005 0.001
Confidence Interval
80% 90% 95% 98% 99% 99.8%
df 0.10 0.05 0.025 0.01 0.005 0.001
y = a0 + a1x + a2 x2
∂S r
= −2 ∑ x i ( y i − a 0 − a 1 x i − a 2 x i2 )
∂a 1
∂S r
= −2 ∑ x i2 ( y i − a 0 − a 1 x i − a 2 x i2 )
∂a 2
Normal equations:
( )
n a 0 + (∑ x i )a 1 + ∑ x i2 a 2 = ∑ y i
(∑ x i )a 0 + (∑ x i2 )a + (∑ x )a
1 i
3
2 = ∑ x iy i
(∑ x )a + (∑ x
i
2
0 i
3
)a + (∑ x )a
1 i
4
2 = ∑ x i2 y i
( yi − y )
2
xi yi ( yi - a0 - a1xi - a2xi2)2
0 2.1 544.44 0.14332
1 7.7 314.47 1.00286
2 13.6 140.03 1.08158
3 27.2 3.12 0.80491
4 40.9 239.22 0.61951
5 61.1 1272.11 0.09439
2513.39 − 3.74657
r= = 0.99851 = 0.99925
2513.39
>> x = [0 1 2 3 4 5];
>> y = [2.1 7.7 13.6 27.2 40.9 61.1];
>> c = polyfit(x, y, 2)
>> [c, s] = polyfit(x, y, 2)
>> st = sum((y - mean(y)).^2)
>> sr = sum((y - polyval(c, x)).^2)
>> r = sqrt((st - sr) / st)
MATLAB polyval Function
Evaluate polynomial at the points defined by the input vector
>> y = polyval(c, x)
where x = Input vector
y = Value of polynomial evaluated at x
c = vector of coefficient in descending order
Polynomial Interpolation
70
60
50
40
y
30
20
10
0
0 1 2 3 4 5
x
>> y2 = polyval(c,x)
>> plot(x, y, ’o’, x, y2)
Error Bounds
>> plot(x,y,'o',x,y2,'g-',x,y2+2*delta,'r:',x,y2-2*delta,'r:')
xi yi
1 0.5
2 2.5
3 2.0
4 4.0
5 3.5
6 6.0
7 5.5
>> plot(x,y,'o',x,y2,'g-',x,y2+2*delta,'r:',x,y2-2*delta,'r:')
Multiple Linear Regression
Example:
x1 x2 y 6 16.5 14 c0 54
16.5
76.25 48 c1 = 243.5
0 0 5
2 1 10 14 48 54 c2 100
2.5 2 9
1 3 0
c0 = 5
4 6 3 c1 = 4
7 2 27
c2 = −3
Multivariate Fit in MATLAB
c0 + c1x11 + c2x12 + . . . + cpx1p = y1
c0 + c1x21 + c2x22 + . . . + cpx2p = y2
.
.
.
c0 + c1xm1 + c2xm2 + . . . + cpxmp = ym
Overdetermined system of equations: A c = y
x11 x12 L x1 p 1 c0 y1
x x22 L x2 p 1 c y
A= , c = , and y = 2
21 1
M M O M M M M
xm1 xm 2 L xmp 1 p
c ym
Fit norm >> c = (A’*A)\(A’*y)
Fit QR >> c = A\y
Example:
x1 x2 y
0 0 5
2 1 10
2.5 2 9
1 3 0
4 6 3
7 2 27