0% found this document useful (0 votes)
21 views29 pages

Expt 1_Curve Fitting

The document discusses curve fitting in the context of mathematical physics, explaining its importance for estimating values between discrete data points and simplifying complex functions. It outlines two main approaches: least-squares regression for noisy data and interpolation for precise data, detailing the process of finding the best fit through minimizing residuals. Additionally, it covers linear regression, criteria for determining the best fit, and quantification of error in regression analysis.

Uploaded by

reenavinu2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views29 pages

Expt 1_Curve Fitting

The document discusses curve fitting in the context of mathematical physics, explaining its importance for estimating values between discrete data points and simplifying complex functions. It outlines two main approaches: least-squares regression for noisy data and interpolation for precise data, detailing the process of finding the best fit through minimizing residuals. Additionally, it covers linear regression, criteria for determining the best fit, and quantification of error in regression analysis.

Uploaded by

reenavinu2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

MATHEMATICAL PHYSICS II LAB

Bachelor of Physics (Hons)


Semester II
Experiment 1
Curve Fitting
Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 1
Why & What is Curve Fitting
1. Data are often given for discrete values along a continuum : We may require estimates at points
between the discrete values.
2. We may require a simplified version of a given complicated function.
§ One way to do it is to compute values of the function at a number of discrete values along the range
of interest.
§ Then, a simpler function may be derived to fit these values.
■ Both the above applications are known as curve fitting.
■ There are two general approaches for curve fitting that are distinguished from each other on the
basis of the amount of error associated with these data.
i. When the given data exhibits a significant degree of error or “noise,” the strategy is to derive a
single curve that represents the general trend of the data. Because any individual data point may be
incorrect, we make no effort to intersect every point. Rather, the curve is designed to follow the
pattern of the points taken as a group. One approach of this nature is called least-squares
regression.
ii. When the data are known to be very precise, the basic approach is to fit a curve or a series of
curves that pass directly through each of the points. Such data usually originate from tables. The
estimation of values between well-known discrete points is called interpolation.

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 2


Least Squares
Regression

Interpolation: Linear
§ If data is reliable, we can plot it and connect the Interpolation
data dots.
§ Since its really a group of small 𝑓(𝑥)s connecting one point to the
next, it doesn’t work very well for data that has built in random
error (scatter)
Regression or Curve Fitting:
§ capturing the trend in the data by assigning a single
function across the entire range. Curvilinear
Interpolation

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 3


Least Squares Fitting
■ If the data are obtained from experiments, they ■ Thus curve fitting consists of two steps:
typically contain a significant amount of random
noise caused by measurement errors. i. choosing the form of 𝑓 (𝑥)
■ The task of curve fitting is to find a smooth curve ii. computation of the parameters that produce the
that fits the data points “on the average.” best fit to the data.
■ This curve should have a simple form (e.g., a low- ■ Next We Ask: What is meant by the “Best Fit”?
order polynomial), so as to not reproduce the noise.
• If the noise is confined to the 𝑦-coordinate, the most
■ Let 𝑓 𝑥 = 𝑓(𝑥; 𝑎! , 𝑎" , 𝑎# , … . . , 𝑎$ ) be the function commonly used measure is the least-squares fit,
that is to be fitted to the (𝑛 + 1) data points which minimizes the function
𝑥% , 𝑦% , 𝑖 = 0,1,2, … . . , 𝑛.
■ The notation implies that we have a function of 𝑥 𝒓
…(1)
that contains (𝑚 + 1) variable parameters
(𝑎! , 𝑎" , 𝑎# , … . . , 𝑎$ ) where 𝑚 < 𝑛. w.r.t. each 𝑎& (𝑘 = 1,2, … . 𝑚)
■ The form of 𝑓(𝑥) is determined beforehand, usually ■ Therefore, the optimal values of the parameters are
from the theory associated with the experiment from given by the solution of the equations
which the data are obtained.
𝒓 …(2)
■ The only means of adjusting the fit are the
parameters.

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 4


■ The terms 𝑟% = 𝑦% − 𝑓(𝑥% ) in Eq. (1) are called residuals; they represent the discrepancy between the data
points and the fitting function at 𝑥% .
■ The function 𝑆 to be minimized is thus the sum of the squares of the residuals.
■ Equations (2) are generally non-linear in 𝑎% and may thus be difficult to solve.
■ Often the fitting function is chosen as a linear combination of specified functions 𝑓& (𝑥)
…(3)
in which case Eqns. (2) are linear. If the fitting function is a polynomial, we have 𝑓! 𝑥 = 1, 𝑓" 𝑥 = 𝑥,
𝑓# 𝑥 = 𝑥 # , and so on.
■ The spread of the data about the fitting curve is quantified by the standard deviation, defined as

𝒓 …(4)

■ Note that if 𝑛 = 𝑚, we have interpolation, not curve fitting. In that case both the numerator and the
denominator in Eq. (4) are zero, so that σ is indeterminate. (Explained on Next Slide with 7 data points)

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 5


Linear Regression
■ Where substantial error is associated with data, polynomial
interpolation is inappropriate and may yield unsatisfactory
results when used to predict intermediate values.
Experimental data are often of this type.
■ The adjacent figure shows seven experimentally derived
data points exhibiting significant variability. Visual
inspection of these data suggests a positive relationship
between y and x.
Note that if 𝑛 = 𝑚, we have interpolation,
■ Now, if a sixth-order interpolating polynomial is fitted to not curve fitting
these data (Fig. b), it will pass exactly through all of the
points. § One way to determine the line in Fig. c is to
■ However, because of the variability in these data, the curve visually inspect the plotted data and then
oscillates widely in the interval between the points. sketch a “best” line through the points.
■ In particular, the interpolated values at 𝑥 = 1.5 and 𝑥 = 6.5 § However, this approach is arbitrary. Different
appear to be well beyond the range suggested by these people
data. would draw different lines.
■ A more appropriate strategy for such cases is to derive an § To remove this subjectivity, some criterion
approximating function that fits the shape or general trend must be devised to establish a basis for the fit.
of the data without necessarily matching the individual § One way to do this is to derive a curve that
points.
minimizes the discrepancy between the data
■ Figure c illustrates how a straight line can be used to points and the curve.
generally characterize the trend of these data without
passing through any particular point. § So, we need to look for a suitable criterion to
minimize the discrepancy.
Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 6
Criterion for Linear Regression

We Ask:

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 7


Some Possible Criteria for “Best Fit” I
■ Problem of Linear Regression: We seek to fit a straight line to a given a set of (𝑛 + 1) paired observations/data
points 𝑥" , 𝑦" , (𝑥#, 𝑦#), (𝑥$, 𝑦$), (𝑥%, 𝑦%)……., (𝑥& , 𝑦& ).
■ Let the mathematical expression for the straight line be Fitting a Straight Line to Data is
𝑦 = 𝑎" + 𝑎#𝑥 + 𝑒 ……(1)
called Linear Regression
where 𝑎" and 𝑎1 are coefficients representing the intercept and the slope, respectively, and 𝑒 is the error, or residual,
between the model and the observations, which can be represented by rearranging Eq. (1) as
𝑒 = 𝑦 − (𝑎" + 𝑎#𝑥)……(2)
■ Thus, the error, or residual, is the discrepancy between the true value of y and the approximate value, 𝑎" + 𝑎#𝑥,
predicted by the linear equation.
■ One strategy for fitting a “best” line through the data would be to minimize the sum of the residual errors for all
the available data, as in
∑&'() 𝑒' = ∑&'() (𝑦' − 𝑎" − 𝑎#𝑥' )……(3)
where n = total number of points. However, this is an inadequate criterion as illustrated in the figure (a) (next
slide), which depicts the fit of a straight line to two points. In this case, large deviations which are equal but of
opposite signs will cancel one another and may give sums of deviations ∑&'(# 𝑒' as minimum.
■ The other option could be to minimize the sum of the absolute values of the residuals defined as
∑&'() 𝑒' = ∑&'() |𝑦' − 𝑎" − 𝑎#𝑥' |……(4)
This is a reasonable measure, however, it is difficult to derive analytical formulae using the condition for finding
unknowns.

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 8


Some Possible Criteria for ”Best Fit” II
■ Also as shown for the four points in Fig(b), any straight
line falling within the dashed lines will minimize the sum
of the absolute values. Thus, this criterion also does not
yield a unique best fit.
■ A strategy that overcomes the shortcomings of the
aforementioned approaches is to minimize the sum of the
squares of the residuals between the measured y and the
y calculated with the linear model defined as

….(5)

■ The above condition is called the principle of least square


fit because it gives the sum of squares of deviations least
or minimum and hence best fit to the given data.

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 9


Least-Squares Fit of a Straight Line
■ To determine values for 𝑎0 𝑎𝑛𝑑 𝑎1, Eqn.(5) is differentiated with
respect to each coefficient as

…..(6) (𝑛 + 1) ∑ 𝑥% 𝑦% − ∑ 𝑥% ∑ 𝑦% ….(9)
𝑎" = #
𝑛 + 1 ∑ 𝑥%# − (∑ 𝑥% )
■ Setting the above derivatives equal to zero will result in a
minimum Sr and we get
§ Substituting Eqn. (9) in Eqn. 8(a) we get

….(7) …..(10)
■ Eqns. (7) can be expressed as a set of two simultaneous linear where 𝑦2 & 𝑥̅ are means of 𝑦 and 𝑥 respectively.
equations with two unknowns 𝑎" & 𝑎# .

𝒏 + 𝟏 𝒂 𝒐 + % 𝒙 𝒊 𝒂 𝟏 = % 𝒚𝒊 § Eqns. (8) can be put in matrix form as


𝑛 + 1 ∑ 𝑥' 𝑎) ∑ 𝑦'
….(8) = ….(11)
∑ 𝑥' ∑ 𝑥'$ 𝑎# ∑ 𝑥' 𝑦'
% 𝒙𝒊 𝒂𝒐 + % 𝒙𝟐𝒊 𝒂𝟏 = % 𝒙𝒊 𝒚𝒊

■ These are called the normal equations. They can be solved which can be easily solved.
simultaneously to give

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 10


Quantification of Error of Linear Regression
■ The residual in linear regression represents the vertical distance
between a data point and the straight line as shown in the adjacent
figure 1 .
■ A Standard Deviation for the regression line can be determined as
Figure 1
𝑺𝒓
𝒔𝒚/𝒙 =;
𝒏 + 𝟏 − 𝟐 ….(12)
where 𝑠$/& is called the standard error of the estimate.
■ The subscript notation “ 𝑦/𝑥” designates that the error is for a
predicted value of y corresponding to a particular value of x.
■ The division by the factor of [(𝑛 + 1) − 2)] in Eqn. (12) is to be noted.
It is because two data-derived estimates—𝑎0 and 𝑎1—were used to
compute 𝑆𝑟; In the process we lose two degrees of freedom.
■ An alternate justification of division by the factor of [(𝑛 + 1) − 2] is
that there is no such thing as the “spread of data” around a straight
line connecting two points. Thus, for the case where 𝑛 = 2, Eqn. (12)
yields a meaningless result of infinity.
■ The standard error of the estimate 𝑠$/& quantifies the spread of the
data around the regression line in contrast to the original standard
deviation that quantified the spread around the mean as shown in
Figure 2 (next slide).

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 11


Figure 2

a) the spread of the data around the mean of the dependent variable § Because the magnitude of this quantity is
b) the spread of the data around the best-fit line. scale-dependent the difference is normalized
§ The reduction in the spread in going from (a) to (b) as indicated by the to 𝑆, to yield
bell-shaped curves at the right, represents the improvement due to ….(13)
linear regression.
§ These concepts can be used to quantify the “goodness” of our fit. This is where 𝑟2 is called the coefficient of
useful for comparison of several regressions as illustrated in Figure (3). determination
§ To do this, we determine the total sum of the squares 𝑺𝒕 around the mean and 𝑟 is the correlation coefficient.
of 𝑦 values of the given data set. This is the magnitude of the residual error § For a perfect fit, 𝑆- = 0 and 𝑟 = 𝑟 $ = 1,
associated with the dependent variable prior to regression. signifying that the line explains 100% variability of
§ After performing the regression, we compute 𝑺𝒓 , the sum of the squares data.
of the residuals around the regression line. § The other extreme is 𝑟 = 𝑟 $ = 0 & 𝑆- = 𝑆,
§ The difference between the two quantities 𝑆, − 𝑆- , the improvement or error when the fit represents no improvement.
reduction due to describing the data in terms of a straight line rather than § r can also be computed using the relation
as an average value.
Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 ….(14) 12
Practice Problem-1 (Fitting a Straight Line)
a) Fit a straight line to the 𝑥 and 𝑦 values given in the following table.
b) Display the equation of the regression line (fitted line).
c) Plot both the fitted line and the given data points.
2

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 13


Practice Problem
(Continued)
d)Estimation of Errors in the Linear Fit
■ The standard deviation is

■ The standard error of the estimate is

■ Thus, because 𝑆./0 < 𝑆. , the linear regression model has merit. The extent of the improvement is quantified by

■ These results indicate that 86.8 percent of the original uncertainty has been explained by the linear model or the
model explains 93.2% of variability in the given data.

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 14


More on Linear Regression
■ The following are some characteristics of
■ The following are some statistical Linear Regression.
assumptions that are inherent in the
linear least-squares procedures are : i. The regression of 𝑦 versus 𝑥 is not the
same as 𝑥 versus 𝑦.
i. Each 𝑥 has a fixed value; it is not random
and is known without error. ii. Errors of opposite signs do not cancel.

ii. The 𝑦 values are independent random iii. Since the squares of residuals appear in
variables and all have the same variance. the eqn. (5),therefore, it provides more

iii. The 𝑦 values for a given 𝑥 must be


normally distributed. 0 0 0

weight to large errors than small errors.

Limitation: Using regression analysis, only those values of 𝑦 = 𝑓(𝑥) can be estimated that lie within
the range of the given data set.
The regression curve cannot be used to extrapolate the 𝑦 values.

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 15


Weighting of Data Weighted Linear Regression
If the fitting function is a straight line 𝑓 𝑥 = 𝑎 + 𝑏𝑥,
■ There are occasions when our confidence in the ■ the normal equations are obtained as follows:
accuracy of data varies from point to point.
■ For example:
0
i. the instrument taking the measurements may be
more sensitive in a certain range of data
ii. sometimes the data represent the results of several
experiments, each carried out under different
conditions
■ Under these circumstances we may want to assign a 0
confidence factor, or weight, to each data point and
minimize the sum of the squares of the weighted 0
residuals 𝑟% = 𝑤% [𝑦% − 𝑓 𝑥% ]# where 𝑤% are the
weights. ■ After simplification these give a system of linear
equations for a and b as:
■ Hence, the function to be minimized is
%+-

𝑆 𝑎! , 𝑎" , … , 𝑎$ = ; 𝑤% [𝑦% − 𝑓 𝑥% ]#
0 0 0
%+,
■ This procedure forces the fitting function f(x) closer to
the data points that have higher weights.
■ If all the data have same importance, then the
weights are set to 1. 0 0 0

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 16


Practice Problem-2 (Weighted Fitting)
Fit the given Data:

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 17


Other Non Linear Fits
■ Linear regression provides a powerful
technique for fitting a best line to data.
■ However, it is predicated on the fact that
the relationship between the dependent
and independent variables is linear.
■ This is not always the case, and the first a) Data that are ill-suited for linear least-squares regression.
step in any regression analysis should be to b) Indication that a parabola fit is preferable.
plot and visually inspect the data to
ascertain whether a linear model applies.
■ For example, in the adjacent figure, the
data is curvilinear. In such situations, we
can either use polynomial regression or
transform the data in a form that is
compatible with linear regression.
■ The table below shows other functions that
can easily be linearized easily.
■ After linearization, the coefficients are
computed for linear fit and then
transformed back to obtain the expected
non-linear fitted curve. Linearization of Non- Linear Functions

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 18


Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 19
Practice Problem-3 (Power Law Fitting)
a) Fit a Power Equation 𝑦 = 𝑎𝑥 4 to the 𝑥 and 𝑦 values given in the following table.
b) Display the computed values of the coefficients 𝑎 & 𝑏.
c) Reproduce both the curves shown here.

§ A linear regression of the log-transformed data yields


the result

§ The intercept, log 𝑎 = −0.300 and then taking the


anti-logarithm yields 𝑎 = 10./ = 0.5
§ The slope is 𝑏 = 1.75
§ So, the required power equation becomes 𝑦 = 0.5𝑥 ".12
§ The adjacent figure shows the plots.

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 20


POLYNOMIAL REGRESSION
■ A linear fit is not always a ‘good fit’ for a given data set.
■ One alternative is to fit polynomials to the data using
polynomial regression. …(3)
■ Another option is to transform the given data in a form
that is compatible with linear regression.
■ The least-squares procedure can be readily extended § The above equations can be set equal to zero and
to fit the data to a higher-order polynomial.
rearranged to develop the following set of normal
■ For example, suppose that we fit a second-order or a equations
quadratic polynomial. The normal equations for this
can be derived as follows:
■ Let ….(1)
…(4)
The sum of the squares of the residuals is given by

….(2)
§ It is to be noted that he above three equations are
■ Differentiating Eqn.(2) w.r.t. each of the unknown linear in three unknowns 𝑎" , 𝑎# & 𝑎$.
coefficients of the polynomial, we get § The coefficients of the unknowns can be calculated
directly from the observed data.
§ Thus, the problem of determining a least-squares
second-order polynomial is equivalent to solving a
system of three simultaneous linear equations.
Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 21
Practice Problem-4 (Quadratic Fitting)
a) Fit a quadratic polynomial to the 𝑥 and 𝑦 values given in the following table.
b) Display the equation of the regression curve. (fitted curve).
c) Plot both the fitted curve and the given data points.

These results indicate that 99.851 percent of the original uncertainty has been explained by the model
Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 22
Generalizing Polynomial Curve Fitting

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 23


Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 24
Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 25
In the case where y=None and x is a 2x2 array,
linregress(x) is equivalent to linregress(x[0], x[1]).

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 26


Practice Problem-1
a) Fit a straight line to the 𝑥 and 𝑦 values given in the following table.
c) Plot both the fitted line and the given data points.

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 27


Console Output

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 28


!Note!

Dr. Pragati Ashdhir , Professor, Department of Physics, Hindu College@2024-25 29

You might also like