Chapter 8 Regression Analysis - 2009 - A Guide To Microsoft Excel 2007 For Scientists and Engineers
Chapter 8 Regression Analysis - 2009 - A Guide To Microsoft Excel 2007 For Scientists and Engineers
Regression Analysis
In this chapter we seek answers to the question: What equation
fits my experimental data? The general terminology for this type
of activity is regression analysis. The reader may wish to Google to
find how this term came to be used.
Least-Squares Fitting Gauss is credited with developing the fundamentals of the basis
for least-squares analysis in 1795 at the age of 18. One speaks
about the line of best fit. In this instance, we will restrict
ourselves to linear fits. Let the experimental data consist of
pairs ofx- andy-values. We write the equation of the line of best
fit as J =mx + b, where J (read as ttyhat") is the predicted value.
The vertical displacement between the actualy-value and the
predicted J for a given x is called the residual. The least-
squares criterion requires that we adjust the constants m and
b such that the sum of the squares of the residuals, 'L.(Yi - Jlis
as small as possible. There are formulas for finding these
parameters, but we shall let Excel do the work.
Exercise 1: Trendline, Scenario: A physics student is tasked with finding the thermal
coefficient of resistance of a sample. Her experimental results
SLOPE, and are shown in Figure 8.1.
INTERCEPT
The textbook told her to work with Equation 1, where Ro is the
resistance at ODC, R, is the resistance at temperature t DC and a
is the required coefficient.
R, = RoO + at)
Of course, this can also be written as:
Regression Analysis 131
Figure 8.1
(a) Open a new workbook and on Sheetl enter the text and data
shown in columns A and B of Figure 8.1.
Now we are ready to add the trendline. We could select the chart
and use Chart Tools / Layout / Trendline / Linear Trendline to
quickly add a trendline. This just adds the trendline; we want
more. The same steps, but ending with More Trendline Options,
R-squared gives a measure of will open the required dialog but we shall use the shortcut menu.
the goodness of the fit. In a
sense, it is a measure of how (c) Right click a marker on the chart and select Insert Trendline
much of the variability in the from the shortcut menu to open the Trendline dialog (see
y-values can be accounted for Figure 8.2).
by changes in the x-values.
(d) Clearly, we want a linear trendline, so make that selection.
Here it is 99/0; the rest may
For this demonstration also check the boxes to give us the
be attributed to experimental
equation of the best fit and the R-squared value. Our data
errors. starts at SoC but it will be interesting to have the trendline
start at O°C (then it will hit the y-axis), so in the Backwards
box of the Forecast group, enter a value of 5. Note that the
Trendline Equation box can be dragged around the chart.
132 A Guide to Microsoft Excel 2007 for Scientists and Engineers
Figure 8.2
We see that the trendline values and the function results agree. In
the figure the function values are formatted to show only two
decimal places in keeping with the experimental data, butyou can
look at both the trendline equation and the function results to 15
decimals to compare them.
Figure 8.3
(a) Our completed worksheet will look like Figure 8.3. We can
copy much of it from Sheetl. Select from Sheetl Al :Fll, copy
it, and paste this to Al of Sheet2. Use Home / Editing / Clear
(looks like an eraser) / Clear All to remove D5:F7.
Ifwe know the parameters for the equation of the straight line, we
can find the value of y for any x with y = mx+ b. We do this in
E7:E9. In F7:F9 we use the FORECAST function to show that if this
is our only task we do not need to find the slope and intercept but
can have Excel do that "behind the scenes."
Exercise 3: The LlNEST In this Exercise we use LINEST rather than SLOPE, INTERCEPT,
and RSQ to get the parameters for a linear fit. LINEST is more
Function flexible and can give more data, as we shall see in this and
subsequent Exercises.
Temporarily ignore E4:F8 and enter all text and values as shown
in Figure 8.4 onto Sheet3. Construct the chart.
Figure 8.4
Exercise 4: Fixed Occasionally, you want to get a fit with a fixed intercept. You may,
for example, want an intercept of zero or of some other value. If
Intercept you look at Figure 8.2, there is a setting Set Intercept where you
can specify the required intercept value. Getting a zero value with
LINEST is simple; you just enter FALSE for argument three.
Specifying a value such as S needs a "workaround." The linesy =
1.5x+5 and z = 1.5x are parallel. For a given x, the y-value equals
thez-value plus S. So if we subtractS fromeachy-value, we get the
z line and its intercept is O. Let's see how we implement that in
Excel.
Figure 8.5
(b) Create the chart. Add the trendlines giving the y-line an
intercept of a and the z-line and intercept of S. Delete the
trendline entries in the legend. Edit the second trendline
equation to show z= ... rather thany= ...
Exercise 5: A Looking at Figure 8.2, one can see that Excel can do more than just
linear trendlines. How about LINEST? Can that cope with other
Polynomial Fit than linear functions? We will look at a polynomial fit
(a) On SheetS, enter the text and values shown in rows 1 through
6 of Figure 8.6. If you enter formatted text in two cells
(B6:C6), select the range and drag the fill handle; Excel will
automatically complete the rest of the test.
Regression Analysis 137
Figure 8.6
Exercise 6: A A simple model for the growth of bacteria predicts that if the
initial population is No, the population N, at time twill be given by
Logarithmic Fit the following equation, in which B is the reproduction rate.
(LOG EST)
N, = No exp(Bt)
138 A Guide to Microsoft Exce/2007 for Scientists and Engineers
Figure 8.7
BS: =LN(B4)
A9:B9: =LINEST(B5:F5,B3: F3)
C9: =EXP(B9)
A13:B13: =LINEST(LN(B4:F4),B3: F3)
C13: =EXP(B13)
E9:F9: =LOGEST(B4: F4,B3: F3)
G9: =LN(E9)
The LOGEST function (which can also return fitting statistics) fits
Regression Analysis 139
The TREND and The parameters in the LINEST and LOGEST arrays can, of course,
be used to find the values in the trendlines or to interpolate or
GROWTH Functions extrapolate. However, this is more readily done with TREND (for
LINEST fits) and GROWTH (for LOGEST fits). The syntaxes for
these functions are:
TREND (known_ y' s,known_x' s,new_x's,const)
GROWTH (knowny' s,known_x's,new_x's,const)
Figure 8.8
Figure 8.8 shows the use ofTREND to compute the fitted values in
the problem posed in Exercise 6 and the use of GROWTH for the
problem in Exercise 7. The formulas, which refer to different
worksheets, are:
Figure 8.9
Exercise 7: Slope and In this Exercise we see how to compute the slope of a polynomial
and how to display a tangent line on a chart Suppose we find the
Tangent slope m ata pointx!J'Yo' then the tangent is the line that obeys Yo =
mxo + b. Hence, b = Yo - mxo and we can find a second point on the
tangent usingy = m[x-xoJ+Yo'
Figure 8.10
Figure 8.11
(d) Cells El0 and Fl.O hold our xo,Yo data pair; this is the painton
the curve where we want the tangent Cells Ell and Fll hold
the second painton the tangent. In Gl0:Hl0 we compute the
slope and intercept values of the tangent line.
El0: =INDEX($A$4:$A$13,$G$3)
142 A Guide to Microsoft Excel 2007 for Scientists and Engineers
rro. =INDEX($B$4:$B$13,$G$3)
Gl0: =INDEX($C$4:$C$13,$G$3)
Hl0: =FlO-ElO*GlO
Ell: =INDEX($A$4:$A$13,$G$3+1)
Fll: =GlO*Ell+HlO
Exercise 8: The Excel has a feature called the Analysis Toolpak, which has a
variety of tools that enable the user to generate results without
Analysis Toolpak using formulas and formatting. In this exercise we will see the use
of the Regression Tool by repeating the problem set out in
Exercise 3 for comparison purposes.
(b) Use the command Data / Analysis / Data Analysis and from the
resulting dialog select Regression, which opens the dialog
shown in Figure 8.12.
(c) The x range is B3:B20, and they range is A3:A20. Ensure you
have checked the Labels box. A suitable output range for our
purposes is ES, but you will note that you could output to a
new worksheet or workbook. Check the box Line Fit Plots to
generate a chart. Click the OK button.
If you compare the results in F21 and F22, shown in Figure 8.13,
you will see that the slope and intercept are the same as were
generated with LINEST in Exercise 3. You will also see that the
statistics are in agreement. None of this is surprising as the Tool
uses the LINEST function.
Regression Analysis 143
Figure 8.12
Figure 8.13
There are two major drawbacks to using this Tool. The user has no
control over the positioning of the various resulting values and,
like all Data Analysis Tools, the results are static. This means that
if you make a change in the input data you must remember to
rerun the Tool.
Artthe time ofwriting, the Regression Tool has a small bug in that
it produces a Column chart when an XY chart is required. The
user should right click and change the chart type.
144 A Guide to Microsoft Exce/2007 for Scientists and Engineers
Problems 1. *What function best fits the datal in the following table?
t (sec) X (percent)
200 0.18
400 0.29
600 0.42
800 0.51
(iii) Clearly, 5 results from the fact we started with 5g. How
are A and B related to each other and to the data in the
experiment?
(iv) Do a mathematical analysis of the experiment to explain
the exponential fit
5. Fit the data below" to the equation N = apk by: (i) making a
2 W. L Friend and A. B. plot and adding a power trendline; (ii) plotting Ln(N) against
Metzner, American Institute Ln(P) and adding a linear trendline; and using LINE ST.
of Chemical Engineering Ensure you understand the relationship between the various
Journal 4, 393, 1958. fitting parameters. Note that you can plot P vs. N and give
both axes a logarithmic scale to get a straight line, but this
does not help with regression analysis.
P N P N P N
0.46 24.80 10.00 84.50 55.00 195.00
0.53 26.50 17.70 115.00 58.50 193.00
0.63 28.50 18.60 115.00 70.30 189.00
0.74 30.00 25.30 150.00 93.00 245.00
3.00 58.40 31.60 127.00 95.00 245.00
4.20 60.30 32.00 140.00 185.00 315.00
5.00 70.70 37.00 165.00 340.00 380.00
5.60 69.00 41.00 170.00 590.00 480.00
3 H. F. Stimson, Journal of A plot of liT against liP where T is measured in Kelvin and
Research of the National P in torr will give a straight line with a slope -!JHJR where R
Bureau of Standards, 73A, has the value 8.3145 J mole" K 1. From the following data"
493,1969. find ~Hv for water.
8. Make a plot of the following data and add two trendlines, one
quadratic and the other cubic. Format the cubic trendline as
a dotted line. Hint: Remember the Selection group in Chart
Tools / Format.
15 seconds and add 40 (one said 37) you get a good estimate
of the temperature in of. Does this data agree with these
comments?
T COF) 46 49 51 52 54 56 57 58 59 60
chirps 40 50 55 63 72 70 77 73 90 93
T COF) 61 62 63 64 66 67 68 71 71 72
chirps 96 88 99 110 113 120 127 137 137 132
11. The table that follows shows the results of an enzyme kinetics
experiment. The quantity V is the velocity of the reaction,
while [5] is the concentration of the substrate S. Ideally, this
data should be fitted to the Michaelis- Menten equation to find
K. Traditionally, biochemists linearize the M-M equation to
give the Lineweaver-Burke equation and then plot l/V
against 1/[5]. What value of K is obtained using a trendline
and using LINEST? We revisitthis problem in Chapter 12 and
use Solver to make a direct fit.