Chapter 8 B - Trendlines and Regression Analysis
Chapter 8 B - Trendlines and Regression Analysis
Figure 8.11
The R2 value will continue to increase as the order
of the polynomial increases; that is, a 4th order
polynomial will provide a better fit than a 3rd order,
and so on.
Higher order polynomials will generally not be very
smooth and will be difficult to interpret visually.
◦ Thus, we don't recommend going beyond a third-order
polynomial when fitting data.
Use your eye to make a good judgment!
Regression analysis is a tool for building
mathematical and statistical models that
characterize relationships between a dependent
(ratio) variable and one or more independent, or
explanatory variables (ratio or categorical), all of
which are numerical.
Simple linear regression involves a single
independent variable.
Multiple regression involves two or more
independent variables.
Finds a linear relationship between:
- one independent variable X and
- one dependent variable Y
First prepare a scatter plot to verify the data has a
linear trend.
Use alternative approaches if the data is not linear.
Size of a house is
typically related to its
market value.
X = square footage
Y = market value ($)
The scatter plot of the full
data set (42 homes)
indicates a linear trend.
Market value = a + b × square feet
Two possible lines are shown below.
Excel functions:
◦ =INTERCEPT(known_y’s, known_x’s)
◦ =SLOPE(known_y’s, known_x’s)
Slope = b1 = 35.036
=SLOPE(C4:C45, B4:B45)
Intercept = b0 = 32,673
=INTERCEPT(C4:C45, B4:B45)
Estimate Y when X = 1750 square feet
^
Y = 32,673 + 35.036(1750) = $93,986
=TREND(C4:C45, B4:B45, 1750)
Data > Data Analysis >
Regression
Input Y Range (with
header)
Input X Range (with
header)
Check Labels
9-68
Click the Predict button
in the Data Mining group
and choose Multiple
Linear Regression.
Enter the range of the
data (including headers)
Move the appropriate
variables to the boxes on
the right.
Select the output options and
check the Summary report box.
Before clicking Finish, click on
the Best subsets button.
Select the best subsets option:
View results from the ―Output Navigator‖ links.
Regression output (all variables)
If you click ―Choose Subset,‖ XLMiner will create a new worksheet with the results for this model.
Typically choose the model with the highest adjusted R2.
Models with a minimum value of Cp or having Cp less
than or at least close to k + 1 are good models to
consider.
RSS is the residual sum of squares, or the sum of
squared deviations between the predicted probability of
success and the actual value (1 or 0).
Probability is a quasi-hypothesis test that a given subset
is acceptable; if this is less than 0.05, you can rule out
that subset.