0% found this document useful (0 votes)
40 views

Chapter 8 Multiple Regression

This document discusses multiple regression analysis. It begins by explaining the general form of a multiple regression equation with two independent variables. It then discusses estimating equations with k independent variables. It also covers evaluating a regression model, including examining the standard error of estimate, assumptions, ANOVA table, and testing individual variables and the overall model. An example illustrates developing and interpreting a regression model to estimate family food expenditures based on income, family size, and whether children are in college.

Uploaded by

creation portal
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Chapter 8 Multiple Regression

This document discusses multiple regression analysis. It begins by explaining the general form of a multiple regression equation with two independent variables. It then discusses estimating equations with k independent variables. It also covers evaluating a regression model, including examining the standard error of estimate, assumptions, ANOVA table, and testing individual variables and the overall model. An example illustrates developing and interpreting a regression model to estimate family food expenditures based on income, family size, and whether children are in college.

Uploaded by

creation portal
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

8- 1

Chapter 8 Multiple Regression

8.1 Multiple Regression Analysis


8.2 Multiple Standard Error of Estimate
8.3 Multiple Regression and Correlation
Assumptions
8.4 The ANOVA Table
8.5 Evaluating the Regression Equation
8.6 Analysis of Residuals

School of Economics and Management, Beijing University of Aero/Astronautics


8- 2

8.1 Multiple Regression Analysis


 For two independent variables, the general form of
the multiple regression equation is:
Yˆ  a  b1 X 1  b2 X 2
 X1 and X2 are the independent variables.
 a is the Y-intercept.
 b1 is the net change in Y for each unit change in X1
holding X2 constant. It is called a partial regression
coefficient, or just a regression coefficient.

School of Economics and Management, Beijing University of Aero/Astronautics


8- 3

Regression Plane for a 2-Independent


Variable Linear Regression Equation

School of Economics and Management, Beijing University of Aero/Astronautics


8- 4

Multiple Regression Analysis


 The general multiple regression with k independent
variables is given by:

Yˆ  a  b1 X 1  b2 X 2    bk X k
The least squares criterion is used to develop this
equation.
 The coefficients b , b , , b can be determined by
1 2 k
Excel.

School of Economics and Management, Beijing University of Aero/Astronautics


8- 5

8.2 Multiple Standard Error of Estimate


 The multiple standard error of estimate is a measure of
the effectiveness of the regression equation.
 It is measured in the same units as the dependent
variable.
 It is difficult to determine what is a large value and

what is a small value of the standard error.


 The formula is:
n

 i i
(Y  Yˆ ) 2

s y 12k  i 1
n  k 1

School of Economics and Management, Beijing University of Aero/Astronautics


8- 6
8.3 Multiple Regression and
Correlation Assumptions
 The independent variables and the dependent
variable have a linear relationship.
The dependent variable must be continuous and at
least interval-scale.
The variation in (Y  Yˆ ) or residual must be the same
for all values of Y. When this is the case, we say the
difference exhibits homoscedasticity.
The residuals are normally distributed with mean of 0.

Successive values of the dependent variable must be


uncorrelated.

School of Economics and Management, Beijing University of Aero/Astronautics


8- 7

8.4 The ANOVA Table


The ANOVA table reports the variation in
the dependent variable. The variation is
divided into two components.
The Explained Variation is that accounted
for by the independent variables.
The Unexplained or Random Variation is not
accounted for by the independent variables.

School of Economics and Management, Beijing University of Aero/Astronautics


8- 8

8.5 Evaluating the Regression


Equation
Correlation Matrix
A correlation matrix is used to show all possible
simple correlation coefficients among the
variables.
 The matrix is useful for locating correlated
independent variables.

 Itshows how strongly each independent variable


is correlated with the dependent variable.

School of Economics and Management, Beijing University of Aero/Astronautics


8- 9

Global Test
 Theglobal test is used to investigate whether any of the k
independent variables have significant coefficients. The
hypotheses are:
H 0 :  1   2  ...   k  0
H 1 : Not all  s equal 0

 Thetest statistic is the F distribution with k and (n - k - 1)


degrees of freedom, where n is the sample size.

SSR / k H0
F ~ F ( k , n  k  1)
SSE /(n  k  1)
School of Economics and Management, Beijing University of Aero/Astronautics
8- 10

Test for Individual Variables


This test is used to determine which independent
variables have nonzero regression coefficients.

The variables that have zero regression


coefficients are usually dropped from the analysis.
The test statistic is the t distribution with (n-k-1)

degrees of freedom.
bi H 0
t ~ t (n  k  1)
Sbi

School of Economics and Management, Beijing University of Aero/Astronautics


8- 11

Stepwise Regression

The advantages to the stepwise method are:


1. Only independent variables with significant
regression coefficients are entered into the equation.
2. The steps involved in building the regression equation
are clear.
3. It is efficient in finding the regression equation with
only significant regression coefficients.
4. The changes in the multiple standard error of estimate
and the coefficient of determination are shown.
School of Economics and Management, Beijing University of Aero/Astronautics
8- 12
Example 1
A market researcher for Dollar Supermarket is studying the yearly
amount families of four or more spend on food. Three independent
variables are thought to be related to yearly food expenditures (Y). Those
variables are: total family income (X1) in $100, size of family (X2), and
whether the family has children in college (X3).

Note the following regarding the regression equation.


 The variable college is called a dummy variable. It can take
only one of two possible outcomes. That is a child is a
college student or not.

We usually code one value of the dummy variable as “1”


and the other “0.”
School of Economics and Management, Beijing University of Aero/Astronautics
8- 13

Example 1 continued

Family Food Expenditures Income Size Student


1 3900 376 4 0
2 5300 515 5 1
3 4300 516 4 0
4 4900 468 5 0
5 6400 538 6 1
6 7300 626 7 1
7 4900 543 5 0
8 5300 437 4 0
9 6100 608 5 1
10 6400 513 6 1
11 7400 493 6 1
12 5800 563 5 0

School of Economics and Management, Beijing University of Aero/Astronautics


8- 14

Example 1 continued

Use Excel to develop a Regression analysis,


the regression equation is:
Yˆ  954  1.09 X 1  748 X 2  565 X 3
What food expenditure would you estimate
for a family of 4, with no college students,
and an income of $50,000 (which is input as
500)?

School of Economics and Management, Beijing University of Aero/Astronautics


8- 15

Example 1 continued
From the regression output we note:
 The coefficient of determination is 80.4%. This means

that more than 80% of the variation in the amount spent


on food is accounted for by the variables income, family
size, and student.
 Each additional $100 of income per year will increase
the amount spent on food by $109 per year.
 An additional family member will increase the amount
spent per year on food by $748.
 A family with a college student will spend $565 more
per year on food than those without a college student.
School of Economics and Management, Beijing University of Aero/Astronautics
8- 16

Example 1 continued
The correlation matrix is as follows:
Food Income Size
Income 0.587
Size 0.876 0.609
Student 0.773 0.491 0.743
The strongest correlation between the dependent
variable and an independent variable is between
family size and amount spent on food.
None of the correlations among the independent
variables should cause problems. All are between
-0.80 and 0.80.
School of Economics and Management, Beijing University of Aero/Astronautics
8- 17

Example 1 continued

The estimated food expenditure for a family of 4


with a $500 (that is $50,000) income and no
college student is $4,491.

Yˆ  954  1.09  500  748  4  565  0  4491

School of Economics and Management, Beijing University of Aero/Astronautics


8- 18

Example 1 continued
Conduct a global test of hypothesis to determine if
any of the regression coefficients are not zero.
H 0 : 1   2   3  0 H1 : at least one  

 H0 is rejected if F > 4.07.


 From the output, the computed value of F is 10.94.
 Decision: H is rejected. Not all the regression
0
coefficients are zero.

School of Economics and Management, Beijing University of Aero/Astronautics


8- 19

Example 1 continued

 Conduct an individual test to determine which


coefficients are not zero. This is the hypotheses for the
independent variable family size.

H0 : 2  0 H1: 2  0
 From the output, the only significant variable is family
size using the p-values. The other variables can be
omitted from the model.
 Thus, using the 5% level of significance, reject H0 if the

p-value < 0.05


School of Economics and Management, Beijing University of Aero/Astronautics
8- 20

Example 1 continued

 We rerun the analysis using only the significant


independent family size.
 The new regression equation is:

Yˆ  340  1031X 2

 Thecoefficient of determination is 76.8%. We dropped


two independent variables, and the R-square term was
reduced by only 3.6%.

School of Economics and Management, Beijing University of Aero/Astronautics


Example 1 continued 8- 21

Regression Analysis: Food versus Size


The regression equation is
Food = 340 + 1031 Size

Predictor Coef SE Coef T-value P-value


Constant 339.7 940.7 0.36 0.726
Size 1031.0 179.4 5.75 0.000

Std Error = 557.7 R-Sq =0.768 R-Sq(adjusted) =0.744

Analysis of Variance

Source DF SS MS F P
Regression 1 10275977 10275977 33.03 0.000
Residual Error 10 3110690 311069
Total 11 13386667
School of Economics and Management, Beijing University of Aero/Astronautics
8- 22

8.6 Analysis of Residuals


A residual is the difference between the actual value of
Y and the predicted value Yˆ .

Residuals should be approximately normally distributed.


Histograms are useful in checking this requirement.
A plot of the residuals and their corresponding Yˆ values

is used for showing that there are no trends or patterns in


the residuals.

School of Economics and Management, Beijing University of Aero/Astronautics


8- 23

Residual Plot

1000
Residuals

500

-500
4500 6000 7500

School of Economics and Management, Beijing University of Aero/Astronautics
8- 24

Histograms of Residuals

8
7
6
Frequency

5
4
3
2
1
0
-600 -200 200 600 1000
Residuals
School of Economics and Management, Beijing University of Aero/Astronautics

You might also like