0% found this document useful (0 votes)
26 views

Chapter 15

Uploaded by

thu.nguyen230126
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Chapter 15

Uploaded by

thu.nguyen230126
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Statistics for Business and Economics (13e)

Statistics for
Business and Economics (13e)
Anderson, Sweeney, Williams, Camm, Cochran
© 2017 Cengage Learning

Slides by John Loucks


St. Edwards University

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
1
Statistics for Business and Economics (13e)

Chapter 15
Multiple Regression
• Multiple Regression Model
• Least Squares Method
• Multiple Coefficient of Determination
• Model Assumptions
• Testing for Significance
• Using the Estimated Regression Equation for Estimation and Prediction
• Categorical Independent Variables

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
2
Statistics for Business and Economics (13e)

Multiple Regression

• In this chapter we continue our


study of regression analysis by
considering situations involving
two or more independent
variables.
• This subject area, called
multiple regression analysis,
enables us to consider more
factors and thus obtain better
estimates than are possible with
simple linear regression.
© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or 3
otherwise on a password-protected website or school-approved learning management system for classroom use.
Statistics for Business and Economics (13e)

Multiple Regression Model


• Multiple Regression Model
The equation that describes how the dependent variable y is related to
the independent variables x1, x2, . . . xp and an error term is:

y = b0 + b1x1 + b2x2 + . . . + bpxp + e

where:
b0, b1, b2, . . . , bp are the parameters, and
e is a random variable called the error term

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
4
Statistics for Business and Economics (13e)

Multiple Regression Equation


• Multiple Regression Equation
The equation that describes how the mean value of y is related to x1,
x2, . . . xp is: E(y) = 0 + 1x1 + 2x2 + . . . + pxp

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
5
Statistics for Business and Economics (13e)

Estimated Multiple Regression Equation


• Estimated Multiple Regression Equation

= b0 + b1x1 + b2x2 + . . . + bpxp

A simple random sample is used to compute sample statistics b0, b1, b2, . . . , bp
that are used as the point estimators of the parameters b0, b1, b2, . . . , bp.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
6
Statistics for Business and Economics (13e)

Estimation Process
Multiple Regression Model
Sample Data:
E(y) = 0 + 1x1 + 2x2 +. .+ pxp + e
x1 x 2 . . . x p y
Multiple Regression Equation
. . . .
E(y) = 0 + 1x1 + 2x2 +. . .+ pxp . . . .
Unknown parameters are
b0 , b1 , b2 , . . . , bp
Estimated Multiple
Regression Equation
b0, b1, b2, . . . , bp
= b0 + b1x1 + b2x2 + . . . + bpxp
provide estimates of
Sample statistics are
b0 , b1 , b2 , . . . , bp b0, b1, b2, . . . , bp

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
7
Statistics for Business and Economics (13e)

Least Squares Method


• Least Squares Criterion

min

• Computation of Coefficient Values

The formulas for the regression coefficients b0, b1, b2, . . . bp involve the
use of matrix algebra. We will rely on computer software packages to
perform the calculations.
The emphasis will be on how to interpret the computer output rather
than on how to make the multiple regression computations.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
8
Statistics for Business and Economics (13e)

Multiple Regression Model


• Example: Programmer Salary Survey
A software firm collected data for a sample of 20 computer programmers. A
suggestion was made that regression analysis could be used to determine if
salary was related to the years of experience and the score on the firm’s
programmer aptitude test.
The years of experience, score on the aptitude test, and corresponding annual
salary ($1000s) for a sample of 20 programmers is shown on the next slide.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
9
Statistics for Business and Economics (13e)

Multiple Regression Model


Exper. Test Salary Exper. Test Salary
(Yrs.) Score ($1000s) (Yrs.) Score ($1000s)
4 78 24.0 9 88 38.0
7 100 43.0 2 73 26.6
1 86 23.7 10 75 36.2
5 82 34.3 5 81 31.6
8 86 35.8 6 74 29.0
10 84 38.0 8 87 34.0
0 75 22.2 4 79 30.1
1 80 23.1 6 94 33.9
6 83 30.0 3 70 28.2
6 91 33.0 3 89 30.0

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
10
Statistics for Business and Economics (13e)

Multiple Regression Model


Suppose we believe that salary (y) is related to the years of experience
(x1) and the score on the programmer aptitude test (x2) by the following
regression model:
y = 0 + 1x1 + 2x2 + 

where
y = annual salary ($1000s)
x1 = years of experience
x2 = score on programmer aptitude test

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
11
Statistics for Business and Economics (13e)

Solving for the Estimates of 0, 1, 2


Least Squares
Input Data Output
x1 x2 y Computer b0 =
Package b1 =
4 78 24 for Solving
7 100 43 b2 =
Multiple
. . .
Regression R2 =
. . .
3 89 30 Problems etc.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
12
Statistics for Business and Economics (13e)

Solving for the Estimates of 0, 1, 2


• Regression Equation Output

Predictor Coef SE Coef T p


Constant 3.17394 6.15607 0.5156 0.61279
Experience 1.4039 0.19857 7.0702 1.9E-06
Test Score 0.25089 0.07735 3.2433 0.00478

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
13
Statistics for Business and Economics (13e)

Estimated Regression Equation

SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)

(Note: Predicted salary will be in thousands of dollars.)

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
14
Statistics for Business and Economics (13e)

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or 15
otherwise on a password-protected website or school-approved learning management system for classroom use.
Statistics for Business and Economics (13e)

Interpreting the Coefficients


• In multiple regression analysis, we interpret each regression coefficient as
follows:
bi represents an estimate of the change in y corresponding to one
unit increase in xi when all other independent variables are held
constant.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
16
Statistics for Business and Economics (13e)

Interpreting the Coefficients


b1 = 1.404

Salary is expected to increase by $1,404 for each additional year of


experience (when the variable score on programmer attitude test is
held constant).

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
17
Statistics for Business and Economics (13e)

Interpreting the Coefficients


b2 = 0.251

Salary is expected to increase by $251 for each additional point scored on


the programmer aptitude test (when the variable years of experience is held
constant).

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
18
Statistics for Business and Economics (13e)

Multiple Coefficient of Determination


• Relationship Among SST, SSR, SSE

SST = SSR + SSE

where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
19
Statistics for Business and Economics (13e)

Multiple Coefficient of Determination


• ANOVA Output

Analysis of Variance
SOURCE DF SS MS F P
Regression 2 500.3285 250.164 42.76 0.000
Residual Error 17 99.45697 5.850
Total 19 599.7855

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
20
Statistics for Business and Economics (13e)

Multiple Coefficient of Determination

R2 = SSR/SST

R2 = 500.3285/599.7855 = .83418

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
21
Statistics for Business and Economics (13e)

Adjusted Multiple Coefficient of Determination


• Adding independent variables, even ones that are not statistically significant,
causes the prediction errors to become smaller, thus reducing the sum of
squares due to error, SSE.
• Because SSR = SST – SSE, when SSE becomes smaller, SSR becomes larger,
causing R2 = SSR/SST to increase.
• The adjusted multiple coefficient of determination compensates for the number
of independent variables in the model.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
22
Statistics for Business and Economics (13e)

Adjusted Multiple Coefficient of Determination

2 𝑛 −1 2
𝑅 𝑎 =1−(1 − 𝑅 )
𝑛− 𝑝 − 1

2 20 − 1
𝑅 𝑎 =1− ( 1− .834179 ) =.814671
20 − 2− 1

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
23
Statistics for Business and Economics (13e)

Assumptions About the Error Term 


• The error  is a random variable with mean of zero.
• The variance of  , denoted by  2, is the same for all values of the
independent variables.
• The values of  are independent.
• The error  is a normally distributed random variable reflecting the deviation
between the y value and the expected value of y given by 0 + 1x1 + 2x2 + . .
+ pxp.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
24
Statistics for Business and Economics (13e)

Testing for Significance


• In simple linear regression, the F and t tests provide the same conclusion.
• In multiple regression, the F and t tests have different purposes.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
25
Statistics for Business and Economics (13e)

Testing for Significance: F Test


• The F test is used to determine whether a significant relationship exists
between the dependent variable and the set of all the independent variables.
• The F test is referred to as the test for overall significance.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
26
Statistics for Business and Economics (13e)

Testing for Significance: t Test


• If the F test shows an overall significance, the t test is used to determine
whether each of the individual independent variables is significant.
• A separate t test is conducted for each of the independent variables in the
model.
• We refer to each of these t tests as a test for individual significance.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
27
Statistics for Business and Economics (13e)

Testing for Significance: F Test


Hypotheses H0 :  1 =  2 = . . . =  p = 0
Ha: One or more of the parameters is not equal to zero

Test Statistics F = MSR/MSE

Rejection Rule Reject H0 if p-value < a or if F F , where F is based on


an F distribution with p d.f. in the numerator and
n - p - 1 d.f. in the denominator.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
28
Statistics for Business and Economics (13e)

F Test for Overall Significance


Hypotheses H0 :  1 =  2 = 0
Ha: One or both of the parameters is not equal to zero.

Rejection Rule For  = .05 and d.f. = 2, 17; F.05 = 3.59


Reject H0 if p-value < .05 or F > 3.59

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
29
Statistics for Business and Economics (13e)

F Test for Overall Significance


• ANOVA Output

Analysis of Variance
SOURCE DF SS MS F P
Regression 2 500.3285 250.164 42.76 0.000
Residual Error 17 99.45697 5.850
Total 19 599.7855

p-value used to test for


overall significance

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
30
Statistics for Business and Economics (13e)

F Test for Overall Significance


Test Statistics F = MSR/MSE
= 250.16/5.85 = 42.76

Conclusion p-value < .05, so we can reject H0.


(Also, F = 42.76 > 3.59)

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
31
Statistics for Business and Economics (13e)

Testing for Significance: t Test


Hypotheses H0 :  i = 0
Ha :  i ≠ 0
𝑏𝑖
Test Statistics 𝑡=
𝑠𝑏 𝑖

Rejection Rule Reject H0 if p-value < a or if t < -tor t > t


where t is based on a t distribution with n - p – 1
degrees of freedom.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
32
Statistics for Business and Economics (13e)

t Test for Significance of Individual Parameters


Hypotheses H 0 : bi = 0
H a : bi ≠ 0

Rejection Rule For  = .05 and d.f. = 17, t.025 = 2.11


Reject H0 if p-value < .05, or if t < -2.11 or t > 2.11

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
33
Statistics for Business and Economics (13e)

t Test for Significance of Individual Parameters


• Regression Equation Output

Predictor Coef SE Coef T p


Constant 3.17394 6.15607 0.5156 0.61279
Experience 1.4039 0.19857 7.0702 1.9E-06
Test Score 0.25089 0.07735 3.2433 0.00478

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
34
Statistics for Business and Economics (13e)

t Test for Significance of Individual Parameters


• Regression Equation Output

Predictor Coef SE Coef T p


Constant 3.17394 6.15607 0.5156 0.61279
Experience 1.4039 0.19857 7.0702 1.9E-06
Test Score 0.25089 0.07735 3.2433 0.00478

t statistic and p-value used to test for


the individual significance of “Test Score”

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
35
Statistics for Business and Economics (13e)

t Test for Significance of Individual Parameters


𝑏 1 1.4039
Test Statistics 𝑡= = =7.07
𝑠𝑏 .1986 1

𝑏 2 .25089
𝑡= = =3.24
𝑠 𝑏 .07735 2

Conclusions Reject both H0: 1 = 0 and H0: 2 = 0.


Both independent variables are significant.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
36
Statistics for Business and Economics (13e)

Testing for Significance: Multicollinearity


• The term multicollinearity refers to the correlation among the independent
variables.
• When the independent variables are highly correlated (say, |r |> .7), it is not
possible to determine the separate effect of any particular independent variable
on the dependent variable.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
37
Statistics for Business and Economics (13e)

Testing for Significance: Multicollinearity


• If the estimated regression equation is to be used only for predictive purposes,
multicollinearity is usually not a serious problem.
• Every attempt should be made to avoid including independent variables that
are highly correlated.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
38
Statistics for Business and Economics (13e)

Using the Estimated Regression Equation


for Estimation and Prediction
• The procedures for estimating the mean value of y and predicting an
individual value of y in multiple regression are similar to those in simple
regression.
• We substitute the given values of x1, x2, . . . , xp into the estimated regression
equation and use the corresponding value of as the point estimate.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
39
Statistics for Business and Economics (13e)

Using the Estimated Regression Equation


for Estimation and Prediction
• The formulas required to develop interval estimates for the mean value of
and for an individual value of y are beyond the scope of the textbook.
• Software packages for multiple regression will often provide these interval
estimates.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
40
Statistics for Business and Economics (13e)

Categorical Independent Variables


• In many situations we must work with categorical independent variables such
as gender (male, female), method of payment (cash, check, credit card), etc.
• For example, x2 might represent gender where x2 = 0 indicates male and x2 = 1
indicates female.
• In this case, x2 is called a dummy or indicator variable.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
41
Statistics for Business and Economics (13e)

Categorical Independent Variables


• Example: Programmer Salary Survey
As an extension of the problem involving the computer programmer salary
survey, suppose that management also believes that the annual salary is related
to whether the individual has a graduate degree in computer science or
information systems.
The years of experience, the score on the programmer aptitude test, whether
the individual has a relevant graduate degree, and the annual salary ($1000) for
each of the sampled 20 programmers are shown on the next slide.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
42
Statistics for Business and Economics (13e)

Categorical Independent Variables


Exper. Test Salary Exper. Test Salary
(Yrs.) Score Degr. ($1000) (Yrs.) Score Degr. ($1000)
4 78 No 24.0 9 88 Yes 38.0
7 100 Yes 43.0 2 73 No 26.6
1 86 No 23.7 10 75 Yes 36.2
5 82 Yes 34.3 5 81 No 31.6
8 86 Yes 35.8 6 74 No 29.0
10 84 Yes 38.0 8 87 Yes 34.0
0 75 No 22.2 4 79 No 30.1
1 80 No 23.1 6 94 Yes 33.9
6 83 No 30.0 3 70 No 28.2
6 91 Yes 33.0 3 89 No 30.0

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
43
Statistics for Business and Economics (13e)

Categorical Independent Variables


• Regression Equation Output

= b0 + b1x1 + b2x2 + b3x3

where:
= annual salary ($1000)
x1 = years of experience
x2 = score on programmer aptitude test
x3 = 0 if individual does not have a graduate degree
1 if individual does have a graduate degree
(x3 is a dummy variable)

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
44
Statistics for Business and Economics (13e)

Categorical Independent Variables


• ANOVA Output

Analysis of Variance
SOURCE DF SS MS F P
Regression 3 507.8960 269.299 29.48 0.000
Residual Error 16 91.8895 5.743
Total 19 599.7855

R2 = 507.896/599.7855 = .8468 Previously, R2 = .8342


2 20 −1
𝑅 𝑎 =1− ( 1− .8468 ) =.8181 Previously, Adjusted
20 − 3 −1 R2 = .815

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
45
Statistics for Business and Economics (13e)

Categorical Independent Variables


• Regression Equation Output

Predictor Coef SE Coef T p


Constant 7.945 7.382 1.076 0.298
Experience 1.148 0.298 3.856 0.001
Test Score 0.197 0.090 2.191 0.044
Grad. Degr. 2.280 1.987 1.148 0.268

Not significant

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
46
Statistics for Business and Economics (13e)

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or 47
otherwise on a password-protected website or school-approved learning management system for classroom use.
Statistics for Business and Economics (13e)

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or 48
otherwise on a password-protected website or school-approved learning management system for classroom use.
Statistics for Business and Economics (13e)

More Complex Categorical Variables


• If a categorical variable has k levels, k - 1 dummy variables are required, with
each dummy variable being coded as 0 or 1.
• For example, a variable with levels A, B, and C could be represented by x1 and
x2 values of (0, 0) for A, (1, 0) for B, and (0, 1) for C.
• Care must be taken in defining and interpreting the dummy variables.

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
49
Statistics for Business and Economics (13e)

More Complex Categorical Variables


• For example, a variable indicating level of education could be represented by x1
and x2 values as follows:

Highest
Degree x1 x2
Bachelor’s 0 0
Master’s 1 0
Ph.D. 0 1

© 2017 Cengage Learning. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or
otherwise on a password-protected website or school-approved learning management system for classroom use.
50

You might also like