0% found this document useful (0 votes)
3 views

Multiple Linear Regression-I

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Multiple Linear Regression-I

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Multiple Regression

 Multiple Regression Model


 Least Squares Method
 Multiple Coefficient of Determination
Multiple Linear Regression  Model Assumptions
 Testing for Significance
 Using the Estimated Regression Equation
Analysis & Diagnosis for Estimation and Prediction
 Categorical Independent Variables
 Residual Analysis
 Logistic Regression

1 2

Multiple Regression Multiple Regression Model

 In this chapter we continue our study of regression  Multiple Regression Model


analysis by considering situations involving two or The equation that describes how the dependent
more independent variables. variable y is related to the independent variables
 This subject area, called multiple regression x1, x2, . . . xp and an error term is:
analysis, enables us to consider more factors and
thus obtain better estimates than are possible with y = b0 + b1x1 + b2x2 + . . . + bpxp + e
simple linear regression.
where:
b0, b1, b2, . . . , bp are the parameters, and
e is a random variable called the error term

3 4

Multiple Regression Equation Estimated Multiple Regression Equation

 Multiple Regression Equation  Estimated Multiple Regression Equation


The equation that describes how the mean
value of y is related to x1, x2, . . . xp is: y^ = b0 + b1x1 + b2x2 + . . . + bpxp

E(y) = b 0 + b 1x1 + b2x2 + . . . + b pxp


A simple random sample is used to compute sample
statistics b0, b1, b2, . . . , bp that are used as the point
estimators of the parameters b0, b1, b 2, . . . , bp.

5 6

1
Estimation Process Least Squares Method

Multiple Regression Model  Least Squares Criterion


E(y) = b0 + b 1x1 + b2x2 +. . .+ bpxp + e Sample Data:
Multiple Regression Equation
x 1 x 2 . . . xp y min  ( y i  yˆ i )2
. . . .
E(y) = b 0 + b 1x1 + b 2x2 +. . .+ b pxp . . . .
Unknown parameters are  Computation of Coefficient Values
b 0, b 1, b 2, . . . , b p The formulas for the regression coefficients
b0, b1, b2, . . . bp involve the use of matrix algebra.
Estimated Multiple We will rely on computer software packages to
b0, b1, b2, . . . , bp Regression Equation
perform the calculations.
provide estimates of yˆ  b0  b1 x1  b2 x2  ...  bp x p
b 0, b 1, b 2, . . . , b p Sample statistics are
b0, b1, b2, . . . , bp

7 8

Least Squares Method Multiple Regression Model

 Computation of Coefficient Values  Example: Programmer Salary Survey


The formulas for the regression coefficients A software firm collected data for a sample of 20
b0, b1, b2, . . . bp involve the use of matrix algebra. computer programmers. A suggestion was made that
We will rely on computer software packages to regression analysis could be used to determine if
perform the calculations. salary was related to the years of experience and the
score on the firm’s Programmer Aptitude Test.
The emphasis will be on how to interpret the
The years of experience, score on the aptitude test
computer output rather than on how to make the
test, and corresponding annual salary ($1000s) for a
multiple regression computations.
sample of 20 programmers is shown on the next slide.

9 10

Multiple Regression Model Multiple Regression Model

Suppose we believe that salary (y) is related to


Exper. Test Salary Exper. Test Salary the years of experience (x1) and the score on the
(Yrs.) Score ($000s) (Yrs.) Score ($000s)
programmer aptitude test (x2) by the following
4 78 24.0 9 88 38.0 regression model:
7 100 43.0 2 73 26.6
1 86 23.7 10 75 36.2 y = b0 + b1x1 + b2x2 + e
5 82 34.3 5 81 31.6
8 86 35.8 6 74 29.0 where
10 84 38.0 8 87 34.0 y = annual salary ($000)
0 75 22.2 4 79 30.1 x1 = years of experience
1 80 23.1 6 94 33.9 x2 = score on programmer aptitude test
6 83 30.0 3 70 28.2
6 91 33.0 3 89 30.0

11 12

2
Solving for the Estimates of b0, b1, b2 Estimated Regression Equation

 Regression Equation Output

SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)


Predictor Coef SE Coef T p
Note: Predicted salary will be in thousands of dollars.
Constant 3.17394 6.15607 0.5156 0.61279
Experience 1.4039 0.19857 7.0702 1.9E-06
Test Score 0.25089 0.07735 3.2433 0.00478

13 14

Interpreting the Coefficients Interpreting the Coefficients

In multiple regression analysis, we interpret each


regression coefficient as follows: b1 = 1.404

bi represents an estimate of the change in y Salary is expected to increase by $1,404 for


corresponding to a 1-unit increase in xi when all each additional year of experience (when the variable
other independent variables are held constant.
score on programmer attitude test is held constant).

15 16

Interpreting the Coefficients Multiple Coefficient of Determination

 Relationship Among SST, SSR, SSE


b2 = 0.251
SST = SSR + SSE
Salary is expected to increase by $251 for each
additional point scored on the programmer aptitude
test (when the variable years of experience is held
(y i  y )2 =  ( yˆ i  y )2 + (y i  yˆ i )2

constant). where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error

17 18

3
Multiple Coefficient of Determination Multiple Coefficient of Determination

 ANOVA Output
R2 = SSR/SST
Analysis of Variance
R2 = 500.3285/599.7855 = .83418
SOURCE DF SS MS F P
Regression 2 500.3285 250.164 42.76 0.000
Residual Error 17 99.45697 5.850
Total 19 599.7855

SSR
SST

19 20

Adjusted Multiple Coefficient Adjusted Multiple Coefficient


of Determination of Determination
 Adding independent variables, even ones that are
not statistically significant, causes the prediction
errors to become smaller, thus reducing the sum of n1
Ra2  1  ( 1  R 2 )
squares due to error, SSE. np1
 Because SSR = SST – SSE, when SSE becomes smaller,
SSR becomes larger, causing R2 = SSR/SST to 20  1
increase. Ra2  1  (1  .834179)  .814671
20  2  1
 The adjusted multiple coefficient of determination
compensates for the number of independent
variables in the model.

21 22

Assumptions About the Error Term e Testing for Significance

The error e is a random variable with mean of zero. In simple linear regression, the F and t tests provide
the same conclusion.
The variance of e , denoted by 2, is the same for all
values of the independent variables. In multiple regression, the F and t tests have different
purposes.
The values of e are independent.

The error e is a normally distributed random variable


reflecting the deviation between the y value and the
expected value of y given by b 0 + b1x1 + b 2x2 + . . + bpxp.

23 24

4
Testing for Significance: F Test Testing for Significance: t Test

The F test is used to determine whether a significant If the F test shows an overall significance, the t test is
relationship exists between the dependent variable used to determine whether each of the individual
and the set of all the independent variables. independent variables is significant.

The F test is referred to as the test for overall A separate t test is conducted for each of the
significance. Or Over-all fitness. independent variables in the model.

We refer to each of these t tests as a test for individual


significance.

25 26

Testing for Significance: F Test F Test for Overall Significance

Hypotheses H0 : b 1 = b 2 = . . . = b p = 0 Hypotheses H0 : b 1 = b 2 = 0
Ha: One or more of the parameters Ha: One or both of the parameters
is not equal to zero. is not equal to zero.

Test Statistics F = MSR/MSE


Rejection Rule For a = .05 and d.f. = 2, 17; F.05 = 3.59
Reject H0 if p-value < .05 or F > 3.59
Rejection Rule Reject H0 if p-value < a or if F > Fa ,
where Fa is based on an F distribution
with p d.f. in the numerator and
n - p - 1 d.f. in the denominator.

27 28

F Test for Overall Significance F Test for Overall Significance

 ANOVA Output
Test Statistics F = MSR/MSE
Analysis of Variance = 250.16/5.85 = 42.76

SOURCE DF SS MS F P
Regression Conclusion p-value < .05, so we can reject H0.
2 500.3285 250.164 42.76 0.000
Residual Error 17 99.45697 5.850 (Also, F = 42.76 > 3.59)
Total 19 599.7855

p-value used to test for


overall significance

29 30

5
Testing for Significance: t Test t Test for Significance
of Individual Parameters
Hypotheses H0 : bi  0 Hypotheses H0 : bi  0
H a : bi  0 H a : bi  0

bi Rejection Rule For a = .05 and d.f. = 17, t.025 = 2.11


Test Statistics t
sbi
Reject H0 if p-value < .05, or
if t < -2.11 or t > 2.11
Rejection Rule Reject H0 if p-value < a or
if t < -taor t > ta where ta
is based on a t distribution
with n - p - 1 degrees of freedom.

31 32

t Test for Significance t Test for Significance


of Individual Parameters of Individual Parameters
 Regression Equation Output
Test Statistics b1 1. 4039
  7 . 07
sb1 . 1986
Predictor Coef SE Coef T p
b2 . 25089
Constant 3.17394 6.15607 0.5156 0.61279   3. 24
sb2 . 07735
Experience 1.4039 0.19857 7.0702 1.9E-06
Test Score 0.25089 0.07735 3.2433 0.00478
Conclusions Reject both H0: b1 = 0 and H0: b2 = 0.
Both independent variables are
significant.
t statistic and p-value used to test for the
individual significance of “Experience”

33 34

Lotteries have become important sources of revenue in some states


When one company buys another company, it is not unusual that some workers are
in India. Many people have criticized lotteries, however, referring to terminated. The severance benefits offered to the laid-off workers are often the
them as a tax on the poor and uneducated. In an examination of the subject of dispute. Suppose that the TESLA recently bought the TWEETER (now it is
issue, a random sample of 100 adults was asked how much they “X”) and subsequently terminated 20 of TWEETER’s employees. As part of the
spend on lottery tickets and was interviewed about various buyout agreement, it was promised that the severance packages offered to the
Tweeter employees would be equivalent to those offered to Tesla employees who
socioeconomic variables. The purpose of this study is to test the
had been terminated in the past year. Thirty-six-year-old Bill Smith, a Tweeter
following beliefs: employee for the past 10 years, earning $32,000 per year, was one of those let go.
• Relatively uneducated people spend more on lotteries than do His severance package included an offer of 5 weeks’ severance pay. Bill complained
relatively educated people. that this offer was less than that offered to Tesla’s employees when they were laid
• Older people buy more lottery tickets than younger people. off, in contravention of the buyout agreement. A Data Scientist was called in to settle
• People with more children spend more on lotteries than people with the dispute. The statistician was told that severance is determined by three factors:
fewer children. age, length of service with the company, and pay. To determine how generous the
• Relatively poor people spend a greater pro- portion of their income on severance package had been, a random sample of 50 Tesla ex-employees was taken.
lotteries than relatively rich people. For each, the following variables were recorded: Number of weeks of severance pay Age
of employee Number of years with the company Annual pay (in thousands of dollars).
Data: Amount spent on lottery tickets as a percentage of total household
income, Number of years of education, Age, Number of children, Personal Perform an analysis to determine whether Bill is correct in his assessment of the severance
income (in INR). package.
Fit a Model and interpret the results. Data Data

35 36

You might also like