0% found this document useful (0 votes)
215 views

Chapter 6 Student

This chapter discusses regression analysis including simple and multiple linear regression. Simple linear regression involves one independent and one dependent variable while multiple linear regression has more than one independent variable. The chapter covers determining the coefficient of determination, fitting the regression equation using the least squares method, hypothesis testing using ANOVA, and interpreting regression outputs.

Uploaded by

syifa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
215 views

Chapter 6 Student

This chapter discusses regression analysis including simple and multiple linear regression. Simple linear regression involves one independent and one dependent variable while multiple linear regression has more than one independent variable. The chapter covers determining the coefficient of determination, fitting the regression equation using the least squares method, hypothesis testing using ANOVA, and interpreting regression outputs.

Uploaded by

syifa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Chapter 6

Regression Analysis

Overview
6.1 Introduction
6.2 The Coefficient of Determination
6.3 Simple Linear Regression
6.3.1 Least Square Method
6.3.2 Hypothesis Testing for Slope using ANOVA
6.4 Multiple Linear Regression
6.4.1 Multiple Linear Regression With Two Predictor Variable

6.4.1.1 Linear Regression Model Using Matrices


6.4.1.2 Fitted value and residual
6.4.1.3 Analysis of Variance (ANOVA)
6.4.2 Test for significance of regression
6.4.3 Computing the multiple linear regression analysis using Excel
6.4.4 Interpretation of regression statistical output
6.4.5 Model selection

128
LEARNING OBJECTIVES:

At the end of this chapter, the student would be able to


 Draw a regression line
 Fit the simple linear regression equation using the least square method
 Interpret values of intercept and slope
 Fit the multiple linear regression equation for two predictor variable.
 Draw an ANOVA Table
 Test the significant of regression
 Estimate dependent variable using regression line
 Use and interpret multiple linear regression analysis using Microsoft Excel
 Summarize the multiple regression analysis from the Excel output
 Choose the most significant variable in order to achieve parsimony model

6.1 Introduction
– Regression analysis is a statistical method used to describe the nature of relationship
between variable that is positive or negative, linear or nonlinear.
– When the data have linear relationship (based on scatter diagram), then we can fit a
regression line.
– Regression line is a useful way to enable us to see the trend of the data and then making
a prediction on the basis of the data.
– If it involve only 2 variable that is one dependent variable and one independent variable,
it called as simple linear regression
– Simple linear regression model is a model with a single independent variable 𝑥 that has
a relationship with a response variable 𝑦 and it can be represented by an equation of a
straight line.
– If there is more than one independent variable, then it becomes multiple linear
regression.
– Multiple linear regression model is used to describe linear relationships involving more
than two variables

129
6.2 The Coefficient of Determination
The coefficient of determination, 𝑅2 (𝑟 2 ) is a measure of the variation of the dependent variable
that is explained by the regression line and the independent variable. In other words, it gives
the percentage of the variation in dependent variable that can be explained by the independent
variable. The value of 𝑅2 is between 0 and 1 inclusive (0 ≤ 𝑅2 ≤ 1)

Formula

The coefficient of determination, R 2


SSR SSE
R2   1
S yy S yy

where
SSR  S yy  SSE  regression sum of squares.
SSE  S yy  ˆ1S xy  residual sum of squares.

S yy = corrected sum of squares of the observations. Since 0  SSE  S yy , it follows that


0  R2  1 .
where
2
 n 
  xi 
S xx   xi 2   i 1 
n
𝑅2 = 𝑟 2
i 1 n
square of the
2 correlation coefficient
 n

  yi  between 𝑦 and 𝑥
yi 2   i 1 
n
S yy  
i 1 n

 n  n 
  xi   yi 
S xy   xi yi   i 1  i 1 
n

i 1 n

Table 6.1 : Interpretation of coefficient of determination value, 𝑅2

𝑅2 = 0 The dependent variable cannot be predicted from the


independent variable
𝑅2 = 1 The dependent variable can be predicted without error
from the independent variable
0 ≤ 𝑅2 ≤ 1 indicate the extent to which the dependent variable is
predicted from the independent variable

130
Table 6.2 shows how to interpret the value of coefficient of determination
Table 6.2 : How to interpretation of coefficient of determination value, 𝑅2

𝑅2 = 0.1 10% of the variation in 𝑦 can be explained by 𝑥


𝑅2 = 0.8 80% of the variation in 𝑦 can be explained by 𝑥

6.3 Simple linear Regression Model


The simple linear regression model is defined as below:
y  0  1 x  

where
x = independent variable or predictor
y = dependent variable or response variable
0 = the y – intercept of the line
1 = the slope of the line
 = a random error component with the assumption that mean is zero and variance
is unknown
The values of 𝛽0 and 𝛽1 are parameters for the simple linear regression model which are also
known as regression coefficients. Since the parameters 𝛽0 and 𝛽1 are unknown, they must be
estimated from the sample data. The method of least squares is used to estimate the parameters.
6.3.1 The Least Square Method
By using the least square method, we may estimate the unknown parameters,  0 and 1 , in
order to obtain the best-fitting line for a set of data. The least square method can minimize the
sum squares of the difference between the observed data and the regression line . The estimated,
or fitted regression line is given by

ŷ  ˆ0  ˆ1 x

where
x = independent variable
𝑦̂ = estimated dependent variable
𝛽̂0 = estimate of y – intercept, the point at which the line intersects the 𝑦-axis
𝛽̂1 = estimate of slope , the amount of increase/decrease of 𝑦 for each unit
increase/decrease in 𝑥.

Given the data (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), ⋯ , (𝑥𝑛 , 𝑦𝑛 ). Then, the 𝛽̂0 and 𝛽̂1 of the regression line are
estimated using
𝑆𝑥𝑦
𝛽̂1 = 𝑎𝑛𝑑 𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅
𝑆𝑥𝑥

where 𝑆𝑥𝑦 , 𝑆𝑥𝑥 , 𝑥̅ 𝑎𝑛𝑑 𝑦̅ are defined by

131
 n  n 
n 

 xi 

 yi 

S xy   xi yi  i 1 i 1

i 1 n
2
 n 
  xi 
S xx   xi 2   i 1 
n

i 1 n

∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑦𝑖
𝑥̅ = 𝑦̅ =
𝑛 𝑛

6.3.2 Hypothesis testing for slope using ANOVA


Two variables have a linear relationship if the slope of regression line is not zero, 𝛽̂1 ≠ 0. The
hypothesis testing of regression slope is known as linearity hypothesis test. Analysis of variance
(ANOVA) approach may be used for hypothesis testing of slope of regression line (or testing
significance of regression). The ANOVA table is given in table 6.3.

Table 6.3 : ANOVA for hypothesis testing of slope of regression line.


Source of Sum of squares Degrees of Mean of squares f
variations freedom
Regression 𝑆𝑆𝑅 = 𝛽̂1 𝑆𝑥𝑦 1 𝑀𝑆𝑅 = 𝑆𝑆𝑅/1 𝑀𝑆𝑅/𝑀𝑆𝐸
Residual 𝑆𝑆𝐸 = 𝑆𝑦𝑦 − 𝛽̂1 𝑆𝑥𝑦 𝑛−2 𝑀𝑆𝐸 = 𝑆𝑆𝐸/(𝑛 − 2)
Total 𝑆𝑆𝑇 = 𝑆𝑦𝑦 𝑛−1

The procedure of hypothesis testing for the slope as follows :


Step 1 : State the hypothesis
𝐻0 : 𝛽1 = 0
𝐻1 : 𝛽1 ≠ 0

Step 2 : Complete the ANOVA table


𝑀𝑆𝑅
𝑓𝑡𝑒𝑠𝑡 =
𝑀𝑆𝐸

Step 3 : Critical value


𝑓𝛼,1,𝑛−2 ( see table F distribution)

132
Step 4 : Decision Rule
If 𝑓𝑡𝑒𝑠𝑡 > 𝑓𝛼,1,𝑛−2 , reject 𝐻0

Step 5 : Conclusion
At 𝛼 % significance level, there is a linear relationship between 𝑦 and 𝑥

Example 8.1
The data obtained in a study of age and blood pressure are as follow:
Age, x Pressure, y
43 128
48 120
56 135
58 137
61 143
67 141
70 152

a) Find S xx , S yy and S xy .
b) Find and interpret the sample correlation coefficient.
c) Find ˆ and ̂ .
1 0

d) Find the estimated regression line, ŷ .


e) Estimate value of ŷ if value of x  65 .
f) Test the hypothesis for slope at 1% level of significance.

Solution:

133
Example 8.2
A study was made by a businesswoman to determine the relation between advertising cost daily
and sales closed. The data is as follow:
Advertising Costs (RM) Sales (RM)
40 385
25 395
30 475
40 490
50 560
25 480

a) Find S xx , S yy and S xy .
b) Find and interpret the sample correlation coefficient.
c) Find ˆ and ̂ .
1 0

d) Find the estimated regression line, ŷ and sketch this line.


e) Estimate the weekly sales when advertising costs are RM35.

Solution:

134
EXERCISE
1) A study is done to investigate if Statistics scores have some effect on students’ CPA
scores. Data below are Statistics final examination scores of 10 randomly students and
their corresponding CPA scores.
Scores, x 87 69 75 56 63 90 71 74 80 78
CPA, y 3.41 3.15 3.28 2.46 2.89 3.73 3.11 3.23 3.50 3.34

a) Find S xx , S yy and Sxy .


b) Find and interpret the sample correlation coefficient, r .
c) Find ̂ 0 and ˆ1 .
d) Find the estimated regression line, ŷ and sketch the graph.
e) Predict a CPA score if a student get 65 in Statistics.
f) Test the hypothesis that 𝛽1 = 0 at 5% significance level

2) A supervisor wants to determine the relationship between the age of her employee and
the number of sick days they take each year. The data is as follow:

Age, x 18 21 25 36 48 53
Days, y 16 12 9 5 6 2

a) Find S xx , S yy and S xy .
b) Find and interpret the sample correlation coefficient.
c) Find ˆ and ̂ .
1 0

d) Find the estimated regression line, ŷ .


e) Estimate value of ŷ if value x  32 .

3) A researcher wishes to study the relationship between the monthly e-commerce sales
and the online advertising cost. You have the survey results for 7 online stores for the
last year. The data were recorded as follow:

Online cost, Sales, y


x
1.7 368
1.5 340
2.8 665
5 954
1.3 331
2.2 556
1.3 376

a) Find S xx , S yy and S xy .
b) Find and interpret the sample correlation coefficient.

135
c) Find ˆ 1 and ̂ 0 .
d) Find the estimated regression line, ŷ .
e) Estimate value of ŷ if value x  2.7 .

4) A study was made on the amount of converted sugar in a certain process at various
temperatures. The data were coded and recorded as follows.
Temperature, x Converted sugar, y
1.0 8.1
1.1 7.8
1.2 8.5
1.3 9.8
1.4 9.5
1.5 8.9
1.6 8.6
1.7 10.2
1.8 9.3
1.9 9.2
2.0 10.2

a) Find S xx , S yy and Sxy .


b) Find and interpret the sample correlation coefficient, r .
c) Find ̂ 0 and ˆ1 .
d) Find the estimated regression line, ŷ and sketch the graph.
e) Estimate the amount of converted sugar produced when the coded temperature
is 1.75.
f) Test the hypothesis for slope at 5% significance level

136
6.4 Multiple Linear Regression
A multiple linear regression model is used to describe linear relationships involving more than
two variables. For example :
1. The height of a child can depend on the height of the mother, the height of the father,
nutrition, and environmental factors.
2. CGPA of students can be depend on time taken during their studies and techniques of
study.
The general form of multiple linear regression model is given by
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝜀
where 𝛽0 , 𝛽1 , 𝛽2 , ⋯ , 𝛽𝑘 are the unknown parameters (regression coefficients) and 𝜀 is the error
term. In general, the response variable (𝑦) may be related to 𝑘 regressor (predictor) variables
(𝑥1 , 𝑥2 , ⋯ , 𝑥𝑘 ).
For this chapter is only focusing on multiple linear regression with two predictor variables.
6.4.1 Multiple Linear Regression With Two Predictor Variable
When there are two predictor variables, 𝑥1 and 𝑥2 , the regression model is
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝜀

It is also called a first order model with two predictor variables. The parameter 𝛽0 gives the
intercept of regression plane. Hence, 𝛽0 is the mean of 𝑦 when the variables 𝑥1 = 𝑥2 = 0. For
the parameter 𝛽1 , it shows that the mean change in 𝑦 per unit change in 𝑥1 when 𝑥2 is held
constants. Similarly for 𝛽2 , it shows that the mean change in 𝑦 per unit change in 𝑥2 when 𝑥1
is held constants.
The multiple linear regression equation identify the plane that gives the best fit to the data. The
estimated form of the multiple linear regression equation is

𝑦̂ = 𝛽̂0 + 𝛽̂1 𝑥1 + 𝛽̂2 𝑥2


where
𝑦̂ : predicted value of y
𝑥1 , 𝑥2 : independent variables

𝛽̂0 : estimated value of 𝑦 intercept

𝛽̂1 , 𝛽̂2 : estimated value of the regression coefficients

137
6.4.1.1 Linear Regression Model Using Matrices
In matrix term, the general linear regression model is :
𝑌 = 𝑋𝛽 + 𝜀

1 𝑥11 𝑥12 𝜀1
𝑦1 𝛽0
1 𝑥21 𝑥22 𝜀2
𝑌 = 𝑦2 𝑋=[ ] 𝛽 = [𝛽1 ] 𝜀=[⋮]
⋮ ⋮ ⋱ ⋮
𝛽2 𝑝×1 𝜀𝑛 𝑛×1
[𝑦𝑛 ]𝑛×1 1 𝑥 𝑛1 𝑥𝑛2 𝑛×𝑝

Let us denote the vector of the least square estimated regression coefficient 𝑏0 , 𝑏1 , 𝑏2 as :
𝑏0
𝑏 = [𝑏1 ]
𝑏2 𝑝×1

To estimate the regression coefficient, the least square normal equation for the general linear
regression model is:
𝑋 ′ 𝑋𝑏 = 𝑋 ′ 𝑌
and the least square estimator is
𝑏 = (𝑋 ′ 𝑋)−1 𝑋′𝑌

To estimate the parameter, it can transform to the algebraic method. The normal equation in
algebraic form for the case of two predicted variable
1 𝑋11 𝑋12 𝑌1
1 1 ⋯ 1 1 1 ⋯ 1

𝑋 𝑋 = [𝑋11 𝑋21 ⋯ 𝑋𝑛1 ] [1 𝑋21 𝑋22
] ′
𝑋 𝑌 = [𝑋11 𝑋21 ⋯ 𝑋𝑛1 ] [ 𝑌2 ]
⋮ ⋮ ⋮
𝑋12 𝑋22 ⋯ 𝑋𝑛2 ⋮ 𝑋12 𝑋22 ⋯ 𝑋𝑛2
1 𝑋𝑛1 𝑋𝑛2 𝑌𝑛

𝑛 ∑ 𝑋1 ∑ 𝑋2 ∑𝑌

𝑋 𝑋 = [∑ 𝑋1 ∑ 𝑋1 2 ∑ 𝑋1 𝑋2 ] ′
𝑋 𝑌 = [∑ 𝑋1 𝑌 ]
∑ 𝑋2 ∑ 𝑋2 𝑋1 ∑ 𝑋2 2 ∑ 𝑋2 𝑌

Then, the normal equation in algebraic form for the case of two predictor variable can be
obtained from :
𝑋 ′ 𝑋𝑏 = 𝑋 ′ 𝑌

138
𝑛 ∑ 𝑋1 ∑ 𝑋2 𝑏0 ∑𝑌
[∑ 𝑋1 ∑ 𝑋1 2 ∑ 𝑋1 𝑋2 ] [𝑏1 ] = [∑ 𝑋1 𝑌 ]
∑ 𝑋2 ∑ 𝑋2 𝑋1 ∑ 𝑋2 2 𝑏2 ∑ 𝑋2 𝑌

∑ 𝑦 = 𝑛𝑏0 + 𝑏1 ∑ 𝑥1 + 𝑏2 ∑ 𝑥2
∑ 𝑥1 𝑦 = 𝑏0 ∑ 𝑥1 + 𝑏1 ∑ 𝑥1 2 + 𝑏2 ∑ 𝑥1 𝑥2
∑ 𝑥2 𝑦 = 𝑏0 ∑ 𝑥2 + 𝑏1 ∑ 𝑥1 𝑥2 + 𝑏2 ∑ 𝑥2 2

6.4.1.2 Fitted value and residual


Let the vector of the fitted value denoted by 𝑌̂ and the vector of the residual terms denoted by
𝜀.
𝜀1 𝑌̂1
𝜀2 ̂
𝜀=[⋮] 𝑌̂ = 𝑌2

𝜀𝑛 ̂
[𝑌𝑛 ]
When we estimate the parameter, the fitted linear regression model is

𝑌̂ = 𝑋𝑏
and the vector of the residual terms, 𝜀 = 𝑌 − 𝑌̂
𝜀1 𝑌1 𝑌̂1
𝜀2 𝑌 𝑌̂2
[ ⋮ ] = [ 2] −
⋮ ⋮
𝜀𝑛 𝑌𝑛 ̂
[𝑌𝑛 ]
6.4.1.3 Analysis of Variance (ANOVA)
Sums of squares
The sums of squares for the analysis of variance in matrix term as stated below:

1
SSTO  Y 'Y    Y ' JY
n
SSE  Y 'Y  b ' X 'Y
1
SSR  b ' X 'Y    Y ' JY  SST 0  SSE
n
where 𝐽 is an 𝑛 × 𝑛 matrix of 1’s
SSTO, has n-1 degrees of freedom associated with it. SSE has n – p degrees of freedom with p
parameters need to be estimated in the regression function. For SSR has p – 1 degrees.

139
Mean Square

SSR
MSR 
p 1
SSE
MSE 
n p

Table 6.4 : Analysis of Variance (ANOVA) result


Source of variation SS df MS
Regression 1 p 1 SSR
b ' X 'Y    Y ' JY MSR 
n p 1
Error SSE  Y 'Y  b ' X 'Y n p SSE
MSE 
n p
Total 1 n 1
SSTO  Y 'Y    Y ' JY
n

6.4.2 Test for significance of regression


This test to determine if a regression model can be applied to the observed data. In other word,
to test either all the independent variables is related to the dependent variables. ANOVA is
another method to test for the significance or regression.
The ANOVA table is used based on hypothesis test :
𝐻0 : neither of the independent variables is related to the dependent variables
𝐻1 : at least one of the independent variables is related to the dependent variables

Or the hypothesis can be written using symbols as follows.


𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑝−1 = 0

𝐻1 : 𝛽𝑘 ≠ 0 𝑓𝑜𝑟 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑘 (𝑘 = 1,2, ⋯ , 𝑝 − 1)

The test statistics :


𝑀𝑆𝑅
𝐹𝑡𝑒𝑠𝑡 =
𝑀𝑆𝐸

Decision rule :
If 𝐹𝑡𝑒𝑠𝑡 > 𝐹𝛼,𝑝−1,𝑛−𝑝 then REJECT 𝐻0

140
Conclusion :
Rejection of 𝐻0 implies that at least one of the independent variables is related to the dependent
variables.
Example 8.3
Dwaine Studios operates portrait studios in 21 cities of medium size. These studios specialize
in portraits of children. The company is considering an expansion into other cities of medium
size and wishes to investigate whether sales (in thousands dollar) (Y) in a community can be
predicted from the number of persons aged 16 or younger (thousands of persons) (𝑋1 ) in the
community and the per capita disposable personal income (thousands of dollars) (𝑋2 ) in the
community. Data on these variables for the most recent year for the 21 cities in which Dwaine
Studios is now operating are shown below.

CASE X1 X2 Y
1 68.5 16.7 174.4
2 45.2 16.8 164.4
3 91.3 18.2 244.2
4 47.8 16.3 154.6
5 46.9 17.3 181.6
6 66.1 18.2 207.5
7 49.5 15.9 152.8
8 52 17.2 163.2
9 48.9 16.6 145.4
10 38.4 16 137.2
11 87.9 18.3 241.9
12 72.8 17.1 191.1
13 88.4 17.4 232.0
14 42.9 15.8 145.3
15 52.5 17.8 161.1
16 85.7 18.4 209.7
17 41.3 16.5 146.4
18 51.7 16.3 144.0
19 89.6 18.1 232.6
20 82.7 19.1 224.1
21 52.3 16.0 166.5

Based on the above data, find


a) Estimated regression function
b) Fitted value and residuals
c) Complete the ANOVA table
d) Test the relation between sales and target population and per capita disposable income.
at 𝛼 = 0.05

141
Solution :

142
6.4.3 Computing the multiple linear regression analysis using Excel
Step 1 : Key – in the data , make sure the data is in the column form and adjacent to each other.
Step 2 : Data Data Analysis Regression
Step 3 : Analyse the Excel output

Example 8.4
A sales manager of computer company needs to predict sales of monopod in selected market
area. He believes that advertising expenditures and the population in each market area can be
used to predict monopod sales. He gathered sample of monopod sales, advertising expenditures
and the population as shown below. Find the estimated multiple linear regression model which
gives the best fit to the data and interpret for each coefficient.
Market Area Advertising Population (𝑥2 ) Monopod sales (y)
Expenditures (𝑥1 ) (Thousands) RM (in thousands)
RM (in thousands)
A 1 200 100
B 5 700 300
C 8 800 400
D 6 400 200
E 3 100 100
F 10 600 400

Solution :

143
6.4.4 Interpretation of regression statistical output
i. Multiple 𝑅
For multiple regression analysis using Microsoft Excel, Multiple R is used as the overall
correlation and the value is ranged from 0 to +1. It is defined as the positive square root
of 𝑅2 , 𝑅 = √𝑅2 . The closer the value of Multiple R to +1, the stronger the relationship
among the variables. The closer to 0, the weaker the relationship among the variables.
ii. R square and Adjusted R square
R square (𝑅2 ) and Adjusted R square (Adjusted 𝑅2 ) both give the value of coefficient
of determinations. They measure the percentage of variation in the 𝑦 variable associated
with the set of 𝑥 variables. The percentage indicates the variation of 𝑦 dependent
variable which can be explained by the set of independent variables 𝑥1 , 𝑥2 , ⋯ , 𝑥𝑘 . In
practical, R square (𝑅2 ) is used for simple linear regression model, whereas Adjusted R
square (Adjusted 𝑅2 ) is used for multiple linear regression model.
𝑆𝑆𝐸/(𝑛 − 𝑝)
𝑅2 𝑎 = 1 −
𝑆𝑆𝑇/(𝑛 − 1)
iii. Standard error
 Its provide the average distance that the data points fall from the regression line.
S is in the units of the dependent variable.
 A smaller value of standard error shows that the data is less dispersed and the
data is closed to the regression line.
6.4.5 Model Selection
From the previous section, if the regression analysis is significant, the next task will be to
determine which variable will give the best multiple regression model for any given set of data.
Model that includes only the most significance variable is considered achieving parsimony
model.
A simple way of model selection :
 Use common sense and practical considerations to include or exclude variables.
 Consider the P-value from the ANOVA table ( the measure of the overall significance
of multiple regression equation – significance F-value) displayed by computer output.
The smaller the P-value, the better the model.
 Consider equation with high value of 𝑅2 for simple linear regression or high value of
adjusted 𝑅2 for multiple linear regression and try include only significant variables.

144
Example 8.5
The following table summarizes the multiple regression analysis for the response variable (𝑦)
which is weight (in pounds), and the predictor variables (𝑥) are H (height in inches), W (waist
circumferences in cm) and C (cholesterol in mg).

𝑥 P-value 𝑅2 Adjusted 𝑅2 Regression equation


H,W,C 0.0000 0.880 0.870 -199+2.55H+2.18W-0.005C
H,W 0.0000 0.877 0.870 -206+2.66H+2.15W
H,C 0.0002 0.277 0.238 -148+4.65H+0.006C
W,C 0.0000 0.804 0.793 -42.8+2.41W+0.01C
H 0.0001 0.273 0.254 -139+4.55H
W 0.0000 0.790 0.785 -44.1+2.37W
C 0.874 0.001 0.000 173-0.003C

(a) If only one predictor variable is used to predict weight, which single variable is the best?
Why?
(b) If exactly two predictor variables are used to predict weight, which two variables should
be chosen? Why?
(c) Based on the answers (a) and (b), which regression equation is the best for predicting
weight?Why?

Solution:

145
EXERCISE
1. Before hiring new employees, the personal director of a company decides to do a
regression analysis of the company’s current salary structure. She believes that an employee’s
salary in Ringgit Malaysia is related to the number of years of work experience (YEARS) and
to the number of years of post-high school education (POSTHSED). The following Excel output
is produced from the sample data she has gathered:

(a) State the dependent variable and independent variable.


(b) Write the regression equation
(c) Predict a salary for one with no experience and with no post-high school education.
(d) Predict a salary for one with 6 years of work experience and with 4 years of post-high
school education.
(e) Interpret each regression coefficient in the given equation
(f) Find the value of coefficient of determination. Interpret this value.
(g) Give the conclusion based on ANOVA table at 𝛼 = 0.05

2. A manufacturer found that a significant relationship exist among the number of hours
an assembly line employee works per shift 𝑥1 , the total number of items produced 𝑥2 and the
number of defective items produced 𝑦. The multiple regression equation is
𝑦 = 9.6 + 2.2𝑥1 − 1.08𝑥2
(a) Predict the number of defective items produced by an employee who has worked 9 hours
and produced 24 items.
(b) Interpret each coefficient in the given equation.

146
3. The following table summarize the multiple regression analysis for the response
variable (𝑦) is travel time (in hours), and the predictor variables (𝑥) are 𝐸 (engine cc), 𝐷
(distance in km) and 𝑆 (speed in km/hour).
𝑥 P-value 𝑅2 Adjusted 𝑅2 Regression equation
𝐸 0.1653 0.0676 0.0343 2.7355 + 0.0014𝐸
𝐷 3.4884 × 10−15 0.8942 0.8904 −0.3146 + 0.0205𝐷
𝑆 0.8107 0.0021 −0.0336 5.3445 − 0.0069𝑆
𝐸, 𝐷 1.4133 × 10−14 0.9058 0.8988 0.3149 − 0.0006𝐸 + 0.0215𝐷
𝐸, 𝑆 0.3021 0.0848 0.0171 4.4979 + 0.0017𝐸 − 0.0209𝑆
𝐷, 𝑆 6.4453 × 10−15 0.9111 0.9045 1.5894 + 0.0207𝐷 − 0.0198𝑆
𝐸, 𝐷, 𝑆 3.9179 × 10−14 0.9164 0.9067 1.7049 − 0.0004𝐸 + 0.0214𝐷 − 0.0164𝑆

(a) If only one predictor variable is used to predict travel time, which single variable is the
best and give the reason.
(b) If exactly two predictor variables are used to predict travel time, which two variables
should be chosen and give the reason.
(c) State the best regression equation for predicting travel time.

4. A study is conducted to assess the quality of staffs in a company based on their quality
of letters of recommendation. There are three predictor variables used to determine the quality
of letters of recommendation, namely college GPA, high school GPA and TOEFL total. The
following Excel output shows a multiple regression analysis when all the three predictor
variables are used to predict the quality of letters of recommendation.

(a) State the coefficient of correlation, 𝑟 for this study and interpret this value.
(b) Write the multiple regression equation.

147
(c) Interpret the intercept value and the coefficient of TOEFL total when the other variables
are remained constant.
(d) At 𝛼 = 0.02 can we conclude that neither of the independent variables is related to the
dependent variables?
(e) State the value of the coefficient of determination and interpret this value.

148

You might also like