0% found this document useful (0 votes)

3 views

Multiple Linear Regression-I

Uploaded by

037MECH MOHANARAM R

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Multiple Linear Regression-I

Uploaded by

037MECH MOHANARAM R

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Multiple Regression

 Multiple Regression Model

 Least Squares Method
 Multiple Coefficient of Determination
Multiple Linear Regression  Model Assumptions
 Testing for Significance
 Using the Estimated Regression Equation
Analysis & Diagnosis for Estimation and Prediction
 Categorical Independent Variables
 Residual Analysis
 Logistic Regression

1 2

Multiple Regression Multiple Regression Model

 In this chapter we continue our study of regression  Multiple Regression Model

analysis by considering situations involving two or The equation that describes how the dependent
more independent variables. variable y is related to the independent variables
 This subject area, called multiple regression x1, x2, . . . xp and an error term is:
analysis, enables us to consider more factors and
thus obtain better estimates than are possible with y = b0 + b1x1 + b2x2 + . . . + bpxp + e
simple linear regression.
where:
b0, b1, b2, . . . , bp are the parameters, and
e is a random variable called the error term

3 4

Multiple Regression Equation Estimated Multiple Regression Equation

 Multiple Regression Equation  Estimated Multiple Regression Equation

The equation that describes how the mean
value of y is related to x1, x2, . . . xp is: y^ = b0 + b1x1 + b2x2 + . . . + bpxp

E(y) = b 0 + b 1x1 + b2x2 + . . . + b pxp

A simple random sample is used to compute sample
statistics b0, b1, b2, . . . , bp that are used as the point
estimators of the parameters b0, b1, b 2, . . . , bp.

5 6

1
Estimation Process Least Squares Method

Multiple Regression Model  Least Squares Criterion

E(y) = b0 + b 1x1 + b2x2 +. . .+ bpxp + e Sample Data:
Multiple Regression Equation
x 1 x 2 . . . xp y min  ( y i  yˆ i )2
. . . .
E(y) = b 0 + b 1x1 + b 2x2 +. . .+ b pxp . . . .
Unknown parameters are  Computation of Coefficient Values
b 0, b 1, b 2, . . . , b p The formulas for the regression coefficients
b0, b1, b2, . . . bp involve the use of matrix algebra.
Estimated Multiple We will rely on computer software packages to
b0, b1, b2, . . . , bp Regression Equation
perform the calculations.
provide estimates of yˆ  b0  b1 x1  b2 x2  ...  bp x p
b 0, b 1, b 2, . . . , b p Sample statistics are
b0, b1, b2, . . . , bp

7 8

Least Squares Method Multiple Regression Model

 Computation of Coefficient Values  Example: Programmer Salary Survey

The formulas for the regression coefficients A software firm collected data for a sample of 20
b0, b1, b2, . . . bp involve the use of matrix algebra. computer programmers. A suggestion was made that
We will rely on computer software packages to regression analysis could be used to determine if
perform the calculations. salary was related to the years of experience and the
score on the firm’s Programmer Aptitude Test.
The emphasis will be on how to interpret the
The years of experience, score on the aptitude test
computer output rather than on how to make the
test, and corresponding annual salary ($1000s) for a
multiple regression computations.
sample of 20 programmers is shown on the next slide.

9 10

Multiple Regression Model Multiple Regression Model

Suppose we believe that salary (y) is related to

Exper. Test Salary Exper. Test Salary the years of experience (x1) and the score on the
(Yrs.) Score ($000s) (Yrs.) Score ($000s)
programmer aptitude test (x2) by the following
4 78 24.0 9 88 38.0 regression model:
7 100 43.0 2 73 26.6
1 86 23.7 10 75 36.2 y = b0 + b1x1 + b2x2 + e
5 82 34.3 5 81 31.6
8 86 35.8 6 74 29.0 where
10 84 38.0 8 87 34.0 y = annual salary ($000)
0 75 22.2 4 79 30.1 x1 = years of experience
1 80 23.1 6 94 33.9 x2 = score on programmer aptitude test
6 83 30.0 3 70 28.2
6 91 33.0 3 89 30.0

11 12

2
Solving for the Estimates of b0, b1, b2 Estimated Regression Equation

 Regression Equation Output

SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)

Predictor Coef SE Coef T p
Note: Predicted salary will be in thousands of dollars.
Constant 3.17394 6.15607 0.5156 0.61279
Experience 1.4039 0.19857 7.0702 1.9E-06
Test Score 0.25089 0.07735 3.2433 0.00478

13 14

Interpreting the Coefficients Interpreting the Coefficients

In multiple regression analysis, we interpret each

regression coefficient as follows: b1 = 1.404

bi represents an estimate of the change in y Salary is expected to increase by $1,404 for

corresponding to a 1-unit increase in xi when all each additional year of experience (when the variable
other independent variables are held constant.
score on programmer attitude test is held constant).

15 16

Interpreting the Coefficients Multiple Coefficient of Determination

 Relationship Among SST, SSR, SSE

b2 = 0.251
SST = SSR + SSE
Salary is expected to increase by $251 for each
additional point scored on the programmer aptitude
test (when the variable years of experience is held
(y i  y )2 =  ( yˆ i  y )2 + (y i  yˆ i )2

constant). where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error

17 18

3
Multiple Coefficient of Determination Multiple Coefficient of Determination

 ANOVA Output
R2 = SSR/SST
Analysis of Variance
R2 = 500.3285/599.7855 = .83418
SOURCE DF SS MS F P
Regression 2 500.3285 250.164 42.76 0.000
Residual Error 17 99.45697 5.850
Total 19 599.7855

SSR
SST

19 20

Adjusted Multiple Coefficient Adjusted Multiple Coefficient

of Determination of Determination
 Adding independent variables, even ones that are
not statistically significant, causes the prediction
errors to become smaller, thus reducing the sum of n1
Ra2  1  ( 1  R 2 )
squares due to error, SSE. np1
 Because SSR = SST – SSE, when SSE becomes smaller,
SSR becomes larger, causing R2 = SSR/SST to 20  1
increase. Ra2  1  (1  .834179)  .814671
20  2  1
 The adjusted multiple coefficient of determination
compensates for the number of independent
variables in the model.

21 22

Assumptions About the Error Term e Testing for Significance

The error e is a random variable with mean of zero. In simple linear regression, the F and t tests provide
the same conclusion.
The variance of e , denoted by 2, is the same for all
values of the independent variables. In multiple regression, the F and t tests have different
purposes.
The values of e are independent.

The error e is a normally distributed random variable

reflecting the deviation between the y value and the
expected value of y given by b 0 + b1x1 + b 2x2 + . . + bpxp.

23 24

4
Testing for Significance: F Test Testing for Significance: t Test

The F test is used to determine whether a significant If the F test shows an overall significance, the t test is
relationship exists between the dependent variable used to determine whether each of the individual
and the set of all the independent variables. independent variables is significant.

The F test is referred to as the test for overall A separate t test is conducted for each of the
significance. Or Over-all fitness. independent variables in the model.

We refer to each of these t tests as a test for individual

significance.

25 26

Testing for Significance: F Test F Test for Overall Significance

Hypotheses H0 : b 1 = b 2 = . . . = b p = 0 Hypotheses H0 : b 1 = b 2 = 0
Ha: One or more of the parameters Ha: One or both of the parameters
is not equal to zero. is not equal to zero.

Test Statistics F = MSR/MSE

Rejection Rule For a = .05 and d.f. = 2, 17; F.05 = 3.59
Reject H0 if p-value < .05 or F > 3.59
Rejection Rule Reject H0 if p-value < a or if F > Fa ,
where Fa is based on an F distribution
with p d.f. in the numerator and
n - p - 1 d.f. in the denominator.

27 28

F Test for Overall Significance F Test for Overall Significance

 ANOVA Output
Test Statistics F = MSR/MSE
Analysis of Variance = 250.16/5.85 = 42.76

SOURCE DF SS MS F P
Regression Conclusion p-value < .05, so we can reject H0.
2 500.3285 250.164 42.76 0.000
Residual Error 17 99.45697 5.850 (Also, F = 42.76 > 3.59)
Total 19 599.7855

p-value used to test for

overall significance

29 30

5
Testing for Significance: t Test t Test for Significance
of Individual Parameters
Hypotheses H0 : bi  0 Hypotheses H0 : bi  0
H a : bi  0 H a : bi  0

bi Rejection Rule For a = .05 and d.f. = 17, t.025 = 2.11

Test Statistics t
sbi
Reject H0 if p-value < .05, or
if t < -2.11 or t > 2.11
Rejection Rule Reject H0 if p-value < a or
if t < -taor t > ta where ta
is based on a t distribution
with n - p - 1 degrees of freedom.

31 32

t Test for Significance t Test for Significance

of Individual Parameters of Individual Parameters
 Regression Equation Output
Test Statistics b1 1. 4039
  7 . 07
sb1 . 1986
Predictor Coef SE Coef T p
b2 . 25089
Constant 3.17394 6.15607 0.5156 0.61279   3. 24
sb2 . 07735
Experience 1.4039 0.19857 7.0702 1.9E-06
Test Score 0.25089 0.07735 3.2433 0.00478
Conclusions Reject both H0: b1 = 0 and H0: b2 = 0.
Both independent variables are
significant.
t statistic and p-value used to test for the
individual significance of “Experience”

33 34

Lotteries have become important sources of revenue in some states

When one company buys another company, it is not unusual that some workers are
in India. Many people have criticized lotteries, however, referring to terminated. The severance benefits offered to the laid-off workers are often the
them as a tax on the poor and uneducated. In an examination of the subject of dispute. Suppose that the TESLA recently bought the TWEETER (now it is
issue, a random sample of 100 adults was asked how much they “X”) and subsequently terminated 20 of TWEETER’s employees. As part of the
spend on lottery tickets and was interviewed about various buyout agreement, it was promised that the severance packages offered to the
Tweeter employees would be equivalent to those offered to Tesla employees who
socioeconomic variables. The purpose of this study is to test the
had been terminated in the past year. Thirty-six-year-old Bill Smith, a Tweeter
following beliefs: employee for the past 10 years, earning $32,000 per year, was one of those let go.
• Relatively uneducated people spend more on lotteries than do His severance package included an offer of 5 weeks’ severance pay. Bill complained
relatively educated people. that this offer was less than that offered to Tesla’s employees when they were laid
• Older people buy more lottery tickets than younger people. off, in contravention of the buyout agreement. A Data Scientist was called in to settle
• People with more children spend more on lotteries than people with the dispute. The statistician was told that severance is determined by three factors:
fewer children. age, length of service with the company, and pay. To determine how generous the
• Relatively poor people spend a greater pro- portion of their income on severance package had been, a random sample of 50 Tesla ex-employees was taken.
lotteries than relatively rich people. For each, the following variables were recorded: Number of weeks of severance pay Age
of employee Number of years with the company Annual pay (in thousands of dollars).
Data: Amount spent on lottery tickets as a percentage of total household
income, Number of years of education, Age, Number of children, Personal Perform an analysis to determine whether Bill is correct in his assessment of the severance
income (in INR). package.
Fit a Model and interpret the results. Data Data

35 36

L10 Multiple Regression
No ratings yet
L10 Multiple Regression
14 pages
Multiple Regression
No ratings yet
Multiple Regression
36 pages
MultipleRegression 1
No ratings yet
MultipleRegression 1
40 pages
6 Multiple Regression
No ratings yet
6 Multiple Regression
36 pages
Topic 3 Multiple Regression Analysis Estimation
No ratings yet
Topic 3 Multiple Regression Analysis Estimation
31 pages
SaeHB Me Beta
No ratings yet
SaeHB Me Beta
6 pages
C2-English
No ratings yet
C2-English
33 pages
MLDA U1
No ratings yet
MLDA U1
10 pages
Multiple Linear Regression: Step 1: Initialize Values
No ratings yet
Multiple Linear Regression: Step 1: Initialize Values
3 pages
C2 English
No ratings yet
C2 English
34 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Roots of Equations - The Bisection Method: M311 - Chapter 2
No ratings yet
Roots of Equations - The Bisection Method: M311 - Chapter 2
10 pages
4_a_Regression_main
No ratings yet
4_a_Regression_main
38 pages
Multiple Linear Regression in Data Mining
100% (1)
Multiple Linear Regression in Data Mining
14 pages
Chap 6 MultipleLinearRegression Adjusted
No ratings yet
Chap 6 MultipleLinearRegression Adjusted
30 pages
Lecture - 33 Notes
No ratings yet
Lecture - 33 Notes
33 pages
Regression
No ratings yet
Regression
16 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Poly Regression
No ratings yet
Poly Regression
10 pages
Solving bihomogeneous polynomial systems with a zero-dimensional projection
No ratings yet
Solving bihomogeneous polynomial systems with a zero-dimensional projection
21 pages
V. Nonlinear Regression by Modified Gauss-Newton Method: Theory
No ratings yet
V. Nonlinear Regression by Modified Gauss-Newton Method: Theory
39 pages
Week 2 Watermark
No ratings yet
Week 2 Watermark
84 pages
CMA-ES With MATLAB Code
No ratings yet
CMA-ES With MATLAB Code
30 pages
Unit 2
No ratings yet
Unit 2
14 pages
BS Classes V2
No ratings yet
BS Classes V2
70 pages
Mldaf Short
No ratings yet
Mldaf Short
23 pages
Multiple Regression (Compatibility Mode)
No ratings yet
Multiple Regression (Compatibility Mode)
24 pages
Lect 7
No ratings yet
Lect 7
15 pages
Ch4b Expression Operator
No ratings yet
Ch4b Expression Operator
42 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
CS480 6 Linear Models
No ratings yet
CS480 6 Linear Models
68 pages
Lec 01
No ratings yet
Lec 01
17 pages
Reg Problem 8MKS Qa PT2
No ratings yet
Reg Problem 8MKS Qa PT2
9 pages
Numerical Analysis-1
No ratings yet
Numerical Analysis-1
84 pages
Lab 2 Linear Regression Representation
No ratings yet
Lab 2 Linear Regression Representation
6 pages
Question 1 (Linear Regression)
No ratings yet
Question 1 (Linear Regression)
18 pages
Mathematical Rules
No ratings yet
Mathematical Rules
10 pages
R08 Multiple Regression and Machine Learning
No ratings yet
R08 Multiple Regression and Machine Learning
24 pages
Session 4 - Multiple Linear Regression
No ratings yet
Session 4 - Multiple Linear Regression
63 pages
Regression
No ratings yet
Regression
6 pages
Fba 1
No ratings yet
Fba 1
9 pages
Introduction Linear Regression 2015
No ratings yet
Introduction Linear Regression 2015
9 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
Assignment #3_handout
No ratings yet
Assignment #3_handout
3 pages
Chuong 6 - Hoi Quy Boi (SBE - 11e Ch15)
No ratings yet
Chuong 6 - Hoi Quy Boi (SBE - 11e Ch15)
67 pages
Regression
No ratings yet
Regression
44 pages
18-660: Numerical Methods For Engineering Design and Optimization
No ratings yet
18-660: Numerical Methods For Engineering Design and Optimization
27 pages
Multiple Linear Regression: Step 1: Initialize Values
No ratings yet
Multiple Linear Regression: Step 1: Initialize Values
3 pages
Chapter4b-Expression and Operator
No ratings yet
Chapter4b-Expression and Operator
37 pages
Chapter 15
No ratings yet
Chapter 15
67 pages
cor
No ratings yet
cor
6 pages
PWLF Jekel Venter v2
No ratings yet
PWLF Jekel Venter v2
15 pages
STATISTIQUE APPLIQUEE - Seance 4
No ratings yet
STATISTIQUE APPLIQUEE - Seance 4
60 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
Chapter 15
No ratings yet
Chapter 15
67 pages
CHAPTER 7 - Optimal Dispatch of Generation 110511
No ratings yet
CHAPTER 7 - Optimal Dispatch of Generation 110511
72 pages
Batch Norm Parameter Tuning
No ratings yet
Batch Norm Parameter Tuning
2 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Student Solutions Manual for Mathematics for Economics, fourth edition
From Everand
Student Solutions Manual for Mathematics for Economics, fourth edition
Michael Hoy
No ratings yet
Pengaruh Kompensasi Terhadap Kinerja Karyawan Pt. Djarum TBK Cabang Batam Dengan Motivasi Kerja Sebagai Variabel Intervening
No ratings yet
Pengaruh Kompensasi Terhadap Kinerja Karyawan Pt. Djarum TBK Cabang Batam Dengan Motivasi Kerja Sebagai Variabel Intervening
24 pages
Chapter 8 Ppt New Period 3
No ratings yet
Chapter 8 Ppt New Period 3
12 pages
Evans - Analytics2e - PPT - 07 and 08
No ratings yet
Evans - Analytics2e - PPT - 07 and 08
49 pages
SPSS file fiinal
No ratings yet
SPSS file fiinal
47 pages
Q. 1) What Is Class Condition Density? (3 Marks) Ans
No ratings yet
Q. 1) What Is Class Condition Density? (3 Marks) Ans
12 pages
Longitudinal Notes
No ratings yet
Longitudinal Notes
202 pages
Unit 1 Correlation, Regression and Curve Fitting 2024-25
No ratings yet
Unit 1 Correlation, Regression and Curve Fitting 2024-25
25 pages
Stream and Pool Based Active Learning
No ratings yet
Stream and Pool Based Active Learning
11 pages
Ai - Introduction: FDP / Short Term Training On Artificial Intelligence & Deep Learning Applications
No ratings yet
Ai - Introduction: FDP / Short Term Training On Artificial Intelligence & Deep Learning Applications
6 pages
Worksheet Econometrics I
100% (2)
Worksheet Econometrics I
6 pages
Design and Analysis of Experiments 2
100% (1)
Design and Analysis of Experiments 2
8 pages
Telecom Customer Churn
0% (1)
Telecom Customer Churn
39 pages
Xtfef Sthelp
No ratings yet
Xtfef Sthelp
3 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 5
No ratings yet
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 5
9 pages
Linear Regression in Python
No ratings yet
Linear Regression in Python
28 pages
Bivariate Data
No ratings yet
Bivariate Data
26 pages
Regression Statistics
No ratings yet
Regression Statistics
4 pages
On The Use of Indicator Variables in Regression Analysis: by Keith M. Bower, M.S
No ratings yet
On The Use of Indicator Variables in Regression Analysis: by Keith M. Bower, M.S
4 pages
12 Articulo - A New Cost Model For Estimation of Open Pit Copper Mine Capital Expenditure
No ratings yet
12 Articulo - A New Cost Model For Estimation of Open Pit Copper Mine Capital Expenditure
8 pages
We Selected Training Sample Points From Individual Land Use Types Proportional To Their Respective Total Parcel Numbers Instead of Total Parcel Areas
No ratings yet
We Selected Training Sample Points From Individual Land Use Types Proportional To Their Respective Total Parcel Numbers Instead of Total Parcel Areas
2 pages
K Fold Cross Validation
No ratings yet
K Fold Cross Validation
17 pages
Chapter3 Econometrics MultipleLinearRegressionModel
No ratings yet
Chapter3 Econometrics MultipleLinearRegressionModel
41 pages
M6 Check in Activity 4 Group
No ratings yet
M6 Check in Activity 4 Group
10 pages
Multiple Linear Regression Model
No ratings yet
Multiple Linear Regression Model
99 pages
Data Driven Modelling Using MATLAB
No ratings yet
Data Driven Modelling Using MATLAB
21 pages
Week 1 HW
No ratings yet
Week 1 HW
3 pages
SPSS Ian Tinggal Copas Aseekk
No ratings yet
SPSS Ian Tinggal Copas Aseekk
4 pages
Logistic Regression
No ratings yet
Logistic Regression
2 pages
Pacf
No ratings yet
Pacf
39 pages
Text Problems Solved
No ratings yet
Text Problems Solved
9 pages