0% found this document useful (0 votes)
39 views7 pages

Multiple Regression and Issues in Regression Analysis

Uploaded by

pier Acosta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views7 pages

Multiple Regression and Issues in Regression Analysis

Uploaded by

pier Acosta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Page 1

2018, Study Session # 3, Reading # 10

“MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS”


MSR = Mean Regression Sum of Squares  = Critical F taken from F
MSE = Mean Squared Error 1. INTRODUCTION Distribute Table
RSS = Regression Sum of Squares  = Null Hypothesis
SSE = Sum of Squared Errors/Residuals ∝ = Alternative Hypothesis
α = Level of Significance  Multiple linear regression models are more X = Independent Variable
sophisticated. Y = Dependent Variable
 They incorporate more than one independent F = F Statistic (calculated)
variable.

2. MULTIPLE LINEAR REGRESSIONS

 Allows determining effects of more than one independent variable on a


particular dependent variable
  =  +  +   + ⋯   + 
 Tells the impact on Y by changing X1 by 1 unit keeping other independent
variables same.
 Individual slope coefficients (e.g. b1) in multiple regressions known as partial
regression/slope coefficients.

2.1 Assumption of the Multiple Linear Regression Model

 Relationship b/w Y and  ,  ,  , …  is linear.


 Independent variables are not random and no exact linear relationship exists
b/w 2 or more independent variables.
 Expected value of error terms is 0.
 Variance of error term is same for all observations.
 Error term is uncorrelated across observations.
 Error term is normally distributed.

2.2 Predicting the Dependent Variable in a Multiple Regression Model

 Obtain estimates of regression parameters.


 

 = ^ , ^ , ^ , … ^
     
 =  , ,  , … 
 Determine assumed values of  ,  … 
 Compute predicted value of  using  =  +  +   + ⋯ +  
 To predict dependent variable:
 Be confident that assumptions of the regression are met.
 Predictions regarding X must be within reliable range of data used to estimate the model.

2.3 Testing Whether All Population Regression Coefficients Equals Zero

  ⇒ All slope coefficients are simultaneously = 0, none of the X


variable helps explain Y.
 To test  F-test is used.
 T-test cannot be used.
=




/
=

/( ())

Copyright © FinQuiz.com. All rights reserved.


Page 2
2018, Study Session # 3, Reading # 10

2.3 Testing Whether All Population Regression Coefficients Equals Zero

 Where


 =  − 





 =  −  





n = no. of observation
k = no. of slope coefficients
 Decision rule ⇒ reject  if F > FC (for given α).
 It is a one-tailed test.
 df numerator =k
 df denominator =n-(k+1).
 For k and n the test statistic representing H0, all slope coefficients are
equal to 0, is , ()
 In F-distribution table  , () where K represents column and n-
(k+1) represents row.
 Significance of F in ANOVA table represents ‘p value’.
  F-statistic  chances of Type I error.

2.4 Adjusted R2

 R2  with addition of independent variables (X) in regression


 
    = 1 −   1 −  !.
  
  

 When k ≥ 1 ⇒   > 


  can be –ve but R2 is always +ve.




 If  is used for comparing regression models.




 Sample size must be the same


 Dependent variable is defined in the same way.
  Does not necessarily indicate regression is well specified.


3. USING DUMMY VARIABLES IN REGRESSION

 Dummy variable ⇒ takes 1 if particular condition is


true & 0 when it is false.
 Diligence is required in choosing no. of dummy
variables.
 Usually n-1 dummy variables are used
where n= no. of categories.

Copyright © FinQuiz.com. All rights reserved.


Page 3
2018, Study Session # 3, Reading # 10

4. VIOLATIONS OF REGRESSION ASSUMPTIONS

4.1 Heteroskedasticity

 Variance of errors differs across observations ⇒ heteroskedastic


 Variance of errors is similar across observations ⇒ homoskedastic
 Usually no systematic relationship exists b/w X & regression residuals.
 If systematic relationship is present ⇒ heteroskedasticity can exist.

4.1.1 The Consequence of Heteroskedasticity

 It can lead to mistake in inference.


 Does not affect consistency.
 F-test becomes unreliable.
 Due to biased estimators of standard errors, t-test also becomes unreliable.
 Most likely result of heteroskedasticity is that the:
 estimated standard errors will be underestimated.
 t-statistic will be inflated.
 Ignoring heteroskedasticity leads to significant relationship that does not exist actually.
 It becomes more serious while developing investment strategy using regression analysis.
 Unconditional heteroskedasticity ⇒ when heteroskedasticity of error variance is not correlated with
independent variables in the multiple regression.
 Create major problems for statistical inference.
 Conditional heteroskedasticity ⇒ when heteroskedasticity of error variance is correlated with the
independent variables.
 It causes most problems.
 Can be tested & corrected easily through many statistically software packages.

4.1.2 Testing for Heteroskedasticity

 Breush-Pagan test is widely used.


 Regression squared residuals of regression on independent variables.
 Independent variables explain much of the variation of errors ⇒
conditional heteroskedasticity exists.
  = no conditional heteroskedasticity exists.
  = conditional heteroskedasticity exist

Under Breush-pagan test statistic = nR2

R2: from regression of squared residuals on X

Critical value ⇒ calculated χ2 distribution.

df = no. of independent variables

Reject  if test-static > critical value.

Copyright © FinQuiz.com. All rights reserved.


Page 4
2018, Study Session # 3, Reading # 10

4.1.3 Correcting for Heteroskedasticity

Robust Standard Errors Generalized Least Squares

 Corrects standard error of estimated  Modify original equation.


coefficients.  Requires economic expertise to
 Also known as heteroskedasticity implement correctly on financial data.
consistent standards errors or white-
corrected standards errors.

4.2 Serial Correlation

 Regression errors correlated across observations.


 Usually arises in time-series regression.

4.2.1 The Consequences of Serial Correlation

 Incorrect estimate of regression coefficient standard errors


 Parameter estimates become inconsistent & invalid when Y is lagged onto X under serial
correlation.
 Positive serial correlation ⇒ positive (negative) errors  chance of positive (negative) errors
 Negative serial correlation ⇒ positive (negative) errors  chance of negative (positive) errors
 It leads to wrong inferences
 If positive serial correlation:
 Standard errors underestimated
 T-statistic & F-statistics inflated
 Type-I error 
 If negative serial correlation
 Standard errors overestimated
 T-statistics & F-statistics understated
 Type-II error 

4.2.2 Testing for Serial Correlation

 Variety of tests, most common → Durbin-Watson test


∑೅
 "# =
೟ ೟షభమ
೟సమ 
∑೅ మ
೟సభ ೟

Where  = regression residual for period t.


 For large sample size Durbin-Watson statistic (d) is approximately
→DW ≈ 2(1-r)
→where r = sample correlation b/w regression residuals of t and t-1
 Values of DW can range from 0 to 4.
 DW = 2 ⇒ r=0 ⇒ no serial correlation.
 DW = 0 ⇒ r=1 ⇒ perfectly positively serially correlated.
 DW = 4 ⇒ r = -1 ⇒ perfectly negatively serially correlated.
 For positive serial correlation:
  ⇒ No positive serial correlation
  ⇒ Positive serial correlation
 "# < $ ⇒ reject 
 "# >  ⇒ do not reject 
 dl ≤ "# ≤  ⇒ inconclusive.

Copyright © FinQuiz.com. All rights reserved.


Page 5
2018, Study Session # 3, Reading # 10

4.2.2 Testing for Serial Correlation

 For negative serial correlation:


  ⇒ No negative serial correlation.
  ⇒ Negative serial correlation.
 "# > 4 − $ ⇒ Reject .
 "# < 4 −  ⇒ do not reject 
 4 −  ≤ "# ≤ 4 − $ ⇒ inconclusive.

4.2.3 Correcting for Serial Correlation

 Adjust the coefficient standard errors.  Modify regression equation.


→ Recommended method  Extreme care is required.
 Hansen’s method ⇒ most prevalent one.  May lead to inconsistent parameters
estimates.

4.3 Multicollinearity

 Occurs when two or more independent variables (X) are highly


correlated with each other.
 Regression can be estimated but result becomes problematic.
 Serious practical concern due to commonly found approximate linear
relation among financial variables.

4.3.1 The Consequences of Multicollinearity

 Difficulty in detecting significant relationships.


 Estimates become extremely imprecise & unreliable though consistency is unaffected.
 F-statistic is unaffected.
 Standard errors of regression can .
 Causing insignificant t-tests
 Wide confidence interval
 Type II error

4.3.2 Detecting Multicollinearity

 Multicollinearity is a matter of degree rather than the presence / absence.


  Pair wise correlation does not necessarily indicate presence of Multicollinearity
  Pair wise correlation does not necessarily indicate absence of Multicollinearity
 With 2 independent variables ⇒ correlation is a useful indicator.
  R2 significant, F-statistic significant, insignificant t-statistic on slope coefficients ⇒
classic symptom of Multicollinearity

4.3.3 Correcting Multicollinearity

 Exclude one or more regression variables.


 In many cases, experimentation is done to determine
variable causing Multicollinearity

Copyright © FinQuiz.com. All rights reserved.


Page 6
2018, Study Session # 3, Reading # 10

5. MODEL SPECIFICATION AND ERRORS IN SPECIFICATION

 Model specification ⇒ set of variables included in


regression.
 Incorrect specification leads to biased & inconsistent
parameters

5.1 Principles of Model Specification

 Model grounded on economic reasoning.


 Functional form of variables compatible with nature of variables
 Parsimonious ⇒ each included variable should play an essential role
 Model is examined for the violation of regression assumptions.
 Model is tested for the validity & usefulness of the out of sample data.

5.2 Misspecified Functional Form

 One or more variables are omitted. If omitted variable is correlated with


remaining variable, error term will also be correlated with the latter and
the:
 result can be biased & inconsistent.
 estimated standard errors of the coefficients will be inconsistent.
 One or more variables may require transformation.
 Pooling of data from different samples that should not be pooled.
 Can lead to spurious results.

5.3 Times-Series Misspecification (Independent Variables Correlated with Errors)

 Including lagged variables (dependent) as independent


with serial correlation.
 Including a function of the dependent variable as an
independent variable.
 Independent variables measured with error

5.4 Other Types of Time-Series Misspecification

 Nonstationarity: variable properties, e.g. mean, are


not constant through time.
 In practice nonstationarity is a serious problem.

Copyright © FinQuiz.com. All rights reserved.


Page 7
2018, Study Session # 3, Reading # 10

6. MODELS WITH QUALITATIVE DEPENDENT VARIABLES

 Qualitative dependent variables ⇒ dummy variables used as dependent instead of


independent.
 Probit model⇒ based on normal distribution estimates the probability:
 of discrete outcome, given values of independent variables used to explain that
outcome.
 that Y=1, implying a condition is met.
 Logit model:
 Identical to Probit model.
 Based on logistic distribution.
 Both Logit and Probit models must be estimated using maximum likelihood methods.
 Discriminate analysis ⇒ can be used to create an overall score that is used for classification.
 Qualitative dependent variable models can be used for portfolio management and business
management.

Copyright © FinQuiz.com. All rights reserved.

You might also like