0% found this document useful (0 votes)
12 views

A Comprehensive Approach to Misspecification Testing in Linear Regression Models

This study explores misspecification testing in linear regression models, emphasizing the importance of diagnostic tests for accurate statistical modeling. It discusses the effects of variable inclusion and exclusion on bias and variance, and presents a method for testing the null hypothesis of zero error in predictor variables through an augmented regression model. The research highlights the common issues of model misspecification and proposes statistical tests to address these challenges.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

A Comprehensive Approach to Misspecification Testing in Linear Regression Models

This study explores misspecification testing in linear regression models, emphasizing the importance of diagnostic tests for accurate statistical modeling. It discusses the effects of variable inclusion and exclusion on bias and variance, and presents a method for testing the null hypothesis of zero error in predictor variables through an augmented regression model. The research highlights the common issues of model misspecification and proposes statistical tests to address these challenges.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

A Comprehensive Approach to Misspecification Testing in Linear Regression

Models
Siddamsetty Upendra1, Dr. R. Abbaiah2, Dr. P. Balasiddamuni3, Dr. K. Murali4

1.Research scholar, Department of Statistics, S V University, Tirupathi, India


[email protected]
2, 3. Research supervisor, Department of Statistics, S V University, Tirupathi, India
4. Academic Consultant, Department of Statistics, S V University, Tirupathi, India
Abstract

In statistical research, diagnostic testing heavily relies on sample specification tests, which playa
crucialroleinensuringtheaccuracyofstatisticalmodels.Thisstudydelves into misspecification tests in the
context of diagnostic testing and highlights their importance in statistical research and model building.
Model misspecification is a common issue that can arise from various sources, including omitting truly
specified variables, including irrelevant variables,
oraddingunspecifiedvariables.Theeffectsofexcludingorincludingvariablesina model, as outlined by
Potluri Rao, highlight the trade-offs between bias, variance, and mean squared error in classical linear
regression models.

This study presents a linear regression model and proposes an augmented regression model to test
the null hypothesis of unconditional zero error in the predictor variable, X. The null hypothesis is tested
against an augmented regression model, where the OLS estimator is derived and compared under the
null and alternative hypotheses. Two test statistics are presented for this purpose, providing insights into
the efficiency of the estimator.

Keywords: Diagnostic testing, sample specification tests, misspecification tests, over-specification,


under-specification, Ordinary Least Squares estimator, hypothesis, accuracy, reliability.

In the field of diagnostic testing, sample specification tests hold significant importance.
Statistical research heavily relies on diagnostic testing and uses specification tests at various
levels. In the past four decades, a substantial amount of literature has emerged on
misspecification tests for statistical models. Interpreting a good model is an art, and over-
specification provides unbiased estimates of regression efficiency, but with large variance. On
the other hand, under-specification underestimates the bias estimates of the regression
coefficients and their larger variance. When an irrelevant variable is added or an omitted
variable is not specified, statisticians concerned with the misspecification of an irrelevant
variable and the resulting bias face specification bias. It occurs due to the inclusion of a
variable that is not represented by the truth. Given the importance of these aspects of
misspecification in empirical research, the present study considers some key results of
misspecification error tests.
The issue of model misspecification is common in statistical model development. This
problem has four main sources, including omitting the specified variable, using the wrong
variable, adding an unspecified variable, and using an irrelevant variable.
Potluri Rao (1970) states the following effects of excluding or including a variable in a
model:
Effects of Excluding a Specified Variable:

(i) Introduces bias into all least squares estimates;


(ii) Minimizes the variance of all least squares estimates; And
(iii) Minimize the mean squared error of all least square estimates.
Effects of Adding an Irrelevant Variable:

(i) Does not introduce bias into the least squares estimates;
(ii) Maximizes the variance of all least squares estimates;
(iii) Growth is the squared error of all least square estimates.
Assumptions of linear regression models:
Linear regression models make several key assumptions to ensure the validity of the
statistical inferences drawn from the model. Understanding and checking these
assumptions are crucial for accurate interpretation and reliable predictions. Here are the
fundamental assumptions of linear regression models:

1. Linearity: The relationship between the independent variables and the


dependentvariableisassumedtobelinear.Thisimpliesthatchangesintheindependent
variables result in a constant change in the dependent variable.

2. Independence of Residuals: The residuals (the differences between the observed


and predicted values) should be independent of each other. Independence is essential for
unbiased estimates and valid hypothesis testing.

3. Homoscedasticity (Constant Variance): The variance of the residuals should be


constant across all levels of the independent variables. Homoscedasticity ensures that the
spread of the residuals remains consistent throughout the range of predictor variables.

4. Normality of Residuals: The residuals should ideally follow a normal


distribution. However, the Central Limit Theorem often mitigates the impact of
deviations from normality, especially for large sample sizes. Normality is particularly
important for hypothesis testing and constructing confidence intervals.

5. No Perfect Multicollinearity: In multiple regression models, there should not be


perfect linear relationships among the independent variables. High multicollinearity can
make it challenging to estimate individual predictors' effects accurately.

6. No Autocorrelation: The residuals should not exhibit patterns or trends over


time or across observations. Autocorrelation, or the correlation between residuals at
different points, can lead to inefficient parameter estimates and unreliable standard
errors.

7. Additivity and Linearity of Residuals: The effects of changes in the


independent variables on the dependent variable should be additive. Additionally, the
residuals should be linearly distributed, meaning the model's errors should not
systematically increase or decrease across the range of predictor values.

8. Scale of Measurement: The variables should be measured at the interval or ratio


level. This ensures meaningful interpretations of the coefficients and other model
parameters.

Checking these assumptions involves diagnostic tests, graphical methods (such as


residual plots), and statistical tests. It's important to note that while violations of these
assumptions can affect the precision and reliability of the estimates, linear regression
models can still provide valuable insights even in the presence of some departures from
the assumptions, especially with larger sample sizes. However, researchers should exercise
caution and consider alternative models or transformations if assumptions are severely
violated.

We consider a linear regression model as

Y n×1= X n×k β k×1 + ε n×1


Such that
E [ ε ]=0
E [ ε ' ε ] =σ 2 I n
ρ ( X )=k
ε follows N ( 0, σ 2 I n )

The OLS estimator of β is given by

^ ( X ' X )−1 X ' Y


β=
V ( β^ )=σ 2 ( X ' X )
−1
And

State the null and alternative hypotheses as

Null-hypothesis: H0: E [ ε X ]=0


And

Alternative hypothesis: H1: E [ ε X ]=Z ≠ 0


Where Z is an unknown (nx1) vector and not Orthogonal to X
To test the null hypothesis against an unconditional zero error on X in a regression model,
one can consider an augmented regression model. Let it be

Y = Xβ+Wτ + ∈
Where W is a (n x q) matrix of rank q and function of X variables and rank of (X, W) = k + q

The OLS estimator of β composed of the first k elements is given by


~ −1
β=[ ( X ,W ) ( X ,W ) ] [ ( X ,W ) ] Y
′ ′

~
V ( β )=σ 2 ( X ' M W X )−1
~
And also V ( β )=σ 2 H ( W ' M ¿ X )−1 H +σ 2 ( X ' X )
Here
M W = [ I −W ( W ' W )−1 W ' ]
M ¿ =[ I −X ( X ' X )−1 X ' ]
−1
and H= ( X ' X ) X ' W
^
Clearly β is efficient estimator.

Under the Alternative hypothesis, we have


E[β ^ ] = β + ( X ' X )−1 X ' Z
~ −1
E [ β ]= β +( X ' X ) X ' Z − H ( W ' M ¿ W )−1 W ' M ¿ Z

Where
Z=E [ε X ]
Under the null hypothesis, we have
^ ~
( β− β ) Follows N ( 0 , ν )

ν=σ 2 [ ( X ' M ¿ X )−1 −( X ' X )−1 ]


Where ν=σ 2 H ( W ' M ¿ W )−1 H
Under H1wemayobtain,
^ ~
E [ β−
' −1
β ] =−H [ W M ¿ W ] W M ¿ Z
'

The null hypothesis is tested using one of the following two test statistics:

(1) ^ ~
( β− ' −1 ' ^ ~
β ) [ H ( W M ¿ W ) H ] ( β−

β)
Q1 = 2
σ ¿1
1
2
¿ e¿1 e ¿1
σ1 =
Here n− k
And also
Q1 follows asymptotically
χ 2k

(2) ^ ~
( β− ′ ' ~
β ) [ H ( W M ¿ W ) H ] ( β^ − β )
−1 '
Q2= 2
¿
σ2
1
2
¿ e2 e ¿2
¿

σ2 =
Here n−( k +q )
2
Q
And also 2
χ
follows asymptotically min( k , q) .
In practice, the statistical model may not be well specified. Higher specification yields
unbiased estimates of regression coefficients, but larger variance; under specification
underestimates the bias estimates of the regression coefficients and the variance of these
estimates. In empirical research, the problem of estimating a misspecified statistical model is
frequently encountered. In this article, a test for misspecification of a linear regression model
is developed using different types of residuals.

ACKNOWLEDGMENTS
I express my sincere thanks and gratitude to my Research Supervisor, Dr. R.
Abbaiah, from the Department of Statistics at S.V. University, Tirupati, for his
invaluable guidance during my internship and the writing of my research paper.

I am deeply indebted to Dr. P. BalaSiddamuni, Department of Statistics, S.V. University,


Tirupati, for his unwavering interest and support at every stage of my research.

Special thanks to Dr. B. Sarojamma, Head of the Department of Statistics, S.V.U,


Tirupati, for providing crucial technical suggestions that significantly contributed to the
development of my research.

I extend my gratitude to Dr. K. Murali, Academic Consultant, Dept. of Statistics,


S.V. University, Tirupati, for their motivation and cooperation throughout myresearch
journey.

I am privileged to acknowledge the encouragement and support of my sister, Mrs. S.


Maheswari, M.Sc, a researchscholar in the Department of Mathematics at Y V U
niversity, and Mr. A. Sivakumar, M.Com, Lecturer in Commerce, Sri Sai Degree
College, Rly. Kodur. I would also like to express my appreciation to all the authors
whose papers are cited in this article, serving as valuable references and resources.

REFERENCES:
[1}. Benerjee, A.N. and Magnus,J.R.(2000),“ On the Sensitivity of the Usualt- and F-tests to
Covariance Mis-Specification”, Journal of Econometrics, vol.95, pp.157-176.
[2]. Aitkin, M. (1974), “Simultaneous Inference and Choice of Variable Subsets in Multiple
Regression”, Journal of Technimetrics, vol.16, pp.221-22
[3].Akaike,H.(1969),“FittingAutoregressiveModelsforPrediction”,JournalofAnnals of the Institute
of Statistical Mathematics, vol.21, pp.243-247
[4]. Akaike, H. (1970),” Statistical Predictor Identification”, Journal of Annals of the Institute of
Statistical Mathematics, vol.22, pp.203-217
[5]. Amemiya, T. (1980), “Selection of Repressors”, International Economic Review, vol.21,
pp.331-354
[6]. Bozdogan, H. (1987), “Model Selection and Akaike’s Information Criterion (AIC): The General
Theory and Its Analytical Extensions”, Journal of Psychometrika, vol.52, pp.345-370
[7]. Statistical Modeling and Diagnostic Tests, J. Prabhakar Naik, Balasiddamuni Pagadala, Ramesh
Mummineni.
[8]. Attfield, C.L.F (1983), “Consistent Estimation of Certain Parameters in the Unobservable
Variable Model when there is Specification Error”, The Review of Economics and Statistics,
vol. 65, pp. 164-167.
[9]. Bierens, H.J. (1982), “Consistent Model Specification Tests”, Journal of Econometrics, vol. 20,
pp. 105-134, North-Holland Publishing Company.
[10].Chesher, A.D., and Smith, R.J. (1997), “Likelihood Ratio Specification Tests”, Journalof
Econometrica, vol.65, pp. 627-646.
[11].Damodar Gujarati, Dawn Porter and Sangeetha Gunasekar. (2011), “Basic Econometrics”,
5th Edition, Tata Mc Graw Hill Education”, India.
[12]. David, R., Hunter and Li, R. (2005), “Variable Selection Using MM Algorithms”, Journal of
the Annals of Statistics, vol.33, no.4, pp.1617-1642.
[13].Davidson, R. and MacKinnon (1981), “Several Tests for Model Specification in the Presence
of Alternative Hypotheses”, Journal of Econometrics, vol.49, pp.781-793.
[14].Davidson, R., and MacKinnon, J.G. (1990), “Specification Tests Based on Artificial
Regression”, Journal ofthe American Statistical Association, vol. 85, pp. 220-227.
[15]. Jarque, C.M. and Bera, A.K. (1982), “Model Specification Tests”, Journal of Econometrics”,
vol.20, pp.59-82. Johnston, J. (1984), “Econometric Methods”, Third Edition, Mc Graw Hill,
Singapore.
[16].Judge,G.C.etal.(1985),“TheTheoryandPracticeofEconometrics”,SecondEdition,
JohnWileyand Sons,NewYork.Judge,G.G. etal.(1980),“TheTheoryandPractice of
Econometrics”, John Wiley & Sons, New York.
[17]. Pepe, MS. And Janes, H. (2007), “Insightsinto Latnent Class Analysis of Diagnostic Tests
Performance”, Journal of Biostatistics, vol.8, pp.474-484.
[18].Shi, P,and Tsai, C.L. (2002), “Regression Model Selection-A Residual Likelihood
Approach”, Journal of Royal Statistical Society, Series-B, vol.64, pp.237-252.
[19].Davidson, R., & MacKinnon, J. G. (1993). "Estimation and Inference in Econometrics."Oxford
University Press.
[20]. Greene,W.H.(2003)."EconometricAnalysisofPanelData."MITPress.
[21].Hall, A.,& Inoue, A. (2003)."The Large Sample Behavior of the Generalized Method of
Moments Estimator in Misspecified Models." Journal of Econometrics, 114(2), 361-394.
[22]. Harvey, A. C. (1989)."Forecasting, Structural Time Series Models and the Kalman Filter."
Cambridge University Press.
[23]. Hausman, J. A. (1978). "Specification Tests in Econometrics." Econometrica, 46(6), 1251-
1271.
[24]. Hendry,D.F.(1995)."DynamicEconometrics."OxfordUniversityPress.
[25]. Judge, G. G., & Bock, M. E. (1978). "The Statistical Implications of Pre-Test and Stein- Rule
Estimators in Econometrics." North-Holland.
[26]. Kelejian, H. H., & Prucha, I. R. (2007). "HAC Estimation in a Spatial Framework." Journal
of Econometrics, 140(1), 131-154.
[27]. Koenker, R., & Bassett Jr, G. (1978). "Regression Quantiles." Econometrica, 46(1), 33-50.
[28].Phillips,P.C.,&Hansen,B.E.(1990)."StatisticalInferenceinInstrumentalVariables Regression
with I (1) Processes." The Review of Economic Studies, 57(1), 99-125.
[29].Reimers, H. E. (1983). "Nonparametric Smoothing in Spline Models." Journal of the
American Statistical Association, 78(383), 24-37.
[30].White,H.(1987)."MaximumLikelihoodEstimationofMisspecifiedModels."Econometrica,
55(3), 551-586.
[31].Zellner,A.,&Theil,H.(1962)."Three-stageLeastSquares:SimultaneousEstimation of
Simultaneous Equations." Econometrica, 30(1), 54-78.

You might also like