A Comprehensive Approach to Misspecification Testing in Linear Regression Models
A Comprehensive Approach to Misspecification Testing in Linear Regression Models
Models
Siddamsetty Upendra1, Dr. R. Abbaiah2, Dr. P. Balasiddamuni3, Dr. K. Murali4
In statistical research, diagnostic testing heavily relies on sample specification tests, which playa
crucialroleinensuringtheaccuracyofstatisticalmodels.Thisstudydelves into misspecification tests in the
context of diagnostic testing and highlights their importance in statistical research and model building.
Model misspecification is a common issue that can arise from various sources, including omitting truly
specified variables, including irrelevant variables,
oraddingunspecifiedvariables.Theeffectsofexcludingorincludingvariablesina model, as outlined by
Potluri Rao, highlight the trade-offs between bias, variance, and mean squared error in classical linear
regression models.
This study presents a linear regression model and proposes an augmented regression model to test
the null hypothesis of unconditional zero error in the predictor variable, X. The null hypothesis is tested
against an augmented regression model, where the OLS estimator is derived and compared under the
null and alternative hypotheses. Two test statistics are presented for this purpose, providing insights into
the efficiency of the estimator.
In the field of diagnostic testing, sample specification tests hold significant importance.
Statistical research heavily relies on diagnostic testing and uses specification tests at various
levels. In the past four decades, a substantial amount of literature has emerged on
misspecification tests for statistical models. Interpreting a good model is an art, and over-
specification provides unbiased estimates of regression efficiency, but with large variance. On
the other hand, under-specification underestimates the bias estimates of the regression
coefficients and their larger variance. When an irrelevant variable is added or an omitted
variable is not specified, statisticians concerned with the misspecification of an irrelevant
variable and the resulting bias face specification bias. It occurs due to the inclusion of a
variable that is not represented by the truth. Given the importance of these aspects of
misspecification in empirical research, the present study considers some key results of
misspecification error tests.
The issue of model misspecification is common in statistical model development. This
problem has four main sources, including omitting the specified variable, using the wrong
variable, adding an unspecified variable, and using an irrelevant variable.
Potluri Rao (1970) states the following effects of excluding or including a variable in a
model:
Effects of Excluding a Specified Variable:
(i) Does not introduce bias into the least squares estimates;
(ii) Maximizes the variance of all least squares estimates;
(iii) Growth is the squared error of all least square estimates.
Assumptions of linear regression models:
Linear regression models make several key assumptions to ensure the validity of the
statistical inferences drawn from the model. Understanding and checking these
assumptions are crucial for accurate interpretation and reliable predictions. Here are the
fundamental assumptions of linear regression models:
Y = Xβ+Wτ + ∈
Where W is a (n x q) matrix of rank q and function of X variables and rank of (X, W) = k + q
~
V ( β )=σ 2 ( X ' M W X )−1
~
And also V ( β )=σ 2 H ( W ' M ¿ X )−1 H +σ 2 ( X ' X )
Here
M W = [ I −W ( W ' W )−1 W ' ]
M ¿ =[ I −X ( X ' X )−1 X ' ]
−1
and H= ( X ' X ) X ' W
^
Clearly β is efficient estimator.
Where
Z=E [ε X ]
Under the null hypothesis, we have
^ ~
( β− β ) Follows N ( 0 , ν )
The null hypothesis is tested using one of the following two test statistics:
(1) ^ ~
( β− ' −1 ' ^ ~
β ) [ H ( W M ¿ W ) H ] ( β−
′
β)
Q1 = 2
σ ¿1
1
2
¿ e¿1 e ¿1
σ1 =
Here n− k
And also
Q1 follows asymptotically
χ 2k
(2) ^ ~
( β− ′ ' ~
β ) [ H ( W M ¿ W ) H ] ( β^ − β )
−1 '
Q2= 2
¿
σ2
1
2
¿ e2 e ¿2
¿
σ2 =
Here n−( k +q )
2
Q
And also 2
χ
follows asymptotically min( k , q) .
In practice, the statistical model may not be well specified. Higher specification yields
unbiased estimates of regression coefficients, but larger variance; under specification
underestimates the bias estimates of the regression coefficients and the variance of these
estimates. In empirical research, the problem of estimating a misspecified statistical model is
frequently encountered. In this article, a test for misspecification of a linear regression model
is developed using different types of residuals.
ACKNOWLEDGMENTS
I express my sincere thanks and gratitude to my Research Supervisor, Dr. R.
Abbaiah, from the Department of Statistics at S.V. University, Tirupati, for his
invaluable guidance during my internship and the writing of my research paper.
REFERENCES:
[1}. Benerjee, A.N. and Magnus,J.R.(2000),“ On the Sensitivity of the Usualt- and F-tests to
Covariance Mis-Specification”, Journal of Econometrics, vol.95, pp.157-176.
[2]. Aitkin, M. (1974), “Simultaneous Inference and Choice of Variable Subsets in Multiple
Regression”, Journal of Technimetrics, vol.16, pp.221-22
[3].Akaike,H.(1969),“FittingAutoregressiveModelsforPrediction”,JournalofAnnals of the Institute
of Statistical Mathematics, vol.21, pp.243-247
[4]. Akaike, H. (1970),” Statistical Predictor Identification”, Journal of Annals of the Institute of
Statistical Mathematics, vol.22, pp.203-217
[5]. Amemiya, T. (1980), “Selection of Repressors”, International Economic Review, vol.21,
pp.331-354
[6]. Bozdogan, H. (1987), “Model Selection and Akaike’s Information Criterion (AIC): The General
Theory and Its Analytical Extensions”, Journal of Psychometrika, vol.52, pp.345-370
[7]. Statistical Modeling and Diagnostic Tests, J. Prabhakar Naik, Balasiddamuni Pagadala, Ramesh
Mummineni.
[8]. Attfield, C.L.F (1983), “Consistent Estimation of Certain Parameters in the Unobservable
Variable Model when there is Specification Error”, The Review of Economics and Statistics,
vol. 65, pp. 164-167.
[9]. Bierens, H.J. (1982), “Consistent Model Specification Tests”, Journal of Econometrics, vol. 20,
pp. 105-134, North-Holland Publishing Company.
[10].Chesher, A.D., and Smith, R.J. (1997), “Likelihood Ratio Specification Tests”, Journalof
Econometrica, vol.65, pp. 627-646.
[11].Damodar Gujarati, Dawn Porter and Sangeetha Gunasekar. (2011), “Basic Econometrics”,
5th Edition, Tata Mc Graw Hill Education”, India.
[12]. David, R., Hunter and Li, R. (2005), “Variable Selection Using MM Algorithms”, Journal of
the Annals of Statistics, vol.33, no.4, pp.1617-1642.
[13].Davidson, R. and MacKinnon (1981), “Several Tests for Model Specification in the Presence
of Alternative Hypotheses”, Journal of Econometrics, vol.49, pp.781-793.
[14].Davidson, R., and MacKinnon, J.G. (1990), “Specification Tests Based on Artificial
Regression”, Journal ofthe American Statistical Association, vol. 85, pp. 220-227.
[15]. Jarque, C.M. and Bera, A.K. (1982), “Model Specification Tests”, Journal of Econometrics”,
vol.20, pp.59-82. Johnston, J. (1984), “Econometric Methods”, Third Edition, Mc Graw Hill,
Singapore.
[16].Judge,G.C.etal.(1985),“TheTheoryandPracticeofEconometrics”,SecondEdition,
JohnWileyand Sons,NewYork.Judge,G.G. etal.(1980),“TheTheoryandPractice of
Econometrics”, John Wiley & Sons, New York.
[17]. Pepe, MS. And Janes, H. (2007), “Insightsinto Latnent Class Analysis of Diagnostic Tests
Performance”, Journal of Biostatistics, vol.8, pp.474-484.
[18].Shi, P,and Tsai, C.L. (2002), “Regression Model Selection-A Residual Likelihood
Approach”, Journal of Royal Statistical Society, Series-B, vol.64, pp.237-252.
[19].Davidson, R., & MacKinnon, J. G. (1993). "Estimation and Inference in Econometrics."Oxford
University Press.
[20]. Greene,W.H.(2003)."EconometricAnalysisofPanelData."MITPress.
[21].Hall, A.,& Inoue, A. (2003)."The Large Sample Behavior of the Generalized Method of
Moments Estimator in Misspecified Models." Journal of Econometrics, 114(2), 361-394.
[22]. Harvey, A. C. (1989)."Forecasting, Structural Time Series Models and the Kalman Filter."
Cambridge University Press.
[23]. Hausman, J. A. (1978). "Specification Tests in Econometrics." Econometrica, 46(6), 1251-
1271.
[24]. Hendry,D.F.(1995)."DynamicEconometrics."OxfordUniversityPress.
[25]. Judge, G. G., & Bock, M. E. (1978). "The Statistical Implications of Pre-Test and Stein- Rule
Estimators in Econometrics." North-Holland.
[26]. Kelejian, H. H., & Prucha, I. R. (2007). "HAC Estimation in a Spatial Framework." Journal
of Econometrics, 140(1), 131-154.
[27]. Koenker, R., & Bassett Jr, G. (1978). "Regression Quantiles." Econometrica, 46(1), 33-50.
[28].Phillips,P.C.,&Hansen,B.E.(1990)."StatisticalInferenceinInstrumentalVariables Regression
with I (1) Processes." The Review of Economic Studies, 57(1), 99-125.
[29].Reimers, H. E. (1983). "Nonparametric Smoothing in Spline Models." Journal of the
American Statistical Association, 78(383), 24-37.
[30].White,H.(1987)."MaximumLikelihoodEstimationofMisspecifiedModels."Econometrica,
55(3), 551-586.
[31].Zellner,A.,&Theil,H.(1962)."Three-stageLeastSquares:SimultaneousEstimation of
Simultaneous Equations." Econometrica, 30(1), 54-78.