Cramer, Mean and Variance of r2 - 000009

Cramer, Mean and Variance

Uploaded by

Stephen McIntyre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

40 views14 pages

Cramer, Mean and Variance of r2 - 000009

Cramer, Mean and Variance

Uploaded by

Stephen McIntyre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 14

Journal of Econometrics 35 (1987) 253-266, NorthHolland MEAN AND VARIANCE OF R? IN SMALL AND. MODERATE SAMPLES* JS. CRAMER University of Amsterdam, 1011 NH Amsterdam, The Netherlands Received March 1986, nal version received September 1986 We derive and use easily computable expressions forthe mean and variance of R? in the standard linear regression model with fed regressos. In respect to its probability Limit R* is seriously biased upward in small samples; the ‘adjusted’ R® does much beter. But at sample sizes where these distinctions matter both measures are thoroughly unreliable because of their large disper sion, R® should not be quoted for samples of ess than fifty observations. 1, Introduction Ordinary least squares (OLS) estimation of linear regression equations is still an accepted tool of analysis am economists. The reported results invariably include R? or the ‘adjusted’ ‘These measures of goodness of fit have a fatal attraction. Although it is generally conceded among insiders that they do not mean a thing, high values are still a source of pride and satisfaction to their authors, however hard they may try to conceal these teeing To put these sample statistics in proper perspective we shall derive their ‘means and variances for various sample sizes under the standard assumptions of econometric theory. This means that the regressor variables are regarded as given, noncrandom constants. In iis respect the model differs from the Elassicalueatment of contaton in the seting of a, multivariate Normal distribution, and the results differ too. The mean of R? converges to its probability limit from above, and in this sense it has an upward bias which. can_be substantial in small samples. In this respect R* i superior. The Standard eivors show however tat Tor sample sizes of up to 40 or 50 either measure is a very unreliable statistic. *T have benefited from the coretions and improvements suggested by several referees, Norman Draper, oyee Mejering, and V. Srivastava, 1 owe particular debts to Roald Ramer and San de Locus for their help with the algebra and bibliography of section 4. 0304-4076 /87/$3500 1987, Flsevier Science Publishers BY. (North-Holland)254 4S. Cramer, Mean and variance of R? 2. The moments of 2? in the standard model ‘We consider the standard linear regression model with Normal disturbances as given in any textbook of econometrics, but we employ a slightly non-standard presentation and notation. We write yrer+Xp+e, aq) with y a (m1) vector of observed values of the dependent variable and ea vector of m independent N(0, 0°) disturbances, On the right c is a unit vector, Y the intercept parameter, and X a matrix of (k—1) regressor variables, which have all been measured as deviations from their sample means. This last property of X simplifies the notation in the sequel, like the use of m instead of nn for the actual sample size. Note that we have merely reparametrized the systematic, non-random part of the right-hand side, without touching the definition of y and e; we have not taken deviations from the mean for the dependent variable, and the elements of e are still stochastically independent. With ordinary least squares, the estimate of y is sey. @ p which is the sample mean of y, while the estimate of is b= (XX) xy, (3) Upon defining the residual vector ¢ as e=y~ GF Xb, @ we have the identity JY = MFP + BIX'XD + e'e, or (vy — mp?) = 0X + ee, 6) This is the familar decomposition of the sum of squares of y, on the left, into a systematic and a residual component. Their relative size determines the familar measure of goodness of fit ee BX'XD Re1-— ©),JS. Cramer, Mean and variance of Rt 255 ‘We derive the density function of R? and then its moments for the general case with +0. First consider the transformation BR? _ wIX'Xb/a? ee/o* COs is halfway towards F) is the ratio of two independent chi-square ites, The numerator has a non-central chrsquare aisTHbUTION, —= Oo PEM 204-1), oe with (k ~ 1) degrees of freedom and non-centrality parameters ee © ¢ @ For the denominator we have of course ) psip ee, a(ees FET mak). 0) YC o Par (or ‘The density function of G can be found, for example, in Johnson and Kotz (1970, II, p. 191). Upon introducing the transformation (7) and relabelling the parameters we obtain the density of R?, with argument 1, from the transformation theorem. This yields 5 1 enw wR Hom Eggs (11a) with “Gay! wf)= aes (11b) Hk-1), (ite) v= H(m—1). (ua) "A misprint inthe first line of (5) gives», where yi intended.256 JS. Cramer, Mean and variance of Rt Making use of the properties of the Beta function {see Abramovitz and Stegun (1964, pp. 256-258)] we obtain the moments of R? as (12a) aye By td west ERY = Ewe a5 papa (12) and so forth. Their dependence on the three parameters A, k and m is clear, ‘We shall shortly see that they are quite easy to compute. 3. A and the probability limit of R? Eq. (9) defines the parameter \ as the ratio of the systematic variation B’X'XB to the disturbance variance o?, These magnitudes differ by a factor 'm, as can be seen by taking expectations on both sides of eq. (5), Jy — my? = BIX’Xb + e’e, Neglecting the 10s5 of degrees of freedom among the residuals, we find the expected sum of squares of ¥ as SSY = B'X'XB + mo? (a3) By analogy to the passage from (5) to (6) this naturally suggests BB °" Rexx ma? for a measure of the qu of the observations, itself free from sampling variation. This magnitude is ‘commensurate with R?, and it is related to A of (9) by (as) This parametrization is standard in earlier analyses of R? with fixed, non-random X, as opposed to the case of a joint multivariate Normal distribution of the elements of y and the regressor variables of X. Barten (1962) defined @ without further ado as the ‘parent multiple correlation ‘coefficient’ that is estimated by R?, and Schdnfeld (1969, p. 71) equally relies1S. Cramer, Mean and vartance of R? 251 entirely on intuitive appeal when he labels ¢ the ‘theoretical measure of fit’. Press and Zellner (1978) take the parameter from Barten as they lay the basis for a Bayesian analysis, We shall follow the example of Koerts and Abrahamse (1970) and derive as the probability limit of R?.? To do so we introduce sample size n as a variable which has the value m for the sample actually observed, We also rewrite (6) in self-evident notation as ne bj (n-2X:X, by (6) In passing to the limit for m-» co, the main difficulty is the behaviour of X, since this consists of non-random constants. We resort to the device of ‘Hotelling (1940) to treat the regressors as ‘constant in repeated samples’. This ‘means that the virtual sample size n is given by n=pm, a7) with integer p, and that the matrix X, consists of p replications of X= X,, stacked on top of one another. We vary n by varying p, and thus obtain, just like Theil (1971, p. 363), plim (7 saga) fe (ot Enc, re . (18) With X, behaving in this fashion and ii.d, disturbances (as assumed) the OLS estimate is consistent, lim, =, (9) and. ve plim n~'eie, = 02, (20) Upon substitution of these three probability limits into (15) we obtain, by a4), . Bm XX, pln = ane ipaeeat ey which is the desired result Some of these authors pursue the same questions as we do, Barten derives an approximate ‘expression for the bias of R? relative to, and suggests corrections, but he does aot examine dispersion. Koerts and Abrahamse establish the dstibution of R° for given o, Band aod show that this is very sensitive to changes in X. The distribution is deterined mamerially, Gad t must be computed anew for each new matrix X.258 4.8. Cramer, Mean and variance of R* This provides a direct link of the parameter $ with R? in the model under consideration. The identification with a probability limit is particularly appropriate as we shall examine the behaviour of R? at different sample sizes. 4, The mean and standard deviation of R* We return to eq. (12) for the first two moments of R?, While (12a) may be developed a little further (as we shall presently see), we can evaluate both expressions to any desired degree of accuracy by summing the first J terms of the infinite series concerned. When we write the moments as given in (12) as Ewes fro the discrepancy involved in taking the first J terms only is b= YL w(s)z(/)- (22) cza For all moments the 2(/) are positive and tend from below to 1 as /-+ 00, and w() is a Poisson density which sums to 1, Clearly, then, oo. @3) pa It requires no great programming skill to continue summing the w(j), z(j) for given parameter values until the right-hand side of (23) reduces 6 to the desired level of accuracy. In this sense (12) provides easily computable expressions for the moments of R?, * In the event we have set 8 at 10~* in computing the first two moments for various values of , m and k, The mean is overly accurate, and the standard deviation derived from the two moments is correct to three decimal places. The results are given in table 1, and illustrated in figs. 1 and 2. Fig. 1 shows that E(R?) converges rather quickly to @ from above. Some simulations, not further reported here, suggest that this also holds for the median and the mode. R? thus has a definite upwards bias which is however rapidly reduced as the sample size increases. Very roughly the bias is about 0.03 or less with twenty observations when we have one regressor (k= 2), or29 JS. Cramer, Mean and variance of R* ‘Jo pe“ 30“y Jo somes papops 30} <4 J0 wORELAp prRpEIS PUY ROP T98L260 JS. Cramer, Mean and variance of R? 2 ke? os 07 on08s7 06 | | | os o4 pron 03 02 aa [ > 0 aco 200 sao a0 soo oo Fig 18, Expected value of A? asa funetion of m for k= 2 and selected values of 4. with thirty to forty observations with two regressors. But this is a rough indication only, as the exact values vary with ¢. ‘The reason for this brie dismissal of the bias is that it is completely swamped by the dispersion of R?: whenever the bias is at all noticeable, the standard error of R? is several times as large. Fig. 2 shows how the standard ‘error varies with @ and m and, to a much lesser extent, with &; the effect of is particularly strong, For a standard error of R? of 0.03 or less we must have at least twenty observations if = 0.9. Such high values of the true correlation1S. Cramer, Mean and variance of R? 261 kes 06 ow 3 3 8 20 100 m0 sno a0 $00 Fig Ib, Expected value of R? as a function of m for k= 3 and selected values of, coefficient are probably exceptional (as opposed to sample R? of 0.9); but with # at 0.667, which is still quite respectable, nearly two hundred observations are needed to reduce the standard error to 0.03. It is this dependence of the dispersion of the sample R? on the unknown $ which renders any judgment of accuracy so hazardous. The relationship is further illustrated in table 2, which shows how many observations are needed at various ¢ to reduce the standard error of R? to certain given levels.22 1S. Cramer, Mean and variance of Rt 8 | ke a 5 Ww e-09) | ~o= 0867 A ts tr a | ~~ wo m0 300 0 500 Fig. 2a. Standard deviation of A? as function of m for k= 2 and selected values of ¢. To sum up, R? has an upward bias which can be substantial in small samples, but it is anyhow very unreliable, even at moderate sample sizes, because of its dispersion. With less than fifty observations or so there is little point in quoting R? at all, and once we are beyond such numbers the bias has virtually disappeared. The bias issue is a red herring. 5. The adjusted multiple correlation coefficient R? The intuitive explanation of the upward bias of R? is that OLS treats it the sample maximand, and the reason why this bias occurs in small samples is4.8. Cramer, Mean and variance of R? 263 3 kes \ WS ozs cas ao 90 ona 3 2 =) 8 00 100 m0 sao 00 500 Fig. 2b, Standard deviation of R* asa fonction of m for k= 3 and selected values of ¢. that R? does not allow for the loss of degrees of freedom through estimation. This argument justifies the prevalent custom of ‘adjusting’ K? as in 5 ee/(m-k) Bel Gy mim 4) or B= (1+h)R?-h, (25a)264 1S. Cramer, Mean and variance of R? ‘Table ‘Minimum sample size which reduces the standard error of R? below a certain level a. koa m3 = a O08 005 oo oor 005 070 9030 355 200 30 355 200 040 32 184 6 512 184 050 a7 130 3 as 19 0.60 301 108 2 298 107 070 ig 65 16 1s “ 075 130 a 129 46 080 35 30 4 2» oss 8 1 a 15 080 a 6 : 19 head = 4 a a "All cample sizes & K+ 1 reduce the standard eror below a. with kr er (25b) ‘As the first expression in (24) shows, the sums of squares in the first definition of R? in (6) are ‘corrected for degrees of freedom’. This adjustment is common usage among economists, probably because of their unique habit of submitting quite small samples to regression analysis. The adjustment is recommended in most econometric textbooks, all the way back to Ezekiel (1930a), though generally without much theoretical justification and without a source reference. The issue of correcting R? in some way dates from the 1920's, for Ezekiel (1930b) can quote three slightly different definitions of h of (25a) from that decade. The surviving definition (25b) is due to Fisher (1924), ‘who justifies it with the standard argument about sums of squares around the ‘mean that is implicit in the first expression of (24). ‘More precise arguments in support of the adjustment (25) can be advanced. ‘The adjusted R? does not suffer from the defect of R? that is automatically increases with the addition of new regressors; we shall show the E(R?) is indeed independent of the number of regressors. To begin with we must develop E(R?) from (11) a little further, rewriting it as (ay MaeriDyJS. Cramer, Mean and variance of R? 265, Upon consulting Slater (1964, p. 504, 13.1.2) we find that the summations within brackets are a special form of the Kummer function M(a, b, 1), namely a~™M(a, a+ 1, 2). We introduce the notation ae a)a“'M(a,a+1,p)= et (27) etoa)orM(a, aston) Yo aeiy (ay and observe from Slater (1964, p. 505, 13.2.1) that = fem entar. (28) a(usa)= fiemetae, (28) Integrating the right-hand side by parts we find ag(u, a). (29) ‘We now make the appropriate substitutions of these results in (26) and obtain wg(u,a+1)= E(R*) =1~ (owen (2,0), or, by (11c) and (11d), E(R?)=1—(m—k)de“P g(4A,(m—1)}. (30) By (24), then, F(R?) =1~(m—1)teP g{4A,(4m—1)}, Gy) and this expression depends only on m and on ¢ (via A) but not on k: ER?) is independent of the number of regressors. We finally note, with more relevance to our original purpose, that the adjustment very largely removes the upward bias of R®. By (25) we have E(R*) = (1+ A)E(R?) —h, (32) and if this operation is applied to the entries of table 1 it will be seen that the bias virtually disappears; for low values of @, a slight downward bias occurs instead. But the dispersion remains, and is even increased, since s.d.(R*) =(1+h)s.d.(R) (33) while is positive and sizeable in small samples. In smallish samples R?, though unbiased, is even more unreliable than R?,266 41.8. Cramer, Mean and varonce of RE References Barten, AP. 1962, Note oa the unbiased estimation of the squared mulple correlation coe ‘ent, Stalstien Neerlandica 16, 151-163, Davis, B., 1964, Gamma functions and related functions jn: M, Abramovite and LA. Stegun, ‘eds, Handbook of mathematical fanetions (Dover, New York) 253-293. Ezekiel, M,,1930a, Methods of correlation analysis (Wiley, New York), Ezekiel, M, 19306, The sampling variability of Hnear and curvilinear regressions, Annals of ‘Mathematical Statistics 1, 275-300. Fisher, R.A, 1924, The inflence of rainfall onthe yield of wheat at Rothamsted, Philosophical ‘Transactions of the Royal Society of London B 213, 89-102. Hoteling, H., 1940, The selection of variates for use in prediction, Annals of Mathematical Statistics 11, 271-283, Johnson, N.L. and S. Kotz, 1970, Distributions in statistics: Continuous univarite distributions, ‘Vol. 2 (Wiley, New York). Koers, J. and APJ. Abrahamse, 1970, The corelaton cocficient in the general linear model, ‘European Economic Review 1, 401-2. Press, SJ. and A. Zellner, 1978, Posterior distsibution for the multiple coreation coetcient with ‘xed repressors, Journal of Econometrics 8, 207-321. Schoenfeld, P., 1963, Methoden der Oekonometie, Band I (Vahlen, Beri) Slater, LJ, 1954, Confluent hypergeometric fonctions, in: M, Abramovitz and LA. Stegun, eds, “Handbook of mathematical functions (Dover, New York) 504-535. ‘Theil, H, 1971, Principles of econometics (Wiley, New York).

Stat 331 Course Notes
No ratings yet
Stat 331 Course Notes
79 pages
Colin Cameron 1997
No ratings yet
Colin Cameron 1997
14 pages
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 271 300
No ratings yet
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 271 300
30 pages
TA_session_05
No ratings yet
TA_session_05
9 pages
328formulas03 (2019 - 04 - 03 15 - 13 - 21 UTC)
No ratings yet
328formulas03 (2019 - 04 - 03 15 - 13 - 21 UTC)
12 pages
Lect 9
No ratings yet
Lect 9
6 pages
Chapter 5 - STATISTICAL TESTS OF THE LEAST SQUARES ESTIMATES
No ratings yet
Chapter 5 - STATISTICAL TESTS OF THE LEAST SQUARES ESTIMATES
10 pages
5th Chap 1st
No ratings yet
5th Chap 1st
6 pages
1965 - Shapiro - Analysis Variance Normality
No ratings yet
1965 - Shapiro - Analysis Variance Normality
21 pages
Mean, Variance, Moments and Characteristic Functions: Pillai
No ratings yet
Mean, Variance, Moments and Characteristic Functions: Pillai
50 pages
Econometrics Endterm Summary 2 PDF
No ratings yet
Econometrics Endterm Summary 2 PDF
43 pages
34_101
No ratings yet
34_101
9 pages
Ec2 1
No ratings yet
Ec2 1
11 pages
ECMT1020 Formulas 2021
No ratings yet
ECMT1020 Formulas 2021
9 pages
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
No ratings yet
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
9 pages
Problem Set 1 - Answers
No ratings yet
Problem Set 1 - Answers
7 pages
Shapiro 1965
No ratings yet
Shapiro 1965
21 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Bera 2 - Introduction to Statistics for Econometricians, Part II Apostila
No ratings yet
Bera 2 - Introduction to Statistics for Econometricians, Part II Apostila
114 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
Statistical Models in R
No ratings yet
Statistical Models in R
18 pages
6034 - Classical Linear Regression Model
No ratings yet
6034 - Classical Linear Regression Model
30 pages
Lecture01 Uppsala EQG 12
No ratings yet
Lecture01 Uppsala EQG 12
39 pages
6.lect6a Mean Variance Moments)
No ratings yet
6.lect6a Mean Variance Moments)
22 pages
Chapter 5- Sample Statistics
No ratings yet
Chapter 5- Sample Statistics
90 pages
Basic Probability Reference Sheet: February 27, 2001
No ratings yet
Basic Probability Reference Sheet: February 27, 2001
8 pages
Weatherwax Weisberg Solutions
No ratings yet
Weatherwax Weisberg Solutions
162 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
Linear Models
No ratings yet
Linear Models
92 pages
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
No ratings yet
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
35 pages
3 SimpleLinearRegression
No ratings yet
3 SimpleLinearRegression
30 pages
06 01 Regression Analysis
No ratings yet
06 01 Regression Analysis
6 pages
Data Analysis, Standard Error, and Confidence Limits: Mean of A Set of Measurements
No ratings yet
Data Analysis, Standard Error, and Confidence Limits: Mean of A Set of Measurements
5 pages
Nonlinear Least SQ: Queensland 4001, Australia
No ratings yet
Nonlinear Least SQ: Queensland 4001, Australia
17 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
Chaeat Sheet Econometrics
100% (2)
Chaeat Sheet Econometrics
5 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
EC1 Slides Part4
No ratings yet
EC1 Slides Part4
35 pages
ECMT1020 2023S1 Formulas
No ratings yet
ECMT1020 2023S1 Formulas
10 pages
Linear Regression
100% (2)
Linear Regression
228 pages
CH 00
No ratings yet
CH 00
4 pages
Formula Help Sheet
No ratings yet
Formula Help Sheet
6 pages
Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing
No ratings yet
Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing
16 pages
Econometrics: Domodar N. Gujarati
No ratings yet
Econometrics: Domodar N. Gujarati
36 pages
Chapter 5 Measure the Fit of Regression
No ratings yet
Chapter 5 Measure the Fit of Regression
20 pages
Scott and Watson CHPT 4 Solutions
No ratings yet
Scott and Watson CHPT 4 Solutions
4 pages
Intronumericalrecipes v01 Chapter02 Regress
No ratings yet
Intronumericalrecipes v01 Chapter02 Regress
15 pages
Regression Models For Data Science in R by Brian Caffo
No ratings yet
Regression Models For Data Science in R by Brian Caffo
144 pages
Probd
No ratings yet
Probd
49 pages
Advanced Econometrics: Instructor: Kanika Mahajan
No ratings yet
Advanced Econometrics: Instructor: Kanika Mahajan
36 pages
PE Civil: Transportation e-book Practice Exam
No ratings yet
PE Civil: Transportation e-book Practice Exam
41 pages
Chapter 2, Random Variable (Part2)
No ratings yet
Chapter 2, Random Variable (Part2)
19 pages
AD-785 623 On The Arithmetic Means and Variances
No ratings yet
AD-785 623 On The Arithmetic Means and Variances
16 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Battle-Of The Somme
No ratings yet
Battle-Of The Somme
55 pages
Klemes 1974, Hurst Phenomenon - 000587
No ratings yet
Klemes 1974, Hurst Phenomenon - 000587
8 pages
Klemes 1971, Gould and Moran Models - 000588
No ratings yet
Klemes 1971, Gould and Moran Models - 000588
3 pages
Cleveland, Local Regression - 000008
No ratings yet
Cleveland, Local Regression - 000008
11 pages

Cramer, Mean and Variance of r2 - 000009

Uploaded by

Cramer, Mean and Variance of r2 - 000009

Uploaded by

You might also like