0% found this document useful (0 votes)
40 views14 pages

Cramer, Mean and Variance of r2 - 000009

Cramer, Mean and Variance

Uploaded by

Stephen McIntyre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
40 views14 pages

Cramer, Mean and Variance of r2 - 000009

Cramer, Mean and Variance

Uploaded by

Stephen McIntyre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 14
Journal of Econometrics 35 (1987) 253-266, NorthHolland MEAN AND VARIANCE OF R? IN SMALL AND. MODERATE SAMPLES* JS. CRAMER University of Amsterdam, 1011 NH Amsterdam, The Netherlands Received March 1986, nal version received September 1986 We derive and use easily computable expressions forthe mean and variance of R? in the standard linear regression model with fed regressos. In respect to its probability Limit R* is seriously biased upward in small samples; the ‘adjusted’ R® does much beter. But at sample sizes where these distinctions matter both measures are thoroughly unreliable because of their large disper sion, R® should not be quoted for samples of ess than fifty observations. 1, Introduction Ordinary least squares (OLS) estimation of linear regression equations is still an accepted tool of analysis am economists. The reported results invariably include R? or the ‘adjusted’ ‘These measures of goodness of fit have a fatal attraction. Although it is generally conceded among insiders that they do not mean a thing, high values are still a source of pride and satisfaction to their authors, however hard they may try to conceal these teeing To put these sample statistics in proper perspective we shall derive their ‘means and variances for various sample sizes under the standard assumptions of econometric theory. This means that the regressor variables are regarded as given, noncrandom constants. In iis respect the model differs from the Elassicalueatment of contaton in the seting of a, multivariate Normal distribution, and the results differ too. The mean of R? converges to its probability limit from above, and in this sense it has an upward bias which. can_be substantial in small samples. In this respect R* i superior. The Standard eivors show however tat Tor sample sizes of up to 40 or 50 either measure is a very unreliable statistic. *T have benefited from the coretions and improvements suggested by several referees, Norman Draper, oyee Mejering, and V. Srivastava, 1 owe particular debts to Roald Ramer and San de Locus for their help with the algebra and bibliography of section 4. 0304-4076 /87/$3500 1987, Flsevier Science Publishers BY. (North-Holland) 254 4S. Cramer, Mean and variance of R? 2. The moments of 2? in the standard model ‘We consider the standard linear regression model with Normal disturbances as given in any textbook of econometrics, but we employ a slightly non-stan- dard presentation and notation. We write yrer+Xp+e, aq) with y a (m1) vector of observed values of the dependent variable and ea vector of m independent N(0, 0°) disturbances, On the right c is a unit vector, Y the intercept parameter, and X a matrix of (k—1) regressor variables, which have all been measured as deviations from their sample means. This last property of X simplifies the notation in the sequel, like the use of m instead of nn for the actual sample size. Note that we have merely reparametrized the systematic, non-random part of the right-hand side, without touching the definition of y and e; we have not taken deviations from the mean for the dependent variable, and the elements of e are still stochastically independent. With ordinary least squares, the estimate of y is sey. @ p which is the sample mean of y, while the estimate of is b= (XX) xy, (3) Upon defining the residual vector ¢ as e=y~ GF Xb, @ we have the identity JY = MFP + BIX'XD + e'e, or (vy — mp?) = 0X + ee, 6) This is the familar decomposition of the sum of squares of y, on the left, into a systematic and a residual component. Their relative size determines the familar measure of goodness of fit ee BX'XD Re1-— ©), JS. Cramer, Mean and variance of Rt 255 ‘We derive the density function of R? and then its moments for the general case with +0. First consider the transformation BR? _ wIX'Xb/a? ee/o* COs is halfway towards F) is the ratio of two independent chi-square ites, The numerator has a non-central chrsquare aisTHbUTION, —= Oo PEM 204-1), oe with (k ~ 1) degrees of freedom and non-centrality parameters ee © ¢ @ For the denominator we have of course ) psip ee, a(ees FET mak). 0) YC o Par (or ‘The density function of G can be found, for example, in Johnson and Kotz (1970, II, p. 191). Upon introducing the transformation (7) and relabelling the parameters we obtain the density of R?, with argument 1, from the transfor- mation theorem. This yields 5 1 enw wR Hom Eggs (11a) with “Gay! wf)= aes (11b) Hk-1), (ite) v= H(m—1). (ua) "A misprint inthe first line of (5) gives», where yi intended. 256 JS. Cramer, Mean and variance of Rt Making use of the properties of the Beta function {see Abramovitz and Stegun (1964, pp. 256-258)] we obtain the moments of R? as (12a) aye By td west ERY = Ewe a5 papa (12) and so forth. Their dependence on the three parameters A, k and m is clear, ‘We shall shortly see that they are quite easy to compute. 3. A and the probability limit of R? Eq. (9) defines the parameter \ as the ratio of the systematic variation B’X'XB to the disturbance variance o?, These magnitudes differ by a factor 'm, as can be seen by taking expectations on both sides of eq. (5), Jy — my? = BIX’Xb + e’e, Neglecting the 10s5 of degrees of freedom among the residuals, we find the expected sum of squares of ¥ as SSY = B'X'XB + mo? (a3) By analogy to the passage from (5) to (6) this naturally suggests BB °" Rexx ma? for a measure of the qu of the observations, itself free from sampling variation. This magnitude is ‘commensurate with R?, and it is related to A of (9) by (as) This parametrization is standard in earlier analyses of R? with fixed, non-random X, as opposed to the case of a joint multivariate Normal distribution of the elements of y and the regressor variables of X. Barten (1962) defined @ without further ado as the ‘parent multiple correlation ‘coefficient’ that is estimated by R?, and Schdnfeld (1969, p. 71) equally relies 1S. Cramer, Mean and vartance of R? 251 entirely on intuitive appeal when he labels ¢ the ‘theoretical measure of fit’. Press and Zellner (1978) take the parameter from Barten as they lay the basis for a Bayesian analysis, We shall follow the example of Koerts and Abrahamse (1970) and derive as the probability limit of R?.? To do so we introduce sample size n as a variable which has the value m for the sample actually observed, We also rewrite (6) in self-evident notation as ne bj (n-2X:X, by (6) In passing to the limit for m-» co, the main difficulty is the behaviour of X, since this consists of non-random constants. We resort to the device of ‘Hotelling (1940) to treat the regressors as ‘constant in repeated samples’. This ‘means that the virtual sample size n is given by n=pm, a7) with integer p, and that the matrix X, consists of p replications of X= X,, stacked on top of one another. We vary n by varying p, and thus obtain, just like Theil (1971, p. 363), plim (7 saga) fe (ot Enc, re . (18) With X, behaving in this fashion and ii.d, disturbances (as assumed) the OLS estimate is consistent, lim, =, (9) and. ve plim n~'eie, = 02, (20) Upon substitution of these three probability limits into (15) we obtain, by a4), . Bm XX, pln = ane ipaeeat ey which is the desired result Some of these authors pursue the same questions as we do, Barten derives an approximate ‘expression for the bias of R? relative to, and suggests corrections, but he does aot examine dispersion. Koerts and Abrahamse establish the dstibution of R° for given o, Band aod show that this is very sensitive to changes in X. The distribution is deterined mamerially, Gad t must be computed anew for each new matrix X. 258 4.8. Cramer, Mean and variance of R* This provides a direct link of the parameter $ with R? in the model under consideration. The identification with a probability limit is particularly ap- propriate as we shall examine the behaviour of R? at different sample sizes. 4, The mean and standard deviation of R* We return to eq. (12) for the first two moments of R?, While (12a) may be developed a little further (as we shall presently see), we can evaluate both expressions to any desired degree of accuracy by summing the first J terms of the infinite series concerned. When we write the moments as given in (12) as Ewes fro the discrepancy involved in taking the first J terms only is b= YL w(s)z(/)- (22) cza For all moments the 2(/) are positive and tend from below to 1 as /-+ 00, and w() is a Poisson density which sums to 1, Clearly, then, oo. @3) pa It requires no great programming skill to continue summing the w(j), z(j) for given parameter values until the right-hand side of (23) reduces 6 to the desired level of accuracy. In this sense (12) provides easily computable expressions for the moments of R?, * In the event we have set 8 at 10~* in computing the first two moments for various values of , m and k, The mean is overly accurate, and the standard deviation derived from the two moments is correct to three decimal places. The results are given in table 1, and illustrated in figs. 1 and 2. Fig. 1 shows that E(R?) converges rather quickly to @ from above. Some simulations, not further reported here, suggest that this also holds for the median and the mode. R? thus has a definite upwards bias which is however rapidly reduced as the sample size increases. Very roughly the bias is about 0.03 or less with twenty observations when we have one regressor (k= 2), or 29 JS. Cramer, Mean and variance of R* ‘Jo pe“ 30“y Jo somes papops 30} <4 J0 wORELAp prRpEIS PUY ROP T98L 260 JS. Cramer, Mean and variance of R? 2 ke? os 07 on08s7 06 | | | os o4 pron 03 02 aa [ > 0 aco 200 sao a0 soo oo Fig 18, Expected value of A? asa funetion of m for k= 2 and selected values of 4. with thirty to forty observations with two regressors. But this is a rough indication only, as the exact values vary with ¢. ‘The reason for this brie dismissal of the bias is that it is completely swamped by the dispersion of R?: whenever the bias is at all noticeable, the standard error of R? is several times as large. Fig. 2 shows how the standard ‘error varies with @ and m and, to a much lesser extent, with &; the effect of is particularly strong, For a standard error of R? of 0.03 or less we must have at least twenty observations if = 0.9. Such high values of the true correlation 1S. Cramer, Mean and variance of R? 261 kes 06 ow 3 3 8 20 100 m0 sno a0 $00 Fig Ib, Expected value of R? as a function of m for k= 3 and selected values of, coefficient are probably exceptional (as opposed to sample R? of 0.9); but with # at 0.667, which is still quite respectable, nearly two hundred observations are needed to reduce the standard error to 0.03. It is this dependence of the dispersion of the sample R? on the unknown $ which renders any judgment of accuracy so hazardous. The relationship is further illustrated in table 2, which shows how many observations are needed at various ¢ to reduce the standard error of R? to certain given levels. 22 1S. Cramer, Mean and variance of Rt 8 | ke a 5 Ww e-09) | ~o= 0867 A ts tr a | ~~ wo m0 300 0 500 Fig. 2a. Standard deviation of A? as function of m for k= 2 and selected values of ¢. To sum up, R? has an upward bias which can be substantial in small samples, but it is anyhow very unreliable, even at moderate sample sizes, because of its dispersion. With less than fifty observations or so there is little point in quoting R? at all, and once we are beyond such numbers the bias has virtually disappeared. The bias issue is a red herring. 5. The adjusted multiple correlation coefficient R? The intuitive explanation of the upward bias of R? is that OLS treats it the sample maximand, and the reason why this bias occurs in small samples is 4.8. Cramer, Mean and variance of R? 263 3 kes \ WS ozs cas ao 90 ona 3 2 =) 8 00 100 m0 sao 00 500 Fig. 2b, Standard deviation of R* asa fonction of m for k= 3 and selected values of ¢. that R? does not allow for the loss of degrees of freedom through estimation. This argument justifies the prevalent custom of ‘adjusting’ K? as in 5 ee/(m-k) Bel Gy mim 4) or B= (1+h)R?-h, (25a) 264 1S. Cramer, Mean and variance of R? ‘Table ‘Minimum sample size which reduces the standard error of R? below a certain level a. koa m3 = a O08 005 oo oor 005 070 9030 355 200 30 355 200 040 32 184 6 512 184 050 a7 130 3 as 19 0.60 301 108 2 298 107 070 ig 65 16 1s “ 075 130 a 129 46 080 35 30 4 2» oss 8 1 a 15 080 a 6 : 19 head = 4 a a "All cample sizes & K+ 1 reduce the standard eror below a. with kr er (25b) ‘As the first expression in (24) shows, the sums of squares in the first definition of R? in (6) are ‘corrected for degrees of freedom’. This adjustment is common usage among economists, probably because of their unique habit of submitting quite small samples to regression analysis. The adjustment is recommended in most econometric textbooks, all the way back to Ezekiel (1930a), though generally without much theoretical justification and without a source reference. The issue of correcting R? in some way dates from the 1920's, for Ezekiel (1930b) can quote three slightly different definitions of h of (25a) from that decade. The surviving definition (25b) is due to Fisher (1924), ‘who justifies it with the standard argument about sums of squares around the ‘mean that is implicit in the first expression of (24). ‘More precise arguments in support of the adjustment (25) can be advanced. ‘The adjusted R? does not suffer from the defect of R? that is automatically increases with the addition of new regressors; we shall show the E(R?) is indeed independent of the number of regressors. To begin with we must develop E(R?) from (11) a little further, rewriting it as (ay MaeriDy JS. Cramer, Mean and variance of R? 265, Upon consulting Slater (1964, p. 504, 13.1.2) we find that the summations within brackets are a special form of the Kummer function M(a, b, 1), namely a~™M(a, a+ 1, 2). We introduce the notation ae a)a“'M(a,a+1,p)= et (27) etoa)orM(a, aston) Yo aeiy (ay and observe from Slater (1964, p. 505, 13.2.1) that = fem entar. (28) a(usa)= fiemetae, (28) Integrating the right-hand side by parts we find ag(u, a). (29) ‘We now make the appropriate substitutions of these results in (26) and obtain wg(u,a+1)= E(R*) =1~ (owen (2,0), or, by (11c) and (11d), E(R?)=1—(m—k)de“P g(4A,(m—1)}. (30) By (24), then, F(R?) =1~(m—1)teP g{4A,(4m—1)}, Gy) and this expression depends only on m and on ¢ (via A) but not on k: ER?) is independent of the number of regressors. We finally note, with more relevance to our original purpose, that the adjustment very largely removes the upward bias of R®. By (25) we have E(R*) = (1+ A)E(R?) —h, (32) and if this operation is applied to the entries of table 1 it will be seen that the bias virtually disappears; for low values of @, a slight downward bias occurs instead. But the dispersion remains, and is even increased, since s.d.(R*) =(1+h)s.d.(R) (33) while is positive and sizeable in small samples. In smallish samples R?, though unbiased, is even more unreliable than R?, 266 41.8. Cramer, Mean and varonce of RE References Barten, AP. 1962, Note oa the unbiased estimation of the squared mulple correlation coe ‘ent, Stalstien Neerlandica 16, 151-163, Davis, B., 1964, Gamma functions and related functions jn: M, Abramovite and LA. Stegun, ‘eds, Handbook of mathematical fanetions (Dover, New York) 253-293. Ezekiel, M,,1930a, Methods of correlation analysis (Wiley, New York), Ezekiel, M, 19306, The sampling variability of Hnear and curvilinear regressions, Annals of ‘Mathematical Statistics 1, 275-300. Fisher, R.A, 1924, The inflence of rainfall onthe yield of wheat at Rothamsted, Philosophical ‘Transactions of the Royal Society of London B 213, 89-102. Hoteling, H., 1940, The selection of variates for use in prediction, Annals of Mathematical Statistics 11, 271-283, Johnson, N.L. and S. Kotz, 1970, Distributions in statistics: Continuous univarite distributions, ‘Vol. 2 (Wiley, New York). Koers, J. and APJ. Abrahamse, 1970, The corelaton cocficient in the general linear model, ‘European Economic Review 1, 401-2. Press, SJ. and A. Zellner, 1978, Posterior distsibution for the multiple coreation coetcient with ‘xed repressors, Journal of Econometrics 8, 207-321. Schoenfeld, P., 1963, Methoden der Oekonometie, Band I (Vahlen, Beri) Slater, LJ, 1954, Confluent hypergeometric fonctions, in: M, Abramovitz and LA. Stegun, eds, “Handbook of mathematical functions (Dover, New York) 504-535. ‘Theil, H, 1971, Principles of econometics (Wiley, New York).

You might also like