0% found this document useful (0 votes)
24 views8 pages

Cheng2016 Point Biserial Correlation

Uploaded by

rireno5598
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views8 pages

Cheng2016 Point Biserial Correlation

Uploaded by

rireno5598
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

344

British Journal of Mathematical and Statistical Psychology (2016), 69, 344–351


© 2016 The British Psychological Society
www.wileyonlinelibrary.com

A short note on the maximal point-biserial


correlation under non-normality
Ying Cheng* and Haiyan Liu
Department of Psychology, University of Notre Dame, Indiana, USA

The aim of this paper is to derive the maximal point-biserial correlation under non-
normality. Several widely used non-normal distributions are considered, namely the
uniform distribution, t-distribution, exponential distribution, and a mixture of two normal
distributions. Results show that the maximal point-biserial correlation, depending on the
non-normal continuous variable underlying the binary manifest variable, may not be a
function of p (the probability that the dichotomous variable takes the value 1), can be
symmetric or non-symmetric around p = .5, and may still lie in the range from 1.0 to 1.0.
Therefore researchers should exercise caution when they interpret their sample point-
biserial correlation coefficients based on popular beliefs that the maximal point-biserial
correlation is always smaller than 1, and that the size of the correlation is always further
restricted as p deviates from .5.

1. Introduction
The population product-moment correlation between a continuous (X) and dichotomous
(Y ) variable, denoted by qpb ðX; Y Þ, is also known as the point-biserial correlation:

E ½ðX  lX ÞðY  lY Þ
qpb ðX; Y Þ ¼ :
rX rY
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Given that lY ¼ P ðY ¼ 1Þ ¼ p and rY ¼ pð1  pÞ, we get

E ½ðX  lX ÞðY  pÞ


qpb ðX; Y Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
rX pð1  pÞ

Here qpb ðX; Y Þ is the point-biserial correlation at the population level between X and
Y, whose sample counterpart is denoted by rpb ðX; Y Þ.
When the binary variable Y can be assumed to come from dichotomizing a normally
distributed continuous variable Y  , the product-moment correlation betweenX and Y  is

called the biserial
  correlation. Suppose that P ðY ¼ 1Þ ¼ P Y [ sp ¼ p and
P ðY ¼ 0Þ ¼ P Y  sp ¼ 1  p. Then sp is the dichotomization threshold on Y . If Y
follows the standard normal distribution, sp is equal to U1 ð1  pÞ, where UðÞ represents
the standard normal cumulative distribution function (c.d.f.). Point-biserial and biserial
correlations play an important role in psychometric theory, for example item analysis
(Crocker & Algina, 2006, p. 317).

*Correspondence should be addressed to Ying Cheng, Department of Psychology, University of Notre Dame, 118
Haggar Hall, Notre Dame, IN 46556, USA (email: [email protected]).

DOI:10.1111/bmsp.12075
Max point-biserial correlation under non-normality 345

It is well known that the range of the point-biserial correlation may be constrained.
Lord and Novick (1968, p. 340) noted that the ‘point-biserial is never as much as four-fifths
of the biserial’ correlation, because
 
/ sp
qpb ðX; Y Þ ¼ qb ðX; Y Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ;
pð1  pÞ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
where qb ðX; Y Þ is the corresponding biserial correlation, and /ðsp Þ= pð1  pÞ\:8
Here /ðÞ is the standard normal probability distribution function. Lord and Novick (1968)
further noted that since jqb j  1, qpb has to be smaller than .8.
Gradstein (1986) analytically derived the maximal qb ðX; Y Þ when X is normally
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
distributed. The maximum is exactly /ðsp Þ= pð1  pÞ, a function that depends only on p.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
As p departs further from .5, /ðsp Þ= pð1  pÞ decreases. Therefore, researchers
cautioned against interpreting the size of the point-biserial correlation based on the
usual range of the product-moment correlation, that is, [1, 1] (Allen & Yen, 1979, p. 39;
Gradstein, 1986).
On the other hand, a simple Google search pulled up many web pages that claimed
point-biserial correlation values range from 1.0 to +1.0 (e.g., http://www.statisticsso-
lutions.com/point-biserial-correlation/). Interested users of the point-biserial correlation
may be confused by this apparently glaring discrepancy and inconsistency. One possible
reason is violation of the normality assumption which was used in the derivation by
Gradstein (1986). While the assumption of normality is traditionally very common in
psychological research, there has been increasing awareness of and attention to non-
normal data in recent years. For example, researchers have investigated the effect of non-
normality on factor analysis and structural equation modelling (Curran, West, & Finch,
1996; Jennrich & Satorra, 2014; Yanagihara, Tonda, & Matsumoto, 2005; Yuan, Bentler, &
Zhang, 2005; Yuan, Marshall, & Bentler, 2002), reliability coefficient estimation (Sheng &
Sheng, 2012; Yuan & Bentler, 2002), test equating (Zu & Yuan, 2012), mixture item
response theory modelling (Sen, Cohen, & Kim, 2016), as well as missing-data analysis
(Tong, Zhang, & Yuan, 2014; Yuan, 2009; Yuan & Bentler, 2010).
The aim of this paper is to derive the range of qpb under several widely used non-normal
distributions. Interestingly, the issue of non-normality has been well recognized in the
context of statistical testing, for example in power analysis and sample size planning
(Bonett & Wright, 2000; Fowler, 1987) and corrections for statistical tests of correlation
under non-normality (Yuan & Bentler, 2000), but we are not aware of any analytical
treatment of the range of the point-biserial correlation under non-normality. In addition,
the non-normality discussed in the context of statistical testing of qpb usually focuses on
the underlying distribution of the binary variable Y instead of the distribution of the
continuous variable X. In the derivation and discussion below it is not essential what
distribution underlies Y.

2. Maximal qpb ðX; Y Þ when X is normally distributed


Gradstein (1986) derived the maximal qpb ðX; Y Þ when X is normally distributed. Here we
provide details of the derivation, which will facilitate our derivation of the range of qpb
under non-normal distributions. Let us again consider the continuous variable
X  Nðl; r2 Þ, and the dichotomous variable Y, which is coded as 0 or 1, with
P ðY ¼ 1Þ ¼ p. So E½Y  ¼ p and Var½Y  ¼ pð1  pÞ. Without loss of generality, we can
346 Ying Cheng and Haiyan Liu

consider the correlation between Z and Y, where Z is a linear transformation of X by


Z = (X  l)/r. Clearly Z ~ N (0,1), so that lZ = 0 and rZ = 1. Then

E ½ðZ  lZ ÞðY  pÞ E ½ZY 


qpb ðX; Y Þ ¼ qpb ðZ; Y Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
rZ pð1  pÞ pð1  pÞ

For any given p, the maximum of qpb ðX; Y Þ is reached when E ½ZY  is maximized. Since
Y is either 1 or 0, ZY is either Z or 0. Therefore E ½ZY  is maximized when the top 100p% of
the Z distribution happens to have Y = 1, and
 
maxfE ½ZY g ¼ p  E ZjZ [ sp :

Note that the right-hand side of the equation is the mean of a left-truncated standard
normal distribution.
The first moment of a truncated normal distribution with mean l and variance r2 is
al  
/  / bl
lþ  r   r  r;
U bl
r  U al
r

where a and b are the left and right truncation points, respectively, and a < b.
Therefore, the mean of a left-truncated standard normal distribution as above can be
obtained by plugging in a = sp, b = +∞, and l = 0, r = 1:
 
  / sp
E ZjZ [ sp ¼ ;
p

and consequently
 
maxfE ½ZY g ¼ / sp :
 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The range of qpb ðX; Y Þ is thus /ðsp Þ= pð1  pÞ; /ðsp Þ= pð1  pÞ , when X is
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
normally distributed. The maximum of /ðsp Þ= pð1  pÞ is .798, which is reached when
p = .5. This is the familiar result discussed in classic texts such as Lord and Novick (1968)
and Allen and Yen (1979).

3. Maximal qpb ðX; Y Þ when X is non-normally distributed


The derivation above clearly shows where the normality assumption comes to play in
obtaining the range of qpb ðX; Y Þ. If X is non-normally distributed, the range will change.
Here we consider four non-normal distributions, uniform, Student’s t, exponential, and a
mixture of two normal distributions. These distributions are also considered in other
studies of point-biserial correlation in the presence of non-normality (e.g., Fowler, 1987).

3.1. X follows the uniform distribution, a symmetric distribution within an interval


Suppose X  U ða; bÞ, which is a uniform distribution with a lower bound of a and an
upper bound of b. The correlation between X and Y is the same as the correlation between
Max point-biserial correlation under non-normality 347

ZU and Y, where ZU = (X  a)/(b  a) ~ U(0,1). ZU has mean .5 and variance 1/12.


Following the same logic as above, we get
pffiffiffiffiffiffi 
E ½ðZU  :5ÞðY  pÞ 12 E ½ZU Y   p2
qpb ðX; Y Þ ¼ qpb ðZU ; Y Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
pð1  pÞ=12 pð1  pÞ

The maximum of qpb ðX; Y Þ is reached when E ½ZU Y  is maximized, or when the top
100p% of the ZU distribution happens to have Y = 1:
 p
maxfE ½ZU Y g ¼ pE ½ZU jZU [ ð1  pÞ ¼ p 1  :
2

Therefore
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
maxfqpb ðX; Y Þg ¼ 3pð1  pÞ;
 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
or the range of qpb ðX; Y Þ is  3pð1  pÞ; 3pð1 p pÞ . Clearly this
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi range is only
pffiffiffiffiffiffiffiffi
dependent on p, is symmetric around p = .5, and max 3pð1  pÞ ¼ 3=4  :87. The
maximum is reached when p = .5.

3.2. X follows Student’s t-distribution, a symmetric, fat-tailed distribution


Suppose Z ~ t(m), which is a Student’s t-distribution with m degrees of freedom. Here we
only consider m > 2, where a finite variance, m/(m  2), exists. Kim (2008) showed that the
first moment of such a t-distribution truncated with a lower bound of a and an upper
bound of b is
 
ðm1Þ=2 ðm1Þ=2
Gm  AðmÞ  BðmÞ ;

for m > 1, where


  m=2
C m1
2 m    
Gm ¼ ;
2½Fm ðbÞ  Fm ðaÞC 2m C 12

AðmÞ ¼ m þ a2 ;

BðmÞ ¼ m þ b2 ;

and Fm() is the c.d.f. of the standard Student’s t-distribution with m degrees of freedom,
and Γ() represents the gamma function.
It is also straightforward to get

E ½ZY 
qpb ðZ; Y Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
m
ffi;
m2 pð1  pÞ

and
348 Ying Cheng and Haiyan Liu
 
maxfE ½ZY g ¼ p  E ZjZ [ spm :

For our purposes, we need the mean of a t-distribution left truncated at spm, where
spm ¼ Fm1 ð1  pÞ. So a ¼ spm ; b ¼ þ1; AðmÞ ¼ m þ s2pm , and BðmÞ ¼ þ1; Fm ðbÞ ¼ 1;
Fm ðaÞ ¼ 1  p. The mean of the left truncated t-distribution of interest is therefore
  m=2  ðm1Þ=2
  C m1 m
E ZjZ [ spm ¼ 2m 1 m þ s2pm :
2pC 2 C 2

Therefore for a given p and a given m, we have


 ðm1Þ=2
  m=2
C m1 m m þ s2pm
ffi2 m 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
max qpb ðZ; Y Þ ¼ pffiffiffiffiffiffi
m
2 m2 C 2 C 2 pð1  pÞ

The same result holds for X ~ t(l, r2; m), the generalized t-distribution. The range of
qpb ðX; Y Þ again only depends on p, and is
2  ðm1Þ=2  ðm1Þ=2 3
m1 m=2 m1 m=2
C 2 m m þ spm
2
C 2 m m þ spm
2
6 7
ffi m 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; pffiffiffiffiffiffi
4 pffiffiffiffiffiffi
m m
ffi m 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 5;
2 m2C 2 C 2 pð1  pÞ 2 m2C 2 C 2 pð1  pÞ

which is symmetric around p = .5.

3.3. X follows the exponential distribution, a skewed distribution


Suppose X 2 Rþ (where Rþ is the set of positive real numbers) follows the exponential
distribution with rate parameter k, or X ~ l(k), k > 0. X has mean 1/k and variance 1/k2.
We get

E ½ðX  lX ÞðY  pÞ kE ½XY   p


qpb ðX; Y Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ;
rX pð1  pÞ pð1  pÞ

For a given p, the maximum of qpb ðX; Y Þ is reached when E ½XY  is maximized, or
when the top 100p% of the exponential distribution happens to have Y = 1. Then
 
maxfE ½XY g ¼ p  E XjX [ spl ;

where spl ¼ Fl ð1Þ ð1  pÞ, and Fl ðÞ is the c.d.f. of the exponential distribution:

lnð pÞ
spl ¼ :
k

For exponential distribution left truncated at a, the mean is


Max point-biserial correlation under non-normality 349

1.0
N(μ,σ2)
U(α,β)
Exp(λ)

0.8
t(10)
t(3)

0.6
Max ρpb
0.4
0.2
0.0

0 .2 .4 .6 .8 1
p

Figure 1. The maximum point-biserial correlation as a function of p when a continuous


variable follows a normal, uniform, Student’s t (degrees of freedom 3 or 10), or exponential
distribution.

1 1
þ a eak :
1  F l ða Þ k

When a = spl = ln (p) / k, the mean is (1  ln (p)) / k. Therefore

pð1  lnðpÞÞ
maxfE ½XY g ¼
k

and
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
p
maxfqpb ðX; Y Þg ¼  lnð pÞ :
ð1  pÞ

This function is not symmetric around p = .5. It is maximized when lnð pÞ ¼ 2ðp  1Þ.
The maximum is achieved when p  .2, and the maximum is around .8, which can be
obtained numerically using the Newton–Raphson procedure.

3.4. X follows a mixture of two normal distributions


Suppose X follows a mixture distribution of two normal distributions with equal
variance,

XjðY ¼ jÞ  Nðlj ; r2 Þ;

where j = 0, 1. Denote the standardized difference of means by ϑ = (l1  l0)/r. In this


case, following Tate (1954),
350 Ying Cheng and Haiyan Liu
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pð1  pÞ
qpb ðX; Y Þ ¼ 0 :
1 þ pð1  pÞ02

It is straightforward to see that jqpb ðX; Y Þj increases with ϑ2. When


0 ! 1; qpb ðX; Y Þ ! 1. The range of qpb ðX; Y Þ is therefore ½1; 1for any p. It is worth
2

noting that the result can be generalized to any mixture distribution for X. When the
overlap between the two mixture distributions goes to 0 (e.g., when ϑ2?∞), qpb ðX; Y Þ
approaches 1. As a reviewer pointed out, this becomes obvious when you think of the
scatterplot of (X, Y) when X | Y = 0 and X | Y = 1 have nearly no overlap and both have
small variances.

4. Conclusion and discussion


This paper derives the range of qpb ðX; Y Þ under several non-normal distributions:
uniform, Student’s t, exponential, and a mixture of two normal distributions. The maximal
qpb ðX; Y Þ under these distributions is plotted in Figure 1 (except for the mixture normal,
for which the maximal qpb ðX; Y Þ is always 1) against that under the normal distribution.
For the t-distribution, we plotted two curves, one for m = 3, and the other for m = 10. As
the degrees of freedom increase, the maximal qpb ðX; Y Þ of a t-distribution approaches that
of the normal distribution.
Altogether the results in this paper show that the maximum of the population point-
biserial correlation, in spite of popular belief, may not be a function of p (as in the case of
the mixture of two normal distributions), can be symmetric or non-symmetric around
p = .5, and may still range from 1.0 to 1.0 (as in the case of the mixture of two normal
distributions). Therefore researchers should exercise caution when they interpret their
sample point-biserial correlation coefficients based on the popular belief that point-
biserial correlation always has a restricted range, and that the restriction is always more
severe when p deviates from .5.
It is also transparent from this paper that analytical derivation of maximal point-biserial
correlation relies on the moments of truncated continuous distributions. It may be
straightforward to derive the maximal point-biserial correlations under other non-normal
distributions, given the moments of those truncated distributions. While the maximum of
qpb ðX; Y Þ is 1 when X follows a mixture distribution, an interesting question to pursue in a
future study p what is the maximal qpb ðX; Y Þ when X is unimodal, and whether it can be
is ffiffiffiffiffiffiffiffi
higher than 3=4.

References
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/
Cole.
Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall and
Spearman correlations. Psychometrika, 65, 23–28. doi:10.1007/BF02294183
Crocker, L., & Algina, J. (2006). Introduction to classical and modern test theory. Mason, OH:
Cengage Learning.
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality
and specification error in confirmatory factor analysis. Psychological Methods, 1, 16–29.
doi:10.1037/1082-989X.1.1.16
Max point-biserial correlation under non-normality 351

Fowler, R. L. (1987). Power and robustness in product-moment correlation. Applied Psychological


Measurement, 11, 419–428. doi:10.1177/014662168701100407
Gradstein, M. (1986). Maximal correlation between normal and dichotomous variables. Journal of
Educational Statistics, 11, 259–261. doi:10.3102/10769986011004259
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-
Wesley.
Kim, H.-J. (2008). Moments of truncated Student-t distribution. Journal of the Korean Statistical
Society, 37, 81–87. doi:10.1016/j.jkss.2007.06.001
Jennrich, B., & Satorra, A. (2014). The nonsingularity of Gamma in covariance structure analysis of
nonnormal data. Psychometrika, 79, 51–59. doi:10.1007/s11336-013-9353-1
Sen, S., Cohen, A. S., & Kim, S.-H. (2016). The impact of non-normality on extraction of spurious
latent classes in mixture IRT models. Applied Psychological Measurement, Advance online
publication. doi:10.1177/0146621615605080
Sheng, Y., & Sheng, Z. (2012). Is coefficient alpha robust to non-normal data? Frontiers in
Psychology, 3, 34. doi:10.3389/fpsyg.2012.00034
Tate, R. F. (1954). Correlation between a discrete and a continuous variable: Point-biserial
correlation. Annals of Mathematical Statistics, 25, 603–607. doi:10.1214/aoms/1177728730
Tong, X., Zhang, Z., & Yuan, K.-H. (2014). Evaluation of test statistics for robust structural
equation modeling with nonnormal missing data. Structural Equation Modeling, 21, 553–565.
doi:10.1080/10705511.2014.919820
Yanagihara, H., Tonda, T., & Matsumoto, C. (2005). The effects of nonnormality on asymptotic
distributions of some likelihood ratio criteria for testing covariance structures under normal
assumption. Journal of Multivariate Analysis, 96, 237–264. doi:10.1016/j.jmva.2004.10.014
Yuan, K.-H. (2009). Normal distribution based pseudo ML for missing data: With applications to
mean and covariance structure analysis. Journal of Multivariate Analysis, 100, 1900–1918.
doi:10.1016/j.jmva.2009.05.001
Yuan, K.-H., & Bentler, P. M. (2000). Inferences on correlation coefficients in some classes of
nonnormal distributions. Journal of Multivariate Analysis, 72, 230–248. doi:10.1016/
j.jmva.1999.1858
Yuan, K.-H., & Bentler, P. M. (2002). On robustness of the normal-theory based asymptotic
distributions of three reliability coefficient estimates. Psychometrika, 67, 251–259.
doi:10.1007/BF02294845
Yuan, K.-H., & Bentler, P. M. (2010). Consistency of normal distribution based pseudo maximum
likelihood estimates when data are missing at random. American Statistician, 64, 263–267.
doi:10.1198/tast.2010.09203
Yuan, K.-H., Bentler, P. M., & Zhang, W. (2005). The effect of skewness and kurtosis on mean and
covariance structure analysis: The univariate case and its multivariate implication. Sociological
Methods & Research, 34, 240–258. doi:10.1177/0049124105280200
Yuan, K.-H., Marshall, L. L., & Bentler, P. M. (2002). A unified approach to exploratory factor analysis
with missing data, nonnormal data, and in the presence of outliers. Psychometrika, 67, 95–122.
doi:10.1007/BF02294711
Zu, J., & Yuan, K.-H. (2012). Standard error of linear observed-score equating for the NEAT design
with nonnormally distributed data. Journal of Educational Measurement, 49, 190–213.
doi:10.1111/j.1745-3984.2012.00168.x

Received 23 January 2016; revised version received 17 May 2016

You might also like