0% found this document useful (0 votes)

24 views8 pages

Cheng2016 Point Biserial Correlation

Uploaded by

rireno5598

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views8 pages

Cheng2016 Point Biserial Correlation

Uploaded by

rireno5598

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

344

British Journal of Mathematical and Statistical Psychology (2016), 69, 344–351

A short note on the maximal point-biserial

correlation under non-normality
Ying Cheng* and Haiyan Liu
Department of Psychology, University of Notre Dame, Indiana, USA

The aim of this paper is to derive the maximal point-biserial correlation under non-
normality. Several widely used non-normal distributions are considered, namely the
uniform distribution, t-distribution, exponential distribution, and a mixture of two normal
distributions. Results show that the maximal point-biserial correlation, depending on the
non-normal continuous variable underlying the binary manifest variable, may not be a
function of p (the probability that the dichotomous variable takes the value 1), can be
symmetric or non-symmetric around p = .5, and may still lie in the range from 1.0 to 1.0.
Therefore researchers should exercise caution when they interpret their sample point-
biserial correlation coefficients based on popular beliefs that the maximal point-biserial
correlation is always smaller than 1, and that the size of the correlation is always further
restricted as p deviates from .5.

1. Introduction
The population product-moment correlation between a continuous (X) and dichotomous
(Y ) variable, denoted by qpb ðX; Y Þ, is also known as the point-biserial correlation:

E ½ðX lX ÞðY lY Þ
qpb ðX; Y Þ ¼ :
rX rY
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Given that lY ¼ P ðY ¼ 1Þ ¼ p and rY ¼ pð1 pÞ, we get

E ½ðX lX ÞðY pÞ

qpb ðX; Y Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
rX pð1 pÞ

Here qpb ðX; Y Þ is the point-biserial correlation at the population level between X and
Y, whose sample counterpart is denoted by rpb ðX; Y Þ.
When the binary variable Y can be assumed to come from dichotomizing a normally
distributed continuous variable Y , the product-moment correlation betweenX and Y is

called the biserial
correlation. Suppose that P ðY ¼ 1Þ ¼ P Y [ sp ¼ p and
P ðY ¼ 0Þ ¼ P Y sp ¼ 1 p. Then sp is the dichotomization threshold on Y . If Y
follows the standard normal distribution, sp is equal to U1 ð1 pÞ, where UðÞ represents
the standard normal cumulative distribution function (c.d.f.). Point-biserial and biserial
correlations play an important role in psychometric theory, for example item analysis
(Crocker & Algina, 2006, p. 317).

*Correspondence should be addressed to Ying Cheng, Department of Psychology, University of Notre Dame, 118
Haggar Hall, Notre Dame, IN 46556, USA (email: [email protected]).

DOI:10.1111/bmsp.12075
Max point-biserial correlation under non-normality 345

It is well known that the range of the point-biserial correlation may be constrained.
Lord and Novick (1968, p. 340) noted that the ‘point-biserial is never as much as four-fifths
of the biserial’ correlation, because

/ sp
qpb ðX; Y Þ ¼ qb ðX; Y Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ;
pð1 pÞ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
where qb ðX; Y Þ is the corresponding biserial correlation, and /ðsp Þ= pð1 pÞ\:8
Here /ðÞ is the standard normal probability distribution function. Lord and Novick (1968)
further noted that since jqb j 1, qpb has to be smaller than .8.
Gradstein (1986) analytically derived the maximal qb ðX; Y Þ when X is normally
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
distributed. The maximum is exactly /ðsp Þ= pð1 pÞ, a function that depends only on p.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
As p departs further from .5, /ðsp Þ= pð1 pÞ decreases. Therefore, researchers
cautioned against interpreting the size of the point-biserial correlation based on the
usual range of the product-moment correlation, that is, [1, 1] (Allen & Yen, 1979, p. 39;
Gradstein, 1986).
On the other hand, a simple Google search pulled up many web pages that claimed
point-biserial correlation values range from 1.0 to +1.0 (e.g., http://www.statisticsso-
lutions.com/point-biserial-correlation/). Interested users of the point-biserial correlation
may be confused by this apparently glaring discrepancy and inconsistency. One possible
reason is violation of the normality assumption which was used in the derivation by
Gradstein (1986). While the assumption of normality is traditionally very common in
psychological research, there has been increasing awareness of and attention to non-
normal data in recent years. For example, researchers have investigated the effect of non-
normality on factor analysis and structural equation modelling (Curran, West, & Finch,
1996; Jennrich & Satorra, 2014; Yanagihara, Tonda, & Matsumoto, 2005; Yuan, Bentler, &
Zhang, 2005; Yuan, Marshall, & Bentler, 2002), reliability coefficient estimation (Sheng &
Sheng, 2012; Yuan & Bentler, 2002), test equating (Zu & Yuan, 2012), mixture item
response theory modelling (Sen, Cohen, & Kim, 2016), as well as missing-data analysis
(Tong, Zhang, & Yuan, 2014; Yuan, 2009; Yuan & Bentler, 2010).
The aim of this paper is to derive the range of qpb under several widely used non-normal
distributions. Interestingly, the issue of non-normality has been well recognized in the
context of statistical testing, for example in power analysis and sample size planning
(Bonett & Wright, 2000; Fowler, 1987) and corrections for statistical tests of correlation
under non-normality (Yuan & Bentler, 2000), but we are not aware of any analytical
treatment of the range of the point-biserial correlation under non-normality. In addition,
the non-normality discussed in the context of statistical testing of qpb usually focuses on
the underlying distribution of the binary variable Y instead of the distribution of the
continuous variable X. In the derivation and discussion below it is not essential what
distribution underlies Y.

2. Maximal qpb ðX; Y Þ when X is normally distributed

Gradstein (1986) derived the maximal qpb ðX; Y Þ when X is normally distributed. Here we
provide details of the derivation, which will facilitate our derivation of the range of qpb
under non-normal distributions. Let us again consider the continuous variable
X Nðl; r2 Þ, and the dichotomous variable Y, which is coded as 0 or 1, with
P ðY ¼ 1Þ ¼ p. So E½Y ¼ p and Var½Y ¼ pð1 pÞ. Without loss of generality, we can
346 Ying Cheng and Haiyan Liu

consider the correlation between Z and Y, where Z is a linear transformation of X by

Z = (X l)/r. Clearly Z ~ N (0,1), so that lZ = 0 and rZ = 1. Then

E ½ðZ lZ ÞðY pÞ E ½ZY

qpb ðX; Y Þ ¼ qpb ðZ; Y Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
rZ pð1 pÞ pð1 pÞ

For any given p, the maximum of qpb ðX; Y Þ is reached when E ½ZY is maximized. Since
Y is either 1 or 0, ZY is either Z or 0. Therefore E ½ZY is maximized when the top 100p% of
the Z distribution happens to have Y = 1, and

maxfE ½ZY g ¼ p E ZjZ [ sp :

Note that the right-hand side of the equation is the mean of a left-truncated standard
normal distribution.
The first moment of a truncated normal distribution with mean l and variance r2 is
al
/ / bl
lþ r r r;
U bl
r U al
r

where a and b are the left and right truncation points, respectively, and a < b.
Therefore, the mean of a left-truncated standard normal distribution as above can be
obtained by plugging in a = sp, b = +∞, and l = 0, r = 1:

/ sp
E ZjZ [ sp ¼ ;
p

and consequently

maxfE ½ZY g ¼ / sp :
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The range of qpb ðX; Y Þ is thus /ðsp Þ= pð1 pÞ; /ðsp Þ= pð1 pÞ , when X is
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
normally distributed. The maximum of /ðsp Þ= pð1 pÞ is .798, which is reached when
p = .5. This is the familiar result discussed in classic texts such as Lord and Novick (1968)
and Allen and Yen (1979).

3. Maximal qpb ðX; Y Þ when X is non-normally distributed

The derivation above clearly shows where the normality assumption comes to play in
obtaining the range of qpb ðX; Y Þ. If X is non-normally distributed, the range will change.
Here we consider four non-normal distributions, uniform, Student’s t, exponential, and a
mixture of two normal distributions. These distributions are also considered in other
studies of point-biserial correlation in the presence of non-normality (e.g., Fowler, 1987).

3.1. X follows the uniform distribution, a symmetric distribution within an interval

Suppose X U ða; bÞ, which is a uniform distribution with a lower bound of a and an
upper bound of b. The correlation between X and Y is the same as the correlation between
Max point-biserial correlation under non-normality 347

ZU and Y, where ZU = (X a)/(b a) ~ U(0,1). ZU has mean .5 and variance 1/12.

Following the same logic as above, we get
pffiffiffiffiffiffi
E ½ðZU :5ÞðY pÞ 12 E ½ZU Y p2
qpb ðX; Y Þ ¼ qpb ðZU ; Y Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
pð1 pÞ=12 pð1 pÞ

The maximum of qpb ðX; Y Þ is reached when E ½ZU Y is maximized, or when the top
100p% of the ZU distribution happens to have Y = 1:
p
maxfE ½ZU Y g ¼ pE ½ZU jZU [ ð1 pÞ ¼ p 1 :
2

Therefore
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
maxfqpb ðX; Y Þg ¼ 3pð1 pÞ;
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
or the range of qpb ðX; Y Þ is 3pð1 pÞ; 3pð1 p pÞ . Clearly this
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi range is only
pffiffiffiffiffiffiffiffi
dependent on p, is symmetric around p = .5, and max 3pð1 pÞ ¼ 3=4 :87. The
maximum is reached when p = .5.

3.2. X follows Student’s t-distribution, a symmetric, fat-tailed distribution

Suppose Z ~ t(m), which is a Student’s t-distribution with m degrees of freedom. Here we
only consider m > 2, where a finite variance, m/(m 2), exists. Kim (2008) showed that the
first moment of such a t-distribution truncated with a lower bound of a and an upper
bound of b is

ðm1Þ=2 ðm1Þ=2
Gm AðmÞ BðmÞ ;

for m > 1, where

m=2
C m1
2 m
Gm ¼ ;
2½Fm ðbÞ Fm ðaÞC 2m C 12

AðmÞ ¼ m þ a2 ;

BðmÞ ¼ m þ b2 ;

and Fm() is the c.d.f. of the standard Student’s t-distribution with m degrees of freedom,
and Γ() represents the gamma function.
It is also straightforward to get

E ½ZY
qpb ðZ; Y Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
m
ffi;
m2 pð1 pÞ

and
348 Ying Cheng and Haiyan Liu

maxfE ½ZY g ¼ p E ZjZ [ spm :

For our purposes, we need the mean of a t-distribution left truncated at spm, where
spm ¼ Fm1 ð1 pÞ. So a ¼ spm ; b ¼ þ1; AðmÞ ¼ m þ s2pm , and BðmÞ ¼ þ1; Fm ðbÞ ¼ 1;
Fm ðaÞ ¼ 1 p. The mean of the left truncated t-distribution of interest is therefore
m=2 ðm1Þ=2
C m1 m
E ZjZ [ spm ¼ 2m 1 m þ s2pm :
2pC 2 C 2

Therefore for a given p and a given m, we have

ðm1Þ=2
m=2
C m1 m m þ s2pm
ffi2 m 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi :
max qpb ðZ; Y Þ ¼ pffiffiffiffiffiffi
m
2 m2 C 2 C 2 pð1 pÞ

The same result holds for X ~ t(l, r2; m), the generalized t-distribution. The range of
qpb ðX; Y Þ again only depends on p, and is
2 ðm1Þ=2 ðm1Þ=2 3
m1 m=2 m1 m=2
C 2 m m þ spm
2
C 2 m m þ spm
2
6 7
ffi m 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; pffiffiffiffiffiffi
4 pffiffiffiffiffiffi
m m
ffi m 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 5;
2 m2C 2 C 2 pð1 pÞ 2 m2C 2 C 2 pð1 pÞ

which is symmetric around p = .5.

3.3. X follows the exponential distribution, a skewed distribution

Suppose X 2 Rþ (where Rþ is the set of positive real numbers) follows the exponential
distribution with rate parameter k, or X ~ l(k), k > 0. X has mean 1/k and variance 1/k2.
We get

E ½ðX lX ÞðY pÞ kE ½XY p

qpb ðX; Y Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ;
rX pð1 pÞ pð1 pÞ

For a given p, the maximum of qpb ðX; Y Þ is reached when E ½XY is maximized, or
when the top 100p% of the exponential distribution happens to have Y = 1. Then

maxfE ½XY g ¼ p E XjX [ spl ;

where spl ¼ Fl ð1Þ ð1 pÞ, and Fl ðÞ is the c.d.f. of the exponential distribution:

lnð pÞ
spl ¼ :
k

For exponential distribution left truncated at a, the mean is

Max point-biserial correlation under non-normality 349

1.0
N(μ,σ2)
U(α,β)
Exp(λ)

0.8
t(10)
t(3)

0.6
Max ρpb
0.4
0.2
0.0

0 .2 .4 .6 .8 1
p

Figure 1. The maximum point-biserial correlation as a function of p when a continuous

variable follows a normal, uniform, Student’s t (degrees of freedom 3 or 10), or exponential
distribution.

1 1
þ a eak :
1 F l ða Þ k

When a = spl = ln (p) / k, the mean is (1 ln (p)) / k. Therefore

pð1 lnðpÞÞ
maxfE ½XY g ¼
k

and
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
p
maxfqpb ðX; Y Þg ¼ lnð pÞ :
ð1 pÞ

This function is not symmetric around p = .5. It is maximized when lnð pÞ ¼ 2ðp 1Þ.
The maximum is achieved when p .2, and the maximum is around .8, which can be
obtained numerically using the Newton–Raphson procedure.

3.4. X follows a mixture of two normal distributions

Suppose X follows a mixture distribution of two normal distributions with equal
variance,

XjðY ¼ jÞ Nðlj ; r2 Þ;

where j = 0, 1. Denote the standardized difference of means by ϑ = (l1 l0)/r. In this

case, following Tate (1954),
350 Ying Cheng and Haiyan Liu
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pð1 pÞ
qpb ðX; Y Þ ¼ 0 :
1 þ pð1 pÞ02

It is straightforward to see that jqpb ðX; Y Þj increases with ϑ2. When

0 ! 1; qpb ðX; Y Þ ! 1. The range of qpb ðX; Y Þ is therefore ½1; 1for any p. It is worth
2

noting that the result can be generalized to any mixture distribution for X. When the
overlap between the two mixture distributions goes to 0 (e.g., when ϑ2?∞), qpb ðX; Y Þ
approaches 1. As a reviewer pointed out, this becomes obvious when you think of the
scatterplot of (X, Y) when X | Y = 0 and X | Y = 1 have nearly no overlap and both have
small variances.

4. Conclusion and discussion

This paper derives the range of qpb ðX; Y Þ under several non-normal distributions:
uniform, Student’s t, exponential, and a mixture of two normal distributions. The maximal
qpb ðX; Y Þ under these distributions is plotted in Figure 1 (except for the mixture normal,
for which the maximal qpb ðX; Y Þ is always 1) against that under the normal distribution.
For the t-distribution, we plotted two curves, one for m = 3, and the other for m = 10. As
the degrees of freedom increase, the maximal qpb ðX; Y Þ of a t-distribution approaches that
of the normal distribution.
Altogether the results in this paper show that the maximum of the population point-
biserial correlation, in spite of popular belief, may not be a function of p (as in the case of
the mixture of two normal distributions), can be symmetric or non-symmetric around
p = .5, and may still range from 1.0 to 1.0 (as in the case of the mixture of two normal
distributions). Therefore researchers should exercise caution when they interpret their
sample point-biserial correlation coefficients based on the popular belief that point-
biserial correlation always has a restricted range, and that the restriction is always more
severe when p deviates from .5.
It is also transparent from this paper that analytical derivation of maximal point-biserial
correlation relies on the moments of truncated continuous distributions. It may be
straightforward to derive the maximal point-biserial correlations under other non-normal
distributions, given the moments of those truncated distributions. While the maximum of
qpb ðX; Y Þ is 1 when X follows a mixture distribution, an interesting question to pursue in a
future study p what is the maximal qpb ðX; Y Þ when X is unimodal, and whether it can be
is ffiffiffiffiffiffiffiffi
higher than 3=4.

References
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/
Cole.
Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall and
Spearman correlations. Psychometrika, 65, 23–28. doi:10.1007/BF02294183
Crocker, L., & Algina, J. (2006). Introduction to classical and modern test theory. Mason, OH:
Cengage Learning.
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality
and specification error in confirmatory factor analysis. Psychological Methods, 1, 16–29.
doi:10.1037/1082-989X.1.1.16
Max point-biserial correlation under non-normality 351

Fowler, R. L. (1987). Power and robustness in product-moment correlation. Applied Psychological

Measurement, 11, 419–428. doi:10.1177/014662168701100407
Gradstein, M. (1986). Maximal correlation between normal and dichotomous variables. Journal of
Educational Statistics, 11, 259–261. doi:10.3102/10769986011004259
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-
Wesley.
Kim, H.-J. (2008). Moments of truncated Student-t distribution. Journal of the Korean Statistical
Society, 37, 81–87. doi:10.1016/j.jkss.2007.06.001
Jennrich, B., & Satorra, A. (2014). The nonsingularity of Gamma in covariance structure analysis of
nonnormal data. Psychometrika, 79, 51–59. doi:10.1007/s11336-013-9353-1
Sen, S., Cohen, A. S., & Kim, S.-H. (2016). The impact of non-normality on extraction of spurious
latent classes in mixture IRT models. Applied Psychological Measurement, Advance online
publication. doi:10.1177/0146621615605080
Sheng, Y., & Sheng, Z. (2012). Is coefficient alpha robust to non-normal data? Frontiers in
Psychology, 3, 34. doi:10.3389/fpsyg.2012.00034
Tate, R. F. (1954). Correlation between a discrete and a continuous variable: Point-biserial
correlation. Annals of Mathematical Statistics, 25, 603–607. doi:10.1214/aoms/1177728730
Tong, X., Zhang, Z., & Yuan, K.-H. (2014). Evaluation of test statistics for robust structural
equation modeling with nonnormal missing data. Structural Equation Modeling, 21, 553–565.
doi:10.1080/10705511.2014.919820
Yanagihara, H., Tonda, T., & Matsumoto, C. (2005). The effects of nonnormality on asymptotic
distributions of some likelihood ratio criteria for testing covariance structures under normal
assumption. Journal of Multivariate Analysis, 96, 237–264. doi:10.1016/j.jmva.2004.10.014
Yuan, K.-H. (2009). Normal distribution based pseudo ML for missing data: With applications to
mean and covariance structure analysis. Journal of Multivariate Analysis, 100, 1900–1918.
doi:10.1016/j.jmva.2009.05.001
Yuan, K.-H., & Bentler, P. M. (2000). Inferences on correlation coefficients in some classes of
nonnormal distributions. Journal of Multivariate Analysis, 72, 230–248. doi:10.1016/
j.jmva.1999.1858
Yuan, K.-H., & Bentler, P. M. (2002). On robustness of the normal-theory based asymptotic
distributions of three reliability coefficient estimates. Psychometrika, 67, 251–259.
doi:10.1007/BF02294845
Yuan, K.-H., & Bentler, P. M. (2010). Consistency of normal distribution based pseudo maximum
likelihood estimates when data are missing at random. American Statistician, 64, 263–267.
doi:10.1198/tast.2010.09203
Yuan, K.-H., Bentler, P. M., & Zhang, W. (2005). The effect of skewness and kurtosis on mean and
covariance structure analysis: The univariate case and its multivariate implication. Sociological
Methods & Research, 34, 240–258. doi:10.1177/0049124105280200
Yuan, K.-H., Marshall, L. L., & Bentler, P. M. (2002). A unified approach to exploratory factor analysis
with missing data, nonnormal data, and in the presence of outliers. Psychometrika, 67, 95–122.
doi:10.1007/BF02294711
Zu, J., & Yuan, K.-H. (2012). Standard error of linear observed-score equating for the NEAT design
with nonnormally distributed data. Journal of Educational Measurement, 49, 190–213.
doi:10.1111/j.1745-3984.2012.00168.x

Received 23 January 2016; revised version received 17 May 2016

Point-Biserial and Biserial Correlation
No ratings yet
Point-Biserial and Biserial Correlation
6 pages
Point-Biserial Correlation Coefficient
0% (1)
Point-Biserial Correlation Coefficient
3 pages
Lesson 10-11 - Normal Curve
No ratings yet
Lesson 10-11 - Normal Curve
42 pages
Multivariate Normal Distribution
100% (1)
Multivariate Normal Distribution
8 pages
Rakhlin Mathstat sp22
No ratings yet
Rakhlin Mathstat sp22
108 pages
Assignment Updated 101
100% (1)
Assignment Updated 101
24 pages
HASTS215 - HSTS215 NOTES Chapter4
No ratings yet
HASTS215 - HSTS215 NOTES Chapter4
7 pages
Activity 3. Illustrating A Normal Random Variable and Its Characteristics
No ratings yet
Activity 3. Illustrating A Normal Random Variable and Its Characteristics
22 pages
Maths 1-10
No ratings yet
Maths 1-10
27 pages
Bulla 2015
No ratings yet
Bulla 2015
17 pages
Rohatgi - An Introduction To Probability and Statistics Wiley 2015 - Removed
No ratings yet
Rohatgi - An Introduction To Probability and Statistics Wiley 2015 - Removed
13 pages
List of Formulae and Statistical Tables
No ratings yet
List of Formulae and Statistical Tables
4 pages
11 CORRELATION Point-Biserial
No ratings yet
11 CORRELATION Point-Biserial
26 pages
Cov Normal
No ratings yet
Cov Normal
5 pages
Correlation Analysis
No ratings yet
Correlation Analysis
7 pages
C4 The Statistical Tools
No ratings yet
C4 The Statistical Tools
55 pages
North South University Mat361 Total Marks-30 (Time - 70 Min + 10min)
No ratings yet
North South University Mat361 Total Marks-30 (Time - 70 Min + 10min)
8 pages
Bio Stats Chapter 6
No ratings yet
Bio Stats Chapter 6
50 pages
Mathematics Handbook
No ratings yet
Mathematics Handbook
11 pages
2 17805 Harpy
No ratings yet
2 17805 Harpy
15 pages
Prob 3160 CH 8
No ratings yet
Prob 3160 CH 8
10 pages
Cap 2 Applied Multivariate Statistical JOHNSON PP 149-163
No ratings yet
Cap 2 Applied Multivariate Statistical JOHNSON PP 149-163
15 pages
Student, 1908.
No ratings yet
Student, 1908.
9 pages
Stati Unit 5
No ratings yet
Stati Unit 5
8 pages
Biserial Correlation
No ratings yet
Biserial Correlation
27 pages
Unit 4 Book
No ratings yet
Unit 4 Book
9 pages
Homework Topic 1&2.: Plus 20
No ratings yet
Homework Topic 1&2.: Plus 20
11 pages
Srivastava DistributionCorrelationCoefficient 1984
No ratings yet
Srivastava DistributionCorrelationCoefficient 1984
16 pages
2.7 Correlation Coefficient and Bivariate Normal Distribution
No ratings yet
2.7 Correlation Coefficient and Bivariate Normal Distribution
11 pages
Handout-4a-Hotellings T
No ratings yet
Handout-4a-Hotellings T
6 pages
Polychoric and Polyserial Correlations
No ratings yet
Polychoric and Polyserial Correlations
9 pages
A Correlation Coefficient For Circular Data
No ratings yet
A Correlation Coefficient For Circular Data
7 pages
Sept 2017 - Final - With Memo
No ratings yet
Sept 2017 - Final - With Memo
5 pages
Murphysolns
No ratings yet
Murphysolns
45 pages
MA225 L3 Notes
No ratings yet
MA225 L3 Notes
40 pages
Understanding: Probability
No ratings yet
Understanding: Probability
5 pages
Comptes Rendus 3542 2016
No ratings yet
Comptes Rendus 3542 2016
5 pages
S1 Bivariate Data
No ratings yet
S1 Bivariate Data
18 pages
Module 6 - Continuous Distribution
No ratings yet
Module 6 - Continuous Distribution
54 pages
Distributional Regression Rage Against The Mean
No ratings yet
Distributional Regression Rage Against The Mean
25 pages
Biserial and Point Biserial Correlation
0% (1)
Biserial and Point Biserial Correlation
18 pages
Probability Inequality 1998 Annals Sarkar
No ratings yet
Probability Inequality 1998 Annals Sarkar
11 pages
Final Prac Report-3
No ratings yet
Final Prac Report-3
12 pages
Biserial Correlation
No ratings yet
Biserial Correlation
3 pages
Normal Distribution and Measures of Relationship Group 4 FR Sudi)
No ratings yet
Normal Distribution and Measures of Relationship Group 4 FR Sudi)
36 pages
74-Full Text Article-61-1-10-20100531
No ratings yet
74-Full Text Article-61-1-10-20100531
4 pages
Bivariate Distribution Family W Specified Correlation&Given Marginals Xiang 2014
No ratings yet
Bivariate Distribution Family W Specified Correlation&Given Marginals Xiang 2014
15 pages
(X), I 1, - . - , N, The Linear Correlation
No ratings yet
(X), I 1, - . - , N, The Linear Correlation
4 pages
Analyt Tech Newpr
No ratings yet
Analyt Tech Newpr
14 pages
HW 2 Solution
No ratings yet
HW 2 Solution
6 pages
Tutorial Chapter 4 Memo
No ratings yet
Tutorial Chapter 4 Memo
30 pages
Sta 220 M 2018
No ratings yet
Sta 220 M 2018
14 pages
14.5 Linear Correlation: Statistical Description of Data
No ratings yet
14.5 Linear Correlation: Statistical Description of Data
4 pages
Homework 1: Statistics 109 Due February 17, 2019 at 11:59pm EST
No ratings yet
Homework 1: Statistics 109 Due February 17, 2019 at 11:59pm EST
23 pages
Notes ch1 Random Variables and Probability Distributions
No ratings yet
Notes ch1 Random Variables and Probability Distributions
30 pages
Unit 4 - Continuous Random Variables
No ratings yet
Unit 4 - Continuous Random Variables
35 pages
One Sample Statistical Tests, Continued
No ratings yet
One Sample Statistical Tests, Continued
57 pages
Bivariate Normal Distribution _ Jointly Normal
No ratings yet
Bivariate Normal Distribution _ Jointly Normal
6 pages
Kuis Minggu 15 Tepel
No ratings yet
Kuis Minggu 15 Tepel
13 pages
PG TRB Maths Short Notes(1)
No ratings yet
PG TRB Maths Short Notes(1)
7 pages
MAS3301 Bayesian Statistics: M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9
No ratings yet
MAS3301 Bayesian Statistics: M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9
18 pages
CQF January 2023 M1L6 Blank
No ratings yet
CQF January 2023 M1L6 Blank
32 pages
Poisson
No ratings yet
Poisson
7 pages
Probability and Statistics: Binomial Distribution
No ratings yet
Probability and Statistics: Binomial Distribution
53 pages
Proof That The Sample Bivariate Correlation Has Limits (Plus or Minus) 1
100% (1)
Proof That The Sample Bivariate Correlation Has Limits (Plus or Minus) 1
29 pages
Stat 253 Part 4 Special Probability Distributions
No ratings yet
Stat 253 Part 4 Special Probability Distributions
95 pages
Point-Biserial and Biserial Correlation
0% (1)
Point-Biserial and Biserial Correlation
6 pages
Module 3 Lesson 4
No ratings yet
Module 3 Lesson 4
8 pages
Prob NStat 3
No ratings yet
Prob NStat 3
10 pages
Benford's Law
No ratings yet
Benford's Law
20 pages
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
No ratings yet
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
1 page
BS 4 Years Past Paper STAT 2
No ratings yet
BS 4 Years Past Paper STAT 2
15 pages
Course Unit 5-Statistical Hypothesis Testing and Normal Distribution
No ratings yet
Course Unit 5-Statistical Hypothesis Testing and Normal Distribution
12 pages
Handling Overdispersion in Poisson Regression Using Negative Binomial Regression For Poverty Case in West Java
No ratings yet
Handling Overdispersion in Poisson Regression Using Negative Binomial Regression For Poverty Case in West Java
7 pages
Statistical Method MSC Mathematics
No ratings yet
Statistical Method MSC Mathematics
3 pages
Stat Is Tika
100% (1)
Stat Is Tika
44 pages
Chapter9 EX
No ratings yet
Chapter9 EX
5 pages
0-Cheatsheet Capstone Part 1
No ratings yet
0-Cheatsheet Capstone Part 1
4 pages
Weekly Progress Table - Sample
No ratings yet
Weekly Progress Table - Sample
2 pages
AJC H2 Math 2013 Preliminary Paper 2
No ratings yet
AJC H2 Math 2013 Preliminary Paper 2
6 pages
Semana 13
No ratings yet
Semana 13
3 pages
Fall 2024 - STA642 - 1
No ratings yet
Fall 2024 - STA642 - 1
2 pages
The Geometric Distribution
No ratings yet
The Geometric Distribution
5 pages
4.1 Normal Distribution: Properties
No ratings yet
4.1 Normal Distribution: Properties
4 pages
Econometrics I - Problem Set 1: Econometricswithr Download R
No ratings yet
Econometrics I - Problem Set 1: Econometricswithr Download R
3 pages
CH 05 Probability An Introduction To Modeling Uncertainty
No ratings yet
CH 05 Probability An Introduction To Modeling Uncertainty
31 pages
Assignment D
No ratings yet
Assignment D
2 pages
Untitled
No ratings yet
Untitled
2 pages
hm5 2015
No ratings yet
hm5 2015
2 pages

Cheng2016 Point Biserial Correlation

Uploaded by

Cheng2016 Point Biserial Correlation

Uploaded by

344

British Journal of Mathematical and Statistical Psychology (2016), 69, 344–351

A short note on the maximal point-biserial

E ½ðX lX ÞðY pÞ

2. Maximal qpb ðX; Y Þ when X is normally distributed

consider the correlation between Z and Y, where Z is a linear transformation of X by

E ½ðZ lZ ÞðY pÞ E ½ZY

3. Maximal qpb ðX; Y Þ when X is non-normally distributed

3.1. X follows the uniform distribution, a symmetric distribution within an interval

ZU and Y, where ZU = (X a)/(b a) ~ U(0,1). ZU has mean .5 and variance 1/12.

3.2. X follows Student’s t-distribution, a symmetric, fat-tailed distribution

for m > 1, where

Therefore for a given p and a given m, we have

which is symmetric around p = .5.

3.3. X follows the exponential distribution, a skewed distribution

E ½ðX lX ÞðY pÞ kE ½XY p

For exponential distribution left truncated at a, the mean is

Figure 1. The maximum point-biserial correlation as a function of p when a continuous

When a = spl = ln (p) / k, the mean is (1 ln (p)) / k. Therefore

3.4. X follows a mixture of two normal distributions

where j = 0, 1. Denote the standardized difference of means by ϑ = (l1 l0)/r. In this

It is straightforward to see that jqpb ðX; Y Þj increases with ϑ2. When

4. Conclusion and discussion

Fowler, R. L. (1987). Power and robustness in product-moment correlation. Applied Psychological

Received 23 January 2016; revised version received 17 May 2016

You might also like