Lecture 7 Correlation
Lecture 7 Correlation
The relationship between two categorical variables (with two or more categories each)
can be assessed using a Contingency Table/Cross Tabulation and a Chi-square test.
The relationship between a categorical independent variable (with two categories) and a
categorical dependent variable (with two categories) can be assessed using a Difference
of Proportions test. Similarly, the relationship between a categorical independent
variable (with two categories) and an interval/ratio dependent variable can be assessed
using a Difference of Means test. The relationship between a categorical independent
variable (with two or more categories) and an interval/ratio dependent variable or a
categorical dependent variable (with two categories) can be assessed using One-way
Analysis of Variance. The relationship between two interval/ratio variables can be
assessed using a Scatterplot and the Correlation Coefficient.
CP ( xy) ( X X )(Y Y )
Covariation (Sum of Cross-Product Deviations) a) the sum of the product of the joint
deviations of the individual observations of X and Y from their respective means. b)the
aggregate association of the association between X and Y. c) if there is no association
between X and Y the Covariance will be zero, if the association is positive the
Covariation will be positive and if the association is negative the Covariation will be
negative, although the strength of the association cannot be evaluated by the Covariation.
For a particular observation if both X and Y are greater than their respective means then
the SCP is greater than zero.
The same holds true if both X and Y are less than their respective means.
But if X is greater than its mean, while Y is less than its mean, or vice versa, the SCP is
less than zero.
Total Sum of Squares (TSS) in the Standard Deviation equation can be viewed as the
Covariation of a variable with itself.
TSS ( x) ( X X ) ( X X )( X X )
2
Covariance a) the sum of the cross-product deviations divided by the number of cases
less one or the average amount that the paired observations of X and Y covary.
( X X )(Y Y )
cov( xy)
n 1
( X X ) ( X X )( X X )
2
var(x)
n 1 n 1
The difficulties of interpreting the Variance that arise from its units being the units of the
variable are compounded with the Covariance because the units are a combination of X
and Y, that is the size of the Covariance is a function of the standard deviations of X and
Y.
cov( xy )
r
sd ( x) sd ( y )
Although the equation for the Correlation Coefficient does not look similar to the
Standard Deviation, it actually is if we transform the Standard Deviation equation.
(X X ) var( x)
2
sd ( x) var(x)
n 1 sd ( x)
The correlation coefficient can also be expressed as the mean of the products of the
standardized scores. Later, in regression, this becomes a useful perception because
regression with standardized scores/variables is simpler to interpret.
X X
z
sd (x)
X X Y Y
1 X X Y Y sd ( x) sd ( y )
r ( xy)
n 1 sd ( x) sd ( y) n 1
The Population Correlation Coefficient is symbolized as the small Greek letter (rho).
cov( xy )
( x) ( y)
n2
tr
1 R 2