Covariance - Correlation and Regression (Lecture)
Covariance - Correlation and Regression (Lecture)
LECTURE 13
COVARIANCE, CORRELATION, AND REGRESSION
in the discrete and continuous cases respectively. In the bivariate case, the expectation is
defined similarly, except the function u is a function with two arguments, and one must
average across two variables using the joint PMF or PDF.
Definition 1. (Expected value) Let X and Y be two random variables with a joint PMF
pX,Y (x, y) and supports SX and SY , or a joint PDF fX,Y (x, y). The expected value of U (X, Y )
is defined as
X X
EU (X, Y ) = U (x, y)pX,Y (x, y)
x∈SX y∈SY
2 Covariance
One of the important examples is the covariance, which is a measure of association between
two random variables.
Definition 2. (Covariance) Let X and Y be two random variables. The covariance between
X and Y is defined as
Cov(X, Y ) = E ((X − EX)(Y − EY )) .
1
Remark. 1. If X and Y are discrete with a joint PMF pX,Y and supports SX , SY , then:
X X
Cov(X, Y ) = (x − EX)(y − EY )pX,Y (x, y).
x∈SX y∈SY
Recall that EX and EY are two numbers. They must be computed before computing the
covariance. In the continuous case, let fX,Y denote the joint PDF of X and Y :
ˆ ˆ
Cov(X, Y ) = (x − EX)(y − EY )fX,Y (x, y)dydx.
The marginal PMF of the earnings per share (E) is pE (10) = pE (20) = 0.5, and therefore
the mean of the earnings per share is 10 × 0.5 + 20 × 0.5 = 15. The marginal PMF of the price
(S) is pS (100) = 1/3, pS (250) = 1/2, pS (400) = 1/6. Hence, the expected value of the price is
1 1 1
100 × + 250 × + 400 × = 225.
3 2 6
2 1
(10 − 15) × (100 − 225) × + (10 − 15) × (250 − 225) ×
6 6
2 1
+ (20 − 15) × (250 − 225) × + (20 − 15) × (400 − 225) × = 375.
6 6
The covariance is positive and, therefore, the earnings per share and the price per share are
positively associated.
Some properties of the covariance are given in the following theorem. Note that the theorem
2
applies to both discrete and continuous cases, and the proof relies only on the linearity of the
expectation.
3
(f) From the definition of the covariance,
where the equality in the last line holds by the definition of the covariance.
(g) From the definition of the variance,
where the equality in the last line holds by the definition of the covariance.
3 Correlation coefficient
Covariance can be difficult to interpret since it can take any value from −∞ to +∞. For
that reason and to obtain a unit-free measure of association1 , it is common to standardize the
covariance by the product of standard deviations of the two variables. The resulting measure
is called the coefficient of correlation.
Cov(X, Y )
ρX,Y = p .
V ar(X)V ar(Y )
4
units of measurement. Consider the correlation between X and cY , where c > 0.
Cov(X, cY )
ρX,cY = p
V ar(X)V ar(cY )
cCov(X, Y )
= p
V ar(X)c2 V ar(Y )
Cov(X, Y )
= p
V ar(X)V ar(Y )
= ρX,Y .
More importantly, the correlation coefficient is restricted to the interval [−1, 1] thus providing
a very intuitive measure of the strength of association. A larger positive value of the corre-
lation coefficient indicates a stronger positive association between two variables. Similarly, a
smaller negative (closer to -1) value of the correlation coefficient indicates a stronger negative
association between two variables.
Remark. When ρX,Y = 1 or −1, there is a perfect linear relationship between X and Y .
Note that if there is a perfect nonlinear relationship between X and Y (such as Y = X 2 ),
the correlation coefficient will be strictly between −1 and 1, i.e −1 < ρX,Y < 1. Thus,
the correlation coefficient measures the degree of linearity in the relationship between two
variables.
The proof of Theorem 5 requires another result.
Cov(X, Y )
U = Y − X
V ar(X)
= Y − βX, (1)
where
Cov(X, Y )
β= .
V ar(X)
5
By the results of Theorem 3(e) and (h),
(Cov(X, Y ))2
V ar(Y ) − ≥ 0,
V ar(X)
or
(Cov(X, Y ))2 ≤ V ar(X)V ar(Y ). (3)
Taking the square root of the both sides of the inequality in (3), we obtain:
p
|Cov(X, Y )| ≤ V ar(X)V ar(Y ),
if and only if V ar(U ) = 0. However, that implies that U is not a random variable, i.e. U = a
for some constant a. From the definition of U we obtain
a = Y − βX.
Hence, the second part of the lemma holds with b = β = Cov(X, Y )/V ar(X).
The result of Theorem 5 follows immediately from the Cauchy-Schwartz inequality. The
latter implies that
p p
− V ar(X)V ar(Y ) ≤ Cov(X, Y ) ≤ V ar(X)V ar(Y ).
6
p
Dividing all sides of the inequality by V ar(X)V ar(Y ), we obtain
−1 ≤ ρX,Y ≤ 1,
1 1
(10 − 15)2 × + (20 − 15)2 × = 25.
2 2
1 1 1
(100 − 225)2 × + (250 − 225)2 × + (400 − 225)2 × = 10625.
3 2 6
We previously have computed that the covariance between the earnings and the price is 375.
Hence, the correlation is equal to
375
√ ≈ 0.73.
25 × 10625
The meaning and implications of uncorrelatedness are discussed in the following section.
ε = Y − a − bX.
7
Unless the correlation coefficient is exactly 1 or −1, it is impossible to find constants a and
b so that ε is zero with probability one. Nevertheless, let’s try to find a and b so that the
variance of the approximation error is as small as possible. Note that small variability of the
approximation error corresponds to a more accurate approximation. Hence, we are interested
in solving the following problem:
Cov(X, Y )
β = ,
V ar(X)
α = (EY ) − β(EX). (4)
Proof. We can find the least squares coefficients by solving the first-order conditions for the
least squares problem. Suppose that we can change the order of differentiation and integration.
∂ ∂
E(Y − a − bX)2 = E (Y − a − bX) 2
∂b ∂b
= −2 [E(Y − a − bX)X] .
∂ ∂
E(Y − a − bX)2 = E (Y − a − bX)2
∂a ∂a
= −2E(Y − a − bX).
EY − α − βEX = 0,
8
which proves (4).
Next, substitute the expression for α into 5. We have
0 = E [(Y − α − βX) X]
= E [(Y − (EY − βEX) − βX) X]
= E [((Y − EY ) − β(X − EX)) X]
= E [(Y − EY )X − β(X − EX)X]
= E [(Y − EY )X] − βE [(X − EX)X] .
Hence,
E [(Y − EY )X]
β= .
E [(X − EX)X]
However,
Also,
Remark. 1. The regression line goes through the point given by the expected values of X and
Y , since by (4),
EY = α + βEX.
Eε = 0.
Hence, the random approximation error has zero mean. The first-order condition (5) implies
9
that the approximation error is uncorrelated with X:
0 = EεX
= Cov(ε, X),
where the second line follows from Cov(ε, X) = EεX − (Eε)(EX) = EεX, since the mean of
ε is zero.
3. When X and Y are uncorrelated, ρX,Y = 0 and Cov(X, Y ) = 0, and therefore,
Cov(X, Y )
β= = 0.
V ar(X)
In that case, the best linear approximation to the relationship between X and Y is a flat
line (with zero slope). Note that uncorrelatedness does not necessarily mean that X and Y
are unrelated. It is possible that the relationship between X and Y can be described by a
non-trivial nonlinear function (and therefore the two random variables are related). However,
the best linear approximation to this relationship is a flat line.
4. The relationship between positively correlated variables is best approximated by a
regression line with a positive slope. The relationship between negatively correlated variables
is best approximated by a regression line with a negative slope.
10
In that case,
ˆ ˆ
EXY = xyfX (x)fY (y)dydx
ˆ ˆ
= xfX (x) yfY (y)dy dx
ˆ
= xfX (x)E(Y )dx
ˆ
= (EY ) xfX (x)dx
= (EY )(EX).
Thus, under independence the expected value of a product of random variables is equal to the
product of expectations. In particular, this implies that under independence,
The same results also hold for discrete random variables. The proof in the discrete case is
identical after replacing integrals with sums and PDFs with PMFs.
11