Week 2 - Solutions
1. We show the claim only in the case that X and Y are continuous random variables.
Let fX,Y be the joint distribution of X and Y , then
Z Z
E(aX + bY ) =
(ax + by)fX,Y (x, y)dxdy
Z Z
=
axfX,Y (x, y)dxdy +
byfX,Y (x, y)dxdy
Z
Z
= a xfX (x)dx + b yfY (y)dy
Z Z
= aE(X) + bE(Y )
2. (a) From the denition of the Variance.
Var(aX) = E((aX E(aX))2 )
= a2 E((X E(X))2 ) by linearity of the expectation
= a2 Var(X)
(b) The rst claim is obvious, we show only the second. If X and Y are independent,
then
E [(X E(X))(Y E(Y ))] = E [(X E(X))] E [(Y E(Y ))] = 0
(c) From the denition of the Covariance function, we have
Cov(X, Y ) = E [(X E(X))(Y E(Y ))]
= E [(Y E(Y ))(X E(X))]
= Cov(Y, X)
To prove linearity in the rst entry,
Cov(aX + bY, Z) = E [((aX + bY ) E(aX + bY ))(Y E(Y ))]
= aE [((X E(X))(Y E(Y ))] + bE [((Y ) E(Y ))(Y E(Y ))]
= aCov(X, Z) + bCov(Y, Z)
Linearity in the second component follows from the fact that the Covariance
function is symmetric.
Finally, for a, b R and random variable X and Y ,
Var(aX + bY )
= Cov(aX + bY, aX + bY )
= Cov(aX, aX + bY ) + Cov(bY, aX + bY ) linearity in the rst entry
= Cov(aX, aX) + Cov(aX, bY ) + Cov(bY, aX) + Cov(bY, bY ) linearity in the second entry
= Var(aX) + 2Cov(aX, bY ) + Var(bY )
= a2 Var(X) + 2abCov(X, Y ) + b2 Var(Y )
3. In the simple linear regression model,
SSreg =
=
n
X
i=1
n
X
(b
yi yi )2
(b0 + b1 xi (b0 + b1 x
))2
i=1
n
X
b21 (xi x
)2
i=1
= b21 Sxx
where in the second equality, we have use the fact that
ybi = b0 + b1 xi
and y = b0 + b1 x
One can conclude the by using the fact that b1 =
Sxy
Sxx
4. We compute directly,
Sxx =
=
=
=
n
X
i=1
n
X
i=1
n
X
i=1
n
X
(xi x
)xi
(xi x
)(xi x
+x
)
(xi x
)(xi x
) +
n
X
(xi x
)
x
i=1
(xi x
)(xi x
)
i=1
where the second term is zero since
n
X
(xi x
)
x=x
i=1
n
X
(xi x
)
i=1
=x
n
X
xi n
x=0
i=1
By a similar argument,
Sxy =
n
X
(xi x
)yi
i=1
n
X
(xi x
)(yi y + y)
i=1
n
X
(xi x
)(yi y) + y
i=1
n
X
n
X
i=1
(xi x
)(yi y)
i=1
(xi x
)
On the other hand
Sxy =
=
=
n
X
i=1
n
X
i=1
n
X
(xi x
)yi
yi xi x
yi xi
i=1
1
n
n
X
yi
i=1
n
X
i=1
xi
n
X
yi
i=1
5. Compute Var(b0 )
Var(b0 ) = Var(
y b1 x
)
= Var(
y ) + Var(b1 x
) 2Cov(
y , b1 x
)
=
2
2
+x
2
n
Sxx
Where we have made use of the following
n
Var(
y ) = Var(
1X
yi )
n
i=1
n
1 X
= 2
Var(yi ) by independence of yi
n
i=1
1
= 2
n
using the fact that Var(yi ) = 2
and
Var(b1 x
) = x
2 Var(b1 )
n
1 X
=x
2 2
(xi x
)2 Var(yi ) Using the fact that b1 =
Sxx
Sxy
Syy
i=1
1
= x
Sxx
2 2
To compute the covariance
1
x
Cov(
y , Sxy )
2
Sxx
n
1 X
= 2 x
Cov(yi , Sxy )
Sxx
Cov(
y , b1 x
) =
1
x
2
Sxx
i=1
n
X
Cov(yi ,
i=1
n
X
(xj x
)yj ) by linearity in the rst component
j=1
n
n
1 XX
= 2 x
(xj x
)Cov(yi , yj ) by linearity in the second component
Sxx
1
x
2
Sxx
i=1 j=1
n
X
(xi x
) 2
Cov(yi , yj ) = 2 when j = i
i=1
=0
6. Compute Cov(b0 , b1 )
Cov(b0 , b1 ) = Cov(
y b1 x
, b1 )
= Cov(
y , b1 ) Cov(b1 x
, b1 )
= Cov(
y , b1 ) Cov(b1 x
, b1 )
=
xCov(b1 , b1 ) we used the results of the previous question
1
=
x 2 Sxx
7. Let X N (0, 1) and Y 2 (n), compute the density of T := XY /n
p
P(T t) = P(X t Y /n)
Z
=
fX,Y (x, y)dxdy
xt
y/n
Z
=
xt
fX (x)fY (y)dxdy
y/n
Z ty/n
fY (y)
Z
fX (x)dxdy
p
fY (y)FX (t y/n)dy
assuming that we can dierentiate under the integral sign to obtain
Z
p
p
fY (y) y/nfX (t y/n)dy
Z0
y t2 y
n
1
1 y2
2
=
y
e
e 2n
n
2n
2 2 n2
0
Z
2
n1
1
1
1 ( t +1)y
2 e 2 n
= n
y
dy
2n 0
2 2 n2
fT (t) =
Change variable by setting 12 ( tn + 1)y = z . We obtain using the fact that ( 12 ) =
and the denition of the Gamma function,
2
1
2
n1
Z
2
n1
n1
t2
t
1
2
+1
2
+1
z 2 ez 2dz
n
n
n
2n 0
2
1
n1
2
2
2
n+1
1
t
t
n+1
= n
+1
2 2
+1
n
n
2
2 2 n 2 n2
n+1
n+1
2
2
2 2 n+1
t
2
= n+1
+
1
n
2 2 n 21 n2
2
n+1
2
1
1
t
=
+1
.
1 n
B( 2 , 2 ) n n
1
fT (t) = n
22
1 n
Where the last inequality follows from the fact that B( 12 , n2 ) = ( (2 )n+1( 2) ) .
2
8. It is enough to notice that xT AT Ax = kAxk2 0 and xT AAT x = kAT xk2 0
4
Week 4 - Solutions
1. Let X = CY , then from the denition of the Covariance function for random vectors
Var(CX) = E((CX E(CX))((CX E(CX))T )
= E((CX E(CX))([C(X E(X))]T )
= CE((X E(X))((X E(X))T )C T
= CVar(X)C T .
Recall that b = X(X T X)1 X T y . Therefore, by using the above formula and the fact
that X T X is a symmetric matrix.
Var(b) = Var((X T X)1 X T y)
= (X T X)1 X T Var(y)X(X T X)1
= X(X T X)1 X T 2 IX(X T X)1
= 2 (X T X)1 .
2. Given a n n matrix X , tr(X) =
i=1 Xii .
Pn
(a) Show tr(cX) = ctr(X).
tr(cX) =
n
X
cXii = c
i=1
n
X
Xii = ctr(X)
i=1
(b) Show tr(X + Y ) = tr(X + Y ).
tr(X + Y ) =
n
n
n
X
X
X
[Xii + Yii ] =
Xii +
Yii = tr(X) + tr(Y )
i=1
i=1
i=1
(c) Show tr(XY ) = tr(Y X).
tr(XY ) =
n X
n
X
i=1 j=1
n X
n
X
Xij Yji
Yji Xij
j=1 i=1
= tr(Y X)
3. Prove the formula E(y T Ay) = tr(AV ) + T A, where V := Var(y) and := E(y).
T
E(y Ay) =
=
n X
n
X
i=1 j=1
n
n X
X
i=1 j=1
n
n X
X
E[yi Aij yj ]
Aij [Cov(yi , yj ) + E(yi )E(yj )]
Aij Vji +
i=1 j=1
n X
n
X
i=1 j=1
= tr(AV ) + T A
E(yi )Ai,j E(yj )
where in the second equality, we have used the fact that
Cov(yi , yj ) = E(yi yj ) E(yi )E(yj )
4. To nd yb, we need to solve the following system of equations
yb1
7
1 1
1
1 1
1
9
yb2 =
0 =
2 1 1
2 1 1
12
yb3
2
Please enjoy yourself by doing row reduction.
5. Suppose v1 , . . . , vk is an orthogonal basis of V . Since yb is assumed to be in V then
y =
k
X
ci vi
i=1
for some ci R. It is then sucient to nd the ci . To do that, we multiple by vjT on
both hand sides and
vjT y =
k
X
ci vjT vi = ci kvj k2 ,
i=1
since vjT vi = 0 if i 6= j .
6. To show that c = (c1 , . . . , ck ) where where c1 = y T xi /kxi k2 solves the normal equation,
it is sucient to notice that since X = (x1 , . . . , xk ) is an orthogonal basis for V , then
kx1 k2
XT X =
0
...
kxk k2
and by substituting c = (c1 , . . . ck ) into the normal equation, we see that c satises
the normal equation.
7. The projection of y onto V is given by
yb =
k
X
y T xi
xi
kxi k2
i=1
Week 6 - Solutions
1. For vectors x and y , by using the fact that xT y = y T x, we have
kx yk2 + kx + yk2 = (x y)T (x y) + (x + y)T (x + y)
= xT x 2xT y + y T y + xT x + 2xT y + y T y
= 2kxk2 + 2kyk2
2. Given the vector y and x, add and subtract yb to y cx.
y cx = y cx yb + yb
= (y yb) + (b
y cx).
The vector y yb is perpendicular to yb cx = (b c)x. Therefore from Pythagoras
theorem
ky cxk2 = ky ybk2 + kb
y cxk2
ky ybk2
This shows that the distance from y to yb is the shortest amongst all vectors of the
form cx for c R.
3. Expand ky cxk2 = (y cx)T (y cx) to obtain
ky cxk2 = kyk 2cy T x + c2 kxk2
which is a parabolic equation in c and from the previous question we know that it has
an unique minimum. Therefore the determinate is less or equal to zero. That is
4c2 |y T x|2 4c2 kxk2 kyk2 0 = |y T x|2 kxk2 kyk2
4. (a) Compute the moment generating function of 2k random variable.By making the
substitution ( 12 t)x = y
Z
e
0
tx
k
2
( k2 )
k
1
2
x2
dx =
2
Z
=
1
2
k
2
= (2
x 2 1 e( 2 t)x dx
k
2 ( k2 )
Z
( k2 )
( k2 )
k
2
k
2
((21 t)1 y) 2 1 ek (21 t)1 dy
k
(21 t)1 2 y 2 1 ey (21 t)1 dy
0
k2
t)
1
k
2
2
k 1
= (21 t) 2 k
22
k
= (1 2t) 2
( k2 )
y 2 1 ey dy
To compute the expectation
M 0 (t)|t=0 =
k
k
d
(1 2t) 2 |t=0 = k(1 2t) 2 1 |t=0 = k
dt
1 ). Therefore under H , we have
(b) Recall that SSreg = b21 Sxx and b1 N (1 , 2 Sxx
0
1 2
2
b
S
.
Therefore
1
2 1 xx
E(
1
1
SSreg ) = E( 2 b21 Sxx ) = 1
2
(c) Computing the SSreg in general
E(SSreg ) = E(b21 Sxx )
= Sxx E(b21 )
= Var(b1 ) + E(b1 )2 Sxx
1
= 2 Sxx
+ 12 Sxx
= 2 + 12 Sxx
We can that E(SSreg ) under the null is smaller, since 12 Sxx 0.
(d) Recall that 12 SSres = (np)
b2 2np . In the simple linear regress model p = 2.
2
Therefore
E(SSres ) = 2 (n 2)
Using the fundamental identity we have show that
E(SStotal ) = E(SSreg ) + E(SSres )
= 2 (n 2) + 2 + 12 Sxx
= 2 (n 1) + 12 Sxx
under the null 1 = 0, E(SStotal ) = 2 (n 1).
5. Properties of e.
(a) In the simple linear regression model
E(ei ) = (yi ybi ) = E(0 + 1 xi b0 + b1 xo )
which is equal to zero, since E(b0 ) = 0 and E(b1 ) = 1 .
Alternatively To show that E(e) = E(y Xb) = 0, it is enough to use the fact
that b is an unbiased estimator of and write
E(y Xb) = X XE(y) = X X = 0
(b) Xb = X(X T X)1 X T y = Hy .
(c) The matrix H is a n n matrix. the matrix H is symmetric, since
H T = (X(X T X)1 X T )T = (X T )T ((X T X)1 )T X T = X((X T X)1 )X T = H,
and X T X is symmetric.
(d) Computing the variance of e = (I H)y .
Var(e) = Var((I H)y)
= (I H)Var(y)(I H)T
= 2 (I H)(I H)T
= 2 (I H)
where the last equality holds since I H is symmetric and idempotent.
8
(e) From the above, we can write that
Var(ei ) = 2 (1 Hii )
Cov(ei , ej ) = 2 Hij