Final 2015 Sol
Final 2015 Sol
Instructions:
• Please start answering each question on a new page of the answer booklet.
• You are allowed to carry the textbook, your own notes and other course related ma-
terial with you. Electronic reading devices [including kindles, laptops, ipads, etc.] are
allowed, provided they are used solely for reading pdf files already stored on them and
not for any other form of communication or information retrieval.
• You are required to provide detailed explanations of how you arrived at your answers.
• You can use previous parts of a problem even if you did not solve them.
• As throughout the course, entropy (H) and Mutual Information (I) are specified in
bits.
• Throughout the exam ‘prefix code’ refers to a variable length code satisfying the prefix
condition.
• Good Luck!
Final Page 1 of 13
1. Three Shannon Codes (25 points)
Let {Ui }i≥1 be a stationary finite-alphabet source whose alphabet size is r. Note that
the stationarity property implies that P (ui ), P (ui |ui−1 ) do not depend on i. Throughout
this problem, assume that − log P (ui ) and − log P (ui |ui−1 ) are integers for all (ui , ui−1 ).
Recall the definition of a Shannon Code given in the lecture. Your TA’s decided to
compress this source in a lossless fashion using Shannon coding. However, each of them
had a different idea:
In this problem, you will investigate which amongst the three schemes is best for a general
stationary source.
(a) (10 points) If the source is memoryless (i.e. i.i.d.), compare the expected codeword
length per symbol, i.e., n1 E[l(U n )], of each scheme, assuming n > 2 is even.
(b) (15 points) Compare the schemes again, for the case where the source is no longer
memoryless and, in particular, is such that Ui−1 and Ui are not independent.
Solution:
We will first analyze each of the coding schemes for general stationary sources.
Idoia’s scheme: Use codeword length − log P (u) for a symbol u.
1
l¯1 = E[l1 (U n )]
n " #
n
1 X
= E − log P (Ui )
n i=1
n
1X
= E [− log P (U1 )] (stationarity, and linearity of expectation)
n i=1
= H(U1 ) (definition of entropy)
Kartik’s scheme: Use codeword length − log P (ui , ui+1 ) for each successive pair of sym-
bols (ui , ui+1 ).
1
l¯2 = E[l2 (U n )]
n
Final Page 2 of 13
n/2
1 X
= E − log P (U2i−1 , U2i )
n i=1
n/2
1X
= E [− log P (U1 , U2 )] (stationarity, and linearity of expectation)
n i=1
1
= H(U1 , U2 ) (definition of entropy)
2
Jiantao’s scheme: Use codeword length − log P (u1 ) for first symbol u1 , and then code-
word length − log P (ui |ui−1 ) for successive symbols.
1
l¯3 = E[l3 (U n )]
n " #
n
1 X
= E − log P (U1 ) + − log P (Ui |Ui−1 )
n i=2
" n
#
1 X
= E − log P (U1 ) + − log P (U2 |U1 ) (stationarity)
n i=2
1
= [H(U1 ) + (n − 1)H(U2 |U1 )] (linearity of expectation, definition of entropy)
n
(a) Because the source is i.i.d., all three coding schemes have the same average codeword
length, equal to the entropy H(U1 ). One can verify that when the source is i.i.d.
l¯1 = l¯2 = l¯3 = H(U1 ).
(b) Idoia’s codeword length is longest because H(U2 |U1 ) ≤ H(U2 ) = H(U1 ). For n = 2,
the performance of Kartik’s and Jiantao’s coding schemes are identical. However for
larger n, Jiantao’s coding scheme has smallest codeword length, since H(U1 , U2 ) =
H(U1 )+H(U2 |U1 ) ≥ 2H(U2 |U1 ), with equality iff the source is memoryless. Therefore,
in general, l¯1 ≥ l¯2 ≥ l¯3 .
Final Page 3 of 13
2. Channel coding with side information (35 points)
Yi = Xi ⊕ Zi , (1)
where Xi , Yi , Zi all take values in {0, 1}, and ⊕ denotes addition modulo-2. There are
channel states Si which determine the noise level of Zi as follows.
{(Si , Zi )} are i.i.d. (in pairs), independent of the channel input sequence {Xi }.
(a) (10 points) What is the capacity of this channel when both the encoder and the
decoder have access to the state sequence {Si }i≥1 ?
(b) (10 points) What is the capacity of this channel when neither the encoder nor the
decoder have access to the state sequence {Si }i≥1 ?
(c) (10 points) What is the capacity of this channel when only the decoder knows the
state sequence {Si }i≥1 ?
(d) (5 points) Which is largest and which is smallest among your answers to parts (a),
(b) and (c)? Explain.
Solution:
Final Page 4 of 13
= max H(X ⊕ Z|S) − H(Z|S) (Y = X ⊕ S ⊕ Z, and Z is independent of X)
p(X|S)
C = P (S = G)CG + P (S = B)CB
2 1
= (1 − h2 (1/4)) + (1 − h2 (1/3))
3 3
2 1
= 1 − h2 (1/4) − h2 (1/3).
3 3
(b) When neither encoder nor decoder has any state information, there is no way to
use the state information. This results in an average BSC with equivalent crossover
probability
2 1 1 1
p= · + ·
3 4 3 3
5
=
18
Thus, the capacity of this channel is simply that of a BSC(p), i.e.
C = 1 − h2 (5/18)
(c) The decoder has access to the state, which means that equivalently the channel output
can be viewed as the pair (Y, S). The capacity of the channel in this case is
C = max I(X; Y, S)
p(X)
Final Page 5 of 13
= max H(Y |S) − H(Z|S) (Y = X ⊕ S ⊕ Z, and Z is independent of X)
p(X)
Final Page 6 of 13
3. Modulo-3 additive noise channel (25 points)
(a) (5 points) Consider the modulo-3 additive white noise channel given by
Yi = Xi ⊕ Z i , (2)
where Xi , Zi , Yi all take values in the alphabet {0, 1, 2}, ⊕ denotes addition modulo-3,
and {Zi } are i.i.d. ∼ Z and independent of the channel input sequence {Xi }.
Zi
Xi Yi
Figure 1: Ternary additive channel.
where the maximization is over ternary random variables Z that take values in
{0, 1, 2} (and that satisfy the indicated constraint). Obtain φ() explicity, as well
as the distribution of the random variable, Z , that achieves the associated maximum.
(
0, if u = v
d(u, v) =
1, otherwise.
For U, V that are jointly distributed such that E[d(U, V )] ≤ D, justify the following
chain of equalities and inequalities
(i)
I(U ; V ) = H(U ) − H(U |V )
Final Page 7 of 13
(ii)
= H(U ) − H(U V |V )
(iii)
≥ H(U ) − H(U V )
(iv)
≥ H(U ) − φ(D),
where denotes subtraction modulo-3 and φ(D) was defined in Equation (4). Argue
why this implies that the rate distortion function of the source U is lower bounded
as
R(D) ≥ H(U ) − φ(D). (5)
The above inequality is known as the ‘Shannon lower bound’ (specialized to our
setting of ternary alphabets and Hamming loss).
(d) (8 points) Show that when U is uniform (on {0, 1, 2}), the Shannon lower bound
holds with equality, i.e.,
Solution:
where (i) follows from the definition of mutual information, (ii) is due to invariance of
entropy to translation of the RV (or to any one-to-one transformation), (iii) is due to
the channel model, (iv) to the independence of the additive channel noise component
on the channel input, and (v) is because Y is ternary. On the other hand, when PX
is the uniform distribution, the distribution of Y is uniform as well, in which case (v)
holds with equality.
(b) We have
φ() = max H(Z) = max H(Z), (7)
Z:P r(Z6=0)≤ Z:E[ρ(Z)]≤
Final Page 8 of 13
(
0, if z = 0
ρ(Z) =
1, otherwise.
where c(λ) is the normalization constant and λ ≥ 0 is tuned so that the constraint is
met with equality (when possible). In our case this boils down to the distribution
(
1 − , if z = 0
PZ (z) =
/2, otherwise,
for 0 ≤ ≤ 2/3. For 2/3 < ≤ 1, the uniform distribution is in the constraint set,
and is therefore the maximimizing distribution. Thus
(
H(Z ), if 0 ≤ ≤ 2/3
φ() =
log 3, if /3 < ≤ 1
Final Page 9 of 13
4. Gaussian source and channel (35 points)
• Gaussian Channel
Consider the parallel Gaussian channel which has two inputs X = (X1 , X2 ) and two
outputs Y = (Y1 , Y2 ), where
Y1 = X1 + Z 1
Y2 = X2 + Z 2 ,
(a) (10 points) Give an explicit formula for the capacity of this channel in terms of
P, σ12 , σ22 .
(b) (7 points) Suppose you had access to capacity-achieving schemes for the scalar
AWGN channel whose capacity we derived in class. How would you use them
to construct capacity-achieving schemes for this parallel Gaussian channel?
• Gaussian Source
Consider a two-dimensional real valued source U = (U1 , U2 ) such that U1 ∼ N (0, σ12 ),
and U2 ∼ N (0, σ22 ), and U1 is independent of U2 . Let d : R2 × R2 → R be the
distortion measure
We wish to compress i.i.d. copies of the source U , with average per-symbol distortion
no greater than D, i.e. the usual lossy compression setup discussed in class.
(a) (10 points) Evaluate the rate-distortion function R(D) explicitly in terms of the
problem parameters D, σ12 , σ22 .
(b) (8 points) Suppose you had access to good lossy compressors for the scalar
Gaussian source whose rate-distortion function we derived in class. How would
you use them to construct good schemes for this two-dimenstional Gaussian
source?
Solution:
Gaussian Channel
Final Page 10 of 13
By Shannon’s channel coding theorem, the capacity of this parallel Gaussian channel is
given by
0 1/σ12 −1/σ22
f (P1 ) = + . (16)
2(1 + P1 /σ12 ) 2(1 + (P − P1 )/σ22 )
P +σ22 −σ12
Setting it to zero, we have P1∗ = 2
. If |σ22 − σ12 | ≤ P , P1∗ ∈ [0, P ], the optimal
power allocation is given by
P + σ22 − σ12
P1∗ = (17)
2
2 2
P + σ1 − σ2
P2∗ = . (18)
2
If |σ12 − σ22 | > P , without loss of generality we assume σ12 <= σ22 . Then we should set
P1∗ = P, P2∗ = 0. In other words, if the quality of the two Gaussian channels are very
Final Page 11 of 13
different (in the sense that |σ12 − σ22 | > P ), then we should allocate all the power to the
stronger channel.
To sum up, we have the capacity of this parallel Gaussian channel equal to
P1∗ P − P1∗
1 1
Cparallel (P ) = log 1 + 2 + log 1 + , (19)
2 σ1 2 σ22
where
P +σ2 −σ2
2
2
1
|σ12 − σ22 | ≤ P
P1∗ = P |σ12 − σ22 | > P, σ12 < σ22 (20)
0 |σ12 − σ22 | > P, σ12 > σ22
Suppose we now have the capacity achieving schemes for single Gaussian channel. In
order to achieve the capacity of this parallel Gaussian channel, we first allocate power
P − P1∗ to channel 2. Then we take the codebook for Gaussian
P1∗ to channel 1, power
P∗
channel with rate 21 log 1 + σ12 and power P1∗ for channel 1, and take codebook for
1
P −P1∗
1
Gaussian channel with rate 2 log 1 + σ2 and power P − P1∗ for channel 2. The joint
2
codebook has rate Cparallel (P ).
Gaussian Source
By Shannon’s rate distortion theorem, the rate distortion function of this source is given
by
We note that in the above chain of inequalities, if we take the joint test channel PV1 ,V2 |U1 ,U2
to be of form PV1 |U1 PV2 |U2 , all inequalities hold equality. Hence, it suffices to solves the
Final Page 12 of 13
last optimization problem and the resulting answer is not only a lower bound but also an
upper bound on the rate distortion function of this two dimensional source.
Since the function 21 log(1/x) is non-increasing for x > 0, the minimum is attained when
D1 + D2 = D. We define function
1 σ2 1 σ22
g(D1 ) = max{0, log 1 } + max{0, log }. (28)
2 D1 2 D − D1
Setting it to zero, we have D1 = D/2. Hence, if D ≤ 2 min{σ12 , σ22 }, the optimal distortion
allocation is
When σ12 + σ22 ≥ D > 2 min{σ12 , σ22 }, without loss of generality we assume σ12 ≤ σ22 .
Then we should use zero rate to describe U1 , and allocate distortion D − σ12 to U2 . If
D > σ12 + σ22 , we simply use zero rate to describe both U1 and U2 .
In other words, the joint rate distortion function is given by
2σ12 2σ22
1 1
2
log D
+ 2
log D
D ≤ 2 min{σ12 , σ22 }
2 2
max{σ1 ,σ2 }
Rjoint (D) = 12 log D−min{σ 2 2 σ12 + σ22 ≥ D > 2 min{σ12 , σ22 } (31)
1 ,σ2 }
Suppose we now have good lossy compressors for Gaussian source. To achieve the rate
distortion function of this two dimensional source, if D ≤ 2 min{σ12 , σ22 }, we use the rate
distortion code for source U1 and U2 independently under distortion D/2 for each source.
If σ12 + σ22 ≥ D > 2 min{σ12 , σ22 }, we simply use a constant 0 to encode the source with
smaller variance (say U1 ), and use rate distortion code to encode another source with
distortion D − min{σ12 , σ22 }. If D > σ12 + σ22 , we simply use constant 0 to describe both
U1 and U2 .
Final Page 13 of 13