2018 Sol.1
2018 Sol.1
Instructions:
• The exam has 3 questions, totaling 100 points. (There are additional 20 points bonus)
• Please start answering each question on a new page of the answer booklet.
• You are allowed to carry the textbook, your own notes and other course related ma-
terial with you. Electronic reading devices [including kindles, laptops, ipads, etc.] are
allowed, provided they are used solely for reading pdf files already stored on them and
not for any other form of communication or information retrieval.
• You are required to provide a sufficiently detailed explanation of how you arrived at
your answers.
• You can use previous parts of a problem even if you did not solve them.
• As throughout the course, entropy (H) and Mutual Information (I) are specified in
bits.
• Throughout the exam ‘prefix code’ refers to a variable length code satisfying the prefix
condition.
• Good Luck!
Midterm Page 1 of 6
1. Universal Prefix Codes (35 points)
In this problem we consider binary prefix codes over the set of non-negative natural
numbers N = {1, 2, 3, . . . }. We do not know the probability distribution P ≡ (pj , j ∈ N),
but we do know that it is a monotone distribution, i.e. pj ≥ pj+1 ∀j ∈ N. We wish to
construct prefix codes which perform well irrespective of the source probabilities. For a
given code cj , j ∈ N (where each cj is a binary codeword), we denote the code lengths by
ljc , j ∈ N and the expected code length by L̄c := ∞ c j
P
j=1 j j . Also, 0 denotes a sequence
p l
of j zeros.
(a) (6 points) Consider the code uj = 0j 1. Is this code prefix free? Justify.
(b) (Bonus, 5 points) Find a monotone distribution P , such that H(P ) < ∞,
but L̄u = ∞. (it is fine to specify pj up to a normalizing factor).
(c) (8 points) Consider now the code bj , which is the binary representation of j (Eg:
b5 = 101 ). Note that the codelength of bj is given by: ljb = blog2 jc + 1. Is this code
prefix free?
(d) (8 points) For any monotone distribution P , show that the binary code bj in (c) has
expected code length L̄b ≤ H(P ) + 1.
(e) (8 points) Now, consider the code cj = 0blog2 jc+1 1bj with ljc = 2 blog2 jc + 3. Argue
that this code is prefix free.
(f) (5 points) For the code in (e), show that L̄c ≤ 2H(P ) + 3 for all monotone distribu-
tions P .
(g) (Bonus, 5 points) Can you suggest prefix codes which improve on the performance
of the code from part (e), i.e., achieve performance L̄c ≤ c1 H(P ) + c2 , where c1 < 2
(c1 is a constant, c2 is a lower-order term of H(P ))?
Solution to Problem 1
(a) This is a prefix code. Different codes uj have different number of zeros before 1.
1)−2 . This is well-defined since R ∞ −2
P
(b) ChoosePp∞j ∝ (j + −2 n=1 n
∞ log x
< ∞. Also, H(P ) < ∞
since j=1 (j + 1) log(j + 1) < ∞ (the integral 1 x2 dx is finite). However,
∞ ∞
X X 1
L̄u = pj (j + 1) ∝ =∞
j=1 j=1
j+1
R∞ dx
for the integral 1 x
diverges.
(c) This code is not prefix-free. For example, b1 = 1 is a prefix of b3 = 11.
(d) For monotone distributions, we have pj ≤ 1j jk=1 pk ≤ 1j for any j. Hence,
P
∞ ∞
X X 1
L̄b ≤ pj (log2 j + 1) ≤ pj (log2 + 1) = H(P ) + 1.
j=1 j=1
pj
Midterm Page 2 of 6
(e) Assume by contradiction that cj is a prefix of cj 0 for j 6= j 0 . Comparing the number
of zeros in the front, we must have blog2 jc = blog2 j 0 c. Hence, bj and bj 0 must have
the same length, and the prefix assumption implies bj = bj 0 . Since bj is the binary
representation of j, we then have j = j 0 , a contradiction!
(f) Similar to (d), we have jpj ≤ 1. Hence,
∞ ∞
X X 1
L̄c ≤ pj (2 log2 j + 3) ≤ pj (2 log2 + 3) = 2H(P ) + 3.
j=1 j=1
pj
R ∞ dx
(g) For ljc = blog2 j + A log2 log2 j + Bc, since 1 x(log x)α
< ∞ for any α > 1, suitable
P∞ −lc
choices of A, B give j=1 2 < 1. By Kraft’s inequality, there exist a prefix code
j
cj with codelength ljc . Using jpj ≤ 1 again, the average codelength for this code is
∞
X
L̄c ≤ pj (log2 j + A log2 log2 j + B)
j=1
∞
X 1 1
≤ pj log2 + A log2 log2 + B
j=1
pj pj
∞ ∞
!
X X 1
≤ H(P ) + A log2 pj log2 +B
j=1 j=1
p j
(a) (6 points) Explain intuitively why a perfectly secure system is safe from an eaves-
dropping adversary.
(b) (9 points) Show that H(M |C) ≤ H(K|C) (under any system, secure or not).
(c) (9 points) Using part (b), show that I(M ; C) ≥ H(M ) − H(K).
(d) (6 points) Part (c) suggests that a perfectly secure system must have H(K) ≥ H(M ).
Do you think this is practical? Explain.
(e) (Bonus, 5 points) Now, assume that M = K = C = {0, 1}n with M and K uni-
formly and independently distributed in {0, 1}n . Can you suggest perfectly secure
encryption and decryption functions E(K, M ) and D(K, C)?
Solution to Problem 2
Midterm Page 3 of 6
(a) For a perfectly secure system, M and C are independent. Hence, an eavesdropper
who observes the ciphertext C cannot infer any information of M from C.
(b) We have
where the last step follows from H(M |K, C) = 0, for M = D(K, C) is a deterministic
function of K, C.
(c) We have
The first inequality follows from (b); the second inequality is due to the fact that
conditioning reduces entropy.
(d) Under a perfectly secure system, 0 = I(M ; C) ≥ H(M ) − H(K), thus H(K) ≥
H(M ). This is not practical: usually the message is very long (i.e., H(M ) is large),
but we need to transmit/store the key which is as long as the message in a perfectly
secure system.
(e) Consider E(K, M ) = K ⊕ M, D(K, C) = K ⊕ C, where ⊕ denotes the coordinate-
wise modulo-2 sum. Clearly D(K, E(K, M )) = M . Moreover,
where the last step follows from the fact that both M, C are uniformly distributed
on {0, 1}n . Hence, this is a perfectly secure system.
Midterm Page 4 of 6
(b) Individual Sequences (12 points)
Let xn be a given arbitrary binary sequence, with n0 0’s and n1 1’s (n1 = n − n0 ).
You are also provided a compressor C which takes in any arbitrary i.i.d distribution
q(x) as a parameter, and encodes xn using:
1 1
L̄q = log
n q(xn )
ii. (6 points) Let X n be distributed i.i.d according to distribution p(x). Show that
for any > 0:
n n 1 1
P x ∈X : log − (H(p) + D(pkq)) < → 1, as n → ∞
n q(xn )
Solution to Problem 3
(a) i. A. We have
Midterm Page 5 of 6
= H(X1 ) + H(X2 ) − I(X1 ; X2 ) = 2.
We see that L̄q is convex in q, and taking derivative w.r.t q gives q ∗ = nn1 .
ii. By the previous part, it suffices to store n1 ∈ {0, 1, · · · , n} for full knowledge of
q(x). Hence, log(n + 1) bits are enough. The effective compression ratio is
log(n + 1) n1 log(n + 1)
L̄q + = H( ) + .
n n n
(c) i. By AEP, H(q) = H(p) suffices. This is also necessary, for a sequence of random
variables cannot converge in probability to two different limits.
ii. By LLN, we have
n
1 1 1X 1
log n
= log
n q(x ) n i=1 q(xi )
1 X 1
→ EP [log ]= p(x) log = H(p) + D(pkq)
q(x) x
q(x)
Midterm Page 6 of 6