0% found this document useful (0 votes)
9 views

2018 Sol.1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

2018 Sol.1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

EE376A - Information Theory

Midterm, Monday February 12th

Instructions:

• You have two hours, 6:00PM - 8:00PM

• The exam has 3 questions, totaling 100 points. (There are additional 20 points bonus)

• Please start answering each question on a new page of the answer booklet.

• You are allowed to carry the textbook, your own notes and other course related ma-
terial with you. Electronic reading devices [including kindles, laptops, ipads, etc.] are
allowed, provided they are used solely for reading pdf files already stored on them and
not for any other form of communication or information retrieval.

• Calculators are allowed for numerical computations.

• You are required to provide a sufficiently detailed explanation of how you arrived at
your answers.

• You can use previous parts of a problem even if you did not solve them.

• As throughout the course, entropy (H) and Mutual Information (I) are specified in
bits.

• log is taken in base 2.

• Throughout the exam ‘prefix code’ refers to a variable length code satisfying the prefix
condition.

• Good Luck!

Midterm Page 1 of 6
1. Universal Prefix Codes (35 points)
In this problem we consider binary prefix codes over the set of non-negative natural
numbers N = {1, 2, 3, . . . }. We do not know the probability distribution P ≡ (pj , j ∈ N),
but we do know that it is a monotone distribution, i.e. pj ≥ pj+1 ∀j ∈ N. We wish to
construct prefix codes which perform well irrespective of the source probabilities. For a
given code cj , j ∈ N (where each cj is a binary codeword), we denote the code lengths by
ljc , j ∈ N and the expected code length by L̄c := ∞ c j
P
j=1 j j . Also, 0 denotes a sequence
p l
of j zeros.

(a) (6 points) Consider the code uj = 0j 1. Is this code prefix free? Justify.
(b) (Bonus, 5 points) Find a monotone distribution P , such that H(P ) < ∞,
but L̄u = ∞. (it is fine to specify pj up to a normalizing factor).
(c) (8 points) Consider now the code bj , which is the binary representation of j (Eg:
b5 = 101 ). Note that the codelength of bj is given by: ljb = blog2 jc + 1. Is this code
prefix free?
(d) (8 points) For any monotone distribution P , show that the binary code bj in (c) has
expected code length L̄b ≤ H(P ) + 1.
(e) (8 points) Now, consider the code cj = 0blog2 jc+1 1bj with ljc = 2 blog2 jc + 3. Argue
that this code is prefix free.
(f) (5 points) For the code in (e), show that L̄c ≤ 2H(P ) + 3 for all monotone distribu-
tions P .
(g) (Bonus, 5 points) Can you suggest prefix codes which improve on the performance
of the code from part (e), i.e., achieve performance L̄c ≤ c1 H(P ) + c2 , where c1 < 2
(c1 is a constant, c2 is a lower-order term of H(P ))?

Solution to Problem 1

(a) This is a prefix code. Different codes uj have different number of zeros before 1.
1)−2 . This is well-defined since R ∞ −2
P
(b) ChoosePp∞j ∝ (j + −2 n=1 n
∞ log x
< ∞. Also, H(P ) < ∞
since j=1 (j + 1) log(j + 1) < ∞ (the integral 1 x2 dx is finite). However,
∞ ∞
X X 1
L̄u = pj (j + 1) ∝ =∞
j=1 j=1
j+1
R∞ dx
for the integral 1 x
diverges.
(c) This code is not prefix-free. For example, b1 = 1 is a prefix of b3 = 11.
(d) For monotone distributions, we have pj ≤ 1j jk=1 pk ≤ 1j for any j. Hence,
P

∞ ∞
X X 1
L̄b ≤ pj (log2 j + 1) ≤ pj (log2 + 1) = H(P ) + 1.
j=1 j=1
pj

Midterm Page 2 of 6
(e) Assume by contradiction that cj is a prefix of cj 0 for j 6= j 0 . Comparing the number
of zeros in the front, we must have blog2 jc = blog2 j 0 c. Hence, bj and bj 0 must have
the same length, and the prefix assumption implies bj = bj 0 . Since bj is the binary
representation of j, we then have j = j 0 , a contradiction!
(f) Similar to (d), we have jpj ≤ 1. Hence,
∞ ∞
X X 1
L̄c ≤ pj (2 log2 j + 3) ≤ pj (2 log2 + 3) = 2H(P ) + 3.
j=1 j=1
pj

R ∞ dx
(g) For ljc = blog2 j + A log2 log2 j + Bc, since 1 x(log x)α
< ∞ for any α > 1, suitable
P∞ −lc
choices of A, B give j=1 2 < 1. By Kraft’s inequality, there exist a prefix code
j

cj with codelength ljc . Using jpj ≤ 1 again, the average codelength for this code is

X
L̄c ≤ pj (log2 j + A log2 log2 j + B)
j=1
∞  
X 1 1
≤ pj log2 + A log2 log2 + B
j=1
pj pj
∞ ∞
!
X X 1
≤ H(P ) + A log2 pj log2 +B
j=1 j=1
p j

= H(P ) + A log2 H(P ) + B.

2. Perfect Secrecy (30 points)


Alice wishes to communicate a message M to Bob, where M is chosen randomly from
some alphabet M. To prevent an eavesdropping adversary from reading the message,
Alice encrypts the message using a deterministic function C = E(K, M ) to obtain the
ciphertext C ∈ C, where K ∈ K is a secret random key known to both Alice and Bob, and
is independent of the message. Bob receives the ciphertext and decrypts it back to M
using another deterministic function M = D(K, C). We say that this system is perfectly
secure if I(M ; C) = 0.

(a) (6 points) Explain intuitively why a perfectly secure system is safe from an eaves-
dropping adversary.
(b) (9 points) Show that H(M |C) ≤ H(K|C) (under any system, secure or not).
(c) (9 points) Using part (b), show that I(M ; C) ≥ H(M ) − H(K).
(d) (6 points) Part (c) suggests that a perfectly secure system must have H(K) ≥ H(M ).
Do you think this is practical? Explain.
(e) (Bonus, 5 points) Now, assume that M = K = C = {0, 1}n with M and K uni-
formly and independently distributed in {0, 1}n . Can you suggest perfectly secure
encryption and decryption functions E(K, M ) and D(K, C)?

Solution to Problem 2

Midterm Page 3 of 6
(a) For a perfectly secure system, M and C are independent. Hence, an eavesdropper
who observes the ciphertext C cannot infer any information of M from C.
(b) We have

H(M |C) = H(M, K|C) − H(K|M, C)


≤ H(M, K|C)
= H(K|C) + H(M |K, C)
= H(K|C)

where the last step follows from H(M |K, C) = 0, for M = D(K, C) is a deterministic
function of K, C.
(c) We have

I(M ; C) = H(M ) − H(M |C) ≥ H(M ) − H(K|C) ≥ H(M ) − H(K).

The first inequality follows from (b); the second inequality is due to the fact that
conditioning reduces entropy.
(d) Under a perfectly secure system, 0 = I(M ; C) ≥ H(M ) − H(K), thus H(K) ≥
H(M ). This is not practical: usually the message is very long (i.e., H(M ) is large),
but we need to transmit/store the key which is as long as the message in a perfectly
secure system.
(e) Consider E(K, M ) = K ⊕ M, D(K, C) = K ⊕ C, where ⊕ denotes the coordinate-
wise modulo-2 sum. Clearly D(K, E(K, M )) = M . Moreover,

I(M ; C) = H(C) − H(C|M ) = H(C) − H(K ⊕ M |M )


= H(C) − H(K|M ) = H(C) − H(K) = 0

where the last step follows from the fact that both M, C are uniformly distributed
on {0, 1}n . Hence, this is a perfectly secure system.

3. Mix of Problems (35 points)

(a) Pairwise Independence (12 points)


We say random variables X1 , X2 , . . . , Xn are pairwise independent if any pair of
random variables (Xi , Xj ), j 6= i are independent.
i. Let X1 , X2 , X3 be pairwise independent random variables, distributed identically
as Bern(0.5). Then:
A. (6 points) Show that: H(X1 , X2 , X3 ) ≤ 3. When is equality achieved?
B. (6 points) Show that: H(X1 , X2 , X3 ) ≥ 2. When is equality achieved?
ii. (Bonus, 5 points) Let Z1 , Z2 , . . . , Zk be i.i.d Bern(0.5) random variables. Show
that using the Zi ’s, you can generate 2k − 1 pairwise independent random vari-
ables, identically distributed as Bern(0.5).

Midterm Page 4 of 6
(b) Individual Sequences (12 points)
Let xn be a given arbitrary binary sequence, with n0 0’s and n1 1’s (n1 = n − n0 ).
You are also provided a compressor C which takes in any arbitrary i.i.d distribution
q(x) as a parameter, and encodes xn using:
1 1
L̄q = log
n q(xn )

bits per symbol (ignoring integer constraints).


i. (6 points) Given the sequence xn , what distribution q(x) will you choose as a
parameter (in terms of n0 , n1 ) to the compressor C, so that L̄q is minimized.
Justify.
ii. (6 points) When compressing any given individual sequence xn , we also need to
store the parameter distribution q(x) (as it is required for decoding). Show that
you can represent the parameter distribution q(x) using log(n + 1) bits. Find
the effective compression ratio.
(c) AEP(11 points)
Let p(x) and q(x) be two distinct distributions supported on the same alphabet X .
i. (5 points) Let X n be distributed i.i.d according to distribution p(x). Then, for
what distributons p(x), q(x) is the following relationship satisfied for all  > 0?
 
n n 1 1
P x ∈X : log − H(q) <  → 1, as n → ∞
n p(xn )

ii. (6 points) Let X n be distributed i.i.d according to distribution p(x). Show that
for any  > 0:
 
n n 1 1
P x ∈X : log − (H(p) + D(pkq)) <  → 1, as n → ∞
n q(xn )

Solution to Problem 3

(a) i. A. We have

H(X1 , X2 , X3 ) = H(X1 , X2 ) + H(X3 |X1 , X2 )


≤ H(X1 , X2 ) + H(X3 )
= H(X1 ) + H(X2 ) − I(X1 ; X2 ) + H(X3 ) = 3.

Equality holds if and only if X3 is independent of (X1 , X2 ), which together


with the pairwise independence implies that X1 , X2 , X3 are mutually inde-
pendent.
B. We have

H(X1 , X2 , X3 ) = H(X1 , X2 ) + H(X3 |X1 , X2 )


≥ H(X1 , X2 )

Midterm Page 5 of 6
= H(X1 ) + H(X2 ) − I(X1 ; X2 ) = 2.

Equality holds if X3 is a deterministic function of (X1 , X2 ). We also require


X3 to have the correct marginal distribution of Bern(0.5), and satisfy pair-
wise independent properties. The only functions possible are: X3 = X1 ⊕X2
and X3 = 1 ⊕ X1 ⊕ X2 .
For any non-empty subset S ⊂ {1, · · · , k}, we define a random variable XS =
ii. P
k
i∈S Zi . There are 2 − 1 random variables in total. To show XS ∼ Bern(0.5),
pick any i0 ∈ S and note that XS |(Zi )i6=i0 ∼ Bern(0.5). For pairwise inde-
pendence, suppose S 6= S 0 are two different non-empty subsets. By symmetry,
assume that we can pick i0 ∈ S − S 0 , then XS |(Zi )i6=i0 ∼ Bern(0.5) and XS 0 is a
deterministic function of (Zi )i6=i0 . This shows that XS and XS 0 are independent.
(b) i. For q(0) = 1 − q, q(1) = q, we have
1 1 n0 n1
L̄q = log n n
= − log(1 − q) − log(q).
n (1 − q) q
0 1 n n

We see that L̄q is convex in q, and taking derivative w.r.t q gives q ∗ = nn1 .
ii. By the previous part, it suffices to store n1 ∈ {0, 1, · · · , n} for full knowledge of
q(x). Hence, log(n + 1) bits are enough. The effective compression ratio is

log(n + 1) n1 log(n + 1)
L̄q + = H( ) + .
n n n

(c) i. By AEP, H(q) = H(p) suffices. This is also necessary, for a sequence of random
variables cannot converge in probability to two different limits.
ii. By LLN, we have
n
1 1 1X 1
log n
= log
n q(x ) n i=1 q(xi )
1 X 1
→ EP [log ]= p(x) log = H(p) + D(pkq)
q(x) x
q(x)

in probability under P , which is exactly the desired statement.

Midterm Page 6 of 6

You might also like