0% found this document useful (0 votes)
50 views

Entropy 2

This document summarizes key concepts in source coding and entropy. It defines entropy as the expected number of binary questions needed to determine the outcome of a random variable. Source coding is defined as mapping outcomes to binary codewords. Prefix codes allow instant decoding by not having one codeword be a prefix of another. The source coding theorem states that the expected code length must be greater than the entropy, but there exists a code within one bit of the entropy. The Kraft inequality provides a necessary and sufficient condition for a set of codeword lengths to correspond to a prefix code.

Uploaded by

Harsha
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Entropy 2

This document summarizes key concepts in source coding and entropy. It defines entropy as the expected number of binary questions needed to determine the outcome of a random variable. Source coding is defined as mapping outcomes to binary codewords. Prefix codes allow instant decoding by not having one codeword be a prefix of another. The source coding theorem states that the expected code length must be greater than the entropy, but there exists a code within one bit of the entropy. The Kraft inequality provides a necessary and sufficient condition for a set of codeword lengths to correspond to a prefix code.

Uploaded by

Harsha
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Lecture 1: Entropy and Source Coding

September 25, 2006

1 Entropy
Let X be a discrete random variable with alphabet X = {1, 2, . . . m}. Assume there
is a probability mass function p(x) over X . How many binary questions, on average,
does it take to determine the outcome?

Definition 1.1: The entropy of a discrete random variable X is defined as:


X 1
H(X) = p(x)log
p(x)
x∈X

which can interpreted as the expected value


1
H(X) = Ep [log ].
p(x)

2 Source Coding
Definition 2.1: A (binary) source code C for a random variable X is a mapping from
X to a (finite) binary string. Let C(x) be the codeword corresponding to x and let l(x)
denote the length of C(x).

We focus on codes that are “instantaneous”.

Definition 2.2: A code is called a prefix code or an instantaneous code if no codeword


is a prefix of any other codeword.

The nice property of a prefix code is that one can transmit multiple outcomes x1 , x2 , . . . xn
by just concatenating the strings into C(x1 )C(x2 ) . . . C(xn ), where the latter denotes
the concatenation of C(x1 ),C(x2 ) up to C(xn ), and this leads to decoding xi instantly
after xi is received. In this sense, prefix codes are “self punctuating”.
Let the expected length of C be:
X
L(C) = p(x)l(x)
x∈X

1
Theorem 2.3: The expected length of any (prefix) code is greater than the entropy, i.e.

L(C) ≥ H(X)

Furthermore, there exists a code such that

L(C) ≤ H(X) + 1

This theorem is actually more general and applies to uniquely extendable codes.

2.1 The Kraft Inequality


Theorem 2.4: (Kraft Inequality) Any prefix code satisfies:
X
2l(x) ≤ 1
x∈X

Conversely, given a set of codeword lengths which satisfy this inequality, then there
exists a prefix code with these lengths.

Proof: Consider flipping (unbiased) coins until either we have a codeword or no


codeword is possible. This process will terminate as the codewords are of finite length.
Furthermore, as the codewords are a prefix code, the process will terminate instantly
upon obtaining a codeword.
Hence
X
Pr[obtaining a codeword] = Pr[codeword x]
x∈X
X
= 2−l(x)
x∈X
≤1

where the last step follows since probabilities are bounded by 1. This proves the first
statement.
We proved the forward direction with a technique know as the “probabilistic method”.
For the converse, order the lengths in ascending order l1 to lm . Pick code words in this
order subject to the constraint that any previous codeword is not a prefix of the selected
code. To prove that this works, consider a full binary tree of depth lm . Associate each
codeword with a path on the tree — from the root to some internal node (the end node
of the codeword). The prefix condition states that the path of each codeword must
not contain an endpoint of another codeword’s path. With each leaf node, associate a
probability mass of 2−lm . Assign the probability mass of codeword i as 2−li . Note that
each codeword removes 2−li from the remaining mass to be allocated. Furthermore,
allocation is always possible, if there is enough remaining mass, by the prefix condition.
As the lengths satisfy the Kraft inequality, there is enough initial mass (of 1) to assign
all the items to valid codewords.

2
2.2 The proof of the source coding theorem
We first show that there exists a code within one bit of the entropy. Choose the lengths
as:
1
l(x) = dlog e
p(x)
This choice is integer and satisfies the craft inequality, hence there exists a code. Also,
we can upper bound the average code length as follows:
X X 1
p(x)l(x) = p(x)dlog e
p(x)
x∈X x∈X
X 1
≤ p(x)(log + 1)
p(x)
x∈X
= H(X) + 1

Now, let us prove the lower bound on L(C). Consider the optimization problem
X X
min p(x)l(x) such that 2−l(x) ≤ 1
l(x)
x∈X x∈X

The above finds the shortest possible code length subject to satisfying the Kraft in-
equality. If we relax the the codelengths to be non-integer, then we can obtain a lower
bound.
To do this, the Lagrangian is:
X X
L= p(x)l(x) + λ( 2−l(x) − 1)
x∈X x∈X

Taking derivatives with respect to l(x) and λ and setting to 0, leads to:

p(x) + ln 2λ2−l(x) = 0
X
2−l(x) − 1 = 0
x∈X

1
Solving this for l(x) leads to l(x) = log p(x) , which can be verified by direct substitu-
tion. This proves the lower bound.

You might also like