Lecture 5
Lecture 5
Huffman Coding
• Two Variants
– when the probability model for the source is known
– when the source statistics are unknown
• Prefix codes and are optimum for a given model (set
of probabilities).
• Procedure is based on two observations regarding
optimum prefix codes:
1. In an optimum code, symbols that occur more
frequently (have a higher probability of occurrence) will
have shorter codewords than symbols that occur less
frequently.
2. In an optimum code, the two symbols that occur least
frequently will have the same length.
Huffman Coding: Procedure
• Codewords corresponding to the two lowest
probability symbols differ only in the last bit.
• That is, if and are the two least probable symbols
in an alphabet δ and β, if the codeword δ for was
m∗ 0, the codeword for β would be m∗ 1. Here
m is a string of 1s and 0s, and ∗ denotes
concatenation.
• Let us design a Huffman code for a source that
puts out letters from an alphabet = {a1,a2,a3,a4,a5 }
with probabilities as per table
• Entropy for this source is 2.122 bits/symbol.
• To design the Huffman code, we first sort the
letters in a descending probability order
Redundancy
• A measure of the efficiency of this code is its
redundancy—the difference between the
entropy and the average length. In this case,
the redundancy is 0.078 bits/symbol.
• The redundancy is zero when the probabilities
are negative powers of two.
Building the binary Huffman tree
Cost of a Huffman Tree
• Let p1, p2, ... , pm be the probabilities for the
symbols a1, a2, ... ,am, respectively.
• Define the cost of the Huffman tree T to be
m
C(T) ∑piri
i1
where ri is the length of the path from the root
to ai.
• C(T) is the expected length of the code of a
symbol coded by the tree T. C(T) is the bit rate
of the code.
11
Example of Cost
• Example: a 1/2, b 1/8, c 1/8, d 1/4
T
0 1
a 0 1
d
0 1
b c
T p smallest T
k p<q ’
k<h
h
p q
q p
T p smallest T
k q 2nd smallest q ’
<r
h
q k<h r
r p q p
C(T’) = C(T) + hq - hr + kr - kq = C(T) - (h-k)(r-q) < C(T)
50
• Condition 3: In the tree corresponding to the optimum code,
there must be two branches stemming from each intermediate
node.
– If there were any intermediate node with only one branch coming from
that node, we could remove it without affecting the decipherability of
the code while reducing its average length.
• Condition 4: Suppose we change an intermediate node into a
leaf node by combining all the leaves descending from it into a
composite word of a reduced alphabet. Then, if the original tree
was optimal for the original alphabet, the reduced tree is optimal
for the reduced alphabet.
– If this condition were not satisfied, we could find a code with smaller
average code length for the reduced alphabet and then simply expand
the composite word again to get a new code tree that would have a
shorter average length than our original “optimum” tree. This would
contradict our statement about the optimality of the original tree.
Optimality
• Principle 3
Assuming we have a Huffman tree T whose two
lowest probability symbols are siblings at maximum
depth, they can be replaced by a new symbol whose
probability is the sum of their probabilities.
– The resulting tree is optimal for the new symbol set.
T
T
p smallest ’
q 2nd smallest
h
q+p
q p
C(T’) = C(T) + (h-1)(p+q) - hp -hq = C(T) - (p+q)
20
Optimality Principle 3
•
(cont’)
If T’ were not optimal then we could find a
lower cost tree T’’. This will lead to a
lower cost tree T’’’ for the original alphabet.
T’ T’’ T’’
’
q+p
q+p q p
21
• In order to satisfy conditions 1, 2, and 3, the two least probable
letters would have to be assigned codewords of maximum length
lm.
• Furthermore, the leaves corresponding to these letters arise from
the same intermediate node. This is the same as saying that the
codewords for these letters are identical except for the last bit.
• Consider the common prefix as the codeword for the composite
letter of a reduced alphabet.
• Since the code for the reduced alphabet needs to be optimum for
the code of the original alphabet to be optimum, we follow the
same procedure again.
• To satisfy the necessary conditions, the procedure needs to be
iterated until we have a reduced alphabet of size one.
• But this is exactly the Huffman procedure.
• Therefore, the necessary conditions above, which are all satisfied
by the Huffman procedure, are also sufficient conditions.
Length of Huffman Code
• Optimal code for a source S, hence the
Huffman code for the source S , has an
average code length l bounded below by the
entropy and bounded above by the entropy
plus 1 bit. In other words
Recursive Huffman Tree
Algorithm
1. If there is just one symbol, a tree with one node is
optimal. Otherwise
2. Find the two lowest probability symbols with probabilities
p and q respectively.
3. Replace these with a new symbol with
probability p + q.
4. Solve the problem recursively for new symbols.
5. Replace the leaf with the new symbol with an internal node with
two children with the old symbols.
24
Quality of the Huffman
Code (Length of
•
Huffman Codes)
The Huffman code is within one bit of the entropy lower bound.
• Huffman code does not work well with a two symbol alphabet.
– Example: P(0) = 1/100, P(1) = 99/100
– HC = 1 bits/symbol
0 1
1 0
– H = -((1/100)*log2(1/100) + (99/100)log2(99/100))
= .08 bits/symbol
33
Extended Huffman Codes
Comparing this to (3.5), we can see that by encoding the output of the source in
longer
blocks of symbols we are guaranteed a rate closer to the entropy. Note that all we
are talking about here is a bound or guarantee about the rate.