Huffman Code
Huffman Code
Encoding messages
Wasted space
Unicode uses twice as much space as ASCII
• inefficient for plain-text messages containing
only ASCII characters
Same number of bits used to represent all characters
‘a’ and ‘e’ occur more frequently than ‘q’ and ‘z’
A = 00 0010110111001111111111
B = 01
C = 10 ACDBADDDDD
D = 11
Prefix property
Symbol Code
P 000 01001101100010
Q 11
R 01 RSTQPT
S 001
T 10
000 is not a prefix of 11, 01, 001, or 10
11 is not a prefix of 000, 01, 001, or 10 …
Code without prefix property
Symbol Code
P 0
Q 1
R 01
S 10
T 11
DEAACAAAAABA
Symbol Code
A 0
B 10
C 110
D 1110
E 11110
1110111100011000000100 22 bits
Another possible code
DEAACAAAAABA
Symbol Code
A 0
B 100
C 101
D 1101
E 1111
1101111100101000001000 22 bits
Better code
DEAACAAAAABA
Symbol Code
A 0
B 100
C 101
D 110
E 111
11011100101000001000 20 bits
What code to use?
Answer: Yes!
Huffman coding tree
Binary tree
each leaf contains symbol (character)
label edge from node to left child with 0
label edge from node to right child with 1
Code for any symbol obtained by following path from
root to the leaf containing symbol
Code has prefix property
leaf node cannot appear on path to another leaf
note: fixed-length codes are represented by a
complete Huffman tree and clearly have the prefix
property
Building a Huffman tree
A G M T E H _ I S
1 1 1 1 2 2 3 3 5
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 1
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 2
2 2
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 3
2 2 4
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 4
2 2 4
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 5
2 2 4 6
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 6
4 4
2 2 2 2 6
E H
1 1 1 1 3 3 5
A G M T _ I S
Step 7
8 11
4 4 6 5
S
2 2 2 2 3 3
E H _ I
1 1 1 1
A G M T
Step 8
19
8 11
4 4 6 5
S
2 2 2 2 3 3
E H _ I
1 1 1 1
A G M T
Label edges
19
0 1
8 11
0 1 0 1
4 4 6 5
0 1 0 1 0 1 S
2 2 2 2 3 3
0 1 0 1 E H _ I
1 1 1 1
A G M T
Huffman code & encoded message
This is his message
S 11
E 010
H 011
_ 100
I 101
A 0000
G 0001
M 0010
T 0011
00110111011110010111100011101111000010010111100000001010