Huffman
Huffman
Encoding messages
Encode a message composed of a string
of characters
Codes used by computer systems
ASCII
• uses 8 bits per character
• can encode 256 characters
ASCII and Unicode are fixed-length codes
all characters represented by same
number of bits
Fixed Length code - Problems
Suppose that we want to encode a message
constructed from the symbols A, B, C, D, and E
using a fixed-length code
How many bits are required to encode each
symbol?
at least 3 bits are required
symbols)
How many bits are required to encode the
message DEAACAAAAABA?
there are twelve symbols, each requires 3 bits
A = 00 0010110111001111111111
B = 01
C = 10 ACDBADDDDD
D = 11
Prefix property
A code has the prefix property if no character code
is the prefix (start of the code) for another character
Example:
Symbol Code
P 000 01001101100010
Q 11
R 01 RSTQPT
S 001
T 10
000 is not a prefix of 11, 01, 001, or 10
11 is not a prefix of 000, 01, 001, or 10 …
Code without prefix property
The following code does not have prefix property
Symbol Code
P 0
Q 1
R 01
S 10
T 11
A G M T E H _ I S
1 1 1 1 2 2 3 3 5
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 1
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 2
2 2
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 3
2 2 4
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 4
2 2 4
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 5
2 2 4 6
1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 6
4 4
2 2 2 2 6
E H
1 1 1 1 3 3 5
A G M T _ I S
Step 7
8 11
4 4 6 5
S
2 2 2 2 3 3
E H _ I
1 1 1 1
A G M T
Step 8
19
8 11
4 4 6 5
S
2 2 2 2 3 3
E H _ I
1 1 1 1
A G M T
Label edges
19
0 1
8 11
0 1
0 1
4 4 6 5
0 1 0 1 0 1 S
2 2 2 2 3 3
0 1 0 1 E H _ I
1 1 1 1
A G M T
Huffman code & encoded message
This is his message
S 11
E 010
H 011
_ 100
I 101
A 0000
G 0001
M 0010
T 0011
00110111011110010111100011101111000010010111100000001010