0% found this document useful (0 votes)
9 views

Huffman

Uploaded by

arugunta.cs22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Huffman

Uploaded by

arugunta.cs22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Huffman Codes

Encoding messages
 Encode a message composed of a string
of characters
 Codes used by computer systems
 ASCII
• uses 8 bits per character
• can encode 256 characters
 ASCII and Unicode are fixed-length codes
 all characters represented by same
number of bits
Fixed Length code - Problems
 Suppose that we want to encode a message
constructed from the symbols A, B, C, D, and E
using a fixed-length code
 How many bits are required to encode each
symbol?
 at least 3 bits are required

 2 bits are not enough (can only encode four

symbols)
 How many bits are required to encode the

message DEAACAAAAABA?
 there are twelve symbols, each requires 3 bits

 12*3 = 36 bits are required


Drawbacks of fixed-length codes

 Same number of bits used to represent all


characters
 ‘a’ and ‘e’ occur more frequently than ‘q’
and ‘z’

 Potential solution: use variable-length codes


 variable number of bits to represent
characters when frequency of occurrence
is known
 short codes for characters that occur
frequently
Advantages of variable-length codes
 The advantage of variable-length codes over fixed-
length is short codes can be given to characters that
occur frequently
 on average, the length of the encoded message is
less than fixed-length encoding
 Potential problem: how do we know where one
character ends and another begins?
• not a problem if number of bits is fixed!

A = 00 0010110111001111111111
B = 01
C = 10 ACDBADDDDD
D = 11
Prefix property
 A code has the prefix property if no character code
is the prefix (start of the code) for another character
 Example:

Symbol Code
P 000 01001101100010
Q 11
R 01 RSTQPT
S 001
T 10
 000 is not a prefix of 11, 01, 001, or 10
 11 is not a prefix of 000, 01, 001, or 10 …
Code without prefix property
 The following code does not have prefix property

Symbol Code
P 0
Q 1
R 01
S 10
T 11

 The pattern 1110 can be decoded as QQQP, QTP,


QQS, or TS
Huffman coding tree
 Binary tree
 each leaf contains symbol (character)
 label edge from node to left child with 0
 label edge from node to right child with 1
 Code for any symbol obtained by following path from
root to the leaf containing symbol
 Code has prefix property
 leaf node cannot appear on path to another leaf
Building a Huffman tree
 Find frequencies of each symbol occurring in
message
 Begin with a forest of single node trees
 each contain symbol and its frequency
 Do recursively
 select two trees with smallest frequency at the root
 produce a new binary tree with the selected trees
as children and store the sum of their frequencies
in the root
 Recursion ends when there is one tree
 this is the Huffman coding tree
Example
 Build the Huffman coding tree for the message
This is his message
 Character frequencies

A G M T E H _ I S

1 1 1 1 2 2 3 3 5

 Begin with forest of single trees

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 1

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 2

2 2

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 3

2 2 4

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 4

2 2 4

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 5

2 2 4 6

1 1 1 1 2 2 3 3 5
A G M T E H _ I S
Step 6

4 4

2 2 2 2 6
E H

1 1 1 1 3 3 5
A G M T _ I S
Step 7

8 11

4 4 6 5
S

2 2 2 2 3 3
E H _ I

1 1 1 1
A G M T
Step 8
19

8 11

4 4 6 5
S

2 2 2 2 3 3
E H _ I

1 1 1 1
A G M T
Label edges
19
0 1

8 11
0 1
0 1

4 4 6 5
0 1 0 1 0 1 S

2 2 2 2 3 3
0 1 0 1 E H _ I

1 1 1 1
A G M T
Huffman code & encoded message
This is his message

S 11
E 010
H 011
_ 100
I 101
A 0000
G 0001
M 0010
T 0011

00110111011110010111100011101111000010010111100000001010

You might also like