Graph Theory - Important Application of Trees Huffman Coding
Graph Theory - Important Application of Trees Huffman Coding
Huffman coding
Prepared By:
Dr. Eng. Moustafa Reda A. Eltantawi
Spring 2021
Dr. Eng. Moustafa Reda A. 1
Eltantawi
Content
1. Encoding and decoding messages
• Fixed-length coding
• Variable-length coding
2. Huffman coding
Encoding and Decoding
• Encoding
Given a code and a message. It is easy to
encode the message. Just replace the
characters by the codewords.
• Decoding
•Given an encoded message, decoding is the
process of turning it back into the original
message.
•A message is uniquely decodable if it can
only be decoded in one way 3
Fixed-length coding
Fixed-length coding
The code of which, each character (or symbol) has the same number of
bits, i.e. each codeword has the same length.
Problem
• Consider a message containing only the characters in the alphabet {’a’,
’b’, ’c’, ’d’}.
Question
How many bits do we need to encode uniquely each character in a
message made up of characters from an n-letter alphabet ?
Answer
log n bits at least.
Variable-length coding
Each character is assigned a different length. In a variable-length
code codewords may have different lengths.
Decoding
When decoding, only one possible message:
Message = aaabcabc. . .which is the correct message.
Prefix-free codes
• What is a prefix?
• 00 is prefix of 001
• 110 is prefix of 110111
• 111101 is prefix of 111101001
• Prefix-free codes
• A prefix-free code is an encoding scheme where the code of a
character is not the prefix of any other character:
• Encoding scheme 1 in the previous example is not a prefix-free
code.
• Encoding scheme 2 in the previous example is a prefix-free
code.
• Prefix-Codes
Fixed-length codes are always uniquely decipherable.
• For instance, if e were encoded with 0, a with 1, and t with 01, then
Good Luck 9
• Prefix-free codes: advantage
• A prefix-free code allows to decode the message uniquely:
the code represents only one possible message.
• This is the case in encoding scheme 2.
• Example
Consider Message = addd in the alphabet
{’a’, ’b’, ’c’, ’d’}
Huffman coding...
Objective
• Huffman coding is an algorithm used for lossless
data compression.
Applications
• Several data compression softwares, WinZip, zip,
gzip,. . . Use lossless data encoding.
Data compression
Minimizing number of bits.
Huffman Coding
• Huffman codes can be used to compress
information
– Like WinZip – although WinZip doesn’t use the Huffman
algorithm
– JPEGs do use Huffman as part of their compression
process
We now write a 0 on each branch extending to the left and a 1 on each branch extending
to the right.
To show how the encoding works, let us write the word MAD using the code. Follow the
unique path from the root of the tree down to the appropriate leaf, noting in order the
labels on the edges. 14
Binary Encoding Tree
15
Binary Encoding Tree
We can easily translate this code using
our tree. Let us see how we could
decode 000011100. Referring to the
tree, we can see that there is only one
path from the root to a leaf that can
give rise to those first three 0s, and
that is the path leading to M. So we
can begin to separate the code word
into letters: 000-011100. Again
following down from the root, the path
011 leads us unambiguously to A. So we
have 000-011-100. The path 100 leads
unambiguously to D.
The reason we can translate the string
of 0s and 1s back to letters without
ambiguity is that no letter has a code
that is the start of the code for a
different letter. 16
Check Point
1. Use the binary encoding tree to write the binary code for each of the
following words: FEET, ANT
2. Use the encoding tree to find the word represented by each of the
following codes: 1000101001101 , 0001111001
3. Decode the following message (commas are inserted to show
separation of words). 00001010101101, 0000101, 011101,
1010101010111.
17
Huffman’s Algorithm for
Forming the Coding Tree
– Place each symbol in leaf
• Weight of leaf = symbol frequency
– Select two trees L and R (initially leafs)
• Such that L, R have lowest frequencies in tree
– Create new (internal) node
• Left child L & Right child R
• New frequency frequency( L ) + frequency( R )
– Repeat until all nodes merged into one tree.
meaning that ’a’ appears only once in the message, ’b’ appears
3 times in the message. . .
Example 1.
From the frequency table, build a forest of binary trees. Initially,
each tree in the forest contains only a root corresponding to a
character of the alphabet and its frequency (that we will call
weight):
• Merge two trees with the smallest two frequencies, label left
edge from the root of the merged tree 0, and the right edge 1,
• the weight of the root of the merged tree is the sum of the
frequencies (weight) of its left and right children.
Step Two
Example 1.
Step
Three
Step
Four
Example 1.
Step Five
Step Six
Example 1.
Step Seven
Example 1.
Step
Eight
Example 1.
The code of each character is obtained by concatenating the labels of the
edges on the path from the root to the node representing the character:
Let fi be the frequency of a character and di the number of bits in the code of
that character:
d f
7
i 1 i i
• The total number of bits required to encode the message M =
= 5·1+5·3+4·4+3·10+2·13+2·12+2·15 = 146 bits.
• We need 146 bits to encode the message with the given frequency table.
Example 2.
• As another example, lets take the string:
“duke blue devils”.
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
Initialization
Set the
repetition i,1 s,1 b,1 v,1 k,1 l,2 sp,2 d,2 u,2 e,3
in a Ascending
order
i,1 s,1
b,1 v,1 k,1 l,2 sp,2 d,2 u,2 e,3
Pick the two leaves
of least repetitions
Form a 2
subtree
of these
two leaves i,1 s,1 b,1 v,1 k,1 l,2 sp,2 d,2 u,2 e,3 29
Huffman Coding
Repeat the
repetition
2 l,2 sp,2 d,2 u,2 e,3
in a descending
order b,1 v,1 k,1 i,1 s,1
Repeat the
repetition k,1 2 2 l,2 sp,2 d,2 u,2 e,3
in a Ascending
order b,1 v,1 i,1 s,1 30
Huffman Coding
Form a new 3 2 l,2 sp,2 d,2 u,2 e,3
subtree from the
new two least- k,1 2
i,1 s,1
repetitions
b,1 v,1
b,1 v,1
b,1 v,1 32
Huffman Coding
Form a new 5 e,3 4 4
subtree from the 2 3 d,2 u,2 l,2 sp,2
new two least-
repetitions i,1 s,1 k,1 2
b,1 v,1
e,3 4 4 5
Repeat the
d,2 u,2 l,2 sp,2 2 3
Repetition in a
Ascending order i,1 s,1 k,1 2
b,1 v,1
7 4 5
Form a new
l,2 sp,2 2 3
subtree from the e,3 4
new two least- i,1 s,1 k,1 2
repetitions d,2 u,2
33
b,1 v,1
Huffman Coding
Repeat the 4 5 7
Repetition in a
Ascending order 2 3
l,2 sp,2 e,3 4
i,1 s,1 k,1 2
d,2 u,2
b,1 v,1
9 7
Form a new
subtree from the e,3 4
new two least- 4 5
d,2 u,2
repetitions l,2 sp,2 2 3
b,1 v,1 34
Huffman Coding
7 9
Repeat the
Repetition in a e,3 4 4 5
Ascending order
d,2 u,2 l,2 sp,2 2 3
7 9
Form a new subtree
from the new two e,3 4 4 5
least-repetitions.
d,2 u,2 l,2 sp,2 2 3
This is the required
Coding Tree. i,1 s,1 k,1 2
35
b,1 v,1
Huffman Coding Tree e
d
00
010
16
u 011
7 9 l 100
sp 101
e,3 4 4 5 i 1100
s 1101
d,2 u,2 l,2 sp,2 2 3
k 1110
i,1 s,1 k,1 2 b 11110
v 11111
b,1 v,1
Thus, “duke blue devils” turns into:
010 011 1110 00 101 11110 100 011 00 101 010 00 11111 1100 100 1101 52 bits 7 bytes
5 1 1 7 8 2 3 6 5 1 1 7 8 2 3 6
7 7 11
4 4
2 2
5 1 1 7 8 2 3 6 5 1 1 8 2
7 3 6
Graph Theory Ch. 2. Trees and Distance 38
Huffman coding
7 11
2
11 14
5 1 1 7 8 2 3 6
7
5 1 1 7 8 2 3 6
14
19
7
11
2
7
4
5 1 1 7 8 2 3 6
5 1 1 7 8 2 3 6
14 19
11
14
19
5 1 1 7 8 2 3 6 7 11
7 8
3
4 5
2 6
2
41
Graph Theory` 1 1
Huffman coding
14
19
7 11
7 8
3 0 1
4 5 0 1 0 1
2 6
2 0 1 d e 0 1
4
1 1 g
0 1
a h
0 1
f
b c
Character b c f g a h d e
Frequency 1 1 2 3 5 6 7 8
Code 00101 00100 0011 000 110 111 01 10
Length 5 5 4 3 3 3 2 2
42
Example 4.
Find the Huffman Coding Tree for:
A C E H I
3 5 8 2 7
H A C I E
2 3 5 7 8
Huffman Tree Construction 2
H A
C I E
2 3
5 7 8
5
H A H A
2 3 2 3
C C
5 5 I E E 5 5
I
10 7 8 7 8 10
Huffman Tree Construction 5
H A H A
2 3 2 3
E C I E
I C
7 8 5 5 5 5 7 8
15 10 10 15
0
10
1 0
15
1
1111001
0 1
25
H A
2 3
0
5
1
C
5
I
7
E
8
1111001
0 1 0
1
E
10 15
0 1
25
Huffman Decoding 3, 4
H A
2 3
0
5
1
C
5
I
7
E
8
1111001
0 1 0
1
15
E
10
0 1
25
H A
2 3
0
5
1
C
5
I
7
E
8
1111001
0 1 0
1
15
EE
10
0 1
25
H
Huffman Decoding 5, 6, 7
A
2 3
0
5
1
C
5
I
7
E
8
1111001
0 1 0
15 1
10
0
EE
25 1
H A
2 3
C I E
0
5
1
5 7 8 1111001
0 1 0
10 15 1
0
25 1
EE
H A
2 3
C I E
0
5
1
5 7 8 1111001
0 1 0
10 15 1
0
25 1
EEA
Home Work
A B C D E
20 15 5 15 45