0% found this document useful (0 votes)
10 views

Graph Theory - Important Application of Trees Huffman Coding

The document discusses Huffman coding, which is a variable-length coding technique used for lossless data compression. It works by assigning variable-length codewords to input characters based on their frequencies, where more common characters are represented using fewer bits than less common characters. The document provides examples of how Huffman coding assigns codes via a binary tree structure and shows the step-by-step process of constructing the Huffman tree from character frequencies.

Uploaded by

kareemkadrey02
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Graph Theory - Important Application of Trees Huffman Coding

The document discusses Huffman coding, which is a variable-length coding technique used for lossless data compression. It works by assigning variable-length codewords to input characters based on their frequencies, where more common characters are represented using fewer bits than less common characters. The document provides examples of how Huffman coding assigns codes via a binary tree structure and shows the step-by-step process of constructing the Huffman tree from character frequencies.

Uploaded by

kareemkadrey02
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Graph Theory

Huffman coding

Prepared By:
Dr. Eng. Moustafa Reda A. Eltantawi
Spring 2021
Dr. Eng. Moustafa Reda A. 1

Eltantawi
Content
1. Encoding and decoding messages
• Fixed-length coding
• Variable-length coding

2. Huffman coding
Encoding and Decoding
• Encoding
Given a code and a message. It is easy to
encode the message. Just replace the
characters by the codewords.

• Decoding
•Given an encoded message, decoding is the
process of turning it back into the original
message.
•A message is uniquely decodable if it can
only be decoded in one way 3
Fixed-length coding
Fixed-length coding
The code of which, each character (or symbol) has the same number of
bits, i.e. each codeword has the same length.

Problem
• Consider a message containing only the characters in the alphabet {’a’,
’b’, ’c’, ’d’}.

• The ASCII code (unsigned int) representation of the characters


in the message is not appropriate: we have only 4 characters, each
of which is coded using 8 bits.

• A code that uses only 2 bits to represent each character would


be enough to store any message that is a combination of only 4
characters (00, 01, 10, 11).
Fixed-length coding

Question
How many bits do we need to encode uniquely each character in a
message made up of characters from an n-letter alphabet ?

Answer
 log n  bits at least.
Variable-length coding
Each character is assigned a different length. In a variable-length
code codewords may have different lengths.

Problem when decoding


When decoding, there is more than one possible message:
Message = aaabcabc. . .which is correct, or, Message = bcbcbcbc. .
.which is incorrect.
Consider now the following encoding scheme:

Decoding
When decoding, only one possible message:
Message = aaabcabc. . .which is the correct message.
Prefix-free codes
• What is a prefix?
• 00 is prefix of 001
• 110 is prefix of 110111
• 111101 is prefix of 111101001

• Prefix-free codes
• A prefix-free code is an encoding scheme where the code of a
character is not the prefix of any other character:
• Encoding scheme 1 in the previous example is not a prefix-free
code.
• Encoding scheme 2 in the previous example is a prefix-free
code.
• Prefix-Codes
Fixed-length codes are always uniquely decipherable.

• Prefix Code: A code is called a prefix (free) code if no


codeword is a prefix of another one.
Prefix Property
• Consider using bit strings of different lengths to encode letters.
Letters that occur more frequently should be encoded using short bit
strings, and longer bit strings should be used to encode rarely
occurring letters. When letters are encoded using varying numbers of
bits, some method must be used to determine where the bits for each
character start and end.

• For instance, if e were encoded with 0, a with 1, and t with 01, then

the bit string 0101


could correspond to:

eat, tea, eaea, or tt.

Good Luck 9
• Prefix-free codes: advantage
• A prefix-free code allows to decode the message uniquely:
the code represents only one possible message.
• This is the case in encoding scheme 2.

• Variable-length prefix-free code vs. Fixed-length code


• Compared to fixed-length codes, a variable-length prefix-
free code allows to obtain shorter codes for messages.

• Example
Consider Message = addd in the alphabet
{’a’, ’b’, ’c’, ’d’}
Huffman coding...
Objective
• Huffman coding is an algorithm used for lossless
data compression.

• By lossless, it is meant that the exact original data


can be recovered by decoding the compressed data.

Applications
• Several data compression softwares, WinZip, zip,
gzip,. . . Use lossless data encoding.

Data compression
Minimizing number of bits.
Huffman Coding
• Huffman codes can be used to compress
information
– Like WinZip – although WinZip doesn’t use the Huffman
algorithm
– JPEGs do use Huffman as part of their compression
process

• The basic idea is that instead of storing each


character in a file as an 8-bit ASCII value, we will
instead store the more frequently occurring
characters using fewer bits and less frequently
occurring characters using more bits
– On average this should decrease the filesize (usually12½)
Basic idea
• Each symbol, in the original data to be compressed
(for example, a character in a file), is assigned a code.

• The length of the code assigned to a symbol varies


from a symbol to another (variable-length coding).

• The length of the code assigned to a symbol depends


on the frequency (the number of times the symbol
appears in the original data) of the symbol.
• Symbols whose frequency is high (appear more often
in the message than others) are assigned a shorter
codes.
• Symbols whose frequency is low (appear less often in
the message than others) are assigned a longer codes.
Binary Encoding Tree
The unique path property of a tree can be used to set
up a code. Here we set up a binary code, that is, a code
with strings of 0s and 1s representing letters. (Recall
that when we represent numbers in binary form we use
only the symbols 0 and 1.)

To set up the binary code we use a special kind of


tree (called a binary tree) like that shown in the
figure. This tree is a directed graph (there are arrows
on the edges). The vertex at the top (with no arrows
pointing toward it) is called the root of the directed
tree; the vertices with no arrows pointing away from
them are called leaves of the directed tree.
We label each leaf with a letter we want to encode. The diagram shown provides an
encoding for only 8 letters, but we could easily draw a bigger binary tree with more leaves
to represent more letters.

We now write a 0 on each branch extending to the left and a 1 on each branch extending
to the right.

To show how the encoding works, let us write the word MAD using the code. Follow the
unique path from the root of the tree down to the appropriate leaf, noting in order the
labels on the edges. 14
Binary Encoding Tree

15
Binary Encoding Tree
We can easily translate this code using
our tree. Let us see how we could
decode 000011100. Referring to the
tree, we can see that there is only one
path from the root to a leaf that can
give rise to those first three 0s, and
that is the path leading to M. So we
can begin to separate the code word
into letters: 000-011100. Again
following down from the root, the path
011 leads us unambiguously to A. So we
have 000-011-100. The path 100 leads
unambiguously to D.
The reason we can translate the string
of 0s and 1s back to letters without
ambiguity is that no letter has a code
that is the start of the code for a
different letter. 16
Check Point
1. Use the binary encoding tree to write the binary code for each of the
following words: FEET, ANT
2. Use the encoding tree to find the word represented by each of the
following codes: 1000101001101 , 0001111001
3. Decode the following message (commas are inserted to show
separation of words). 00001010101101, 0000101, 011101,
1010101010111.

17
Huffman’s Algorithm for
Forming the Coding Tree
– Place each symbol in leaf
• Weight of leaf = symbol frequency
– Select two trees L and R (initially leafs)
• Such that L, R have lowest frequencies in tree
– Create new (internal) node
• Left child  L & Right child  R
• New frequency  frequency( L ) + frequency( R )
– Repeat until all nodes merged into one tree.

• Now we assign codes to the tree by placing a 0 on every left branch


and a 1 on every right branch.
• A traversal of the tree from root to leaf give the Huffman code
for that particular leaf character.
• Note that no code is the prefix of another code
18
Example 1.
• Consider the alphabet {’a’, ’b’, ’c’, ’d’, ’e’, ’f’, ’g’}
• Count the frequency of each character in the message (number
of times the character appears). For example:

meaning that ’a’ appears only once in the message, ’b’ appears
3 times in the message. . .
Example 1.
From the frequency table, build a forest of binary trees. Initially,
each tree in the forest contains only a root corresponding to a
character of the alphabet and its frequency (that we will call
weight):

Then, apply the following rules:

• Merge two trees with the smallest two frequencies, label left
edge from the root of the merged tree 0, and the right edge 1,

• the weight of the root of the merged tree is the sum of the
frequencies (weight) of its left and right children.

Remarks: it is a non-deterministic algorithm as there is no specified


rule to apply in case of identical frequencies.
Example 1.
Step One

Step Two
Example 1.

Step
Three

Step
Four
Example 1.

Step Five

Step Six
Example 1.

Step Seven
Example 1.

Step
Eight
Example 1.
The code of each character is obtained by concatenating the labels of the
edges on the path from the root to the node representing the character:

Character Code Number of Bits Weight


(Length)
a
b
c
d
e
f
g

Let fi be the frequency of a character and di the number of bits in the code of
that character:
 d f
7
i 1 i i
• The total number of bits required to encode the message M =
= 5·1+5·3+4·4+3·10+2·13+2·12+2·15 = 146 bits.
• We need 146 bits to encode the message with the given frequency table.
Example 2.
• As another example, lets take the string:
“duke blue devils”.

e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1

• We first to a frequency count of the


characters:
• e:3, d:2, u:2, l:2, space:2, k:1, b:1, v:1, i:1, s:1
• Next we use a Greedy ‫ جشع – طمع‬algorithm to
build up a Huffman Tree
– We start with nodes for each character
27
Example 2.
• We then pick the nodes with the smallest
frequency and combine them together to
form a new node.
– The selection of these nodes is the Greedy
part ‫ طمع‬- ‫جشع‬

• The two selected nodes are removed from the


set, but replace by the combined node.

• This continues until we have only 1 node left in


the set
28
Huffman Coding
The Given
Forest
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1

Initialization
Set the
repetition i,1 s,1 b,1 v,1 k,1 l,2 sp,2 d,2 u,2 e,3
in a Ascending
order

i,1 s,1
b,1 v,1 k,1 l,2 sp,2 d,2 u,2 e,3
Pick the two leaves
of least repetitions

Form a 2
subtree
of these
two leaves i,1 s,1 b,1 v,1 k,1 l,2 sp,2 d,2 u,2 e,3 29
Huffman Coding
Repeat the
repetition
2 l,2 sp,2 d,2 u,2 e,3
in a descending
order b,1 v,1 k,1 i,1 s,1

Form a new 2 2 l,2 sp,2 d,2 u,2 e,3


subtree
from the
new two b,1 v,1 k,1 i,1 s,1
least-leaves

Repeat the
repetition k,1 2 2 l,2 sp,2 d,2 u,2 e,3
in a Ascending
order b,1 v,1 i,1 s,1 30
Huffman Coding
Form a new 3 2 l,2 sp,2 d,2 u,2 e,3
subtree from the
new two least- k,1 2
i,1 s,1
repetitions
b,1 v,1

l,2 sp,2 2 d,2 u,2 3 e,3


Repeat the
Repetition in a k,1 2
i,1 s,1
Ascending order
b,1 v,1

Form a new 4 2 d,2 u,2 3 e,3


subtree from the
k,1 2
new two least- l,2 sp,2 i,1 s,1
repetitions
b,1 v,1 31
Huffman Coding
Repeat the e,3 3 4
d,2 u,2 2
Repetition in a
Ascending order k,1 2
i,1 s,1 l,2 sp,2

b,1 v,1

Form a new 4 2 e,3 3 4


subtree from the
new two least- k,1 2
d,2 u,2 i,1 s,1 l,2 sp,2
repetitions
b,1 v,1

Repeat the 2 3 e,3 4 4


Repetition in a
Ascending order k,1 2 l,2 sp,2
i,1 s,1 d,2 u,2

b,1 v,1 32
Huffman Coding
Form a new 5 e,3 4 4
subtree from the 2 3 d,2 u,2 l,2 sp,2
new two least-
repetitions i,1 s,1 k,1 2

b,1 v,1

e,3 4 4 5
Repeat the
d,2 u,2 l,2 sp,2 2 3
Repetition in a
Ascending order i,1 s,1 k,1 2

b,1 v,1
7 4 5
Form a new
l,2 sp,2 2 3
subtree from the e,3 4
new two least- i,1 s,1 k,1 2
repetitions d,2 u,2
33
b,1 v,1
Huffman Coding
Repeat the 4 5 7
Repetition in a
Ascending order 2 3
l,2 sp,2 e,3 4
i,1 s,1 k,1 2
d,2 u,2
b,1 v,1

9 7
Form a new
subtree from the e,3 4
new two least- 4 5
d,2 u,2
repetitions l,2 sp,2 2 3

i,1 s,1 k,1 2

b,1 v,1 34
Huffman Coding
7 9
Repeat the
Repetition in a e,3 4 4 5
Ascending order
d,2 u,2 l,2 sp,2 2 3

i,1 s,1 k,1 2


16
b,1 v,1

7 9
Form a new subtree
from the new two e,3 4 4 5
least-repetitions.
d,2 u,2 l,2 sp,2 2 3
This is the required
Coding Tree. i,1 s,1 k,1 2
35
b,1 v,1
Huffman Coding Tree e
d
00
010
16
u 011
7 9 l 100
sp 101
e,3 4 4 5 i 1100
s 1101
d,2 u,2 l,2 sp,2 2 3
k 1110
i,1 s,1 k,1 2 b 11110
v 11111
b,1 v,1
Thus, “duke blue devils” turns into:
010 011 1110 00 101 11110 100 011 00 101 010 00 11111 1100 100 1101  52 bits  7 bytes

When grouped into 8-bit bytes:


01001111 10001011 11101000 11001010 10001111 11100100 1101xxxx

Thus it takes 7 bytes of space compared to 16 characters * 1 byte/char = 16


bytes uncompressed occupying 16*8 = 128 bits.
Saved space = 72/128 = 0.5625 = 56.25 % 36
Example 3.
• Consider eight items with a b c d e f g h
frequencies 5, 1, 1, 7, 8, 2, 3, 6.
5 1 1 7 8 2 3 6

• Combines items according to the 2


tree on the left below, working
from the bottom up.
5 1 1 7 8 2 3 6
– First the two items of weight 1
combine to from one of weight 2.
4
2
– Now this and the original item
of weight 2 are the least likely
and combine to form an item of 5 1 1 7 8 2 3 6
weight 4. 37
Huffman coding
4
2 2

5 1 1 7 8 2 3 6 5 1 1 7 8 2 3 6

7 7 11

4 4

2 2

5 1 1 7 8 2 3 6 5 1 1 8 2
7 3 6
Graph Theory Ch. 2. Trees and Distance 38
Huffman coding
7 11

2
11 14

5 1 1 7 8 2 3 6
7

5 1 1 7 8 2 3 6

Graph Theory` Ch. 2. Trees and Distance 39


Huffman coding
11 14

14
19

7
11

2
7

4
5 1 1 7 8 2 3 6

5 1 1 7 8 2 3 6

Graph Theory` Ch. 2. Trees and Distance 40


Huffman coding 33

14 19

11

14
19
5 1 1 7 8 2 3 6 7 11
7 8
3
4 5
2 6
2
41
Graph Theory` 1 1
Huffman coding
14
19
7 11
7 8
3 0 1

4 5 0 1 0 1
2 6
2 0 1 d e 0 1
4
1 1 g
0 1
a h
0 1
f

b c
Character b c f g a h d e
Frequency 1 1 2 3 5 6 7 8
Code 00101 00100 0011 000 110 111 01 10

Length 5 5 4 3 3 3 2 2

42
Example 4.
Find the Huffman Coding Tree for:
A C E H I
3 5 8 2 7

Huffman Tree Construction 1: Re-arrangement

H A C I E
2 3 5 7 8
Huffman Tree Construction 2
H A
C I E
2 3

5 7 8
5

Huffman Tree Construction 3 Huffman Tree Construction 4

H A H A
2 3 2 3
C C
5 5 I E E 5 5
I
10 7 8 7 8 10
Huffman Tree Construction 5
H A H A
2 3 2 3
E C I E
I C
7 8 5 5 5 5 7 8

15 10 10 15

Huffman Tree Construction 6


H A
2 3 E = 11
C I E I = 10
0 1
C = 01
5 5 7 8
A = 001
0 1 0 1 H = 000
10 15
•Input: AHE
0 1
25 •Output: (001)(000)(11) = 00100011
Huffman Decoding
Algorithm
• Decoding
– Read compressed file & binary
tree.
– Use binary tree to decode file
•Follow path from root to
leaf.
H
Huffman Decoding 1, 2
A
2 3
C I E
0 1
5 5 7 8

0
10
1 0
15
1
1111001
0 1
25

H A
2 3

0
5
1
C
5
I
7
E
8
1111001
0 1 0
1

E
10 15

0 1
25
Huffman Decoding 3, 4
H A
2 3

0
5
1
C
5
I
7
E
8
1111001
0 1 0
1
15

E
10
0 1
25

H A
2 3

0
5
1
C
5
I
7
E
8
1111001
0 1 0
1
15

EE
10
0 1
25
H
Huffman Decoding 5, 6, 7
A
2 3

0
5
1
C
5
I
7
E
8
1111001
0 1 0
15 1
10
0
EE
25 1

H A
2 3
C I E
0
5
1
5 7 8 1111001
0 1 0
10 15 1
0
25 1
EE
H A
2 3
C I E
0
5
1
5 7 8 1111001
0 1 0
10 15 1
0
25 1
EEA
Home Work

A B C D E

20 15 5 15 45

You might also like