Algorithmics: Information Coding Techniques
Algorithmics: Information Coding Techniques
CT065-3.5-3
Module Code and Module Title Title of Slides Slide 3 (of 50)
Keywords
Huffmans algorithm
Prefix code
Optimal prefix code
Module Code and Module Title Title of Slides Slide 4 (of 50)
Data Compression
Reducing the amount of bits required
for data representation is known as
compression
Module Code and Module Title Title of Slides Slide 5 (of 50)
Data Compression
Compression consists of 2 phases:
Encoding phase (compressing)
Data is converted using an encoding scheme
Module Code and Module Title Title of Slides Slide 6 (of 50)
Encoding/Decoding
Will use message in generic sense to
mean the data to be compressed
15-853
Page6
Module Code and Module Title Title of Slides
Data Compression
Purpose:
To reduce the size of files stored on disk
(i.e. in effect increasing the capacity of
the disk)
To increase the effective rate of data
transmission (by transmitting less data)
Module Code and Module Title Title of Slides Slide 7 (of 50)
Fixed Length Encoding
Concepts:
A fixed number of bits are used to represent
each character in the encoding scheme
For example, a 3-bit code length would be
required to uniquely represent 8 different
characters
The most frequently occurring characters have
short codes
Module Code and Module Title Title of Slides Slide 8 (of 50)
Fixed Length Encoding - Example
Table 1 - A fixed length encoding scheme
Module Code and Module Title Title of Slides Slide 9 (of 50)
Fixed Length Encoding
Prefix Code Tree
The codes in Table 1 can be represented by the binary tree
below:
0 1 0 1
0 1 0 1 0 1 0
a e i s t sp nl
Module Code and Module Title Title of Slides Slide 10 (of 50)
Fixed Length Encoding
Prefix Code Tree
If we further impose the condition that the tree is to be a
strictly binary tree (i.e. all nodes are either leaves or have
2 children), then the codes could be represented with
somewhat less bits, as shown below:
0 1 0 1
nl
0 1 0 1 0 1
a e i s t sp
Module Code and Module Title Title of Slides Slide 11 (of 50)
Variable Length Encoding
A more efficient encoding scheme
compared to fixed length encoding
would be one that:
Allows the code lengths(previous
example for fixed is 3) to vary from
character to character, with the most
frequently occurring characters having
short codes
Module Code and Module Title Title of Slides Slide 14 (of 50)
Huffman Encoding
The Huffman encoding algorithm was
created in 1952 and is named after its
inventor, David Huffman
It is a loss less encoding algorithm that is
ideal for compressing text or program files
The algorithm uses variable length codes
Module Code and Module Title Title of Slides Slide 15 (of 50)
Huffman Coding
Module Code and Module Title Title of Slides Slide 16 (of 50)
Huffman Encoding - Example
Data:
ACDABA
Module Code and Module Title Title of Slides Slide 17 (of 50)
Huffman Encoding - Example
Data:
ACDABA
Module Code and Module Title Title of Slides Slide 18 (of 50)
Huffman Algorithm
How was the prefix code tree
constructed in the previous example?
Huffmans encoding algorithm constructs
an optimal prefix code tree by repeatedly
merging the two trees with the least
weight
Module Code and Module Title Title of Slides Slide 20 (of 50)
Huffman Algorithm
Given a character A with frequency, f. The Huffman tree is
constructed using a priority queue, Q, of nodes, with
frequencies as keys.
Module Code and Module Title Title of Slides Slide 21 (of 50)
Huffman Algorithm
Assume the number of characters to be
encoded is C
Huffman algorithm can be described as
follows:
Maintain a forest of trees (each tree
represents one character)
The weight of a tree = sum of frequencies
of its leaves
Module Code and Module Title Title of Slides Slide 23 (of 50)
Huffman Algorithm
Module Code and Module Title Title of Slides Slide 24 (of 50)
Huffman Algorithm - Example
Module Code and Module Title Title of Slides Slide 25 (of 50)
Huffman Algorithm - Example
Maintain a forest with the each character
representing a tree within the forest and its
frequency within the text representing the
weight of the tree:
A B C D
3 1 1 1
Module Code and Module Title Title of Slides Slide 26 (of 50)
Huffman Algorithm - Example
Merge the two trees with the smallest
weights 2
A B
3 1
C D
1 1
Module Code and Module Title Title of Slides Slide 27 (of 50)
Huffman Algorithm - Example
B 2
1
A C D
3 1 1
Module Code and Module Title Title of Slides Slide 28 (of 50)
Huffman Algorithm - Example
Merge the last two trees to produce the
optimal prefix code tree:
0 6 1
A 3
0 1
3
A=0
B 2
1 0 1
B = 10
C D
C = 110 1 1 D = 111
Module Code and Module Title Title of Slides Slide 29 (of 50)
Huffman Encoding - Example
Data: ACDABA
Codes: A=0, B=10, C=110, D=111
If these codes are used to compress the file,
the compressed data would look like this:
01101110100
This means that 11 bits are used instead of
48, a compression ratio of 4 to 1 for this
particular file
Module Code and Module Title Title of Slides Slide 19 (of 50)
Huffman Coding Example 2
As an example, lets take the string:
duke blue devils
We first to a frequency count of the characters:
e:3, d:2, u:2, l:2, space:2, k:1, b:1, v:1, i:1, s:1
Next we use a Greedy algorithm to build up a
Huffman Tree
We start with nodes for each character
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
i,1 s,1
b,1 v,1
b,1 v,1
e,3 4 4 3 2
b,1 v,1
e,3 4 4 5
b,1 v,1
7 4 5
b,1 v,1
7 9
e,3 4 4 5
b,1 v,1
16
7 9
e,3 4 4 5
b,1 v,1
Module Code and Module Title Title of Slides Slide 30 (of 50)