0% found this document useful (0 votes)
63 views

Algorithmics: Information Coding Techniques

The document discusses Huffman coding, an algorithm for data compression. Huffman coding assigns variable-length codes to input characters, symbols, or values, where the length of each assigned code is inversely proportional to the frequency of the character it represents. It creates a prefix code tree by repeatedly merging the two nodes with the lowest frequencies until only one tree remains. This allows more frequent characters to be represented using fewer bits than less common characters, achieving optimal compression. The document provides an example of how Huffman coding can compress the string "ACDABA" into a more efficient encoding than a fixed-length scheme.

Uploaded by

thendral
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Algorithmics: Information Coding Techniques

The document discusses Huffman coding, an algorithm for data compression. Huffman coding assigns variable-length codes to input characters, symbols, or values, where the length of each assigned code is inversely proportional to the frequency of the character it represents. It creates a prefix code tree by repeatedly merging the two nodes with the lowest frequencies until only one tree remains. This allows more frequent characters to be represented using fewer bits than less common characters, achieving optimal compression. The document provides an example of how Huffman coding can compress the string "ACDABA" into a more efficient encoding than a fixed-length scheme.

Uploaded by

thendral
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 44

Algorithmics

CT065-3.5-3

Information Coding Techniques


Level 3 Computing (Software Engineering)
Learning Outcomes
By the end of this lesson you should be
able to:
Briefly explain the Information Coding
Techniques algorithms in the following
areas
Huffman Coding

Module Code and Module Title Title of Slides Slide 3 (of 50)
Keywords
Huffmans algorithm
Prefix code
Optimal prefix code

Module Code and Module Title Title of Slides Slide 4 (of 50)
Data Compression
Reducing the amount of bits required
for data representation is known as
compression

In text compression, each character


within a file is encoded using a certain
number of bits (usually less than 8 bits
for each character)

Module Code and Module Title Title of Slides Slide 5 (of 50)
Data Compression
Compression consists of 2 phases:
Encoding phase (compressing)
Data is converted using an encoding scheme

Decoding phase (decompressing)


Data is decoded to its original form using the
same scheme that used to encode it

Module Code and Module Title Title of Slides Slide 6 (of 50)
Encoding/Decoding
Will use message in generic sense to
mean the data to be compressed

Input Compressed Output


Message Encoder Message Decoder Message

The encoder and decoder need to understand


common compressed format.

15-853
Page6
Module Code and Module Title Title of Slides
Data Compression
Purpose:
To reduce the size of files stored on disk
(i.e. in effect increasing the capacity of
the disk)
To increase the effective rate of data
transmission (by transmitting less data)

Module Code and Module Title Title of Slides Slide 7 (of 50)
Fixed Length Encoding
Concepts:
A fixed number of bits are used to represent
each character in the encoding scheme
For example, a 3-bit code length would be
required to uniquely represent 8 different
characters
The most frequently occurring characters have
short codes

Module Code and Module Title Title of Slides Slide 8 (of 50)
Fixed Length Encoding - Example
Table 1 - A fixed length encoding scheme

Character Code Frequency Total Bits


a 000 10 3x10 = 30
e 001 15 3x15 = 45
i 010 12 3x12 = 36
s 011 3 3x3 = 9
t 100 4 3x4 = 12
sp (blank space) 101 13 3x13 = 39
nl (new line) 110 1 3x1 = 3

Total bits required for encoding 174

Module Code and Module Title Title of Slides Slide 9 (of 50)
Fixed Length Encoding
Prefix Code Tree
The codes in Table 1 can be represented by the binary tree
below:

0 1 Total : 174 bits

0 1 0 1

0 1 0 1 0 1 0

a e i s t sp nl

All characters are only stored on leaf nodes.


In the tree above, a left branch represents 0 and a right branch
represents 1. The path to a node indicates its representation.

Module Code and Module Title Title of Slides Slide 10 (of 50)
Fixed Length Encoding
Prefix Code Tree
If we further impose the condition that the tree is to be a
strictly binary tree (i.e. all nodes are either leaves or have
2 children), then the codes could be represented with
somewhat less bits, as shown below:

Total : 173 bits


0 1

0 1 0 1
nl

0 1 0 1 0 1

a e i s t sp

Module Code and Module Title Title of Slides Slide 11 (of 50)
Variable Length Encoding
A more efficient encoding scheme
compared to fixed length encoding
would be one that:
Allows the code lengths(previous
example for fixed is 3) to vary from
character to character, with the most
frequently occurring characters having
short codes

Module Code and Module Title Title of Slides Slide 14 (of 50)
Huffman Encoding
The Huffman encoding algorithm was
created in 1952 and is named after its
inventor, David Huffman
It is a loss less encoding algorithm that is
ideal for compressing text or program files
The algorithm uses variable length codes

Module Code and Module Title Title of Slides Slide 15 (of 50)
Huffman Coding

Huffman codes can be used to compress information


Like WinZip although WinZip doesnt use the
Huffman algorithm
JPEGs do use Huffman as part of their
compression process

The basic idea is that instead of storing each


character in a file as an 8-bit ASCII value, we will
instead store the more frequently occurring characters
using fewer bits and less frequently occurring
characters using more bits
On average this should decrease the filesize
(usually )
Module Code and Module Title Title of Slides
Huffman Encoding - Example
Assume that we want to compress the
following piece of data using Huffman
encoding:
ACDABA
Since there are 6 characters, this
uncompressed text is 6 bytes or 48
bits long

Module Code and Module Title Title of Slides Slide 16 (of 50)
Huffman Encoding - Example
Data:
ACDABA

With Huffman encoding, the file is searched


for the most frequently appearing symbols
(in this case the character A occurs 3
times) and then a prefix code tree is built
that replaces the symbols by shorter bit
sequences.

Module Code and Module Title Title of Slides Slide 17 (of 50)
Huffman Encoding - Example
Data:
ACDABA

In this particular case, the algorithm


would use the following encoding
table: A=0, B=10, C=110, D=111.

Module Code and Module Title Title of Slides Slide 18 (of 50)
Huffman Algorithm
How was the prefix code tree
constructed in the previous example?
Huffmans encoding algorithm constructs
an optimal prefix code tree by repeatedly
merging the two trees with the least
weight

Module Code and Module Title Title of Slides Slide 20 (of 50)
Huffman Algorithm
Given a character A with frequency, f. The Huffman tree is
constructed using a priority queue, Q, of nodes, with
frequencies as keys.

Module Code and Module Title Title of Slides Slide 21 (of 50)
Huffman Algorithm
Assume the number of characters to be
encoded is C
Huffman algorithm can be described as
follows:
Maintain a forest of trees (each tree
represents one character)
The weight of a tree = sum of frequencies
of its leaves

Module Code and Module Title Title of Slides Slide 23 (of 50)
Huffman Algorithm

At the beginning of the algorithm, there


are C single-node trees

At the end of the algorithm there is one


tree, and this is an optimal Huffman tree

Module Code and Module Title Title of Slides Slide 24 (of 50)
Huffman Algorithm - Example

Let us construct an optimal prefix


code tree for the following text:
ACDABA

Module Code and Module Title Title of Slides Slide 25 (of 50)
Huffman Algorithm - Example
Maintain a forest with the each character
representing a tree within the forest and its
frequency within the text representing the
weight of the tree:

A B C D
3 1 1 1

Module Code and Module Title Title of Slides Slide 26 (of 50)
Huffman Algorithm - Example
Merge the two trees with the smallest
weights 2
A B
3 1
C D
1 1

Note: We chose to merge trees C and D, in fact we could have


chosen any two out of B, C and D. The trees are
arbitrarily merged as either the right or left subtree.

Module Code and Module Title Title of Slides Slide 27 (of 50)
Huffman Algorithm - Example

Merge the next two trees with the smallest


weights: 3

B 2
1

A C D
3 1 1
Module Code and Module Title Title of Slides Slide 28 (of 50)
Huffman Algorithm - Example
Merge the last two trees to produce the
optimal prefix code tree:
0 6 1

A 3
0 1
3
A=0
B 2
1 0 1

B = 10
C D
C = 110 1 1 D = 111
Module Code and Module Title Title of Slides Slide 29 (of 50)
Huffman Encoding - Example
Data: ACDABA
Codes: A=0, B=10, C=110, D=111
If these codes are used to compress the file,
the compressed data would look like this:
01101110100
This means that 11 bits are used instead of
48, a compression ratio of 4 to 1 for this
particular file

Module Code and Module Title Title of Slides Slide 19 (of 50)
Huffman Coding Example 2
As an example, lets take the string:
duke blue devils
We first to a frequency count of the characters:
e:3, d:2, u:2, l:2, space:2, k:1, b:1, v:1, i:1, s:1
Next we use a Greedy algorithm to build up a
Huffman Tree
We start with nodes for each character

e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1

Module Code and Module Title Title of Slides


Huffman Coding

We then pick the nodes with the smallest


frequency and combine them together to
form a new node
The selection of these nodes is the Greedy
part
The two selected nodes are removed from
the set, but replace by the combined node
This continues until we have only 1 node
left in the set

Module Code and Module Title Title of Slides


Huffman Coding

e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1

Module Code and Module Title Title of Slides


Huffman Coding

e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 2

i,1 s,1

Module Code and Module Title Title of Slides


Huffman Coding

e,3 d,2 u,2 l,2 sp,2 k,1 2 2

b,1 v,1 i,1 s,1

Module Code and Module Title Title of Slides


Huffman Coding

e,3 d,2 u,2 l,2 sp,2 3 2

k,1 2 i,1 s,1

b,1 v,1

Module Code and Module Title Title of Slides


Huffman Coding

e,3 d,2 u,2 4 3 2

l,2 sp,2 k,1 2 i,1 s,1

b,1 v,1

Module Code and Module Title Title of Slides


Huffman Coding

e,3 4 4 3 2

d,2 u,2 l,2 sp,2 k,1 2 i,1 s,1

b,1 v,1

Module Code and Module Title Title of Slides


Huffman Coding

e,3 4 4 5

d,2 u,2 l,2 sp,2 2 3

i,1 s,1 k,1 2

b,1 v,1

Module Code and Module Title Title of Slides


Huffman Coding

7 4 5

e,3 4 l,2 sp,2 2 3

d,2 u,2 i,1 s,1 k,1 2

b,1 v,1

Module Code and Module Title Title of Slides


Huffman Coding

7 9

e,3 4 4 5

d,2 u,2 l,2 sp,2 2 3

i,1 s,1 k,1 2

b,1 v,1

Module Code and Module Title Title of Slides


Huffman Coding

16

7 9

e,3 4 4 5

d,2 u,2 l,2 sp,2 2 3

i,1 s,1 k,1 2

b,1 v,1

Module Code and Module Title Title of Slides


Huffman Coding

Now we assign codes to the tree by


placing a 0 on every left branch and a 1 on
every right branch
A traversal of the tree from root to leaf give
the Huffman code for that particular leaf
character
Note that no code is the prefix of another
code

Module Code and Module Title Title of Slides


Huffman Coding
16 e 00
d 010
7 9 u 011
l 100
e,3 4 4 5
s 101
p
d,2 u,2 l,2 sp,2 2 3 i 1100
s 1101
i,1 s,1 k,1 2
k 1110
b 11110
b,1 v,1
v 11111

Module Code and Module Title Title of Slides


Huffman Coding

These codes are then used to encode the


string
Thus, duke blue devils turns into:
010 011 1110 00 101 11110 100 011 00 101 010 00 11111 1100 100 1101

When grouped into 8-bit bytes:


01001111 10001011 11101000 11001010 10001111 11100100 1101xxxx
Thus it takes only 7 bytes of space compared to 16
characters * 1 byte/char = 16 bytes uncompressed

Module Code and Module Title Title of Slides


Huffman Coding
Uncompressing works by reading in the file bit by
bit
Start at the root of the tree
If a 0 is read, head left
If a 1 is read, head right
When a leaf is reached decode that character and
start over again at the root of the tree

Module Code and Module Title Title of Slides


Huffman Encoding -
Applications
Huffman encoding is mainly used in
compression programs like pkZIP, lha,
gz, zoo, and arj

It is also used within JPEG and MPEG


compressions

Module Code and Module Title Title of Slides Slide 30 (of 50)

You might also like