0% found this document useful (0 votes)
84 views

Huffman Coding

Huffman coding is a lossless data compression algorithm that uses variable-length codes to encode characters based on their frequency of occurrence. It assigns smaller codes to more frequent characters and larger codes to less frequent characters. It also follows the prefix rule to avoid ambiguities when decoding, ensuring that no code is a prefix of another. To construct the Huffman tree, nodes are created for each unique character and arranged by frequency, then merged together from lowest frequencies up until a single tree remains. Codes are then assigned by traversing the tree and assigning 0 for left branches and 1 for right.

Uploaded by

Ghoul FTW
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views

Huffman Coding

Huffman coding is a lossless data compression algorithm that uses variable-length codes to encode characters based on their frequency of occurrence. It assigns smaller codes to more frequent characters and larger codes to less frequent characters. It also follows the prefix rule to avoid ambiguities when decoding, ensuring that no code is a prefix of another. To construct the Huffman tree, nodes are created for each unique character and arranged by frequency, then merged together from lowest frequencies up until a single tree remains. Codes are then assigned by traversing the tree and assigning 0 for left branches and 1 for right.

Uploaded by

Ghoul FTW
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Huffman Coding-

● Huffman Coding also called as Huffman Encoding is a famous greedy algorithm that is
used for the lossless compression of data.
● It uses variable length encoding where variable length codes are assigned to all the
characters depending on how frequently they occur in the given text.
● The character which occurs most frequently gets the smallest code and the character
which occurs least frequently gets the largest code.

Prefix Rule-

● To prevent ambiguities while decoding, Huffman coding implements a rule known as a


prefix rule which ensures that the code assigned to any character is not a prefix of the
code assigned to any other character.

Major Steps in Huffman Coding-

There are two major steps in Huffman coding-

1. Building a Huffman tree from the input characters


2. Assigning codes to the characters by traversing the Huffman tree

Steps to construct Huffman Tree-

Step-01:

Create a leaf node for all the given characters containing the occurring frequency of
characters.

Step-02:

Arrange all the nodes in the increasing order of frequency value contained in the nodes.
Step-03:

Considering the first two nodes having minimum frequency, create a new internal node
having frequency equal to the sum of the two nodes frequencies and make the first node as
a left child and the other node as a right child of the newly created node.

Step-04:

Keep repeating Step-02 and Step-03 until all the nodes form a single tree.

After following all the above steps, our desired Huffman tree will be constructed.

Important Formulas-

Formula-01:

Formula-02:

Total number of bits in Huffman encoded message


= Total number of characters in the message x Average code length per character
= ∑ ( frequencyi x Code lengthi )
PRACTICE PROBLEMS BASED ON HUFFMAN
CODING-

Problem-01:

A file contains the following characters with the frequencies as shown. If Huffman coding is
used for data compression, determine-

1. Huffman code for each character


2. Average code length
3. Length of Huffman encoded message (in bits)

Characters Frequencies

a 10

e 15

i 12

o 3

u 4

s 13

t 1
Solution-

First let us construct the Huffman tree using the steps we have learnt above-

Step-01:

Step-02:

Step-03:
Step-04:

Step-05:
Step-06:
Step-07:
After we have constructed the Huffman tree, we will assign weights to all the edges. Let us
assign weight ‘0’ to the left edges and weight ‘1’ to the right edges.

Note
● We can also assign weight ‘1’ to the left edges and weight ‘0’ to the right edges.
● The only thing to keep in mind is that we must follow the same convention at the time of
decoding which we adopted at the time of encoding.

After assigning weight ‘0’ to the left edges and weight ‘1’ to the right edges, we get-
1. Huffman code for the characters-

We will traverse the Huffman tree from the root node to all the leaf nodes one by one and
and will write the Huffman code for all the characters-

● a = 111
● e = 10
● i = 00
● o = 11001
● u = 1101
● s = 01
● t = 11000

From here, we can observe-


● Characters occurring less frequently in the text are assigned the larger codes.
● Characters occurring more frequently in the text are assigned the smaller codes.

2. Average code length-

We know,
Average code length
= ∑ ( frequencyi x code lengthi ) / ∑ ( frequencyi )
= { (10 x 3) + (15 x 2) + (12 x 2) + (3 x 5) + (4 x 4) + (13 x 2) + (1 x 5) } / (10 + 15 + 12 + 3 +
4 + 13 + 1)
= 2.52

3. Length of Huffman encoded message-

We know-
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= 58 x 2.52
= 146.16 ≅ 147 bits

You might also like