Huffman Coding
Huffman Coding
● Huffman Coding also called as Huffman Encoding is a famous greedy algorithm that is
used for the lossless compression of data.
● It uses variable length encoding where variable length codes are assigned to all the
characters depending on how frequently they occur in the given text.
● The character which occurs most frequently gets the smallest code and the character
which occurs least frequently gets the largest code.
Prefix Rule-
Step-01:
Create a leaf node for all the given characters containing the occurring frequency of
characters.
Step-02:
Arrange all the nodes in the increasing order of frequency value contained in the nodes.
Step-03:
Considering the first two nodes having minimum frequency, create a new internal node
having frequency equal to the sum of the two nodes frequencies and make the first node as
a left child and the other node as a right child of the newly created node.
Step-04:
Keep repeating Step-02 and Step-03 until all the nodes form a single tree.
After following all the above steps, our desired Huffman tree will be constructed.
Important Formulas-
Formula-01:
Formula-02:
Problem-01:
A file contains the following characters with the frequencies as shown. If Huffman coding is
used for data compression, determine-
Characters Frequencies
a 10
e 15
i 12
o 3
u 4
s 13
t 1
Solution-
First let us construct the Huffman tree using the steps we have learnt above-
Step-01:
Step-02:
Step-03:
Step-04:
Step-05:
Step-06:
Step-07:
After we have constructed the Huffman tree, we will assign weights to all the edges. Let us
assign weight ‘0’ to the left edges and weight ‘1’ to the right edges.
Note
● We can also assign weight ‘1’ to the left edges and weight ‘0’ to the right edges.
● The only thing to keep in mind is that we must follow the same convention at the time of
decoding which we adopted at the time of encoding.
After assigning weight ‘0’ to the left edges and weight ‘1’ to the right edges, we get-
1. Huffman code for the characters-
We will traverse the Huffman tree from the root node to all the leaf nodes one by one and
and will write the Huffman code for all the characters-
● a = 111
● e = 10
● i = 00
● o = 11001
● u = 1101
● s = 01
● t = 11000
We know,
Average code length
= ∑ ( frequencyi x code lengthi ) / ∑ ( frequencyi )
= { (10 x 3) + (15 x 2) + (12 x 2) + (3 x 5) + (4 x 4) + (13 x 2) + (1 x 5) } / (10 + 15 + 12 + 3 +
4 + 13 + 1)
= 2.52
We know-
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= 58 x 2.52
= 146.16 ≅ 147 bits