Source Coding
Source Coding
Let Lmin denote the minimum possible value of L. We then define the
coding efficiency of the source encoder as
These methods are used to increase average information per bit for realizing
maximum coding efficiency.
These source coding methods also achieve Data compression.
Prefix Coding
A Prefix code is defined as a code in which no codeword is prefix
of any other codeword.
Since H(s) is < 3, Average codeword length can be < 3 but should be > 2.121 bits
ii) Huffman Code/Tree construction
1 Root
0
0 1 A
0 1 B
C
1
Root Symbol Probability Code word
0 1 s0 0.4 00
A B s1 0.2 10
0 1 0 1
s2 0.2 11
s0 C s1 s2 s3 0.1 010
0 1
s4 0.1 011
s3 s4
Symbol Probability Huffman Code Codeword without
word source coding
s0 0.4 00 000
s1 0.2 10 001
s2 0.2 11 010
s3 0.1 010 011
s4 0.1 011 100
iv) Variance of the average codeword length L’ or over the ensemble of source
symbols
s1 0.2 01
The average codeword length
s2 0.2 000
= 0.4 (1) + 0.2 (2) + 0.2 (3) + 0.1 (4) + 0.1 (4) s3 0.1 0010
= 2.2 bits/symbol s4 0.1 0011
Hufman Code Decoding
If the encoded sequence is 100111100 find the original signal.
(Decode the encoded sequence 100111100) Symbol Probability Code word
100111100 s0 0.4 00
s1 0.2 10
s2 0.2 11
s3 0.1 010
s4 0.1 011
Shannon–Fano coding
Shannon–Fano coding, is a technique for constructing a prefix code based on a set of
symbols and their probabilities (estimated or measured).
It is suboptimal in the sense that it does not achieve the lowest possible expected code word
length like Huffman coding.
The Shannon–Fano encoding algorithm proceeds as follows:
1. The symbols are arranged in order from most probable to least probable.
2. They then are divided into two sets whose total probabilities are as close as possible to
being equal.
3. All symbols then have the first digits of their codes assigned; symbols in the first set
receive "0" and symbols in the second set receive "1".
4. As long as any sets with more than one member remain, the same process is repeated
on those sets, to determine successive digits of their codes.
5. When a set has been reduced to one symbol this means the symbol's code is complete
and will not form the prefix of any other symbol's code.
Example:
There 5 different source symbols. Suppose 39 total symbols have been observed with
the following frequencies.
Symbol A B C D E
Count 15 7 6 6 5
Solution:
i) First compute the probability of each symbol and arrange in decreasing order of probabilities.
Symbol A B C D E
Count 15 7 6 6 5
Probabilities 15/39 = 0.385 0.179 0.154 0.154 0.128
First Second Third
Symbol Probabilities Codewords
division division division
A 0 00
0.385 0
B 1 01
0.179
C 0 10
0.154 Root
D 1 0 110
0.154 1
0
E 1 111 1
0.128
0 0 1
1
00
A
B C 0
01 1
10
110
D 111 E
iii) Entropy =2.1852 bits/symbol
= 0.385 (2) + 0.179 (2) + 0.154 (2) + 0.154 (3) + 0.128 (3)
= 2.282 bits/symbol