0% found this document useful (0 votes)
23 views

Source Coding

Uploaded by

anitasjadhav
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Source Coding

Uploaded by

anitasjadhav
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Source Encoder

Properties of a source Encoder:


1. It should represent the discrete source efficiently.
2. The codewords produced by the encoder are in binary
form.
3. The source code is uniquely decodable, so that the original
source sequence can be reconstructed perfectly from the
encoded binary sequence.
The source has an alphabet with K different symbols and that
the kth symbol sk occurs with probability pk, k = 0, 1, …. K–1. Let
the binary codeword assigned to symbol sk by the encoder
have length lk, measured in bits. We define the average
codeword length of the source encoder as

Let Lmin denote the minimum possible value of L. We then define the
coding efficiency of the source encoder as

Lmin is determined by Shannon’s First Theorem.


Source-coding Theorem

Minimum number of bits required to represent a DMS i.e. Lmin is


determined by Shannon’s first theorem: the source-coding theorem,
which may be stated as follows:
Given a discrete memoryless source whose output is denoted by the
random variable S the entropy H(S) imposes the following bound on
the average codeword length for any source encoding scheme:

Setting Lmin = H(S), in we get


Some Source coding Techniques: (variable length)

1. Shannon Fano coding algorithm


2. Huffman coding algorithm
3. Lempel–Ziv coding algorithm

 These methods are used to increase average information per bit for realizing
maximum coding efficiency.
 These source coding methods also achieve Data compression.
Prefix Coding
A Prefix code is defined as a code in which no codeword is prefix
of any other codeword.

1. The Prefix Code is variable length source coding scheme where


no code is the prefix of any other code.

2. The prefix code is a uniquely decodable code as the end of the


code word is always recognizable.

3. But, the converse is not true i.e., all uniquely decodable


codes may not be prefix codes.
Huffman Coding
 Huffman codes are an important class of prefix codes.
 It is an optimal prefix code for a given distribution, i.e. it has the shortest expected
length.

 The Huffman encoding algorithm proceeds as follows:


1. The source symbols are listed in order of decreasing probability. The two source
symbols of lowest probability are assigned 0 and 1. This part of the step is referred
to as the splitting stage.
2. These two source symbols are then combined into a new source symbol with
probability equal to the sum of the two original probabilities The
probability of the new symbol is placed in the list in accordance with its value.
3. The procedure is repeated until we are left with a final list of source statistics
(symbols) of only two for which the symbols 0 and 1 are assigned.
4. The code for each (original) source is found by working backward and tracing the
sequence of 0s and 1s assigned to that symbol as well as its successors.
Example 1:
Consider the five symbols (s0, s1, s2, s3 and s4) of the alphabet of a discrete memoryless source
and their probabilities 0.4, 0.2, 0.2, 0.1 and 0.1 respectively.
i) Can this source be encoded so that average code word length is less than
three bits? Give reasons for your answer.
ii) Devise Huffman code for this source.
iii) Determine coding efficiency of the code designed.
iv) Find Redundancy and Variance of the code.
Ans:
i) According to Shannon’s first Thm
= 2.121 bits/symbol

Since H(s) is < 3, Average codeword length can be < 3 but should be > 2.121 bits
ii) Huffman Code/Tree construction

1 Root
0
0 1 A

0 1 B
C
1
Root Symbol Probability Code word
0 1 s0 0.4 00
A B s1 0.2 10
0 1 0 1
s2 0.2 11
s0 C s1 s2 s3 0.1 010
0 1
s4 0.1 011
s3 s4
Symbol Probability Huffman Code Codeword without
word source coding
s0 0.4 00 000

s1 0.2 10 001
s2 0.2 11 010
s3 0.1 010 011
s4 0.1 011 100

A message “s0s4s1s0s2s3 “ is transmitted as


0 0 0 1 1 1 0 0 0 1 1 0 1 0 (14 bits)

If source coding is not used, we require minimum 3 bits to code 5 symbols


And the transmitted message is
0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 1 (18 bits)
iii) The average codeword length = 0.4 (2) + 0.2 (2) + 0.2 (2) + 0.1 (3) + 0.1 (3)
= 2.2 bits/symbol

Coding Efficiency = = 96.41%

iv) Variance of the average codeword length L’ or over the ensemble of source
symbols

Redundancy is measure of efficiency of a code and is defined as the difference between


Average Length and Entropy.
Important to Note:
 Huffman encoding process (i.e., the Huffman tree) is not unique.
 There are two variations:
1. First, at each splitting stage in the construction of a Huffman code, there is
arbitrariness in the way the symbols 0 and 1 are assigned to the last two source symbols.
Whichever way the assignments are made, however, the resulting differences are trivial.
2. Second, ambiguity arises when the probability of a combined symbol (obtained by
adding the last two probabilities pertinent to a particular step) is found to equal another
probability in the list.
The probability of the new symbol can be placed as high as possible OR as low as possible
and consistently adhere to the same throughout the encoding process.
 When the probability of the new symbol is placed as high as possible, the resulting
Huffman code has a significantly smaller variance than when it is moved as low as
possible. Hence it is called Minimum variance Huffman Code.
The probability of the new symbol is placed as low as possible

Symbol Probability Code word


s0 0.4 1

s1 0.2 01
The average codeword length
s2 0.2 000

= 0.4 (1) + 0.2 (2) + 0.2 (3) + 0.1 (4) + 0.1 (4) s3 0.1 0010
= 2.2 bits/symbol s4 0.1 0011
Hufman Code Decoding
If the encoded sequence is 100111100 find the original signal.
(Decode the encoded sequence 100111100) Symbol Probability Code word

100111100 s0 0.4 00

s1 0.2 10
s2 0.2 11
s3 0.1 010
s4 0.1 011
Shannon–Fano coding
Shannon–Fano coding, is a technique for constructing a prefix code based on a set of
symbols and their probabilities (estimated or measured).
It is suboptimal in the sense that it does not achieve the lowest possible expected code word
length like Huffman coding.
 The Shannon–Fano encoding algorithm proceeds as follows:
1. The symbols are arranged in order from most probable to least probable.
2. They then are divided into two sets whose total probabilities are as close as possible to
being equal.
3. All symbols then have the first digits of their codes assigned; symbols in the first set
receive "0" and symbols in the second set receive "1".
4. As long as any sets with more than one member remain, the same process is repeated
on those sets, to determine successive digits of their codes.
5. When a set has been reduced to one symbol this means the symbol's code is complete
and will not form the prefix of any other symbol's code.
Example:
There 5 different source symbols. Suppose 39 total symbols have been observed with
the following frequencies.
Symbol A B C D E
Count 15 7 6 6 5

i. Construct a Shannon–Fano code for this small alphabet.


ii. Determine coding efficiency of the code designed.
iii. Find Redundancy of the code.

Solution:
i) First compute the probability of each symbol and arrange in decreasing order of probabilities.

Symbol A B C D E
Count 15 7 6 6 5
Probabilities 15/39 = 0.385 0.179 0.154 0.154 0.128
First Second Third
Symbol Probabilities Codewords
division division division

A 0 00
0.385 0
B 1 01
0.179
C 0 10
0.154 Root
D 1 0 110
0.154 1
0
E 1 111 1
0.128

0 0 1
1

00
A
B C 0
01 1
10

110
D 111 E
iii) Entropy =2.1852 bits/symbol

The average codeword length

= 0.385 (2) + 0.179 (2) + 0.154 (2) + 0.154 (3) + 0.128 (3)
= 2.282 bits/symbol

Coding Efficiency = = 95.76%

Redundancy is measure of efficiency of a code and is defined as the difference between


Average Length and Entropy.
= L – H(S) = 0.0968bits/symbol

You might also like