0% found this document useful (0 votes)
5 views

Data Compression Unit-2

Uploaded by

Nishita Rani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data Compression Unit-2

Uploaded by

Nishita Rani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Data Compression

Huffman Coding:
• This technique was developed by David Huffman as
part of a class assignment.
• The class was the first ever in the area of information
theory and was taught by Robert Fano at MIT.
• The codes generated using this technique or
procedure are called Huffman codes.
• These codes are prefix codes and are optimum for a
given model (set of probabilities).
• It is a famous algorithm used for lossless data
encoding.
Data Compression
Huffman Coding:
The Huffman procedure is based on two
observations regarding optimum prefix codes.
1. In an optimum code, symbols that occur more
frequently (have a higher probability of occurrence) will
have shorter codeword's than symbols that occur less
frequently.
2. In an optimum code, the two symbols that occur least
frequently will have the same length.
Data Compression
Huffman Coding:
Example: Design a Huffman code for the given
below letters.
LETTER PROBABILITIES

a1 0.2

a2 0.4

a3 0.2

a4 0.1

a5 0.1
Data Compression
Huffman Coding:
Example: step1:
LETTER PROBABILITIES LETTER PROBABILITIES

a1 0.2 a2 0.4

a2 0.4 Arrange letters a1 0.2


according to
a3 0.2 descending a3 0.2
order of
a4 0.1 probability . a4 0.1

a5 0.1 a5 0.1
Data Compression
Huffman Coding:
Example: step1:
a2 (0.4) a2 (0.4) a2 (0.4) a’’3 (0.6) 0
a’’’3 (1)
a’3 (0.4) 0 a2 (0.4) 1
a1 (0.2) a1 (0.2) a’’3 (0.6)
a1 (0.2) 1
a3 (0.2) a3 (0.2)
0

a4 (0.1) 0 a’3 (0.4)


1 LETTER PROBABILITIES Codewords
a’4 (0.2)
1 a1 0.2 01
a5 (0.1)
a2 0.4 1
a3 0.2 000
a4 0.1 0010
a5 0.1 0011
Data Compression
Huffman Coding:
The codes of letters are given below, Now we have to find the
Average length.

LETTER PROBABILITIES Codewords


a1 0.2 01
a2 0.4 1
a3 0.2 000
a4 0.1 0010
a5 0.1 0011

Average length= 0.2 * 2 + 0.4 * 1 + 0.2 * 3 + 0.1 * 4 + 0.1*4


= 2.2 bits
Data Compression
Huffman Coding: (Minimum Variance Huffman
Coding)
By performing the sorting procedure in a slightly
different manner, we could have found a different
Huffman code.
In the first re-sort, we could place a’4 higher in the
list.
Data Compression
Huffman Coding: (Minimum Variance Huffman
Coding)
Example: step1:
LETTER PROBABILITIES LETTER PROBABILITIES

a1 0.2 a2 0.4
Arrange letters
a2 0.4 according to a1 0.2
descending
a3 0.2 order of a3 0.2
probability .
a4 0.1 a4 0.1

a5 0.1 a5 0.1
Data Compression
Huffman Coding:
Example: step1:
a2 (0.4) a2 (0.4) a’1 (0.4) a’2 (0.6) 0
a’’2 (1)
a’4 (0.2) a2 (0.4) 0 a’1 (0.4) 1
a1 (0.2) a’2 (0.6)
a1 (0.2) 0 a’4 (0.2) 1
a’1 (0.4)
a3 (0.2) a3 (0.2)
1

a4 (0.1) 0
LETTER PROBABILITIES Codewords
a’4 (0.2)
a1 0.2 10
a5 (0.1) 1
a2 0.4 00
a3 0.2 11
a4 0.1 010
a5 0.1 011
Data Compression
Minimum Variance Huffman Coding:
The codes of letters are given below, Now we have to find the
Average length.

LETTER PROBABILITIES Codewords


a1 0.2 10
a2 0.4 00
a3 0.2 11
a4 0.1 010
a5 0.1 011

Average length= 0.2 * 2 + 0.4 * 2 + 0.2 * 2 + 0.1 * 3 + 0.1*3


= 2.2 bits
Data Compression
Adaptive Huffman Coding:
• Huffman coding requires knowledge of the probabilities of the source
sequence.

• If this knowledge is not available, Huffman coding becomes a two-pass


procedure:
– the statistics are collected in the first pass,
– the source is encoded in the second pass.

• In order to convert this algorithm into a one-pass procedure, Faller and


Gallagher independently developed adaptive algorithms to construct
the Huffman code based on the statistics of the symbols already
encountered.

• These were later improved by Knuth and Vitter.


Data Compression
Adaptive Huffman Coding:
• Adaptive Huffman Coding: statistics are gathered and updated dynamically as the
data stream arrives.

• Initial_code assigns symbols with some initially agreed upon codes, without any
prior knowledge of the frequency counts.

• Update_tree constructs an Adaptive Huffman tree.

• It basically does two things:


• Increments the frequency counts for the symbols (including any new ones).
• Updates the configuration of the tree.

• The encoder and decoder must use exactly the same initial_code and update_tree
routines.

• Adaptive Coding have three procedures:


• Encoding Procedure
• Updating Procedure
• Decoding Procedure
Data Compression
Adaptive Huffman Tree Updating :
– Nodes are numbered in order from left to right, bottom to
top. The numbers in parentheses indicates the count.

– The tree must always maintain its sibling property, i.e., all
nodes (internal and leaf) are arranged in the order of
increasing counts.

– If the sibling property is about to be violated, a swap


procedure is invoked to update the tree by rearranging
the nodes.

– When a swap is necessary, the farthest node with count N


is swapped with the node whose count has just been
increased to N +1.
Data Compression
Adaptive Huffman Tree Updating :
It works on two parameters
– Weight of each leaf
– The number of nodes
Weight of each leaf:
External Leaf: It is represented by . The weight of
external node consist of number of times a symbol
encountered.

Internal Leaf: It is represented by . The weight of


internal node consists the sum of weights of its offspring
(child).
Data Compression
Adaptive Huffman Tree Updating (cont..) :

The Number of Nodes:

• The total number of nodes in a tree is (2m-1), where


“m” represents total number of alphabets(English
Alphabets).

So, the number of nodes is 2*26-1 = 52-1 = 51

• There is a root called NYT (Not Yet Transmitted) node


of weight “0”
Data Compression
Ex: Update tree for the message by Adaptive Huffman.
“ a a r d v a r k ”
Sol:
The root node is NYT of weight “0” 2. Next symbol again “a” encountered,
Now tree for “aa”
NYT 0
51
2
1. First symbol “a” encountered 51
1
51 NYT 0 2 a
49 50
NYT 0 1 a
49 50
Data Compression
Ex: Update tree for the message by Adaptive Huffman.
“ a a r d v a r k ”
Sol:
3. Next symbol “r” encountered, Now 4. Next symbol “d” encountered, Now
tree for “aar” tree for “aard”
3 4

51 51

2 2 a

2 a 49 50
1
49 50

1 1 r
NYT 0 1 r 47 48
47 48
NYT 0 1 d
45 46
Data Compression
Ex: Update tree for the 4. Next symbol “v” encountered, Now
tree for “aardv”
message by Adaptive 5
Huffman. 51

“ aardvark ” Level-1 3 2 a
50
Sol: 49

Level-2 2 1 r
47 48

1 d
1
46
45 At level-1 and level-2
the value of left child
is greater than right
NYT 0 1 v child , So as per the
43 44 tree property need to
swap.
Data Compression
5. After swapping at level-1 and level-2,
Now tree for “aardv”
5
51

Level-1 2 a 3
49 50

Level-2 1 r 2
48
47
1 d
1
46
45

NYT 0 1 v
43 44
Data Compression
6. Next symbol “a” encountered, Now
tree for “aardva”
6
51

3 a 3
49 50

1 r 2
48
47
1 d
1
46
45

NYT 0 1 v
43 44
Data Compression
7. Next symbol “r” encountered, Now
tree for “aardvar”
7
51

3 a 4
49 50

2 r 2
48
47
1 d
1
46
45

NYT 0 1 v
43 44
Data Compression
8. Next symbol “k” encountered, Now
tree for “aardvark”
8
51
3 a 5
49 50
2 r 3
47 48

Level-3 2 1 d
46
45

1 1 v

At level-3 the value 43 44


of left child is greater
than right child , So
as per the tree NYT 0 1 k
property need to 41 42
swap.
Data Compression
9. After swapping at level-3, Now tree
for “aardvark”
8
51
3 a 5
49 50
2 r 3
47 48

Level-3 1 d 2
46 45

1 1 v

At level-3 the value 43 44


of left child is greater
than right child , So
as per the tree NYT 0 1 k
property need to
swap. 41 42
Data Compression
Adaptive Huffman Tree Encoding:
Flow chart to perform Encoding Process
Data Compression
Adaptive Huffman Tree Encoding:
Rules:
• Symbol encountered first time then perform or send code:-
NYT followed by Fixed Code
• Symbol is already encountered:
Code is the path from the root node to the corresponding node.

How to find NYT Code:


Traverse the tree from root node to NYT node
How to find Fixed Code:
We used an alphabet consisting of 26 letters. In order to obtain our
prearranged code, we have to find r and e such that
2e + r = 26 , where 0 ≤ r < 2e .
It is easy to see that the values of e = 4 and r = 10 satisfy this
requirement.
Data Compression
Adaptive Huffman Tree Encoding:
Rules:
Case1: if ‘k’ of alphabet in symbol 1 ≤ k ≤ 2r , then
ak is encoded as (e+1) bit binary representation
of (k-1). Eg: k value for alphabet ‘a’ is 1, ‘b’ is 2
etc…
Case2: if k > 2r , then ak is encoded as ‘e’ bit binary
representation of (k-r-1).
Data Compression
Ex: Encode the message by Adaptive Huffman.
“ a a r d v a r k ”
Sol:
The root node is NYT of weight “0” The Code for “a” is 00000 and the tree
for “a” is given below
NYT 0
51
First symbol “a” encountered, k=1, We have
e=4 and r=10. 1
The value of k=1 So, the Case 1 occurs 0 51 1
perform (k-1) in (e+1) bits
(k-1)= (1-1)=0 and (e+1)=(4+1)=5bits
The representation of “0” in 5 bits is 00000 NYT 0 1 a
The Code for first time is NYT+Fixed Code. 49 50
Initial the NYT is NULL and fixed Code is
“00000”
Data Compression
Ex: Encode the message by Adaptive Huffman.
“ a a r d v a r k ”
Sol:
1 The Code for “a” is 1 and the tree for “a”
is given below
0 51 1

NYT 0 1 a
2
49 50
0 51 1
Again the next symbol is “a” and it is
encountered earlier.
So, Only Send the Code and it is NYT 0 2 a
traverse the path from Root to Arrival 49 50
Symbol “a”
Data Compression
Ex: Encode the message by Adaptive Huffman.
“ a a r d v a r k ”
Sol:
Again the next symbol is “r” and it is
encountered first time.
For symbol “r” encountered, k=18, We have 3
e=4 and r=10.
The value of k=18 So, the Case 1 occurs 0 51 1
perform (k-1) in (e+1) bits
(k-1)= (18-1)=17 and (e+1)=(4+1)=5bits
1 2 a
The representation of 17 in 5 bits is 10001 0 49 50
1
The Code is NYT (0) + Fixed Code (10001).
So, the Code for “r” is 010001 NYT 0 1 r
47 48
The Updated tree is 
Data Compression
Ex: Encode the message by Adaptive Huffman.
“ a a r d v a r k ”
Sol: 4
0 51 1
Again the next symbol is “d” and it is
encountered first time.
For symbol “d” encountered, k=4, We have 2 2 a
e=4 and r=10. 49 50
The value of k=4 So, the Case 1 occurs 0 1
perform (k-1) in (e+1) bits
(k-1)= (4-1)=3 and (e+1)=(4+1)=5bits 1 1 r
0 47 48
The representation of 3 in 5 bits is 00011 1

The Code is NYT (00) + Fixed Code (00011). NYT 0 1 d


So, the Code for “d” is 0000011 45 46

The Updated tree is 


Data Compression
Ex: Encode the message by Adaptive Huffman.
“ a a r d v a r k ”
Sol: 5
0 51 1
Again the next symbol is “v” and it is
encountered first time.
For symbol “v” encountered, k=22, We 3 2 a
have e=4 and r=10. 49 50
The value of k=22 So, the Case 2 occurs 0 1
perform (k-r-1) in (e) bits
(k-r-1)= (22-10-1)=11 and (e)=(4)=4 bits 2 1 r
0 47 48
The representation of 11 in 4 bits is 1011 1

The Code is NYT (000) + Fixed Code (1011). 1 1 d


So, the Code for “v” is 0001011 0 45 46
1
NYT 0 1 v
The Updated tree is 
43 44
Data Compression
Ex: Encode the message by Adaptive Huffman.
“ a a r d v a r k ”
Sol: 5
0 51 1
As we know at Level 1 and
Level 2, The value of left
child is greater than the Level-1 2 a 3
right child. 49 0 50 1
So, need to swap at Level 1
Level-2 2
and Level 2 1 r
0 48
47 1
The Updated tree is 

1 1 d
0 45 1 46

NYT 0 1 v
43 44
Data Compression
Ex: Encode the message by Adaptive Huffman.
“ a a r d v a r k ”
Sol: 6
0 51 1
Again the next symbol is “a”
and it is encountered earlier.
3 a 3
So, Only Send the Code and it 49
is traverse the path from Root 0 50 1
to Arrival Symbol “a”
1 r 2
The Code for “a” is 0 and the 0 48
47 1
tree for “a” is given below

1 1 d
The Updated tree is  0 45 1 46

NYT 0 1 v
43 44
Data Compression
Ex: Encode the message by Adaptive Huffman.
“ a a r d v a r k ”
Sol: 7
0 51 1
Again the next symbol is “r”
and it is encountered earlier.
3 a 4
So, Only Send the Code and it 49
is traverse the path from Root 0 50 1
to Arrival Symbol “r”
2 r 2
The Code for “r” is 10 and the 0 48
47 1
tree for “a” is given below

1 1 d
The Updated tree is  0 45 1 46

NYT 0 1 v
43 44
Data Compression
Ex: Encode the message by 8
Adaptive Huffman. 0 51 1
“ a a r d v
a r k ”
3 a
Sol: 5
Again the next symbol is “k” and it is 49 50
0 1
encountered first time.
For symbol “k” encountered, k=11, We 2 r 3
have e=4 and r=10. 48
47 0
The value of k=11 So, the Case 1 occurs 1
perform (k-1) in (e+1) bits 2 1 d
Level-3
(k-1)= (11-1)=10 and 0 1
45 46
(e+1)=(4+1)=5bits
1 1 v
The representation of 10 in 5 bits is 01010
0 1 44
43
The Code is NYT (1100) + Fixed Code
(01010). NYT 0 1 k
So, the Code for “d” is 110001010 41 42
The Updated tree is 
Data Compression
8
After Swapping at level-3
0 51 1

3 a 5
49 50
0 1
2 r 3
48
47 1
0
2
1 d
0 1
46 45
1 1 v

0 1 44
43
NYT 0 1 k
41 42
Data Compression
After Encoding the letters the Codeword for each alphabet are:

a a r d v a r k

00000 1 010001 0000011 0001011 0 10 110001010

The Coded String is: 00000101000100000110001011010110001010


Data Compression
Adaptive Huffman Tree Decoding: Flow chart to perform Decoding Process
Data Compression
Adaptive Huffman Tree Decoding:
Rules:
• Read Binary String
• If leaf is encountered- then decode the symbol.
• If leaf is NYT- then
– If (e > r) then
• Position of Decimal value of (k+r+1 ) bits in alphabets
– Read “e” bits and if the resulting number is less than
“r” or (e < r) then
• Position of Decimal value of (e+1 ) bits +1 in alphabets
Data Compression
Ex: Decode the message by Adaptive Huffman.
“ 00000101000100000110001011010110001010 ”
Sol:
At receiver End the first Node is NYT node. So, The alphabet at Position (1) is “a”,
The updated tree is given below
NYT 0
51
At initial there is no NYT code, So we have
only the fixed code and the leaf is NYT then 1
read first “e” bits. As we know initially 0 51 1
“e”=4.
So, the first four bits in String is 0000
The decimal value of 0000 is 0 which is less NYT 0 1 a
than the value of r=10. 49 50
0<10
Now we read (e+1) bits decimal value +1
000000+11
Data Compression
Ex: Decode the message by Adaptive Huffman.
“ 00000101000100000110001011010110001010 ”

Sol: 1
0 51 1

NYT 0 1 a
2
49 50
0 51 1
Now the next bit in the String is “1” So
traverse from the root node via link 1
will get the symbol “a”. NYT 0 2 a
So, the code for the next bit 1 is “a” 49 50
Data Compression
Ex: Decode the message by Adaptive Huffman.
“ 00000101000100000110001011010110001010 ”
Sol:
So, The alphabet at Position (18) is “r”,
Now the Next bit in the String “0” So,
The updated tree is given below
traverse from the root node via link “0” and
will get NYT node, the next “0” bit is used 3
for NYT node .
0 51 1
So, read next “e” bits after the NYT bit “0”
is 1000. Decimal value of “1000” is “8”.
Perform e<r8<10, True 2 a
1
Then perform e+1 bits 10001  (17)
Decimal value. 0 49 50
1
Now we read (e+1) bits decimal value +1
1000117+118
NYT 0 1 r
47 48
Data Compression
Ex: Decode the message by Adaptive Huffman.
“ 00000101000100000110001011010110001010 ”
Sol: 3
0 51 1
Now the Next bit in the String “0” So,
traverse from the root node via link “0” and
again next bit is “0” and will get NYT node, 2 2 a
So, the next two bits “00” parse to reach 49 50
the NYT node. 0 1
So, Now perform the task
So, read next “e” bits after the NYT bit “00” 1 1 r
is 0001. Decimal value of “0001” is “1”. 0 47 48
1
Perform e<r1<10, True
Then perform e+1 bits 00011  (3) NYT 0 1 d
Decimal value. 45 46
Now we read (e+1) bits decimal value +1
000113+14
The alphabet at position (4) is “d”
Data Compression
Ex: Decode the message by Adaptive Huffman.
“ 00000101000100000110001011010110001010 ”
Sol: 5
0 51 1
Now the Next bit in the String “0” So,
traverse from the root node via link “0” and
again next bits are “00” and will get NYT 3 2 a
node, So, the next three bits “000” parse to 49 50
reach the NYT node. 0 1
Next “e” bits 1011 11(Dec)
Condition e<r11<10, False 2 1 r
Perform “e” bits for (k+r+1) 0 47 48
1
position in alphabets.
1 1 d
(k+r+1)(11+10+1)22 46
0 45 1
The position of alphabet at 22 is
“v” NYT 0 1 v
43 44
Data Compression
Ex: Decode the message by Adaptive Huffman.
“ 00000101000100000110001011010110001010 ”
Sol: 6
0 51 1
Now the Next bit in the String
“0” So, traverse from the root
node via link “0” will get 3 a 3
alphabet “a”. 49 0 50 1
Now the next bit “0”
represents the alphabet “a” 1 r 2
0 48
47 1

1 1 d
0 45 1 46

NYT 0 1 v
43 44
Data Compression
Ex: Decode the message by Adaptive Huffman.
“ 00000101000100000110001011010110001010 ”
Sol: 7
0 51 1
Now the Next bit in the String
“1” So, traverse from the root
node via link “1” and again 3 a 4
next bits are “0” and will get 49
the alphabet “r” 0 50 1
The next two bits “10”
2 r 2
represents the code for
0 48
alphabet “r” 47 1

1 1 d
0 45 1 46

NYT 0 1 v
43 44
Data Compression
Ex: Decode the message by Adaptive Huffman.
“ 00000101000100000110001011010110001010 ”
Sol: 7
0 51 1
Now the Next bit in the String
“1” So, traverse from the root
node via link “1” and again 3 a 4
next bits are “00” and will get 49
the NYT code. 0 50 1
Next “e” bits 0101 5(Dec)
2 r 2
Condition e<r5<10, True
0 48
Then perform e+1 bits 01010 47 1
 (10) Decimal value.
Now we read (e+1) bits 1 1 d
decimal value +1 0 1
45 46
0101010+111
The alphabet at position (11) NYT 0 1 v
is “k”
43 44
Ex: Decode the message by Adaptive Huffman.
“ 00000101000100000110001011010110001010 ”
Sol:
Now the Next bit in the String
“1” So, traverse from the root
node via link “1” and again The updated tree is on next slide
next bits are “00” and will get
the NYT code.
Next “e” bits 0101 5(Dec)
Condition e<r5<10, True
Then perform e+1 bits 01010
 (10) Decimal value.
Now we read (e+1) bits
decimal value +1
0101010+111
The alphabet at position (11)
is “k”
Data Compression
8
0 51 1

3 a 5
49 50
0 1
2 r 3
48
47 1
0
2
1 d
0 1
46 45
1 1 v

0 1 44
43
NYT 0 1 k
41 42
Data Compression
Golomb Codes:
• The Golomb code is described in a Succinct paper by
Solomon W. Golomb (1960).
• The Golomb codes belong to a family of codes
designed to encode integers with the assumption
that the larger an integer, the lower its probability
of occurrence.
• The simplest code for this situation is the unary
code.
• The unary code for a positive integer n is simply n
1s followed by a 0
• Thus, the code for 4 is 11110, and the code for 7 is
11111110.
Data Compression
Golomb Codes:
• Coding Scheme works in two parts
– Unary part
– Different part
• The Golomb code is actually a family of codes
parameterized by an integer m > 0.
• In the Golomb code with parameter m, we represent an
integer n > 0 using two numbers q and r, where
– q= ⌊ n/m ⌋ and r= n-qm
• We will use q for unary code and r for different code
• The codeword will get after combine q and r
Data Compression
Golomb Codes:
Different code of r
• ⌊ log2m ⌋ bit representation of r for first 2 ⌈ log2m ⌉ -m
values
• ⌈ log2m ⌉ bit representation of r+ 2 ⌈ log2m ⌉ -m for rest of values.
Data Compression
Golomb Codes:
Eg: Design a Golomb code for m=5 and
n= 0,1,2,3,....,15.

Sol: Always first find these values:


⌊ log2m ⌋= ⌊log25⌋=2
⌈ log2m ⌉= ⌈ log25 ⌉=3

Next Find
 ⌊ log2m ⌋ bit representation of r for first 2 ⌈ log2m ⌉ -m values

 2 bit representation of r for first 2 3 -5 values=8-5=3

 Means for three values of “r” (0,1,2) will be represented in 2 bits

 ⌈ log2m ⌉ bit representation of r+ 2 ⌈ log2m ⌉ -m for rest of values.


 3 bit representation for rest of values r+ 2 ⌈ log2m ⌉ -m = r+ 2 3 -5= r+3.
Data Compression
n m q= ⌊ n/m ⌋ r= n-qm Bits for q Bits for r Codewords (q+r)
0 5 0 0 0 00 000
1 5 0 1 0 01 001
2 5 0 2 0 10 010
3 5 0 3 0 110 0110
4 5 0 4 0 111 0111
5 5 1 0 10 00 1000
6 5 1 1 10 01 1001
7 5 1 2 10 10 1010
8 5 1 3 10 110 10110
9 5 1 4 10 111 10111
10 5 2 0 110 00 11000
11 5 2 1 110 01 11001
12 5 2 2 110 10 11010
13 5 2 3 110 110 110110
14 5 2 4 110 111 110111
15 5 3 0 1110 00 111000
Data Compression
Rice Codes:
• The Rice code was originally developed by Robert F. Rice (he called it the
Rice machine) and later extended by Pen-Shu Yeh and Warner Miller.
• The Rice code can be viewed as an adaptive Golomb code.
• In the Rice code, a sequence of nonnegative integers (which might have
been obtained from the preprocessing of other data) is divided into
blocks of J integers a piece.
• Each block is then coded using one of several options, most of which are
a form of Golomb codes.
• Each block is encoded with each of these options, and the option
resulting in the least number of coded bits is selected.
• The particular option used is indicated by an identifier attached to the
code for each block.
• The easiest way to understand the Rice code is to examine one of its
implementations.
• We will study the implementation of the Rice code in the
recommendation for lossless compression from the Consultative
Committee on Space Data Standards (CCSDS).
Data Compression
Rice Codes:

• (CCSDS) Recommendation for Lossless Compression.

• The algorithm consists of a preprocessor (the modeling step)


and a binary coder (coding step).

• The preprocessor removes correlation from the input and


generates a sequence of nonnegative integers.

• This sequence has the property that smaller values are more
probable than larger values.

• The binary coder generates a bit stream to represent the


integer sequence.
Data Compression
Rice Codes:
1 Steps for Preprocessor:
• Given a sequence {yi}, for each yi we generate a prediction ỹi.
• A simple way to generate a prediction would be to take the previous value of
the sequence to be a prediction of the current value of the sequence:
ỹi = yi-1
• We then generate a sequence whose elements are the difference between yi
and its predicted value ỹi
di = yi − ỹi
• The di value will have a small magnitude when our prediction is good and a
large value when it is not.
• Let ymax and ymin be the largest and smallest values that the sequence {yi} takes
on.
Ti = min { ymax − ỹ , ỹ − ymin }
The sequence {di} can be converted into a sequence of nonnegative
integers {xi} using the following mapping:
Data Compression
Rice Codes:
2 Steps for Coding (Binary Coder):
• The sequence {xi} is divided into segments with each
segment being further divided into blocks of size J.
• It is recommended by CCSDS that J have a value of 16.
• Each block is then coded using one of the following options.
• The coded block is transmitted along with an identifier that
indicates which particular option was used.
– Fundamental Sequences
• Unary code of n: n times 1 followed by 0
Data Compression
Rice Codes:
2 Steps for Coding (Binary Coder): cont..
– Split Sample Sequences
• The code for a k-bit number n using the mth split sample
option consists of the m least significant bits of k followed
by a unary code representing the k − m most significant
bits.
• Ex: Suppose n=23, k=8 bit, m=3 split sample
8 bit representation of 23 is 00010111
Least 3 significant bits of 00010111 is 111
Remaining k-m=(8-3) most significant bits
= 00010111 00010 = (2)10
Uniary code of 2 is 110
Data Compression
Rice Codes:
2 Steps for Coding (Binary Coder): cont..
– Second Extension option
• This option is use full for Sequence with low entropy
• Many value of {xi} are zero.

• the value of is encoded using a unary code.


Data Compression
Rice Codes:
2 Steps for Coding (Binary Coder): cont..
– Zero block option
• It is used when one or more of the blocks of {xi} are zero—generally
when we have long sequences of {yi} that have the same value.
• The ROS code is used when the last five or more blocks in a segment are
all zero.
Data Compression
Rice Codes:
Ex: Encode the sequence by Rice Code having 16 values using J= 8 via One Split
sample option.
32,33,35,39,37,38,39,40,40,40,40,39,40,40,41,40
Sol.
Assume the prediction value for the first element of sequence is zero (0).
yi 32 33 35 39 37 38 39 40 40 40 40 39 40 40 41 40
ỹi = yi-1 0 32 33 35 39 37 38 39 40 40 40 40 39 40 40 41
di = yi − ỹi 32 1 2 4 -2 1 1 1 0 0 0 -1 1 0 1 -1
ymax − ỹi 41 9 8 6 2 4 3 2 1 1 1 1 2 1 1 0
ỹ − ymin 0 32 33 35 39 37 38 39 40 40 40 40 39 40 40 41
Ti 0 9 8 6 2 4 3 2 1 1 1 1 2 1 1 0
= min { ymax − ỹ , ỹ − ymin }

Ti = min { ymax − ỹ , ỹ − ymin } ymax = 41 , ymin =0


Data Compression
Rice Codes:
di 32 1 2 4 -2 1 1 1 0 0 0 -1 1 0 1 -1
Ti 0 9 8 6 2 4 3 2 1 1 1 1 2 1 1 0
Xi 32 2 4 8 3 2 2 2 0 0 0 1 2 0 2 1

Now consider the mapping for xi


For x1 : d=32, T=0 , it lies in 3rd case : Ti + di = 0+32=32
For x2 : d=1, T=9 , it lies in 1st case : 2*di = 2*1=2
For x3 : d=2, T=8 , it lies in 1st case : 2*di = 2*2=4
For x4 : d=4, T=6 , it lies in 1st case : 2*di = 2*4=8
For x5 : d=-2, T=2 , it lies in 2nd case : 2*|di |-1 = 2*2-1=4-1=3
For x6 : d=1, T=4 , it lies in 1st case : 2*di = 2*1=2
For x7 : d=1, T=3 , it lies in 1st case : 2*di = 2*1=2
For x8 : d=1, T=2 , it lies in 1st case : 2*di = 2*1=2
For x9 = For x10 =For x11 =For x14 :d=0, T=1 , it lies in 1st case : 2*di = 2*0=0
For x12 : d=-1, T=1 , it lies in 2nd case : 2*|di |-1 = 2*1-1=2-1=1
For x13 : d=1, T=2 , it lies in 1st case : 2*di = 2*1=2
For x15 : d=1, T=1 , it lies in 1st case : 2*di = 2*1=2
For x16 : d=-1, T=0 , it lies in 2nd case : 2*|di |-1 = 2*1-1=2-1=1
Data Compression
Rice Codes:
di 32 1 2 4 -2 1 1 1 0 0 0 -1 1 0 1 -1
Ti 0 9 8 6 2 4 3 2 1 1 1 1 2 1 1 0
Xi 32 2 4 8 3 2 2 2 0 0 0 1 2 0 2 1

Now as the value of J =8, So we need to divide the sequence into two blocks.

The first block contain the first eight values. 32 2 4 8 3 2 2 2

The second block contain the last eight values. 0 0 0 1 2 0 2 1

For Coding now:


– Fundamental Sequences
• Unary code of n: n times 1 followed by 0

32 2 4 8 3 2 2 2
111111111111111111111111111111110 110 11110 111111110 1110 110 110 110
0 0 0 1 2 0 2 1
0 0 0 10 110 0 110 10
Data Compression
Rice Codes:
For Coding now:
Now as the value of J =8, So we need to divide the sequence into two blocks.

The first block contain the first eight values. 32 2 4 8 3 2 2 2

The second block contain the last eight values. 0 0 0 1 2 0 2 1


Data Compression
Rice Codes: Decimal
Value of Unary
For Coding now: Least Most Most representati
Significa Significa Significa on of (k-m)
nt “m=1 nt “k-m nt “k-m decimal
Binary Codewords
bits” bits” value
Numbers representation bits”
32 100000 0 10000 16 1111111 0111111
1111111 1111111
11110 111110

2 000010 0 00001 1 10 010


4 000100 0 00010 2 110 0110
8 001000 0 00100 4 11110 011110
3 000011 1 00001 1 10 110
2 000010 0 00001 1 10 010
2 000010 0 00001 1 10 010
2 000010 0 00001 1 10 010
Data Compression
Tunstall Codes:
• In the Tunstall code all the codewords
are of equal length.
• Each codeword represents different
number of letters.
• The main advantage of this code, error
in codewords doesnot propogate, unlike
variable length codes.
Data Compression
Tunstall Codes:
• To design “n” bit tunstall code (2^n codewords)
for an source with alphabet size N.
1.Start with N symbols of the source alphabet.
2.Remove the most probable symbol, add N new
entries to the codebook by the concatenate
the rest of the symbols with the probable one.
3.Repeat the process in step 2 for K times, where
N+K(N-1)<=2^n
Data Compression
• Ex: Design a Tunstall code (3-bit)
A={A, B, C} P(A)=0.6 P(B)=0.3 P(C)=0.1
Sol: As we know that the codeword is to be generated for 3 bits, then total no. of
Codewords would be 2n = 23 = 8
Arrange the letter according to their probability in Descending (Higher to Lower)
Letters Probability
A 0.6
B 0.3
C 0.1
Discard the letter of higher priority and concatenate with all letters in previous
table to make new letters and we get new table after First Iteration.

Letters Probability
B 0.3
C 0.1
AA 0.36
AB 0.18
AC 0.06
Data Compression
Sol:
Discard the letter of higher priority letter (AA) and concatenate with all letters in
both tables only or until the required number of codeword's reached to make new
letters and we get new table after Second (last) iteration
Letters Probability Codewords S no
B 0.3 000 0
C 0.1 001 1
AB 0.18 010 2
AC 0.06 011 3
AAA 0.216 100 4
AAB 0.108 101 5
AAC 0.036 110 6
Data Compression
Application of Huffman Coding:
• Lossless Image Compression
• Text Compression
• Audio Compression
Data Compression
PROBLEMS:
1. Alphabet A={a,b,c,d,e} with
P(a)=0.15, P(b)=0.04, P(c)=0.26,
P(d)=0.05 and P(e)=0.50.
Find:
– Entropy
– Huffman code for the source
– Average length of this code
Data Compression
Ans.
Entropy=1.82 bits/symbol
LETTER CODE
a 110
b 1111
c 10
d 1110
e 0

Average Length=1.83 bits


Data Compression
PROBLEMS:
2. Design the Golomb code of 24 for m=5.
Ans. Golomb Code of 24 : 11110111

You might also like