Chapter Four
Lossless Compression Algorithms
1
Types of Compression
Lossless Compression Lossy Compression
M M
Compress without Loss Compress with loss
m m
M’ M
Uncompress Uncompress
M M’
M = Multimedia data Transmitted
Introduction
Compression: Encoding data with fewer bits
than the original representation of the
information.
It is a state of being made smaller or more
pressed together.
3
Why data compression?
Compressed data take little storage space
Once large multimedia data compressed and transferred to
some where then we can reconstruct/decompressed and
construct to its original Data size.
Compressed data can transferred quickly
Since compressed data can easily transferred to some where and
also reduce communication bandwidth.
4
Information Theory
Lossless compression: accuracy matters when we want to
decompressed the a data or information to its original form.
No information and quality is lost in lossless compression.
Can re create the original data from compressed Data.
5
Run Length Encoding(RLE)
RLE: is lossless data compression
runs on data (sequences in which the same data value occurs
in many consecutive data elements) are stored as a single
data value and count.
Example
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWW
WWWWWWWWWBWWWWWWWWWWWWWW
Encoded and compress as :12W1B12W3B24W1B14W
6
Cont.…
Interpreted As: a sequence of twelve Ws, one B, twelve Ws,
three Bs, etc.,
the original characters were 67 and after compressed its will
be represented by 18 characters only.
Can also represent by splitting number and characters into
different ways.
Given an input string, write a function that returns the Run
Length Encoded string for the input string.
For example, if the input string is “wwwwaaadexxxxxx”, then
7
RLE
A A A A A A A A B B B B B B B C C
Run Length Encoding
A 8 B 7 C 2
8
Variable Length Coding(VLC)
VLC is a code which maps source symbols t0 a variable number
of bits.
can allow sources to be compressed and decompressed
with zero error (lossless data compression) and still be read
back symbol by symbol.
Some examples of well-known variable-length coding strategies
are
Huffman coding,
arithmetic coding,
9
Huffman Coding
The Huffman coding procedure finds the optimum, uniquely decodable,
variable length code associated with a set of events, given their probabilities of
occurrence
Start
Creates a Binary Code Branches
Root
Tree
0 1
Nodes connected by
A
branches with leaves 0 1
Node
Top node – root B
Two branches from 0 1
each node C D
Leaves
10
Huffman Coding
A0
B 10 Start
Branches
C 110 Root
D 111 1
0
Given the adjacent A
Node
Huffman code tree, 0 1
decode the following B
1
sequence: 11010001110 0
C D
110 10 0 111 0
C B A D A
Leaves
11
Huffman Coding Examples
Lets Take a sequence of character as follows
Message=BCCABBDDAECCBBAEDDCC
First to now The length of the message we should have
to count the characters=20
How each character is to be sent i.e. Alphabets were
understand by computer using their ASCII code
ASCII code is=8 Bits
E.g. ASCII value of A=65=01000001
12
Huffman Coding Examples
So to now the length of all characters we have 8 bit of
ASCII value for each character and length of 20=
8*20=160 Bits.
But this is not efficient since we only use 5 alphabets i.e.
from A to E so taking 8 bits to represent the characters is
tedious task.
13
Huffman Coding Examples
Characters Frequency/ Code
count/occurrence
A 3 000
B 5 001
C 6 010
D 4 011
E 2 111
The message Length is
Length * Number of Bits=20*3=60 Bits.
For characters =5x8 bit=40
For codes 5*3=15 , so 40+15=55
14
Huffman Coding Examples
Character Count Code size
A 3 001 3x3=9
B 5 10 5x2=10 Total=45
C 6 11 6x2=12 Bits size of
message
D 4 01 4x2=8
E 2 000 2x3=6
Starting from the root
Arrange from smallest to node assign 0 to the left
largest value side and 1 to the right
E A D B C side
20
2 3 4 5 6 0 BCCABBD
1 B=10 c=11
9 If a serious of
0
codes were
given you can
5 1 11 decode like this
0 1 0
1
2 3 4 5 6 15
Huffman Coding Examples
To know the size of the Tree/Table
Number of character x ASCII bit= 5X8 =40
Number of new code bits=12
Totally 52 bits
The bit size of the message is 45 bit and the tree/table
size is 52 totally we need 97 Bit
16
17
18
19
20
21
22
23
a =45 *3 = 135
e =65*2 = 130
l =13*4 =52
n =45*3 = 135
o =18*4 =72
s =22*3= 66
t =53*2 =106
total = bit size= 696
Char 7*8 = 56
Number code word = 21
Total message = 696+56+21 = 773
24
Lempel-Ziv –Welch (LZW)
This algorithm is typically used in GIF and optionally in PDF and
TIFF.
It is lossless, meaning no data is lost when compressing.
The algorithm is simple to implement and has the potential for
very high throughput in hardware implementations.
It is widely used in Unix file compression utility compress, and is
used in the GIF image format.
25
Example
Encode the following Data sequence using LZW
encoding.
AABABBBABAABABBBABBABB
Dividing data sequence in to Data segment
A AB ABB B ABA ABAB BB ABBA BB
The segment should be short and not encountered
previously
26
Example
All segments referred to as phrases
Encode phrase by phrase and encode last bit
Sequence A AB ABB B ABA ABAB BB ABBA BB
Position 1 2 3 4 5 6 7 8 9
Numerical A 1B 2B B 2A 5B 4B 3A 7
Representation
27
Example2
Encode (i.e., compress) the string ABBCBCABABCAABCAAB
using the LZ algorithm.
The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B)
Note: The above is just a representation, the commas and
parentheses are not transmitted
Example: Decompression
Decode (i.e., decompress) the sequence (0, A) (0, B) (2, C) (3, A)
(2, A) (4, A) (6, B)
The decompressed message is:
ABBCBCABABCAABCAAB
Summery Questions
What is compression?
What are the different types of compression?
What is lossless compression?
Why we need compression?
List some of lossless compression algorithms?
Encode (compress) the following strings using the Lempel-Ziv
algorithm.
1. aaababbbaaabaaaaaaabaabb
2. ABBCBCABABCAABCAAB
3. SATATASACITASA.
30