Image Compression: Volum Eof Data To (Tex T, Fax
Image Compression: Volum Eof Data To (Tex T, Fax
o
t
a
da t
of
e
m
u
l
,
o
x
v
a
f
,
he
t
t
x
e
e
t
c
(
u
d
d
e
To re nsmitt
ra
t
e
b
s)
e
g
e
g
a
r
ima
to
s
e
c
u
.
d
s
e
t
r
n
e
To
m
e
ir
u
q
re
IMAGE
COMPRESSION
~2x
Bytes required by 3 hr
12
10
movie?
Total size of movie 2000 GB;
So 236 DVDs required
APR 14, 11
DATA COMPRESSION
INFORMATION VS
DATA
REDUNDANTDATA
INFORMATION
RELATIVE DATA
REDUNDANCY
Relative data
redundancy
APR 14, 11
Larger
representation
6
APR 14, 11
7
APR 14, 11
8
APR 14, 11
I. CODING
REDUNDANCY
pr(rk) = nk / NM
APR 14, 11
0.1
1000000
0
200
0.3
1100100
0
255
0.2
What is the average no. of bits
required to represent each pixel?
others
0.0
1111111
1
8 bits
APR 14, 11
Code
pr(rk)
0.4
0000000
0
10
Code
0
128
pr(rk)
0.4
0.1
200
255
others
0.3
0.2
0.0
10
110
0
111
11
arrives
No codeword should
be a prefix of
another codeword
APR 14, 11
12
13
14
APR 14, 11
16
APR 14, 11
17
III. IRRELEVANT
INFORMATION
AKA PSYCHOVISUAL
REDUNDANCY
19
APR 14, 11
20
ta
n
e
m
a i se
d
n
Fu Prem
l
APR 14, 11
APR 14, 11
21
22
APR 14, 11
ENTROPY
AVERAGE INFORMATION PER SOURCE
OUTPUT
23
ENTROPY
AVERAGE INFORMATION PER SOURCE
OUTPUT
The base of the log determines
the unit used to measure
information
0.3
0.2
0.0
24
BASIC COMPRESSION
METHODS
25
APR 14, 11
26
HUFFMAN CODING
APR 14, 11
APR 14, 11
HUFFMAN CODING
Codeword
Codeword X
length
2
01
10
11
000
001
01
0.25
10
0.25
11
0.2
000
0.15
001
0.15
Probabilit
y
00
1
0.3
0.45
01
00
0.25
0.3
01
10
0.25
0.25
11
0.2
0
0.55
1
0.45
EXAMPLE
Ax={ a , b , c , d , e }
1.0
0.55
1
0
0.45
0
0.3
1
a
0.25
b
0.25
c
0.2
d
0.15
e
0.15
00
10
11
010
011
DISADVANTAGES OF
THE HUFFMAN CODE
Changing ensemble
VARIATIONS
Generalizing
32
GOLOMB CODING
APR 14, 11
34
quotient
remaind
er
0
1
0
0
0
1
000
001
2
3
4
0
0
1
2
3
0
010
011
1000
5
6
7
1
1
1
1
2
3
1001
1010
1011
1100
0
8
APR 14, 11
Cod
e
36
37
APR 14, 11
ARITHMETIC
CODING
38
ALGORITHM FOR
ARITHMETIC CODING
APR 14, 11
Symbol
a
b
c
d
APR 14, 11
New "a"
Interval
[0.0, 0.04)
[0.04, 0.1)
[0.1, 0.12)
[0.12, 0.2)
39
c
d
[0.16, 0.168)
[0.168, 0.2)
40
APR 14, 11
41
APR 14, 11
DISADVANTAGES
Although the AC usually provides a better result in
comparison to the wide-spread Huffman code, it is
applied rarely.
One is that the whole codeword must be received to
start decoding the symbols, and if there is a corrupt
bit in the codeword, the entire message could
become corrupt.
Another is that there is a limit to the precision of the
number which can be encoded, thus limiting the
number of symbols to encode within a codeword.
There also exists many patents upon arithmetic
coding, so the use of some of the algorithms also
call upon royalty fees. The companies IBM, AT&T and
Encoding Strings
lowerbound = 0
upperbound = 1
while there are still symbols to encode
currentrange = upperbound - lowerbound
upperbound = lowerbound + (currentrange
upperboundofnewsymbol)
lowerbound = lowerbound + (currentrange
lowerboundofnewsymbol)
end while
Any value between the computed lower and upper
probability bounds now encodes the input string.
43
APR 14, 11
Symb Probabilit
Let us encode add
Interval
ol
y
Start with lower and upper
[0.0,
a
0.2
probability bounds of 0 and
0.2)
1.
[0.2,
b
0.3
0.5)
Encode 'a'
[0.5,
c
0.1
current range = 1 - 0 = 1
0.6)
upper bound = 0 + (1 0.2)
[0.6,
d
0.4
= 0.2
1.0)
Encode d
lower bound = 0 + (1 0.0)
current range = 0.2 - 0 = =
0.20.0
upper bound = 0 + (0.2 1.0)
= 0.2
lower bound = 0 + (0.2 0.6)
Encode d
= 0.12
current range = 0.2 0.12 = 0.08
44
upper bound = 0.12 + (0.08 1.0) =
0.2
APR 14, 11
45
DECODING
ALGORITHM
Decode 0.177
APR 14, 11
46
LZW CODING
APR 14, 11
APR 14, 11
48
APR 14, 11
"ABABBABCABABBA"
Code
String
C
49
APR 14, 11
50
APR 14, 11
39
39
39
39
APR 14, 11
39
39
39
39
126
39
126
39
39
126 39126
126126
126
126
126 126
39
126 39126
39-39
126
126
126
126-126
39
39
39
39-39
126
39-39-126
126
126
39
126-39
39
39
126
39-126
126
o/p
Code
String
39
256
39-39
Encode
the given
image.
39
39-126
(did 257
u notice the
vertical edge?)
126
258
126-126
126
259
126-39
256
260
39-39-126
258
261
126-126-39
260
262
259
262
126-39-39
257
263
39-126-126
39-39-126-126
51
52
LZW
DECOMPRESSION
APR 14, 11
53
APR 14, 11
54
APR 14, 11
LZW: Limitations
What happens when the dictionary gets too large (i.e., when
all the 4096 locations have been used)?
Here are some options usually implemented:
Simply forget about adding any more entries and use the
table as is.
Throw the dictionary away when it is no longer effective at
compression.
Clear entries 256-4095 and start building the dictionary
again.
55
Some clever schemes rebuild a string table from the
last N input characters.
APR 14, 11