0% found this document useful (0 votes)
1 views17 pages

Chapter 5 Data Compression

Chapter 5 discusses basic data compression techniques, categorizing them into lossless and lossy methods. It covers concepts such as redundancy, variable length coding, Huffman encoding, run-length encoding, and quantization, explaining how these methods reduce data size while preserving or approximating the original information. The chapter highlights the trade-offs between compression efficiency and data fidelity, particularly in the context of different types of data.

Uploaded by

bakr khader
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views17 pages

Chapter 5 Data Compression

Chapter 5 discusses basic data compression techniques, categorizing them into lossless and lossy methods. It covers concepts such as redundancy, variable length coding, Huffman encoding, run-length encoding, and quantization, explaining how these methods reduce data size while preserving or approximating the original information. The chapter highlights the trade-offs between compression efficiency and data fidelity, particularly in the context of different types of data.

Uploaded by

bakr khader
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Chapter 5

Basic Compression

 Data Compression
 Redundancy
 Variable Length Coding
 Huffman encoding
 Run Length Encoding (RLE)
 Quantization (Lossy)

1
Data Compression
Two categories
• Information Preserving
– Error free compression
– Original data can be recovered completely
• Lossy
– Original data is approximated
– Less than perfect
– Generally allows much higher compression

2
Basics
Data Compression
– Process of reducing the amount of data required to
represent a given quantity of information
Data vs. Information
– Data and Information are not the same thing
– Data
• the means by which information is conveyed
• various amounts of data can convey the same
information
– Information
• “A signal that contains no uncertainty”
3
Redundancy
Redundancy
– “data” that provides no relevant information
– “data” that restates what is already known
• For example
– Consider that N1 and N2 denote the number of “data
units” in two sets that represent the same information
– where Cr is the “Compression Ratio”
• Cr = N1 / N2
• EXAMPLE
N1 = 10 & N2 = 1 data can encode the same information Compression is Ratio
Cr = N1/N2 = 10 (or 10:1)
Implying 90% of the data in N1 is redundant
4
Variable Length Coding (1)
Our binary system (called natural binary) is not always that good at
representing data from a compression point of view

Consider the following string of data:


abaaaaabbbcccccaaaaaaaabbbbdaaaaaaa

There are 4 different pieces of information (let’s say 4 symbols)


a, b, c, d

In natural binary we would need at least 2 bits to represent this, assigning bits as
follows:
a=00, b=01, c=10, d=11

There
5 are 35 pieces of data, that is 35*2bits = 70bits
Variable Length Coding (2)
Now, consider the occurrence of each symbol:
a,b,c,d
Abaaaaabbbcccccaaaaaaaabbbbdaaaaaaa

a = 21/35 (60%)
b = 8 /35 (22%)
c = 5/35 (14%)
d = 1/35 (02%)

6
Variable Length Coding (3)
Idea of variable length coding: assign less bits to encode to
more frequent symbols, more bits for less frequent symbols

Bit assignment using VLC:


a=1, b=01, c=001, d=0001

Now, compute the number of bits used to encode the same


data:
21*(1) + 8*(2) + 5*(3) + 1*(4) = 56bits

a = 21/35 (84%)
So, we have a compression ratio of 70:56 or 1.25, meaning
b = 8 /35 (22%)
c = 5/35 (14%)
7
20% of the data using natural binary encoding is redundant
d = 1/35 (02%)
David Huffman

Huffman encoding

This is an example of error free coding, the information is completely the same,
the data is different

• This idea of variable length coding is used in many places


– Area codes for cities in large countries (China/India)

• Prof. David A. Huffman developed an algorithm to take a data set and compute
its “optimal” encoding (bit assignment)
– He developed this as a student at MIT

• This is very commonly applied to many compression techniques as a final stage

• There are other VLC techniques, such as Arthimetic coding and LZW coding
8
(ZIP)
Huffman encoding
Is the same of VLC, but using the Histogram
amount to measure the frequency for each level
in the Gray image.

See next example:


Note
The image coded with 3 bits per pixel

9
Huffman encoding
codew Gray
lp l p
ord level
pi=hi/n
0.000 0
• p(i) is the probability of 0.012 1
occurrence for a gray level i. 0.071 2
• h is the frequency of occurrence
0.019 3
of a gray level i
• n is the total number of pixels in 0.853 4
the image 0.023 5
0.019 6
0.003 7

Digital image processing a practical introduction using java, Nick Efford


10
Huffman encoding
codew Gray
lp l p
ord level
1111
0 0.000 0
4 0.853 11
0
2 0.071 1.317 1111
0
5 0.023 0.042 0 0.147 0.012 1
0
3 0.019 1 0 0.076 1
6 0.019 10 0.071 2
0 0.034 1
1 0.012 0.015 1 1101 0.019 3
0
7 0.003 0.003 1 0 0.853 4
0 0.000 1
1 1100 0.023 5
1110 0.019 6
1111
0.003 7
10

Digital image processing a practical introduction using java, Nick Efford


11
Huffman encoding
codew Gray
lp l p
ord level
1111
6 0.000 0
11
1111
5 0.012 1
0
2 10 0.071 2
4 1101 0.019 3
1 0 0.853 4
4 1100 0.023 5
4 1110 0.019 6
1111
6 0.003 7
10

Digital image processing a practical introduction using java, Nick Efford


12
Huffman encoding
x
codew Gray
lp l p
ord level
1111
0.000 6 0.000 0
11
1111
0.060 5 0.012 1
0
0.142 2 10 0.071 2
0.076 4 1101 0.019 3
0.853 1 0 0.853 4
0.092 4 1100 0.023 5
0.076 4 1110 0.019 6
1111
0.018 6 0.003 7
10

Digital image processing a practical introduction using java, Nick Efford


13
Huffman encoding
codew Gray
lp l p
ord level
The compression
ratio achieved by 1111
0.000 6 0.000 0
Huffman coding is 11
3/1.317=2.28 1111
0.060 5 0.012 1
That mean there is a 0
56% data redundancy 0.142 2 10 0.071 2 +
0.076 4 1101 0.019 3
0.853 1 0 0.853 4
0.092 4 1100 0.023 5
0.076 4 1110 0.019 6
1111 1.317
0.018 6 0.003 7
10

Digital image processing a practical introduction using java, Nick Efford


14
Run-length-encoding (RLE)
Consider the following
aaaaaaaabbbaaaaaaaabbbba

We could instead represent this by “runs”


8a,3b,19a,4b,1a

RLE is often more compact, especially when data contains lots of runs
of the same number(s)

RLE is also lossless, you can always reconstruct the original

Fax machines transmit RLE-encoded linescans of the black-and-white


15 documents
Quantization (Lossy)
Another thing we can do is actually quantize the data such that it cannot be
recovered completely

Consider the following string of numbers:


ORIGINAL = {10, 12, 15, 18, 20, 3, 5, 2, 13}
(all the numbers are unique so Huffman coding won’t help)

Integer divide this by 5 (i.e. quantize it)


QUANTIZED = {2, 2, 3, 3, 4, 0, 1, 0, 2}

The natural binary range is smaller, we could also use Huffman encoding to get a bit
more compression.

Of course, the values are not the same, on “reconstruction” (multiply by 5) we get
only an approximation of the original:
RECOVERED
16 = {10, 10, 15, 15, 20, 0, 5, 0, 10}
Lossy vs. Lossless
• For things like text documents and computer data files, lossy compression
doesn’t make sense
– An approximation of the original is no good!

• But for data like audio or visual, small errors are not easily detectable by
our senses
– An approximation is acceptable

• This is one reason we can get significant compression of images and audio,
vs. other types of data
– Lossless 10:1 is typically possible
– Lossy 300:1 is possible with no significant perceptual loss

• With lossy we can even talk about “quality”


– The more like the original the higher the quality
17 – The less like the original, the lower the quality

You might also like