0% found this document useful (0 votes)

59 views

Data Compression

This document discusses data compression basics. It defines a discrete source as a source of information that can take on discrete values, each with an associated probability. The uncertainty of a discrete source can be quantified using entropy, which measures the average uncertainty in each symbol of the source. Shannon's source coding theorem states that the minimum average code length required to represent a discrete source is its entropy. Data compression aims to map the source to binary codewords in a way that achieves source entropy to minimize redundancy.

Uploaded by

Samar

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Data Compression

Uploaded by

Samar

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 113

CC743 Data Compression

and Digital Image Processing

Chapter 3
Data Compression

Dr. Rowayda A. Sadek

[email protected]
Data Compression Basics
• Discrete source
– Information=uncertainty
– Quantification of uncertainty
– Source entropy
• Variable length codes
– Motivation
– Prefix condition
– Huffman coding algorithm
• Lempel-Ziv coding*
Information
• What do we mean by information?
– “A numerical measure of the uncertainty of an
experimental outcome” – Webster Dictionary
• How to quantitatively measure and
represent information?
– Shannon proposes a probabilistic approach
• Let us first look at how we assess the
amount of information in our daily lives
using common sense
Information = Uncertainty
• Zero information
– Pittsburgh Steelers won the Superbowl XL (past news, no
uncertainty)
– Yao Ming plays for Houston Rocket (celebrity fact, no uncertainty)
• Little information
– It will be very cold in Chicago tomorrow (not much uncertainty
since this is winter time)
– It is going to rain in Seattle next week (not much uncertainty since
it rains nine months a year in NW)
• Large information
– An earthquake is going to hit CA in July 2006 (are you sure? an
unlikely event)
– Someone has shown P=NP (Wow! Really? Who did it?)
Shannon’s Picture on
Communication (1948)
channel channel
source channel destination
encoder decoder
super-channel

source source
encoder decoder

The goal of communication is to move information

from here to there and from now to then

Examples of source:
Human speeches, photos, text messages, computer programs …

Examples of channel:
storage media, telephone lines, wireless transmission …
Source-Channel Separation Principle

The role of channel coding:

Fight against channel errors for reliable transmission of information
(design of channel encoder/decoder is considered in EE461)

We simply assume the super-channel achieves error-free transmission

The role of source coding (data compression):

Facilitate storage and transmission by eliminating source redundancy
Our goal is to maximally remove the source redundancy
by intelligent designing source encoder/decoder
Discrete Source
• A discrete source is characterized by a
discrete random variable X
• Examples
– Coin flipping: P(X=H)=P(X=T)=1/2
– Dice tossing: P(X=k)=1/6, k=1-6
– Playing-card drawing:
P(X=S)=P(X=H)=P(X=D)=P(X=C)=1/4
What is the redundancy with a discrete source?
Two Extreme Cases

tossing source source

channel
a fair coin encoder decoder

P(X=H)=P(X=T)=1/2: (maximum uncertainty)

Minimum (zero) redundancy, compression impossible

tossing a coin with Head HHHH…

or channel duplication
two identical sides
Tail? TTTT…
P(X=H)=1,P(X=T)=0: (minimum redundancy)
Maximum redundancy, compression trivial (1bit is enough)

Redundancy is the opposite of uncertainty

Quantifying Uncertainty of an Event

Self-information

I ( p )   log 2 p p - probability of the event x

(e.g., x can be X=H or X=T)
p I ( p) notes

1 0 must happen
(no uncertainty)

 unlikely to happen
0
(infinite amount of uncertainty)
Intuitively, I(p) measures the amount of uncertainty with event x
Weighted Self-information

p I ( p) I w ( p)  p  I ( p)
0  0
1/2 1 1/2
1 0 0

As p evolves from 0 to 1, weighted self-information

I w ( p)   p  log 2 p first increases and then decreases

Question: Which value of p maximizes Iw(p)?

*Maximum of Weighted Self-information

p=1/e

1
I w ( p) 
e ln 2
Quantification of Uncertainty of a Discrete
Source

 A discrete source (random variable) is a collection

(set) of individual events whose probabilities sum to 1
X is a discrete random variable
x {1,2,..., N }
N
pi  prob( x  i ), i  1,2,..., N p
i 1
i 1

 To quantify the uncertainty of a discrete source,

we simply take the summation of weighted self-
information over the whole set
Shannon’s Source Entropy
Formula
N
H ( X )   I w ( pi )
i 1

N
H ( X )   pi log2 pi (bits/sample)
i 1 or bps

Weighting
coefficients
Source Entropy Examples

 Example 1: (binary Bernoulli source)

Flipping a coin with probability of head being p (0<p<1)

p  prob( x  0), q  1  p  prob( x  1)

H ( X )  ( p log 2 p  q log 2 q )

Check the two extreme cases:

As p goes to zero, H(X) goes to 0 bps  compression gains the most

As p goes to a half, H(X) goes to 1 bps  no compression can help

Entropy of Binary Bernoulli Source
Source Entropy Examples
N
 Example 2: (4-way random walk)

1 1 W E
prob( x  S )  , prob( x  N ) 
2 4
1 S
prob( x  E )  prob( x  W ) 
8
1 1 1 1 1 1 1 1
H ( X )  ( log 2  log 2  log2  log2 )  1.75bps
2 2 4 4 8 8 8 8
Source Entropy Examples (Con’t)

 Example 3: (source with geometric distribution)

A jar contains the same number of balls with two different colors: blue and red.
Each time a ball is randomly picked out from the jar and then put back. Consider
the event that at the k-th picking, it is the first time to see a red ball – what is the
probability of such event?

1 1
p  prob( x  red )  ,1  p  prob( x  blue) 
2 2
Prob(event)=Prob(blue in the first k-1 picks)Prob(red in the k-th pick )
=(1/2)k-1(1/2)=(1/2)k
Source Entropy Calculation

If we consider all possible events, the sum of their probabilities will be one.
k

1
Check:    1
k 1  2  k
1
Then we can define a discrete random variable X with P( x  k )   
2
Entropy:
k

1 
H ( X )   pk log 2 pk   k    2bps
k 1 k 1  2 

Problem 1 in HW3 is slightly more complex than this example

Properties of Source Entropy
• Nonnegative and concave
• Achieves the maximum when the source
observes uniform distribution (i.e.,
P(x=k)=1/N, k=1-N)
• Goes to zero (minimum) as the source
becomes more and more skewed (i.e.,
P(x=k)1, P(xk) 0)
?What is the use of H(X)

Shannon’s first theorem (noiseless coding theorem)

For a memoryless discrete source X, its entropy H(X)
defines the minimum average code length required to
noiselessly code the source.

Notes:
1. Memoryless means that the events are independently
generated (e.g., the outcomes of flipping a coin N times
are independent events)
2. Source redundancy can be then understood as the
difference between raw data rate and source entropy
*Code Redundancy
Practical performance Theoretical bound

r  l  H (X )  0
N li: the length of
Average code length: l   pi li
i 1
codeword assigned
N
1 to the i-th symbol
H ( X )   pi log 2
i 1 pi
Note: if we represent each symbol by q bits (fixed length codes),
Then redundancy is simply q-H(X) bps
?How to achieve source entropy

discrete entropy binary

source X coding bit stream

P(X)

Note: The above entropy coding problem is based on simplified

assumptions are that discrete source X is memoryless and P(X)
is completely known. Those assumptions often do not hold for
real-world data such as images and we will recheck them later.
Data Compression Basics
• Discrete source
– Information=uncertainty
– Quantification of uncertainty
– Source entropy
• Variable length codes
– Motivation
– Prefix condition
– Huffman coding algorithm
• Lempel-Ziv coding*
Variable Length Codes (VLC)
Recall:
Self-information I ( p )   log 2 p

It follows from the above formula that a small-probability event contains

much information and therefore worth many bits to represent it. Conversely,
if some event frequently occurs, it is probably a good idea to use as few bits
as possible to represent it. Such observation leads to the idea of varying the
code lengths based on the events’ probabilities.

Assign a long codeword to an event with small probability

Assign a short codeword to an event with large probability
way Random Walk Example-4
fixed-length variable-length
symbol k pk codeword codeword
S 0.5 00 0
N 0.25 01 10
E 0.125 10 110
W 0.125 11 111

symbol stream : SSNWSENNNWSSSNESS

fixed length: 00 00 01 11 00 10 01 01 11 00 00 00 01 10 00 00 32bits
variable length: 0 0 10 111 0 110 10 10 111 0 0 0 10 110 0 0 28bits
4 bits savings achieved by VLC (redundancy eliminated)
Toy Example (Con’t)
• source entropy: 4
H ( X )   pk log2 pk
k 1
=0.5×1+0.25×2+0.125×3+0.125×3
=1.75 bits/symbol
• average code length:
Nb Total number of bits
l (bps)
Ns Total number of symbols

fixed-length variable-length
l  2bps  H ( X ) l  1.75bps  H ( X )
Problems with VLC
• When codewords have fixed lengths, the
boundary of codewords is always
identifiable.
• For codewords with variable lengths,
their boundary
symbol VLC could become
S S N W S ambiguous
E …
e
S 0 0 0 1 11 0 10…
N 1 0 0 11 1 0 10… 0 0 1 11 0 1 0…
E 10 d d
W 11 SSWN SE … SSNW SE …
Uniquely Decodable Codes
• To avoid the ambiguity in decoding, we
need to enforce certain conditions with
VLC to make them uniquely decodable
• Since ambiguity arises when some
codeword becomes the prefix of the other,
it is natural to consider prefix condition
Example: p  pr  pre  pref  prefi  prefix

ab: a is the prefix of b

Prefix condition

No codeword is allowed to
be the prefix of any other
codeword.

We will graphically illustrate this condition

with the aid of binary codeword tree
Binary Codeword Tree
root # of codewords

Level 1 1 0 2

Level 2 11 10 01 00 22

Level k …… 2k
Prefix Condition Examples
symbol x codeword 1 codeword 2
S 0 0
N 1 10
E 10 110
W 11 111

1 0 1 0

11 10 01 00 11 10

…… 111 110… …
codeword 1 codeword 2
How to satisfy prefix condition?
• Basic rule: If a node is used as a
codeword, then all its descendants
cannot be used as codeword.
Example 1 0

11 10
111 110

…
Property of Prefix Codes
N
Kraft’s inequality  1
2  li

i 1

li: length of the i-th codeword (proof skipped)

Example symbol x VLC- 1 VLC-2

S 0 0
N 1 10
E 10 110
W 4 11 4 111

 1
2
i 1
 li
 1
2
i 1
 li
Two Goals of VLC design
• achieve optimal code length (i.e., minimal redundancy)
For an event x with probability of p(x), the optimal
code-length is –log2p(x) , where x denotes the
smallest integer larger than x (e.g., 3.4=4 )
code redundancy: r  l  H ( X )  0

Unless probabilities of events are all power of 2,

we often have r>0

• satisfy prefix condition

Huffman Codes (Huffman’1952)
• Coding Procedures for an N-symbol source
– Source reduction
• List all probabilities in a descending order
• Merge the two symbols with smallest probabilities
into a new compound symbol
• Repeat the above two steps for N-2 steps
– Codeword assignment
• Start from the smallest source and work back to the
original source
• Each merging point corresponds to a node in binary
codeword tree
Example-I

Step 1: Source reduction

symbol x p(x)
S 0.5 0.5 0.5
N 0.25 0.25
0.5
E 0.125 (NEW)
0.25
W 0.125 (EW)
compound symbols
Example-I (Con’t)
Step 2: Codeword assignment

1 0 symbol x p(x) codeword

NEW 0 S 0.5 0.5 0.5 0 0
1 0 S N 0.25 0.25 0 10
0.5 1
EW 10 E 0.125 0 110
N 1
1 0 0.25
111 110 W 0.125 1 111
W E
Example-I (Con’t)

1 0 0 1
NEW 0 NEW 1
1 0 S 0 1 S
or
EW 10 EW 01
N N
1 0 1 0
110 001 000
W E W E
The codeword assignment is not unique. In fact, at each
merging point (node), we can arbitrarily assign “0” and “1”
to the two branches (average code length is the same).
Example-II
Step 1: Source reduction
symbol x p(x)
e 0.4 0.4 0.4 0.6
(aiou)
a 0.2 0.2 0.4
i 0.2 0.2 (iou) 0.4
0.2
o 0.1 0.2
u 0.1 (ou)
compound symbols
Example-II (Con’t)
Step 2: Codeword assignment
symbol x p(x) codeword
e 0.4 0.4 0.4 0.6 0 1
(aiou) 01
a 0.2 0.2 0.4 1
i 0.2 0.2 (iou) 0.4 000
0.2
o 0.1 0010
0.2
u 0.1 (ou) 0011

compound symbols
Example-II (Con’t)

0 1
(aiou) e
00 01
(iou) a
000 001
i (ou)
0010 0011
o u
binary codeword tree representation
Example-II (Con’t)
symbol x p(x) codeword length
e 0.4 1 1
a 0.2 01 2
i 0.2 000 3
o 0.1 0010 4
5
u 0.1 0011 4
l   pi li  0.4 1  0.2  2  0.2  3  0.1 4  0.1 4  2.2bps
i 1 5
H ( X )   pi log2 pi  2.122bps r  l  H ( X )  0.078bps
i 1

If we use fixed-length codes, we have to spend three bits per

sample, which gives code redundancy of 3-2.122=0.878bps
Example-III

Step 1: Source reduction

compound symbol
Example-III (Con’t)
Step 2: Codeword assignment

compound symbol
Summary of Huffman Coding
Algorithm
• Achieve minimal redundancy subject to the
constraint that the source symbols be coded one at
a time
• Sorting symbols in descending probabilities is the
key in the step of source reduction
• The codeword assignment is not unique. Exchange
the labeling of “0” and “1” at any node of binary
codeword tree would produce another solution that
equally works well
• Only works for a source with finite number of
symbols (otherwise, it does not know where to start)
Variation: Golomb Codes
Optimal VLC for geometric source: P(X=k)=(1/2)k, k=1,2,…

k codeword
1 0 1 0
2 10
3 110 1 0
4 1110
5 11110 1 0
6 111110
1 0
7 1111110
8 11111110
… …… …
Data Compression Basics
• Discrete source
– Information=uncertainty
– Quantification of uncertainty
– Source entropy
• Variable length codes
– Motivation
– Prefix condition
– Huffman coding algorithm
• Lempel-Ziv coding*
History of Lempel-Ziv Coding
• Invented by Lempel-Ziv in 1977
• Numerous variations and improvements
since then
• Widely used in different applications
– Unix system: compress command
– Winzip software (LZW algorithm)
– TIF/TIFF image format
– Dial-up modem (to speed up the transmission)
Dictionary-based Coding
• Use a dictionary
– Think about the evolution of an English dictionary
• It is structured - if any random combination of
alphabets formed a word, the dictionary would not
exist
• It is dynamic - more and more words are put into the
dictionary as time moves on
– Data compression is similar in the sense that
redundancy reveals as patterns, just like English
words in a dictionary
Toy Example
I took a walk in town one day
And met a cat along the way. entry pattern
What do you think that cat did say?
Meow, Meow, Meow 1 I took a walk in town one day
2 And met a
I took a walk in town one day
And met a pig along the way. 3 along the way
What do you think that pig did say?
Oink, Oink, Oink 4 What do you think that
5 did say?
I took a walk in town one day
And met a cow along the way. 6 cat

What do you think that cow did say? 7 meow

Moo, Moo, Moo
… ……
- cited from “Wee Sing for Baby”
Basic Ideas
• Build up the dictionary on-the-fly (based
on causal past such that decoder can
duplicate the process)
• Achieve the goal of compression by
replacing a repeated string by a reference
to an earlier occurrence
• Unlike VLC (fixed-to-variable), LZ parsing
goes the other way (variable-to-fixed)
Lempel-Ziv Parsing
• Initialization:
– D={all single-length symbols}
– L is the smallest integer such that all codewords whose
length is smaller than L have appeared in D (L=1 at the
beginning)
• Iterations: wnext parsed block of L symbols
– Rule A: If wD, then represent w by its entry in D and
update D by adding a new entry with w concatenated with
its next input symbol
– Rule B: If wD, then represent the first L-1 symbols in w
by its entry in D and update D by adding a new entry with
w
Example of Parsing a Binary Stream
Dictionary
input: 0 1 1 0 0 1 1 1 0 1 1 …
entry pattern
w: 0, 11, 10, 00, 01, 110, 011, … 1 0
rule: A B B B A B A 2 1 L=1
output: 1, 2, 2, 1, 3, 4, 7, … (entries in D) 3 01
4 11 L=2
Illustration:
5 10
Step 1: w=0, Rule A, output 1, add 01 to D, L=L+1=2
Step 2: w=11, Rule B, output 2, add 11 to D 6 00
Step 3: w=10, Rule B, output 2, add 10 to D 7 011 L=3
Step 4: w=00, Rule B, output 1, add 00 to D 110
8
Step 5: w=01, Rule A, output 3, add 011 to D,L=L+1=3 fixed variable
Step 6: w=110, Rule B, output 4, add 110 to D -length -length
Compression
• Goal of data compression is to represent an
information source (e.g., a data file, an image, a
speech signal, or a video signal) as accurately
as possible using the fewest number of bits.
• Compression: process of coding that will
effectively reduce the total number of bits
needed to represent certain information.
• Compression is either lossless or lossy
? Why Compression

• Multimedia data are too big

– “A picture is worth a thousand words ! “
• Any media has “redundancy”. Compression
attempts to eliminate this redundancy.

File Sizes for a One-minute QCIF Video Clip

Bits / pixel
File Size Bit-rate Frame
Frame
(Bytes) (bps) Size
Rate
176 x 144 30
68,428,800 9,123,840 12
pixels frames/sec
Compression
• Objective:
– to reduce the data size
• Approach:
– reduce redundancy
• Uncompressed multimedia objects are large. Objects
kept in compressed form
– Save storage space
– Save retrieval bandwidth
– Decompressed and retrieved in parallel
– Save processing time
• Example: To transmit a gigitized color 35 mm slide scanned at
3,000 x 2,000 pixels, and 24 bits, at 28.8 kbaud would take about
( 3000 x 2000 pixels)( 24bits / pixel )
 4883sec ond  81min utes  1.35hours
( 28.8 x1024bits / sec ond )
Lossless vs Lossy Compression
• If the compression and decompression
processes induce no information loss,
then the compression scheme is
lossless; otherwise, it is lossy.
• compression ratio = B0/B1
– B0 = number of bits before compression
– B1 = number of bits after compression
Compression Techniques
• Lossless compression
– Used on programs, DB records, critical
information
• Lossy compression
– Used on images, audio, video, non-critical
data
• Hybrid compression
– JPEG, JPEG-LS, JPEG 2000, MPEG-1,
MPEG-2
Image Compression
• Objective:
– to reduce the data size
• Approach:
– to reduce redundancy
• Compression Techniques
– Lossless compression
• Used on programs, DB records, critical information
– Lossy compression
• Used on images, audio, video, non-critical data
– Hybrid compression
• JPEG, JPEG-LS, JPEG 2000, MPEG-1, MPEG-2
Lossless Compression
• Encode into a form to represent the
original in fewer bits
• The original representation can be
perfectly recovered
• Compression ratios:
– Text 2:1
– Bilevel images 15:1
– Facsimile transmission 50:1
Lossless Compression
• Encode into a form to represent the original in fewer bits
• The original representation can be perfectly recovered
• Compression ratios:
– Text 2:1
– Bilevel images 15:1
– Facsimile transmission 50:1

Lossless
(Noiseless)

Huffman Arithmetic
Lempel Ziv Run length
coding decomposition
Lossy Compression
• Can be decoded into a representation that humans find similar to
the original
• Compression ratios:
– JPEG image 15:1
– MPEG video 200:1 Lossy

Frequency Importance
Predictive Hybrid
oriented oriented

Motion
Transform Filtering Subsamping JPEG MPEG
compensation

Bit JPEG
Subband Quantization
allocation 2000
? Why is Compression possible

• Information Redundancy
• Attempt to eliminate redundancy
Uncompressed text Run-Length Compressed text
ABCCCCCCCCDEFGGG Encoder AB8CDEF3G

• In digital image, neighboring pixels on scanning line are

normally similar (spatial redundancy)
• Adjacent audio samples are similar (predictive encoding);
samples corresponding to silence (silence removal)
• In digital video, in addition to spatial redundancy,
neighboring images in video sequence may be similar
(temporal redundancy)
• Question: How is “information” measured ?
• Information is related to probability. Information is a
measure of uncertainty (or “surprise”)
Self-information
• Shannon’s Definition [1948]:
– Self-information of an event:
1
i ( A)  log b   log b P ( A)
P ( A)
If b = 2, unit of information is bits
 log b P ( A)

P(A)

0 1
Entropy
• Suppose:
– a data source generates output sequence from a set
{A1, A2, …, AN}
– P(Ai): Probability of Ai
• First-Order Entropy:
– the average self-information of the data set

H    P ( Ai ) log 2 P ( Ai )
i

• indicates the amount of information contained in

symbol Ai, i.e., the number of bits needed to code
symbol Ai.
Information Theory
• First-order entropy represents the minimal number of bits
needed to losslessly represent one output of the source.
• Entropy is a measure of how much information is encoded
in a message. Higher the entropy, higher the information
content.
– We could also say entropy is a measure of uncertainty in a
message. Information and Uncertainty are equivalent concepts.
• The units (in coding theory) of entropy are bits per symbol.
– It is determined by the base of the logarithm:
2: binary (bit);
10: decimal (digit).
• Entropy gives the actual number of bits of information
contained in a message source.
Fundamental Ideas
• Run-length encoding
• Average Information Entropy
• For source S generating symbols S1through SN
1
– Self entropy: I(si) = log   log pi
pi
– Entropy of source S: H ( S )   p log
i
i 2 pi

– Average number of bits to encode  H(S) - Shannon

n
1 n
  H (S )   pi log 2   pi log 2 pi
i 1 pi i 1
• Example: If the probability of character ‘e’ appearing in this slide is
1/16, then information content of this character is 4 bits. So, the
character string “eeeee” has a total content of 20 bits (contrast this to
using an 8-bit ASCII coding that could result in 40 bits to represent
“eeeee”).
Example
• Consider throwing a dice, the whole
alphabet is {1,2,3,4,5,6}
• The predicted probability of any number
= 1/6.
I(s)= -log2(1/6) = 2.585
• As the predicted probability of all
symbols are the same, we have
H=6*[-(1/6)*(-2.585)]=2.585
Example
• An image with uniform distribution of gray-level
intensity, i.e. pi = 1/256, with the number of bits
needed to code each gray level being 8 bits.
The entropy of the image is 8.
• What is the entropy of a source with M symbols
where each symbol is equally likely?
– Entropy, H(S) = log2 M

• How about an image in which half of the pixels

are white and half are black?
– Entropy, H(S) = 1
Example 3

(a) histogram of an image with uniform distribution of gray-level

intensities, i.e., pi = 1/256. Entropy = log2256=8
(b) histogram of an image with two possible values. Entropy=0.92.
Run Length Coding (RLC)
• Is al lossless compression method that works by
counting the number of adjacent pixels (in case of image
for ex.) with the same gray level value. This count
called, is then coded and stored
• Text : aaaabbc
Compressed text
Uncompressed text
Run-Length 4a2b1c
aaaabbc
7x8=56 Encoder 1+2+3+3x8=30
ASCII Code ASCII Code
The compression ratio is
56/30 = 1.86:1

• A run is a sequence of a certain length containing only

one symbol
• Length of the sequence is called run count and symbol
run value …
Run Length Coding (RLC)
• Given the following 8x8, 4-bit image: Apply horizontal RLC to encode this
image. Calculate compression ratio after applying the RLC encoding.
• Solution:
10 7 7 10 10 8 8 8 
• First row: 10,1,7,2,10,2,8,3 10 10 6 10 10 12 12 12
• Second row: 10,2,6,1,10,2,12,3  
10 6 10 10 10 12 12 12
• Third row: 10,1,6,1,10,3,12,3  
 0 0 0 10 10 10 0 12 
• Fourth row: 0,3,10,3,0,2 5 5 5 0 0 0 0 0
• Fifth row: 5,3,0,5  
 5 5 5 10 10 9 9 10 
• Sixth row: 5,3,10,2,9,2,10,1 5 5 5 4 4 4 0 0
 
• Seventh row: 5,3,4,3,0,2  0 0 0 0 0 0 0 0 
• Eighth row: 0,8

• These numbers are then stored in the RLC compressed file as:
• 10,1,7,2,10,2,8,3,10,2,6,1,10,2,12,3,10,1,6,1,10,3,12,3,0,3,10,3,0,2,5,3,0,5,
3,10,2,9,2,10,1, 5,3,4,3,0,2, 0,8
• Compression ratio= 64x4 bits: 49x4 bits
Variable Length Codes (VLC)
 Since the entropy indicates the information content in an information source
S, it leads to a family of coding methods commonly known as entropy coding
methods. VLC is one of the best known such methods

Variable Length Codes (VLC)

Shannon-Fano Algorithm Huffman Coding Adaptive Huffman Coding

Shannon-Fano Algorithm
• a top-down approach
1. Sort the symbols according to the frequency
count of their occurrences.
2. Recursively divide the symbols into two
parts, each with approximately the same
number of counts, until all parts contain only
one symbol.
”Examples – Coding of “HELO
”Examples – Coding of “HELO
Note: The outcome of Shannon – fano is not necessarily unique
Huffman Coding Algorithm
• Huffman Coding Algorithm - a bottom-up approach:
• Construct the Huffman tree from the bottom up
– Starts at the two nodes with the smallest probability
– Creating a new node as parents of two nodes with the
probability as the sum of its two children’s probabilities
– Repeat the process choosing 2 nodes with the smallest
probabilities, ignoring those that are already children nodes.
– Stop when there is only one parent node
• Each branch is then labelled either 0 or 1.
• The codeword is then the codes reading from the root
to the symbol
Example – Huffman Coding of
”“HELO
Decoding for the Huffman Coding
• Decoding for the Huffman coding is trivial
as long as the statistics and/or coding
tree are sent before the data to be
compressed (in the header file, say).
This overhead becomes negligible if the
data file is sufficiently large.
Properties of Huffman Coding
• Unique Prefix Property: No Huffman code is a prefix of
any other Huffman code – precludes any ambiguity in
decoding.
• Optimality: minimum redundancy code - proved optimal
for a given data model (i.e., a given, accurate,
probability distribution):
– The two least frequent symbols will have the same length for
their Huffman codes, differing only at the last bit.
– Symbols that occur more frequently will have shorter Huffman
codes than symbols that occur less frequently.
– The average code length for an information source S is strictly
less than η + 1 ( η is the entropy). Thus

  l   1
Adaptive Huffman Coding
• Huffman coding requires prior statistical
knowledge about the information source and
such information is not available. E.g. live
streaming.
• An adaptive Huffman coding algorithm can be
used, in which statistics are gathered and
updated dynamically as the data-stream
arrives.
– The probabilities are no longer based on prior
knowledge but on the actual data received so far.
Huffman Coding
• Idea: a code outputs Codeword Probability Symbol
short codewords for
0000 0.05 a
likely symbols and
long codewords for 0001 0.05 b
rare symbols 001 0.1 c
• Objective: to reduce 01 0.2 d
the average length of 10 0.3 e
codewords 110 0.2 f
111 0.1 g
Huffman Tree