0% found this document useful (0 votes)
92 views

Lecture

Here are the Huffman codes for the letters in the given frequencies: e - 0 t - 10 a - 110 o - 111 i - 1000 n - 1001 s - 1010 r - 1011 h - 1100 l - 1101 d - 1110 c - 1111 u - 10000 m - 10001 f - 10010 p - 10011 g - 10100 w - 10101 y - 10110 b - 10111 v - 110000 k - 110001 j - 110010 x - 110011 q - 110100 z - 110101 The total expected code length is 4.

Uploaded by

Weng Yan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views

Lecture

Here are the Huffman codes for the letters in the given frequencies: e - 0 t - 10 a - 110 o - 111 i - 1000 n - 1001 s - 1010 r - 1011 h - 1100 l - 1101 d - 1110 c - 1111 u - 10000 m - 10001 f - 10010 p - 10011 g - 10100 w - 10101 y - 10110 b - 10111 v - 110000 k - 110001 j - 110010 x - 110011 q - 110100 z - 110101 The total expected code length is 4.

Uploaded by

Weng Yan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 75

Lecture 3

Mei-Chen Yeh
2010/03/16
Announcements (1)
• Assignment formats:
– A word or a pdf file
– Subject (主旨):
Multimedia System Design-Assignment #X-your
student id-your name
Multimedia System Design-Assgnment #2-697470731-游宗毅
– File name (檔名)
Assignment #X-your student id-your name
Assignment #2-697470731-游宗毅.doc
Announcements (2)
• For the assignment#2…
• If you did not use either a pdf or a doc file, please re-send
your report to TA using the format.
• Due 03/16 (today)
…and based on TA’s clock
Announcements (3)
• The reading list is finally released!
• Sources:
– Proceedings of ACM MM 2008
– Proceedings of ACM MM 2009
– The best paper in MM 2006 and MM 2007
• Interesting papers not on the list? Let me
know!
Announcements (4)
• So you need to…
– Browse the papers
– Discuss with your partners about the paper choice
– …and do it as soon as possible! I love that paper!

That paper sucks…


How to access the papers?
• The ACM digital library
– http://portal.acm.org/
– Should be able to download the papers if you connect to the site on
campus

• MM08 paper on the web


– http://mpac.ee.ntu.edu.tw/multimediaconf/acmmm2008.html

• Google search
Next week in class
• Bidding papers! (論文競標)
• Each team will get a ticket, where you put
your points.

Ticket # 7
Team name: 夜遊隊
Team members: 葉梅珍 游宗毅
-------------------------------------------------------------------
paper1 paper2 paper3 paper4 … paper 25
50 10 15 20 5
Total : 100 points
Bidding rules
• The team with the most points gets the paper.
• Every team gets one paper.
• When a tie happens…

…and the winner takes the paper.


More about the bidding process
• Just, fair, and open!
公平 公正 公開

• I will assign a paper to teams in which no one


show up for the bid.

Questions
Multimedia Compression (1)
Outline
• Introduction
• Information theory
• Entropy (熵) coding
– Huffman coding
– Arithmetic coding
Why data compression?
• Transmission and storage

Approximate Bit Rates for Uncompressed Sources

– For uncompressed video


• CD-ROM (650MB) could store 650MB x 8 / 221Mbps ≈ 23.5
seconds
• DVD-5 (4.7GB) could store about 3 minutes
What is data compression?
• To represent information in a compact
form (as few bits as possible)
• Technique
Compressed data

Compression Reconstruction

Original Reconstructed data


Codec = encoder + decoder
Technique (cont.)
• Lossless
– The reconstruction is identical to the original.
Do not send money!
Do now send money!

• Lossy
– Involves loss of information
Example: image codec Lossy!

Encoder Decoder
Lossless?
source Not necessarily true!
Performance Measures
How do we say a method is good or bad?

• The amount of compression


• How closely the reconstruction is
• How fast the algorithm performs
• The memory required to implement the
algorithm
• The complexity of the algorithm
• …
Two phases: modeling and coding
Original Compressed data

Encoder
Fewer bits!

• Modeling
– Discover the structure in the data
– Extract information about any redundancy
• Coding
– Describe the model and the residual (how the
data differ from the model)
Example (1)

• 5 bits * 12 samples = 60 bits


• Representation using fewer bits?
Example: Modeling

n
xˆn  n  8 n  1,2,...
Example: Coding
Original data xn

Model xˆn  n  8 10 11 12 13 14 15 16 17 18 19 20

Residual en  xn  xˆn 0 1 0 -1 1 -1 0 1 -1 -1 1 1

• {-1, 0, 1}
• 2 bits * 12 samples = 24 bits (compared with
60 bits before compression)
We use the model to predict the value, then encode the residual!
Another Example
• Morse Code (1838)

Shorter codes are assigned to letters that


occur more frequently!
A Brief Introduction to Information
Theory
Information Theory (1)
• A quantitative (量化的) measure of information
– You will win the lottery tomorrow.
– The sun will rise in the east tomorrow.
• Self-information [Shannon 1948]
P(A): the probability that the event A will happen
1
i( A)  log b   log b P( A)
P( A)
b determines the unit of information

The amount of surprise or uncertainty in the message


Information Theory (2)
• Example: flipping a coin
– If the coin is fair
P(H) = P(T) = ½
i(H) = i(T) = -log2(½) = 1 bit

– If the coin is not fair


P(H) = 1/8, P(T)=7/8
i(H) = 3 bits, i(T) = 0.193 bits
The occurrence of a HEAD conveys more information!
Information Theory (3)
• For a set of independent events Ai A  S i

– Entropy (the average self-information)


H (S )   P( Ai )i( Ai )   P( Ai ) log b P( Ai )

– The coin example


• Fair coin (1/2, 1/2): H=P(H)i(H) + P(T)i(T) = 1
• Unfair coin (1/8, 7/8): H=0.544
Information Theory (4)
• Entropy
– The best a lossless compression scheme can do
– Not possible to know for a physical source
– Estimate (guess)!
• Depends on our assumptions about the structure of
the data
Estimation of Entropy (1)
• 1 2 3 2 3 4 5 4 5 6 7 8 9 8 9 10
– Assume the sequence is i.i.d.
• P(1)=P(6)=P(7)=P(10)=1/16
P(2) =P(3)=P(4)=P(5)=P(8)=P(9)=2/16
• H = 3.25 bits
– Assume sample-to-sample correlation exists
• Model: xn = xn-1 + rn
• Residuals: 1 1 1 -1 1 1 1 -1 1 1 1 1 1 -1 1 1
• P(1)=13/16, P(-1)=3/16
• H = 0.7 bits
Estimation of the Entropy (2)
• 12123333123333123312
– One symbol at a time
• P(1) = P(2) = ¼, P(3) = ½
• H = 1.5 bits/symbol
• 30 (1.5*20) bits are required in total
– In blocks of two
• P(1 2) = ½, P(3 3)=½
• H = 1 bit/block
• 10 (1*10) bits are required in total
Coding
Coding (1)
• The assignment of binary sequences to elements of an
alphabet

letter
_ codeword

alphabet code

• Rate of the code: average number of bits per symbol


• Fixed-length code and variable-length code
Decodable
Ambiguous Not uniquely With one-
decodable symbol delay

Prefix code
instantaneous
Coding (3)
• Example of not uniquely decodable code

Letters Code
a1 0
a2 1 a2 a3 => 100
a3 00 a2 a1 a1 => 100
a4 11

back
Coding (4)
• Not instantaneous, but uniquely decodable
code
a2

Oops!

a2 a3 a3 a3 a3 a3 a3 a3 a3
Prefix Codes
• No codeword is a prefix to another codeword
• Uniquely decodable
Huffman Coding

• Basic algorithm
• Extended form
• Adaptive coding
Huffman Coding
• Observations of prefix codes

– Frequent symbols have shorter codewords


– The two symbols that occur least frequently have
the same length
Huffman procedure:
Two least frequent symbols differ only in the last bit. Ex: m0, m1
Algorithm
• A = {a1, a2, a3, a4, a5}
• P(a1) = 0.2, P(a2) = 0.4, P(a3) = 0.2, P(a4) = 0.1, P(a5) = 0.1
0 (1)
a’1(0.6)
0
a’3(0.4)
1 1
1
0 a’4(0.2)
0 1
a3(0.2) a4(0.1) a5(0.1) a1 (0.2) a2 (0.4)
000 0010 0011 01 1
Algorithm
• Entropy: 2.122 bits/symbol
• Average length: 2.2 bits/symbol

0 (1) (1)
0
a’1(0.6) 1
0
a’2(0.6)
a’3(0.4)
1 1
1 1
0 a’4(0.2) 0 a’4(0.2) a’1(0.4)
0 1 0 1 0 1

a3(0.2) a4(0.1) a5(0.1) a1 (0.2) a2 (0.4) a2 (0.4) a4(0.1) a5(0.1) a1(0.2) a3(0.2)
000 0010 0011 01 1 00 010 011 10 11

Which code is preferred? The one with minimum variance!


Exercise
e h l o p t w
30.5% 13.4% 9.8% 16.1% 5% 22.8% 2.4%

e (30.5) h (13.4) o (16.1) p (5.0) w (2.4) l (9.8) t (22.8)


Length of Huffman Codes
• A source S with A = {a1,…,ak} and {P(a1),…,P(ak)}
K
– Average codeword length: l   P(ai )li
i 1

• Lower and upper bounds


H (S )  l  H (S )  1

entropy of the source


average code length
Extended Huffman Codes (1)
• Consider small alphabet and skewed probabilities
– Example: symbol Prob. codeword
a 0.9 0
1 bit / letter
b 0.1 1
No compression!

• Block multiple symbols together


symbol Prob. codeword
aa 0.81 0
ab 0.09 10
ba 0.09 110
bb 0.01 111 0.645 bit / letter
Extended Huffman Codes (2)
Another example:

H = 0.816 bits/symbol
= 1.2 bits/symbol
1
H (S )  l  H (S ) 
blk size
H = 1.6315 bits/block = 0.816 bits/symbol
= 1.7228 / 2 = 0.8614 bits/symbol
Adaptive Huffman Coding (1)
• No initial knowledge of source distribution
• One-pass procedure
• Based on statistics of encountered symbols

Maintain a Huffman Tree!


Adaptive Huffman Coding (2)
• Huffman tree
• Node (id, weight)
weight: # of
• Sibling property occurrence
– w(parent) = sum of w(children)
– ids are ordered with non-
decreasing weights

Id: 1 2 3 4 5 6 7 8 9 10 11
w: 2 3 5 5 5 6 10 11 11 21 32
Non-decreasing!
Adaptive Huffman Coding (3)
• NYT (not yet transmitted) code
– w(NYT) = 0
– Transmitted when seeing
a new letter
– Smallest id in the tree
• Uncompressed code
(ex: m letters)
Adaptive Huffman Coding: Encode (1)
Input: [a a r d v a r k] (Alphabet: 26 lowercase letters)
Initial tree

a a NYT r
00000 1 0 10001

Output: 00000 1 010001


Adaptive Huffman Coding: Encode (2)

Input: [a a r d v a r k] (Alphabet: 26 lowercase letters)


NYT d NYT v
00 00011 000 1011
5

Output: 0000010100010000011 001011


Adaptive Huffman Coding: Encode (3)
Tree update
5 5

3 3 2

Swap 47 and 48
Adaptive Huffman Coding: Encode (4)
Tree update
5

3 2
49 50

48
47

45 46
Swap 49 and 50
43 44
Adaptive Huffman Coding: Decode (1)
Input: 0000010100010000011001011…
Initial tree

0000 00000 1
Not in the uncompressed code
Get one more bit
a a

Output: a a
Adaptive Huffman Coding: Decode (2)
Input: 0000010100010000011001011…

0 1000 10001 00
NYT Not in the uncompressed code r NYT
Get one more bit

Output: aa r ……
Arithmetic Coding
• Basic algorithm
• Implementation
Cases where Huffman Coding doesn’t work well

Letter Probability Codeword


Small alphabet
Skewed probability a1 0.95 0
a2 0.02 11
a3 0.03 10
H = -0.95*log(0.95)-0.02*log(0.02)-0.03*log(0.03)=0.335 bits/symbol
Average length = 0.95*1+0.02*2+0.03*2 = 1.05 bits/symbol

Average length = 1.222 bits/block = 0.611 bits / symbol


Huffman codes for large blocks
• # {codeword} grows exponential with
block size
– N symbols, group m symbols for a block =>
Nm codewords
• Generate codes for all sequences given a
length m
• Not efficient!
Arithmetic Coding: Generate a tag
• View the entire sequence as a big block
– Step 1: Map the sequence into a unique tag
Ex: A = {a1, a2, a3}, P(a1) = 0.7, P(a2) = 0.1, P(a3) = 0.2
Encode a1, a2, a3, …
0.0 0.0 0.49 0.546

a1 a1 a1 a1

0.7 a2 0.49 a2 0.539 a2 0.5558 a2


0.8 0.56 0.546 0.5572
a3 a3 a30.553 a3
1.0 0.7 0.56 0.56

– Step 2: Generate a binary code for the tag


Arithmetic Coding: Interpret the tag
Ex: 0.553 Update the number: Update the number
(0.553-0.0)/(0.7-0.0) (0.553-0.49)*(0.56-0.49)
= 0.79 = 0.9
0.0 0.0 0.0

a1 a1 a1

0.7 a2 0.7 a2 0.7 a2


0.8 0.8 0.8
a3 a3 a3
1.0 1.0 1.0

a1 a2 a3
l(1) = 0.0 l(2) = 0.0+(0.7-0.0)*0.7=0.49 l(3) = 0.7+(0.8-0.7)*0.8
u(1) = 0.7 u(2) = 0.0+(0.7-0.0)*0.8=0.56 u(3) = 0.7 +(0.8-0.7)*1.0
One more example
• A = {a1, a2, a3}, P(a1) = 0.8, P(a2) = 0.02, P(a3) = 0.18
Encode a1, a3, a2, a1

0.0 0.0 0.656 0.7712

a1 a1 a1 a1 (0.7712+0.773504)/2
= 0.772352

0.8 a2 0.64 a2 0.7712 a 0.773504 a2


2
0.82 0.656 0.77408 0.77356
a3 a3 a3 a3
1.0 0.8 0.8 0.77408
Ex: 0.772352
(0.772352-0.7712)*(0.77408-0.0.7712)
= 0.4

(0.772352-0.0)/(0.8-0.0) (0.772352-0.656)*(0.8-0.656)
= 0.96544 = 0.808

0.0 0.0 0.0 0.0

a1 a1 a1 a1

0.8 a2 0.8 a2 0.8 a2 0.8 a2


0.82 0.82 0.82 0.82
a3 a3 a3 a3
1.0 1.0 1.0 1.0

l(1) = 0.0 l(2) = 0.0+(0.8-0.0)*0.82=0.656 l(3) = 0.656+(0.8-0.656)*0.8=0.7712


u(1) = 0.8 u(2) = 0.0+(0.8-0.0)*1.0=0.8 u(3) = 0.656+(0.8-0.656)*0.82=0.77408
Generating a binary code
• Use the binary representation of the tag
– Truncate to bits
– probability↗, interval↗, required bits↘
0.0
Ex: In binary
Symbol Prob. Cdf Tag Code
a1 0.5 0.5 0.25 .0100 2 01 a1
a2 0.25 0.75 0.625 .1010 3 101 0.5
a2
a3 0.125 0.875 0.8125 .1101 4 1101
0.75
a4 0.125 1.0 0.9375 .1111 4 1111 a3
0.875 a4
An extreme case where the sequence has only one letter
1.0
2
– Bounds: H (S )  l  H (S ) 
sequence length
Adaptive Arithmetic Coding
• A = {a1 , a2 , a3}
• Input sequence: a2 a3 a3 a2

0.0 0.33 0.58 0.63 a 0.64


1/4 a1 1/5 a1 1/6 1 1/7 a1
1/3 a1 0.60 0.64
0.42
0.33 2/6 a2 3/7 a2
2/5 a2
1/3 a2 2/4 a2 0.65
0.63
0.67 a3
0.58 3/6 a3
1/3 a3 a3 2/5 a3 3/7
1/4
1.0 0.67 0.67 0.67 0.65
Implementation
• Two problems
– Finite precision Synchronized rescaling!
– Transmit the first bit after seeing the entire
sequence Incremental encoding!
Implementation: Encode (1)
• Incremental coding
– Send the MSB when l(k) and u(k) have a common
prefix
• Rescaling
– Map the half interval containing the code to [0, 1)

E1: [0, 0.5) E2: [0.5, 1)


E1(x) = 2x E2(x) = 2(x-0.5)
Send 0 Send 1
Left shift 1 bit Left shift 1 bit
Implementation: Encode (2)
Example: a1, a3, a2, a1 (tag: 0.7734375)
0.0

a1 a1 a3 send 1 send 1
a2
0.8 a2
0.82
1.0
a3 send 0

E1: [0, 0.5) -> [0, 1); E1(x) = 2x


E2: [0.5, 1) -> [0, 1); E2(x) = 2(x-0.5) send 0
Implementation: Encode (3)

send 0 Use 0.5


send 10…0

a1
0.7734375
send 1 =(.1100011)2

How to stop?
1.Send the stream size
2.Use EOF (00…0)
Implementation: Decode (1)
Input: 11000110…0
Find the smallest interval (0.82-0.8=0.02) => 6 bits
(2-6 < 0.02) 11000110…0

11000110…0
11000110…0
.110001=0.765625 11000110…0
decode a1
11000110…0

11000110…0
.100011=0.546875
update code: update code:
(0.765625-0)/(0.8-0) (0.546875-0.312)/0.6-0.312
11000110…0
=0.957 =0.8155
decode a3 decode a2
Implementation: Decode (2)
11000110…0

.10=0.5
update code:
(0.5-0.3568)/(0.54112-0.3568)
=0.7769
decode a1
Enhancement (optional)
• One more mapping:
– E3: [0.25, 0.75) -> [0, 1); E3(x) = 2(x-0.25)

How do we transfer information


about an E3 mapping to the
decoder?
E3 mapping
• E3E1
0.0 0.25

0.25 0.375

0.5 0.5 [¼, ½): send 0 1


0.75 0.625

1.0 0.75
• E3E2 0.0 0.25

0.25 0.375

0.5 0.5 [½, ¾): send 1 0


0.75 0.625

1.0 0.75
E3 mapping
• E3…E3E1 [¼, ½): 01 [¼+⅛, ½): 011 [¼+⅛+…, ½): 011…1
0.0 0.25 0.375 m
m
0.25 0.375 0.4375

0.5 0.5 0.5


… 0.5

0.75 0.625 0.5625

1.0 0.75 0.625

Send 0 1 1 … 1
• E3…E3E2 Send 1 0 0 … 0
m
E3 mapping: Encode
a2

Example: a1, a3, a2, a1 (tag: 0.7734375)


0.0 send 1 0 m = 0

a1 a1 a3 send 1

0.8 a2
send 0
0.82
a3
1.0

send 0
E1: [0, 0.5) -> [0, 1); E1(x) = 2x m=1
E2: [0.5, 1) -> [0, 1); E2(x) = 2(x-0.5)
E3: [0.25, 0.75) -> [0, 1); E3(x) = 2(x-0.25)
E3 mapping: Encode
a1

send 1
Use 0.5
send 10…0

Output:
m=1
11000110…0
E3 mapping: Decode
Input: 11000110…0
Find the smallest interval (0.82-0.8=0.02) => 6 bits
(2-6 < 0.02)
11000110…0 m = 0

11000110…0 11000110…0
11000110…0
.110001=0.765625
decode a1
11000110…0

m=1
11000110…0
update code:
.100011=0.546875
(0.765625-0)/(0.8-0)
2*(0.546875-0.25)=0.5938 m=1
=0.957
update code:
decode a3
(0.5938-0.124)/(0.7-0.124)
=0.8155 decode a2
E3 mapping: Decode
m=1

. 10…0 = 0.5
2*(0.5-0.25)=0.5
update code:
(0.5-0.2136)/(0.58224-0.2136)
=0.7769 decode a1

Output:
a1 a3 a2 a1
Next week
• In-class paper bidding
• Decide how you distribute your points with
your partners before coming to the class!

You might also like