ECE 5578 Multimedia Communication
Lec 05
Arithmetic Coding
Zhu Li
Dept of CSEE, UMKC
web: http://l.web.umkc.edu/lizhu
phone: x2346
Z. Li, Multimedia Communciation, 2018 p.1
Outline
Lecture 04 ReCap
Arithmetic Coding
About Homework-1 and Lab
Z. Li, Multimedia Communciation, 2018 p.2
JPEG Coding
Block (8x8 pel) based coding
DCT transform to find sparse *
representation
Quantization reflects human visual
system
Zig-Zag scan to convert 2D to 1D
string
=
Run-Level pairs to have even more
compact representation
Hoffman Coding on Level Category Quant Table:
Fixed on the Level with in the
category
Z. Li, Multimedia Communciation, 2018 p.3
Coding of AC Coefficients
Zigzag scanning:
Example
8 24 -2 0 0 0 0 0
-31 -4 6 -1 0 0 0 0
0 -12 -1 2 0 0 0 0
0 0 -2 -1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Example: zigzag scanning result
24 -31 0 -4 -2 0 6 -12 0 0 0 -1 -1 0 0 0 2 -2 0 0 0 0 0 -1 EOB
(Run, level) representation:
(0, 24), (0, -31), (1, -4), (0, -2), (1, 6), (0, -12), (3, -1), (0, -1),
(3, 2), (0, -2), (5, -1), EOB
Z. Li, Multimedia Communciation, 2018 p.4
Coding of AC Coefficients
Run / Base Run / Base … Run / Base codeword
Catg. codeword Catg. Codeword Cat.
EOB 1010 - - … ZRL 1111 1111 001
0/1 00 1/1 1100 … 15/1 1111 1111 1111 0101
0/2 01 1/2 11011 … 15/2 1111 1111 1111 0110
0/3 100 1/3 1111001 … 15/3 1111 1111 1111 0111
0/4 1011 1/4 111110110 … 15/4 1111 1111 1111 1000
0/5 11010 1/5 11111110110 … 15/5 1111 1111 1111 1001
… … … … … … …
ZRL: represent 16 zeros when number of zeros exceeds 15.
Example: 20 zeros followed by -1: (ZRL), (4, -1).
(Run, Level) sequence: (0, 24), (0, -31), (1, -4), ……
Run/Cat. Sequence: 0/5, 0/5, 1/3, …
24 is the 24-th entry in Category 5 (0, 24): 11010 11000
-4 is the 3-th entry in Category 3 (1, -4): 1111001 011
Z. Li, Multimedia Communciation, 2018 p.5
Outline
Lecture 04 ReCap
Arithmetic Coding
Basic Encoding and Decoding
Uniqueness and Efficiency
Scaling and Incremental Coding
Integer Implementation
About Homework-1 and Lab
Z. Li, Multimedia Communciation, 2018 p.6
Arithmetic Coding – The SciFi Story
When I was in my 5th grade….
Aliens visit earth….
Z. Li, Multimedia Communciation, 2018 p.7
Huffman Coding
Replacing an input symbol with a codeword
Need a probability distribution
Hard to adapt to changing statistics
Need to store the codeword table
Minimum codeword length is 1 bit
Arithmetic Coding
Replace the entire input with a single floating-point
number
Does not need the probability distribution
Adaptive coding is very easy
No need to keep and send codeword table
Fractional codeword length
Z. Li, Multimedia Communciation, 2018 p.8
Introduction
Recall table look-up decoding of Huffman code
N: alphabet size
L: Max code word length 1
Divide [0, 2^L] into N intervals 00
One interval for one symbol
Interval size is roughly 010 011
proportional to symbol prob.
000 010 011 100
Arithmetic coding applies this idea recursively
Normalizes the range [0, 2L] to [0, 1].
Map an input sequence (multiple symbols) to a unique tag in
[0, 1).
abcd…..
dcba….. 0 1
Z. Li, Multimedia Communciation, 2018 p.9
Arithmetic Coding
Symbol set and prob: a (0.8), b(0.02), c(0.18)
Disjoint and complete partition of the range [0, 1)
0 1
[0, 0.8), [0.8, 0.82), [0.82, 1)
a b c
Each interval corresponds to one symbol
Interval size is proportional to symbol probability
The first symbol restricts the tag
0 1
position to be in one of the intervals
The reduced interval is partitioned
0 1
recursively as more symbols are
processed.
0 1
Observation: once the tag falls into an interval, it
never gets out of it
Z. Li, Multimedia Communciation, 2018 p.10
Some Questions to think about:
Why compression is achieved this way?
How to implement it efficiently?
How to decode the sequence?
Why is it better than Huffman code?
Z. Li, Multimedia Communciation, 2018 p.11
Example:
Symbol Prob. 1 2 3
1 0.8
2 0.02 0 0.8 0.82 1.0
3 0.18 Map to real line range [0, 1)
Order does not matter
Decoder needs to use the same order
• Disjoint but complete partition:
• 1: [0, 0.8): 0, 0.799999…9
• 2: [0.8, 0.82): 0.8, 0.819999…9
• 3: [0.82, 1): 0.82, 0.999999…9
Z. Li, Multimedia Communciation, 2018 p.12
Concept of the encoding (not practical)
Input sequence: “1321”
1 2 3
Range 1
0 0.8 0.82 1.0
1 2 3
Range 0.8
0 0.64 0.656 0.8
1 2 3
Range 0.144
0.656 0.7712 0.77408 0.8
1 2 3
Range 0.00288
0.7712 0.773504 0.7735616 0.77408
Termination: Encode the lower end or midpoint to signal the end.
Difficulties: 1. Shrinking of interval requires very high precision for long sequence.
2. No output is generated until the entire sequence has been processed.
Z. Li, Multimedia Communciation, 2018 p.13
Encoder Pseudo Code
Cumulative Density Function (CDF) Probability Mass Function
For continuous distribution: 0.4
x
0.2 0.2 0.2
FX ( x) = P( X £ x) = ò p( x)dx
-¥
For discrete distribution: 1 2 3 4 X
i
FX (i ) = P ( X £ i ) = å P( X = k )
k = -¥ CDF 0.8
1.0
0.4
P( X = i ) = FX (i ) - FX (i - 1). 0.2
Properties:
X
Non-decreasing 1 2 3 4
Piece-wise constant
Each segment is closed at the lower end.
Z. Li, Multimedia Communciation, 2018 p.14
Encoder Pseudo Code
Keep track of LOW=0.0, HIGH=1.0;
while (not EOF) {
LOW, HIGH, RANGE n = ReadSymbol();
Any two are RANGE = HIGH - LOW;
sufficient, e.g., HIGH = LOW + RANGE * CDF(n);
LOW = LOW + RANGE * CDF(n-1);
LOW and RANGE. }
output LOW;
Input HIGH LOW RANGE
Initial 1.0 0.0 1.0
1 0.0+1.0*0.8=0.8 0.0+1.0*0 = 0.0 0.8
3 0.0 + 0.8*1=0.8 0.0 + 0.8*0.82=0.656 0.144
2 0.656+0.144*0.82=0.77408 0.656+0.144*0.8=0.7712 0.00288
1 0.7712+0.00288*0=0.7712 0.7712+0.00288*0.8=0.773504 0.002304
Z. Li, Multimedia Communciation, 2018 p.15
Concept of the Decoding
Arithmetic Decoding (conceptual)
Suppose encoder encodes the lower end: 0.7712 Zoom
1 2 3 In by * CDF
Decode 1
0 0.8 0.82 1.0
1 2 3
Decode 3 * CDF
0 0.64 0.656 0.8
1 2 3
Decode 2 * CDF
0.656 0.7712 0.77408 0.8
1 2 3
Decode 1
0.7712 0.773504 0.7735616 0.77408
Drawback: need to recalculate all thresholds each time.
Z. Li, Multimedia Communciation, 2018 p.16
Simplified Decoding (Floating Pt Ops)
x - low
Normalize RANGE to [0, 1) each time x¬
range
No need to recalculate the thresholds.
Receive 0.7712 1 2 3
Decode 1
x =(0.7712-0) / 0.8 0 0.8 0.82 1.0
= 0.964
1 2 3
Decode 3
0 0.8 0.82 1.0
x =(0.964-0.82) / 0.18
= 0.8 1 2 3
Decode 2
x =(0.8-0.8) / 0.02 0 0.8 0.82 1.0
=0
Decode 1.
1 2 3
Stop.
0 0.8 0.82 1.0
Z. Li, Multimedia Communciation, 2018 p.17
Decoder Pseudo Code
Low = 0; high = 1;
x = GetEncodedNumber();
While (x ≠ low) {
n = DecodeOneSymbol(x);
output symbol n;
x = (x - CDF(n-1)) / (CDF(n) - CDF(n-1));
};
1
00
010 011
000 010 011 100
But this method still needs high-precision floating point operations.
It also needs to read all input at the beginning of the decoding.
Z. Li, Multimedia Communciation, 2018 p.18
Outline
Lecture 04 ReCap
Arithmetic Coding
Basic Encoding and Decoding
Uniqueness and Efficiency
Scaling and Incremental Coding
About Homework-1 and Lab
Z. Li, Multimedia Communciation, 2018 p.19
Uniqueness
1 2 3
Range 0.00288
0.7712 0.773504 0.7735616 0.77408
Termination: Encode the midpoint (or lower end) to signal the end.
How to represent the final tag uniquely and efficiently?
Answer: Take the binary value of the tag T(X) and truncate to
(X is the sequence {x1, …, xm}, not individual symbol)
é 1 ù 1 bit longer than
l ( X ) = êlog ú + 1 bits. Shannon code
ê p( X ) ú
1
Proof: Assuming midpoint tag: T(X) = F(X - 1) + p ( X ), p(X) > 0.
2
First show the truncated code is unique, that is, code is within [F(X-1), F(X)).
1). ëT(X)û l ( X ) £ T ( X ) < F ( X )
So ëT(X)û l ( X ) is below the high end of the interval.
Z. Li, Multimedia Communciation, 2018 p.20
Uniqueness and Efficiency
æé 1 ù ö æ 1 ö
2).
-çç ê log
p ( X )
ú +1 ÷÷ -çç log +1 ÷÷
p( X ) ø 1
2 -l ( X ) = 2 èê ú ø
£2 è
= p( X )
2
By def,
1
T(X) = F(X - 1) + p ( X ), p(X) > 0.
2 F(X)
1
T(X) - F(X - 1) = p( X ) ³ 2-l(X)
2
Together with T(X)- êëT(X) úû l ( X ) £ 2 - l ( X ) T(X)
£ 2-l ( X )
ëT(X)û l ( X ) ³ F ( X - 1). ³ 2-l ( X )
ëT(X)û l ( X )
F(X-1)
Thus F ( X - 1) £ ëT(X)û l ( X ) < F ( X )
So the truncated code is still in the interval. This proves the uniqueness.
Z. Li, Multimedia Communciation, 2018 p.21
Uniqueness and Efficiency
F(X)
Prove the code is uniquely decodable (prefix free):
Any code with prefix
ëT(X) û l ( X ) is in ëT(X)û l ( X )
1 ö 1
é +
ê ëT(X)û l ( X ) , ëT(X)û l ( X ) + 2l ( X ) ÷ 2l ( X )
ë ø
Need to show that this is in [F(X-1), F(X) ):
ëT(X)û l ( X )
We already show ëT(X)û l ( X ) ³ F ( X - 1).
F(X-1)
1
Only need to show F ( X ) - ëT(X)û l ( X ) >
2l ( X )
p( X ) 1
F ( X ) - ëT(X)û l ( X ) > F ( X ) - T ( X ) = > l( X )
2 2
é 1 ù
ëT(X)û l ( X ) is prefix free if l ( X ) = êlog ú + 1 bits.
ê p( X ) ú
Z. Li, Multimedia Communciation, 2018 p.22
Uniqueness and Efficiency
Efficiency of arithmetic code:
m é 1 ù X 1m : { x1 ,..., xm }
l ( X ) = êlog
1 m ú
+ 1 bits.
ê p( X 1 ) ú
æé 1 ù ö
{ 1
m
1
m
ç }
L = E p( X )l ( X ) = å P( X )ç êlog 1
m
m ú
p( X 1 ) ú
+ 1 ÷÷
èê ø
æ 1 ö
£ å P( X )çç log 1
m
m
+ 1 + 1 ÷
÷ = H ( X m
1 )+2
è p( X 1 ) ø
l(X) is the bits to code a sequence {x1,x2, …, xm}.
Assume iid sequence, H ( X 1m ) = mH ( X )
L 2
H(X ) £ £ H(X ) + L/m H(X) for large m, stronger than prev
m m results.
Z. Li, Multimedia Communciation, 2018 p.23
Uniqueness and Efficiency
Comparison with Huffman code:
Expected length of Huffman code:
*
H ( X ) £ L £ H ( X ) +1
Huffman code can reduce the overhead by
jointly encoding more symbols, but needs
much larger alphabet size.
Arithmetic coding is more efficient for longer
sequences.
Z. Li, Multimedia Communciation, 2018 p.24
Binary Arithmetic Coding
Arithmetic coding is slow in general:
To decode a symbol, we need a series of decisions and
multiplications:
While (Tag > LOW + RANGE * Sum(n) / N - 1) {
n++;
}
The complexity is greatly reduced if we have only two
symbols: 0 and 1.
symbol 0 symbol 1
0 x 1
Only two intervals: [0, x), [x, 1)
Z. Li, Multimedia Communciation, 2018 p.25
Encoding of Binary Arithmetic Coding
HIGH ¬ LOW + RANGE ´ CDF (n)
LOW ¬ LOW + RANGE ´ CDF (n - 1)
LOW = 0, HIGH = 1 Prob(0)=0.6. Sequence: 0110
LOW = 0, HIGH = 0.6
0 0.6 1
LOW = 0.36, HIGH = 0.6
0 0.36 0.6
LOW = 0.504, HIGH = 0.6
0.36 0.504 0.6
LOW = 0.504, HIGH = 0.5616
0.504 0.5616 0.6
Only need to update LOW or HIGH for each symbol.
Z. Li, Multimedia Communciation, 2018 p.26
Decoding of Binary Arithmetic Coding
Tag
0 0.6 1
General case (integer implementation):
While (Tag > LOW + RANGE * Sum(n) / N - 1)) {
n++;
}
Binary case: only one condition to check
if (Tag > LOW + RANGE * Sum(Symbol0) / N - 1) {
n = 1;
} else {
n = 0;
}
Z. Li, Multimedia Communciation, 2018 p.27
Applications of Binary Arithmetic Coding
Increasingly popular:
JBIG, JBIG2, JPEG2000, H.264
Convert non-binary signals into binary:
Golomb-Rice Code: used in H.264.
Bit-plane coding: used in JPEG2000.
B = [B0, B1, ……, BK-1]: binary representation of B.
Chain rule:
H(B) = H(B0) +H(B1 | B0) + …… + H(BK-1 | B0, … BK-2).
To code B0, needs P0(0): Prob(B0=0).
To code B1, needs P1(0 | i): Prob(B1=0 | B0 = i), i = 0, 1.
……
More details:
AVC Binary Adaptive Arithmetic Coding: CABAC
HEVC Arithmetic Coding:
Z. Li, Multimedia Communciation, 2018 p.28
Outline
Lecture 04 ReCap
Arithmetic Coding
Basic Encoding and Decoding
Uniqueness and Efficiency
Scaling and Incremental Coding
About Homework-1 and Lab
Z. Li, Multimedia Communciation, 2018 p.29
Scaling and Incremental Coding
Problems of Previous examples:
Need high precision
No output is generated until the entire sequence is encoded.
Decoder needs to read all input before decoding.
Key Observation:
As the RANGE reduces, many MSB’s of LOW and HIGH become
identical:
Example: Binary form of 0.7712 and 0.773504:
0.1100010.., 0.1100011..,
We can output identical MSB’s and re-scale the rest:
Incremental encoding
Can achieve infinite precision with finite-precision integers.
Three scaling scenarios: E1, E2, E3
Important Rules: Apply as many scalings as possible before
further encoding and decoding.
Z. Li, Multimedia Communciation, 2018 p.30
E1(lower half) and E2(higher half) Scaling
E1: [LOW HIGH) in [0, 0.5) 0 0.5 1.0
LOW: 0.0xxxxxxx (binary),
HIGH: 0.0xxxxxxx.
0 0.5 1.0
Output 0, then shift left by 1 bit
[0, 0.5) [0, 1): E1(x) = 2 x
E2: [LOW HIGH) in [0.5, 1) 0 0.5 1.0
LOW: 0.1xxxxxxx,
HIGH: 0.1xxxxxxx.
0 0.5 1.0
Output 1, subtract 0.5,
shift left by 1 bit
[0.5, 1) [0, 1): E2(x) = 2(x - 0.5)
0 0.5 1.0
The 3rd scaling, E3(mid), will be studied later:
LOW < 0.5, HIGH > 0.5, but range < 0.5.
Z. Li, Multimedia Communciation, 2018 p.31
Encoding with E1 and E2 Symbol Prob.
1 0.8
Input 1
2 0.02
0 0.8 1.0
3 0.18
Input 3
0 0.656 0.8 E2: Output 1
Input 2 2(x – 0.5)
0.312 0.5424 0.54816 0.6 E2: Output 1
0.0848 0.09632
E1: 2x, Output 0
0.1696 0.19264 E1: Output 0
0.3392 0.38528 E1: Output 0
0.6784 0.77056 E2: Output 1
Input 1
Encode any value
0.3568 0.54112 in the tag, e.g., 0.5
Output 1
0.3568 0.504256 All: 1100011 (0.7734)
Z. Li, Multimedia Communciation, 2018 32
To verify
LOW = 0.5424 (0.10001010... in binary),
HIGH = 0.54816 (0.10001100... in binary).
So we can send out 10001 (0.53125)
Equivalent to E2E1E1E1E2
After left shift by 5 bits:
LOW = (0.5424 – 0.53125) x 32 = 0.3568
HIGH = (0.54816 – 0.53125) x 32 = 0.54112
Same as the result in the last page.
(In this example, suppose 7 bits are enough for the
decoding)
Z. Li, Multimedia Communciation, 2018 p.33
Comparison with Huffman
Rule: Complete all possible scaling before
Symbol Prob.
encoding the next symbol
1 0.8
2 0.02
Input Symbol 1 does not cause any output 3 0.18
Input Symbol 3 generates 1 bit
Input Symbol 2 generates 5 bits
Symbols with larger probabilities generates less
number of bits.
Sometimes no bit is generated at all
Advantage over Huffman coding
Large probabilities are desired in arithmetic coding
Can use context-adaptive method to create larger probability
and to improve compression ratio.
Z. Li, Multimedia Communciation, 2018 p.34
Incremental Decoding
If input is 1100011, the 1st symbol can be decoded without ambiguity after 5 bits
are read. When 6 bits are read, the status is:
Read 5 bits: Decode 1.
After reading 6 bits:
0 0.8 1.0 Tag: 110001, 0.765625
No scaling.
0 0.656 0.8 Decode 3, E2 scaling:
Shift out 1 bit, read 1 bit
Tag: 100011 (0.546875)
0.312 0.5424 0.54816 0.6
Decode 2, E2 scaling
Tag: 000110 (0.09375)
0.0848 0.09632 E1: Tag: 001100 (0.1875)
0.1696 0.19264 E1: Tag: 011000 (0.375)
0.3392 0.38528 E1: Tag: 110000 (0.75)
0.6784 0.77056 E2: Tag: 100000 (0.5)
0.3568 0.54112 Decode 1
Z. Li, Multimedia Communciation, 2018 p.35
Encoding Pseudo Code with E1, E2
Rule: Complete all possible scalings Symbol Prob. CDF
before further decoding. Adjust LOW, 1 0.8 0.8
HIGH and Tag together. 2 0.02 0.82
(For floating-point implementation) 3 0.18 1
EncodeSymbol(n) {
//Update variables
RANGE = HIGH - LOW;
HIGH = LOW + RANGE * CDF(n);
LOW = LOW + RANGE * CDF(n-1);
//Apply all possible scalings before encoding the //next symbol
while LOW, HIGH in [0, 0.5) or [0.5, 1) {
Output 0 for E1 and 1 for E2
scale LOW, HIGH by E1 or E2 rule
}
}
Z. Li, Multimedia Communciation, 2018 p.36
Decoding Pseudo Code with E1, E2
(For floating-point implementation)
DecodeSymbol(Tag) {
RANGE = HIGH - LOW;
n = 1;
While ( (tag - LOW) / RANGE >= CDF(n) ) {
n++;
}
Symbol Prob. CDF
HIGH = LOW + RANGE * CDF(n); 1 0.8 0.8
LOW = LOW + RANGE * CDF(n-1);
2 0.02 0.82
//keep scaling before decoding next symbol 3 0.18 1
while LOW, HIGH in [0, 0.5) or [0.5, 1) {
scale LOW, HIGH by E1 or E2 rule
read one more bit, update Tag
}
return n;
}
Z. Li, Multimedia Communciation, 2018 p.37
E3 Scaling: [0.25, 0.75)[0, 1)
If RANGE straddles 1/2, E1 and E2 cannot be applied,
but the range can be quite small
Example: LOW=0.4999, HIGH=0.5001
Binary: LOW=0.01111…., HIGH=0.1000…
We may not have enough bits to represent the interval.
0.25 0.5 0.75
0 1
E3 Scaling:
[0.25, 0.75) [0, 1):
0 0.5 1
E3(x) = 2(x - 0.25)
Z. Li, Multimedia Communciation, 2018 p.38
Example
Previous example without E3:
Input 1
0 0.8 1.0
Input 3
0 0.656 0.8 (E2: Output 1)
Input 2
0.312 0.5424 0.54816 0.6 E2: Output 1
0.0848 0.09632
E1: Output 0
0.1696 0.19264
With E3:
0.312 0.6
0.124 0.5848 0.59632 0.7 E3: (x-0.25)x2
Input 2
E2: Output 1
0.1696 0.19264
state after E2°E3 = state after E1°E2, but outputs are different…
Z. Li, Multimedia Communciation, 2018 39
Encoding Operation with E3
Without E3:
Input 2
0.312 0.5424 0.54816 0.6 E2: Output 1
0.0848 0.09632 E1: Output 0
0.1696 0.19264
With E3:
0.312 0.6
0.124 0.5848 0.59632 0.7 E3 (no output)
Input 2
E2: Output 1
0.1696 0.19264 Output 0 here!
Don’t send anything when E3 is used, but send a 0 after E2:
Same output, same final state Equivalent operations
Z. Li, Multimedia Communciation, 2018 40
Decoding for E3 Input 1100011
Read 6 bits:
Tag: 110001 (0.765625)
0 Without E3: 0.8 1.0 Decode 1
0 0.656 0.8 Decode 3, E2 scaling
Tag: 100011 (0.546875)
0.312 0.5424 0.54816 0.6 Decode 2, E2 scaling
Tag: 000110 (0.09375)
0.0848 0.09632 E1:
Tag: 001100 (0.1875)
0.1696 0.19264
With E3:
0.312 0.6 Tag: 100011 (0.546875)
0.124 0.5848 0.59632 0.7 E3:
Tag: 100110 (0.59375)
Decode 2, E2 scaling
Tag: 001100 (0.1875)
0.1696 0.19264
Apply E3 whenever it is possible, everything else is same.
Z. Li, Multimedia Communciation, 2018 p.41
Summary of Different Scalings
0 0.25 0.5 0.75 1.0
Need E1 scaling
Need E2 scaling
Need E3 scaling
No scaling is required.
Continue to
encode/decode the next
symbol.
Z. Li, Multimedia Communciation, 2018 p.42
HW-1
Info Theory: Entropy, Conditional Entropy, Mutual Info,
Relative Entropy (KL Divergence)
Hoffman Coding
Residual Image Error statistics and Golomb coding
Pixel value prediction filtering
Residual error distribution modeling
Optimal m selection in Golomb coding
Z. Li, Multimedia Communciation, 2018 p.43
Summary
VLC is the real world image coding solution
Elements of Hoffman and Golomb coding schemes are
incorporated
JPEG: introduced DC prediction , AC zigzag scan, run-level
VLC
H264: introduced reverse order coding.
Z. Li, Multimedia Communciation, 2018 p.44