0% found this document useful (0 votes)
47 views

Arithmetic Coding: I J I J J

Arithmetic coding is a method of lossless data compression. It assigns variable-length bit strings to symbols based on their probabilities. The string representing a sequence of symbols is chosen from an interval determined by the probabilities. Scaling is used to avoid very small intervals requiring many digits of precision.

Uploaded by

gunnersregister
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Arithmetic Coding: I J I J J

Arithmetic coding is a method of lossless data compression. It assigns variable-length bit strings to symbols based on their probabilities. The string representing a sequence of symbols is chosen from an interval determined by the probabilities. Scaling is used to avoid very small intervals requiring many digits of precision.

Uploaded by

gunnersregister
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Arithmetic coding

Let L be a set of items.


Every item iL has a probability
Pi[0,1], s.t. Pi = 1 .
iL

Every item is represented by the


interval:[ Pj, Pj).
j<i

ji

The current interval is divided into subintervals according to the item's


probabilities.
4.

Klein S. T. and Wiseman Y.

Example
Letter

Probability

Interval

0.2

[0,0.2)

0.3

[0.2,0.5)

0.1

[0.5,0.6)

0.2

[0.6,0.8)

0.1

[0.8,0.9)

0.1

[0.9,1)

4.

Klein S. T. and Wiseman Y.

Compressing String eaii#


Message can be any x [0.23354,0.2336)
0.2

0.2

0.23

0.233

a
0.2

0.26
e

0.5

0.23
i
0.236

0.5
4.

0.26

0.233
i
0.2336

0.23354 #
0.2336
0.236

Klein S. T. and Wiseman Y.

Decompression
The shortest binary fraction in this
interval is 0.00111011110011 which is
0.23358154296875.
This number is more than 0.2 and less
than 0.5 so the first item is 'e'.
Two methods to stop decompression:
A special character for EOF.
Attaching the number of items.
4.

Klein S. T. and Wiseman Y.

Scaling
The intervals in arithmetic coding are
shrinking very fast.
Many digits must be used to provide
sufficient precision.
Multiplications and divisions on very
long numbers are slow and
complicated.
4.

Klein S. T. and Wiseman Y.

The scaling procedure


To have a larger interval, apply scaling
If all the interval is in [0,0.5], boundaries are
doubled and "0" is sent to output.
If all the interval is in [0.5,1], the distance from the
boundaries to 1 is doubled and "1" is sent to
output.
If all the interval is in [0.25,0.75] and the interval
includes 0.5, the distance from the interval
boundaries to 0.5 is doubled. Another following bit
will go to output when there will be an output. The
bit will be inverted from the output.
4.

Klein S. T. and Wiseman Y.

Example
Items:
a-10%
i-20%
r-30%
y-40%

Intervals:
a-[0,0.1)
i-[0.1,0.3)
r-[0.3,0.6)
y-[0.6,1)

String to be encoded: "yair".


4.

Klein S. T. and Wiseman Y.

Example (Cont.)
[0,1)

[0.6,1)
[0.2,1)

Output string is 10011011

1
[0.2,0.28)
0
[0.4,0.56)
follow
[0.3,0.62)
follow
[0.1,0.74)

[0.164,0.292)
0ff = 011
[0.328,0.584)
[0.156,0.668)

follow
[0.3096,0.4632)
0f = 01
[0.6192,0.9264)
1
[0.2384,0.8528)

4.

Klein S. T. and Wiseman Y.

Example (cont. )
The shortest fraction in the interval
[0.2384,0.8528) is 0.5 which is 0.1 in binary.
So the final output is 100110111.
Without scaling we would get:
y
a
i
[0,1) [0.6,1)
[0.6,0.64)
r
[0.604,0.612) [0.6064,0.6088)
The shortest binary fraction in this interval is
0.100110111 which is about 0.607422.
4.

Klein S. T. and Wiseman Y.

Arithmetic code is optimal


The information content of each item is -log2Pi.
Huffman gives each item an integer length so
only when i -log2 Pi is an integer, will Huffman
be optimal.
Such a distribution is called a Dyadic distribution.

If a code gives an average codeword length of


n
H = Pi log Pi , which is called entropy, it will be
i =1

an optimal code.
4.

Klein S. T. and Wiseman Y.

Definitions
File X: x1,...,xn.
Probabilities of items in file X: p1,...,pn.
The items' set: a1,...,am.
xi{a1,...,am}.
Probabilities of items: q1,...,qm.
Frequencies of items: f1,...,fm.
qi=fi/n
4.

Klein S. T. and Wiseman Y.

Proof of optimality
n

Information content in file X: log pi .


n

mf

i =1

i =1

i =1

log pi = log pi = fi log q i = n ni log q i =


i =1
i =1
m
n q i log q i = n H
i =1

Holds for n items,


so one item is encoded exactly by H bits.
4.

Klein S. T. and Wiseman Y.

You might also like