Data Compression
Data Compression
Data Compression
(Source Coding)
Boğaziçi University
Fall 2024
1
1
What
A
• sourceis aCSource
code Code?
for a random variable X is a mapping from X , the range of X, to D , the set of
A source code C for a random variable X is a mapping from X , the range of X, to D , the set of
•
⇤
⇤
1
• A source code
finite-length C forofasymbols
strings random from
variable
a X is alphabet.
D-ary a mapping from
Let , the range
Xdenote
C(x) the of X, to corresponding
codeword D⇤ , the set of
finite-length strings of symbols from a D-ary alphabet. Let C(x) denote the codeword corresponding
finite-length
to
to xx and let strings
and let of symbols
denote
l(x) denote
l(x) fromof
the length
the length ofaC(x).
D-ary alphabet. Let C(x) denote the codeword corresponding
C(x).
•
•
to x and let
Example:
Example: C(1) =denote
l(x)=
C(1) 00, the=
00, C(2)
C(2) =length of C(x).
01, C(3)
01, C(3) ==11,
11,C(4)
C(4)==10, withalphanet
10,with alphanetDD=={0,
{0,1}.1}.
••
•
Example-
Example: Morse
Example-C(1) Code:
= 00,
Morse C(2) = 01, C(3) = 11, C(4) = 10, with alphanet D = {0, 1}.
Code:
••
•
Example-
The Morse
The expected
expected Code:
length
length of aa source
L(C) of
L(C) source code
code C(x) fora arandom
C(x)for randomvariable
variableXXwith
withprobability
probability mass
mass
• function
The expected
function is given
givenL(C)
p(x) length
p(x) is by of a source code C(x) for a random variable X with probability mass
by
XX
function p(x) is given by L(C)== p(x)l(x),
L(C) p(x)l(x),
X
x2X
x2X
L(C) = p(x)l(x),
where l(x) is
where l(x) the length
is the length of
of the
the codeword
codewordassociated
associated withx.x.
x2X with
•• Without
where loss
l(x)
Without of
lossis of generality,
thegenerality, we
length of we can
thecan assumeassociated
codeword
assume thatthe
that theD-ary alphabetisisDD=={0,
withalphabet
D-ary x. {0,1,1,
. . .. ., .D, D 1}.1}.
••• A
A code
code is
Without said
loss
is togenerality,
of to
said be nonsingular
be nonsingular
we canififassume
everyelement
every element
that theofof therange
the
D-ary rangeofofXX
alphabet maps
ismaps intoa1,adifferent
D =into
{0, . .different string
. , D string
1}. in in
• A ⇤⇤
;; that
Dcode
D thatisis,
is,said to be nonsingular if every element of the range of X maps into a different string in
0 0 0 ).
D⇤ ; that is, xx6=6=xx0 =) C(x)6=6=C(x
=)C(x) C(x ).
Best Known Example: Morse Code
where C(x1 )C(x2 ) . . . C(xn ) indicates concatenation of corresponding codewor
the first two bits are 11, we must look at the following bits. If the next
the
bit first
is a 1, twothebits
firstare 11, we
source mustislook
symbol a 3. at
If the
the following
length of the bits. If the
string of nex
bit
0’s isimmediately
a 1, the first sourcethesymbol
following is athe
11 is odd, 3. first
If the length must
codeword of the be string
110 of
0’s
andimmediately
the first source following the 11
symbol must be is
4; odd,
if the the firstofcodeword
length the string ofmust
0’s be
is 110
and
even, thethefirst
firstsource
source symbol
symbol ismust be repeating
a 3. By 4; if the this
length of the we
argument, string
can of
see0’s is
that this
even, the code is uniquely
first source decodable.
symbol is a 3. BySardinas
repeatingandthis
Patterson [455]we
argument, have
can see
devised
that this acode
finiteistest for unique
uniquely decodability,
decodable. whichand
Sardinas involves forming
Patterson sets have
[455]
of possible suffixes to the codewords and eliminating them systematically.
devised a finite test for unique decodability, which involves forming sets
The test is described more fully in Problem 5.5.27. The fact that the last
of possible
code in Table suffixes to the codewords
5.1 is instantaneous and eliminating
is obvious them systematically
since no codeword is a prefix
• A source code C for a random variable X is a mapping from X , the range of X, to D⇤ , the set of
finite-length strings of symbols from a D-ary alphabet. Let C(x) denote the codeword corresponding
5.2 KRAFT INEQUALITY
to x and let l(x) denote the length of C(x).
• Example: C(1) = 00, C(2) = 01, C(3) = 11, C(4) = 10, with alphanet D = {0, 1}.
TABLE 5.1 Classes of Codes
What
Example-is
• a Good
Morse Code: Source Code?
Nonsingular, But Not Uniquely Decodable,
• The expected length L(C) of a source code C(x) for a random variable X with probability mass
X Singular Uniquely Decodable But Not Instantaneous Instantane
function p(x) is given by
1 0 0 X 10 0
L(C) = p(x)l(x),
2 0 010 00 10
x2X
3 0 01 11 110
where
4 l(x) is0 the length of the codeword
10 associated with x. 110 111
• Without loss of generality, we can assume that the D-ary alphabet is D = {0, 1, . . . , D 1}.
• A code is said to be nonsingular if every element of the range of X maps into a different string in
the first
D⇤ ; that is, two bits are 11, we must look at the following bits. If the n
bit is a 1, the first source symbol is a 3. If the length of the string
x 6= x0 =) C(x) 6= C(x0 ).
0’s immediately following the 11 is odd, the first codeword must be 1
• and the first
The extension C ⇤ ofsource
a code C symbol
is the mapping must be 4; if strings
from finite-length the length of the strings
of X to finite-length string of 0’s
even, the by
of D, defined first source symbol is a 3. By repeating this argument, we can
that this code is uniquely decodable. Sardinas and Patterson [455] ha
C(x1 x2 . . . xn ) = C(x1 )C(x2 ) . . . C(xn )
devised a finite test for unique decodability, which involves forming s
where C(x1 )C(x2 ) . . . C(xn ) indicates concatenation of corres
the first two bits are 11, we must look at the following bits. If the next
the
bit first
is a two
1, the bitsfirst
aresource
11, wesymbol
must look
is a 3.at Ifthethefollowing
length ofbits.
the If the next
string of
bit
0’sisimmediately
a 1, the first source the
following symbol
11 is is a 3.
odd, theIffirst
thecodeword
length ofmust the be
string
110 of
0’s
andimmediately
the first source following
symbol themust11beis4;odd, thelength
if the first codeword
of the stringmust be is
of 0’s 110
and
even,thethe
first
firstsource
sourcesymbol
symbolmustis a 3.be
By4;repeating
if the length of the string
this argument, of 0’s
we can see is
even, the first
that this codesource symbol
is uniquely is a 3. BySardinas
decodable. repeatingand thisPatterson
argument, we can
[455] havesee
that this acode
devised finiteistestuniquely decodable.
for unique Sardinas
decodability, which andinvolves
Patterson [455]sets
forming have
devised a finite
of possible suffixestest to
fortheunique decodability,
codewords which involves
and eliminating forming sets
them systematically.
ofThe
possible
test is suffixes
described to more
the codewords and eliminating
fully in Problem 5.5.27. The them
factsystematically.
that the last
• The expected length L(C) of a source code C(x) for a random variable X with probability mass
What
where 5.1isis
TABLEl(x) alengthGood
Classes
the of
of Codes Source
the codeword associatedCode?
with x.
• Nonsingular,
Without loss of generality, Butthat
we can assume Notthe D-ary
Uniquely
alphabet Decodable,
is D = {0, 1, . . . , D 1}.
X Singular Uniquely Decodable But Not Instantaneous Instantaneous
• A code is said to be nonsingular if every element of the range of X maps into a different string in
1 0 0 10 0
D2 ; that is, 0
⇤
010 00 10
3 0 01x 6= x0 =) C(x) 6= C(x0 ). 11 110
4 0 10 110 111
• The extension C ⇤ of a code C is the mapping from finite-length strings of X to finite-length strings
theD, first
of two
defined by bits are 11, we must look at the following bits. If the next
bit is a 1, the first source C(x1 x2 . symbol is1)C(x
. . xn ) = C(x a 3.2) . If then) length of the string of
. . C(x
0’s immediately following the 11 is odd, the first codeword must be 110
and the
where C(x1first
)C(x2source
) . . . C(xn )symbol
indicates must be 4;ofifcorresponding
concatenation the lengthcodewords.
of the string of 0’s is
even, the first source symbol is a 3. By repeating this argument, we can see
• A code is called uniquely decodable if its extension is nonsingular.
that this code is uniquely decodable. Sardinas and Patterson [455] have
• A code is called
devised a prefix
a finite testcode
fororunique
an instantaneous code if nowhich
decodability, codeword is a prefixforming
involves of any other
sets
of possible suffixes to the codewords and eliminating them systematically.
codeword.
The test is described more fully in Problem 5.5.27. The fact that the last
where C(x1 )C(x2 ) . . . C(xn ) indic
What
TABLE 5.1isClasses
TABLE 5.1 a Good
Classes CodesSource Code?
ofofCodes codeword.
Nonsingular,
Nonsingular, But
But Not
Not Uniquely
Uniquely Decodable,
Decodable,
X
X Singular
Singular UniquelyDecodable
Uniquely Decodable ButBut
NotNot C
Instantaneous
Instantaneous Instantaneous
Instantaneous
11 00 00 10 10 0 0
22 00 010
010 00 00 10 10
33 00 0101 11 11 110 110
44 00 1010 110110 111 111
the first two bits are 11, we must look at the following bits. If the next
the first two bits are 11, we must look at the following bits. If the next
bit is a 1, the first source symbol is a 3. If the length of the string of
bit
0’s is a 1, the first
immediately sourcethesymbol
following is athe
11 is odd, 3. first
If the length must
codeword of thebe string
110 of
0’s
and immediately
the first source following
symbol mustthe 11 is odd,
be 4; if the the firstofcodeword
length the string must
of 0’sbeis 110
and
even,thethefirst
firstsource symbolismust
source symbol be repeating
a 3. By 4; if the length of the string
this argument, we canof see0’s is
even, thecode
that this first source
is uniquelysymbol is a 3. By
decodable. repeating
Sardinas andthis argument,
Patterson [455]we can see
have
that thisa code
devised finite istestuniquely decodable.
for unique Sardinas
decodability, whichand Patterson
involves forming[455]
setshave
devised
of possiblea finite
suffixestesttofor
the unique
codewords decodability, whichthem
and eliminating involves forming sets
systematically.
The
of test is described
possible suffixes tomore fully in Problem
the codewords 5.5.27. The them
and eliminating fact that the last
systematically.
• A code is said to be nonsingular if every element of the range of X maps into a different string in
D⇤ ; that is,
x 6= x0 =) C(x) 6= C(x 0
5.2). KRAFT INEQUALITY 107
What is a Good
TABLE 5.1 Classes of Codes
Source Code?
• The extension C ⇤ of a code C is the mapping from finite-length strings of X to finite-length strings
A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other
•
the first two bits are 11, we must look at the following bits. If the next
bit is a 1, the first source symbol is a 3. If the length of the string of
codeword.
0’s immediately following the 11 is odd, the first codeword must be 110
C and the first source symbol must be 4; if the length of the string of 0’s is
even, the first source symbol is a 3. By repeating this argument, we can see
that this code is uniquely decodable. Sardinas and Patterson [455] have
devised a finite test for unique decodability, which involves forming sets
of possible suffixes to the codewords and eliminating them systematically.
• A code is said to be nonsingular if every element of the range of X maps into a different string in
D⇤ ; that is,
x 6= x0 =) C(x) 6= C(x 0
).
5.2 KRAFT INEQUALITY 107
What is a Good Source Code?
• TABLE
The 5.1 C
extension Classes
⇤ of Codes
of a code C is the mapping from finite-length strings of X to finite-length strings
A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other
•
the first two bits are 11, we must look at the following bits. If the next
bit is a 1, the first source symbol is a 3. If the length of the string of
codeword.
0’s immediately following the 11 is odd, the first codeword must be 110
C and the first source symbol must be 4; if the length of the string of 0’s is
even, the first source symbol is a 3. By repeating this argument, we can see
that this code is uniquely decodable. Sardinas and Patterson [455] have
devised a finite test for unique decodability, which involves forming sets
of possible suffixes to the codewords and eliminating them systematically.
The nesting of these definitions is shown in Figure 5.1. To illustrate the
differences between the various kinds of codes, consider the examples of
codeword assignments C(x) to x ∈ X in Table 5.1. For the nonsingular
code, the code string 010 has three possible source sequences: 2 or 14 or
31, and hence the code is not uniquely decodable. The uniquely decodable
code is not prefix-free and hence is not instantaneous. To see that it is
uniquely decodable, take any code string and start from the beginning.
What is a Good Source Code?
If the first two bits are 00 or 10, they can be decoded immediately. If
All
codes
Nonsingular
codes
Uniquely
decodable
codes
Instantaneous
codes
Kraft’s
•
Inequality for Instantenous (Prefix Codes)?
A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other
codeword.
• Kraft Inequality:For any instantaneous code (prefix code) over an alphabet of size D, the codeword
X
li
D 1.
i
Conversely, given a set of codeword lengths that satisfy this inequality, there exists an instantaneous
or !
10 D −li ≤ 1,
!
D −li ≤ 1,
which is the Kraft inequality.
110
Conversely, given any set of codeword lengths l1 , l2 , . . . , l
which is the Kraft inequality.
isfy Conversely,
the Kraft inequality,
given any we set can always construct
of codeword lengths l a, ltree
, . . .like
,l
111 1 2 m
isfy the Kraft inequality, we can always construct a tree like th
FIGURE 5.2. Code tree for the Kraft inequality.
by a leaf on the tree. The path from the root traces out the symbols of the
2
• Let us limit our scope only to (prefix) instantenous source codes with binary codewords, i.e., D=2.
• Let us also find the optimal prefix codes with the minimum expected length.
X
li
2 1.
i
• The solution of constrained optimization problems rely on combining the original cost function and
the constraint(s) with a linear relationship which results in the combined function called Lagrangian
X ✓X ◆
li
J(l1 , l2 , . . . , lm , ) = pi l i + 2 1 .
i i
• Then the optimum solution should satisfy the derivative conditions for each and every one of the
@J
= pj 2 lj
loge 2 = 0, j = 1, . . . , m. (1)
@lj
X
• The solution of constrained optimization problems rely on combining the original cost function and
the constraint(s) with a linear relationship which results in the combined function called Lagrangian
X ✓X ◆
li
• Then the optimum solution should satisfy the derivative conditions for each and every one of the
@J
= pj 2 lj
loge 2 = 0, j = 1, . . . , m. (1)
@lj
@J X
= 2 li
1 = 0. (2)
@ i
• Notice that the derivative w.r.t. to each guarantees that the solution satisfies the corresponding
constraint.
li pi
2 = i = 1, . . . , m.
loge 2
pi = 2 li
() li⇤ = log pi i = 1, . . . , m.
Optimal Source Codes 3
X X
L⇤ = pi li⇤ = pi log pi = H(X).
i i
• Notice that some li⇤ = log pi ’s may not be integer valued. So the optimal lengths are actually found
as:
⇠ ⇡
1 1 1
log li⇤ = log log + 1.
pi pi pi
• Correspondingly H(X) L⇤ H(X) + 1 (Proof is easy, just plug in the upper and lower bounds
in the L⇤ formula.)
• This is an important result which says that the lower bound on how much you can compress the data
• This results also says that the average number of bits per symbol is upper bounded by entropy plus
• Notice that if we are to encode not one symbol but n symbols in total, (X1 , X2 , . . . , Xn . Then we
X X
• This is an important result which says that the lower bound on how much you can compress the data
• This results also says that the average number of bits per symbol is upper bounded by entropy plus
• Notice that if we are to encode not one symbol but n symbols in total, (X1 , X2 , . . . , Xn ). Then we
X X
E[L(X1 , X2 , . . . , Xn )] = ... p(x1 , . . . , xn )l(x1 , . . . , xn )
P
• If X1 , X2 , Xn are i.i.d., then H(X1 , X2 , . . . , Xn ) = i H(Xi ) = nH(X). So we can show that
E[L(X1 , X2 , . . . , Xn )] 1
H(X) H(X) + .
n n
• That is to say, for long sequences of symbols the optimal number of bits per symbol is determined
• This is actually known as the Shannon source coding scheme. It is useful in identifying the opti-
mality conditions and bounds, in determining the optimal codes but it does not help us constructing
codewords.
• It is all about the lengths of the codewords and not about the codewords themselves.
mality conditions and bounds, in determining the optimal codes but it does not help us constructing
codewords.
• It is all about the lengths of the codewords and not about the codewords themselves.
First…
• Properties
How can we ofcodesAngivenOptimal
construct optimal source a PMF? Code
• Without loss of generality, we can assume that the probability masses are ordered, so that p1 p2
··· pm .
P
• Recall that a code is optimal if pi li is minimal.
• Then for any distribution, there exists an optimal instantaneous code (with minimum expected length)
1) The lengths are ordered inversely with the probabilities (i.e., if pj > pk , then lj lk ).
3) Two of the longest codewords differ only in the last bit and correspond to the two least likely
symbols.
j k j k
3) Two of the longest codewords differ only in the last bit and correspond to the two least likely
Huffman
symbols. Codes
• Huffman proposed a method tor generating an optimal source code satisying these conditions. The
1) Sort all symbols and probabilities in descending order of probability. If multiple probabilities
are the same, it does not matter in which order they are written.
2) Merge the two symbols with the lowest probabilities into a new symbol, sum up the probabilities.
Re-sort the new probabilities in descending order. Label one branch with bit 1 and the other
with bit 0.
3) Repeat step 2 with the new probabilities. (Be careful and be consistent with your ordering and
labeling throughout.)
4) When you sum up to a single symbol with probability 1, trace back the branches to come up
• If we consider D-ary codewords, in each step we combine D probabilities. Notice that if D > 2 in
order to have D probabilities to combine at each stage, you may need to add some 0 probability
001
• L(C) = H(X) 4 0.125 1
• Huffman Codes are not necessarily unique.
1
Codewords
111 X4 0.125 Probability
1 1 0.5 0.5 0.5 1
01
Codewords 2X 0.25 0.25 0.5
Probability
000
1 31 0.125
0.5 0.25
0.5 0.5 1
Huffman Codes
001
01 42 0.125
0.25 0.25 0.5 5
Codewords
001 X4 •0.125
L(C) = H(X). Why?
L(C) = H(X). Why?
•
Probability 5
Huffman Codes are not necessarily unique. 0 1
1 1 1/3 Codes are not necessarily
1/3 unique. 0 1 2/3 1
• 5
•Huffman
L(C) = H(X). Why? 5
•
00
D = {0, 1, 2}.
Huffman
•
2 1/3
Codes are not necessarily unique. 0 1
1/3 1/3
•
010
L(C) = 2 1.85 = H(X) < L(C)
3 1/4 1/3
Codewords
011 X4 1/12 Probability
5
00 1 1/3 1/3 2/3 1 5
01
Codewords 2X 1/3
L(C) = H(X). Why? 1/3
•
•L(C) = H(X). Why?
1/3
Probability 5
5
5
111
01 L(C)
• =
42 Codes
Huffman •
H(X). Why?
1/12
1/3
are not necessarily 1/3
unique. 0 1 Huffman
•
1/3unique. 0 1
Codes are not necessarily
Huffman Codes are not necessarily unique. 0 1
L(C) = H(X). Why?
L(C) = H(X). Why?
•
Codewords X Probability 5 5
0
01 1 0.25 0.3 0.45 0 0.55 0
1 5
5
0
10 2 0.25 0.25 0 0 0.3 1 0.45 1 1
0 1
11 3 0.2 0 0.25 1 1 0.25
1
000 4 0.15 1 0.2
001 5 0.15
code has average length 2.3 bits.
ple 5.6.2 Consider a ternary code for the same random variable.
we combine the three least likely symbols into one supersymbol and
Non-binary
the following table: Huffman Codes (D>2)
Codeword X Probability
1 1 0.25 0.5 1
2 2 0.25 0.25
00 3 0.2 0.25 5
01 4 0.15
L(C) = H(X). Why?
02 5 0.15
•
• D = {0, 1, 2}.
code has an average length of 1.5 ternary digits.
ple 5.6.3 If D ≥ 3, we may not have a sufficient number of sym-
o that we can combine them D at a time. In such a case, we add
y symbols to the end of the set of symbols. The dummy symbols
robability 0 and are inserted to fill the tree. Since at each stage of
uction, the number of symbols is reduced by D − 1, we want the
Example 5.6.3 If D ≥ 3, we may not have a sufficient number of sym-
bols so that we can combine them D at a time. In such a case, we add
dummy symbols to the end of the set of symbols. The dummy symbols
have probability 0 and are inserted to fill the tree. Since at each stage of
the reduction, the number of symbols is reduced by D − 1, we want the
Non-binary Huffman Codes (D>2)
total number of symbols to be 1 + k(D − 1), where k is the number of
merges. Hence, we add enough dummy symbols so that the total number
of symbols is of this form. For example:
• D = {0, 1, 2}.