0% found this document useful (0 votes)
8 views

Data Compression

The document discusses source coding in digital communications, defining a source code as a mapping from a random variable to a set of finite-length strings from a D-ary alphabet. It covers concepts such as expected length of source codes, nonsingular and uniquely decodable codes, and provides examples including Morse code. Additionally, it introduces the Kraft inequality and outlines different classes of codes based on their properties.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Data Compression

The document discusses source coding in digital communications, defining a source code as a mapping from a random variable to a set of finite-length strings from a D-ary alphabet. It covers concepts such as expected length of source codes, nonsingular and uniquely decodable codes, and provides examples including Morse code. Additionally, it introduces the Kraft inequality and outlines different classes of codes based on their properties.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

EE 477

Introduction to Digital Communications

Data Compression
(Source Coding)
Boğaziçi University
Fall 2024
1
1

What
A
• sourceis aCSource
code Code?
for a random variable X is a mapping from X , the range of X, to D , the set of
A source code C for a random variable X is a mapping from X , the range of X, to D , the set of



1

• A source code
finite-length C forofasymbols
strings random from
variable
a X is alphabet.
D-ary a mapping from
Let , the range
Xdenote
C(x) the of X, to corresponding
codeword D⇤ , the set of
finite-length strings of symbols from a D-ary alphabet. Let C(x) denote the codeword corresponding
finite-length
to
to xx and let strings
and let of symbols
denote
l(x) denote
l(x) fromof
the length
the length ofaC(x).
D-ary alphabet. Let C(x) denote the codeword corresponding
C(x).



to x and let
Example:
Example: C(1) =denote
l(x)=
C(1) 00, the=
00, C(2)
C(2) =length of C(x).
01, C(3)
01, C(3) ==11,
11,C(4)
C(4)==10, withalphanet
10,with alphanetDD=={0,
{0,1}.1}.
••

Example-
Example: Morse
Example-C(1) Code:
= 00,
Morse C(2) = 01, C(3) = 11, C(4) = 10, with alphanet D = {0, 1}.
Code:
••

Example-
The Morse
The expected
expected Code:
length
length of aa source
L(C) of
L(C) source code
code C(x) fora arandom
C(x)for randomvariable
variableXXwith
withprobability
probability mass
mass
• function
The expected
function is given
givenL(C)
p(x) length
p(x) is by of a source code C(x) for a random variable X with probability mass
by
XX
function p(x) is given by L(C)== p(x)l(x),
L(C) p(x)l(x),
X
x2X
x2X
L(C) = p(x)l(x),
where l(x) is
where l(x) the length
is the length of
of the
the codeword
codewordassociated
associated withx.x.
x2X with

•• Without
where loss
l(x)
Without of
lossis of generality,
thegenerality, we
length of we can
thecan assumeassociated
codeword
assume thatthe
that theD-ary alphabetisisDD=={0,
withalphabet
D-ary x. {0,1,1,
. . .. ., .D, D 1}.1}.
••• A
A code
code is
Without said
loss
is togenerality,
of to
said be nonsingular
be nonsingular
we canififassume
everyelement
every element
that theofof therange
the
D-ary rangeofofXX
alphabet maps
ismaps intoa1,adifferent
D =into
{0, . .different string
. , D string
1}. in in
• A ⇤⇤
;; that
Dcode
D thatisis,
is,said to be nonsingular if every element of the range of X maps into a different string in
0 0 0 ).
D⇤ ; that is, xx6=6=xx0 =) C(x)6=6=C(x
=)C(x) C(x ).
Best Known Example: Morse Code
where C(x1 )C(x2 ) . . . C(xn ) indicates concatenation of corresponding codewor

•A code is called uniquely decodable if its extension is nonsingular.


5.2 KRAFT INEQUALITY 107
5.2 KRAFT INEQUALITY 107
• A code is called a prefix code or an instantaneous code if no codeword is a
TABLE 5.1 Classes of Codes
What is a Good Source Code?
TABLE 5.1 Classes of Codes
codeword.
Nonsingular,But
But Not Uniquely
Uniquely Decodable,
Nonsingular, Not Decodable,
XX Singular
C
Singular UniquelyDecodable
Uniquely Decodable ButBut NotNot InstantaneousInstantaneous
Instantaneous Instantaneou
11 0 00 10 10 0 0
22 0
0 010
010 00 00 10 10
33 0
0 01
01 11 11 110 110
44 0
0 10
10 110
110 111
111

the first two bits are 11, we must look at the following bits. If the next
the
bit first
is a 1, twothebits
firstare 11, we
source mustislook
symbol a 3. at
If the
the following
length of the bits. If the
string of nex
bit
0’s isimmediately
a 1, the first sourcethesymbol
following is athe
11 is odd, 3. first
If the length must
codeword of the be string
110 of
0’s
andimmediately
the first source following the 11
symbol must be is
4; odd,
if the the firstofcodeword
length the string ofmust
0’s be
is 110
and
even, thethefirst
firstsource
source symbol
symbol ismust be repeating
a 3. By 4; if the this
length of the we
argument, string
can of
see0’s is
that this
even, the code is uniquely
first source decodable.
symbol is a 3. BySardinas
repeatingandthis
Patterson [455]we
argument, have
can see
devised
that this acode
finiteistest for unique
uniquely decodability,
decodable. whichand
Sardinas involves forming
Patterson sets have
[455]
of possible suffixes to the codewords and eliminating them systematically.
devised a finite test for unique decodability, which involves forming sets
The test is described more fully in Problem 5.5.27. The fact that the last
of possible
code in Table suffixes to the codewords
5.1 is instantaneous and eliminating
is obvious them systematically
since no codeword is a prefix
• A source code C for a random variable X is a mapping from X , the range of X, to D⇤ , the set of

finite-length strings of symbols from a D-ary alphabet. Let C(x) denote the codeword corresponding
5.2 KRAFT INEQUALITY
to x and let l(x) denote the length of C(x).

• Example: C(1) = 00, C(2) = 01, C(3) = 11, C(4) = 10, with alphanet D = {0, 1}.
TABLE 5.1 Classes of Codes
What
Example-is
• a Good
Morse Code: Source Code?
Nonsingular, But Not Uniquely Decodable,
• The expected length L(C) of a source code C(x) for a random variable X with probability mass
X Singular Uniquely Decodable But Not Instantaneous Instantane
function p(x) is given by
1 0 0 X 10 0
L(C) = p(x)l(x),
2 0 010 00 10
x2X
3 0 01 11 110
where
4 l(x) is0 the length of the codeword
10 associated with x. 110 111
• Without loss of generality, we can assume that the D-ary alphabet is D = {0, 1, . . . , D 1}.

• A code is said to be nonsingular if every element of the range of X maps into a different string in
the first
D⇤ ; that is, two bits are 11, we must look at the following bits. If the n
bit is a 1, the first source symbol is a 3. If the length of the string
x 6= x0 =) C(x) 6= C(x0 ).
0’s immediately following the 11 is odd, the first codeword must be 1
• and the first
The extension C ⇤ ofsource
a code C symbol
is the mapping must be 4; if strings
from finite-length the length of the strings
of X to finite-length string of 0’s
even, the by
of D, defined first source symbol is a 3. By repeating this argument, we can
that this code is uniquely decodable. Sardinas and Patterson [455] ha
C(x1 x2 . . . xn ) = C(x1 )C(x2 ) . . . C(xn )
devised a finite test for unique decodability, which involves forming s
where C(x1 )C(x2 ) . . . C(xn ) indicates concatenation of corres

• A code is called uniquely decodable if its extension is nonsin


5.2
5.2 KRAFT
KRAFTINEQUALITY
INEQUALITY 107107
• A code is called a prefix code or an instantaneous code if
What
TABLE is aClasses
TABLE5.1
5.1 Good
Classes Source
of Codes
of Codes Code?
codeword.
Nonsingular, But
Nonsingular, But Not
Not Uniquely
UniquelyDecodable,
Decodable,
XX Singular
Singular Uniquely
C Decodable
Uniquely Decodable But
ButNot
NotInstantaneous
Instantaneous Instantaneous
Instantaneous
11 00 0
0 10
10 0 0
22 00 010
010 00
00 10
10
3 0 01 11 110
3 0 01 11 110
4 0 10 110 111
4 0 10 110 111

the first two bits are 11, we must look at the following bits. If the next
the
bit first
is a two
1, the bitsfirst
aresource
11, wesymbol
must look
is a 3.at Ifthethefollowing
length ofbits.
the If the next
string of
bit
0’sisimmediately
a 1, the first source the
following symbol
11 is is a 3.
odd, theIffirst
thecodeword
length ofmust the be
string
110 of
0’s
andimmediately
the first source following
symbol themust11beis4;odd, thelength
if the first codeword
of the stringmust be is
of 0’s 110
and
even,thethe
first
firstsource
sourcesymbol
symbolmustis a 3.be
By4;repeating
if the length of the string
this argument, of 0’s
we can see is
even, the first
that this codesource symbol
is uniquely is a 3. BySardinas
decodable. repeatingand thisPatterson
argument, we can
[455] havesee
that this acode
devised finiteistestuniquely decodable.
for unique Sardinas
decodability, which andinvolves
Patterson [455]sets
forming have
devised a finite
of possible suffixestest to
fortheunique decodability,
codewords which involves
and eliminating forming sets
them systematically.
ofThe
possible
test is suffixes
described to more
the codewords and eliminating
fully in Problem 5.5.27. The them
factsystematically.
that the last
• The expected length L(C) of a source code C(x) for a random variable X with probability mass

function p(x) is given by


X
L(C) = p(x)l(x), 5.2 KRAFT INEQUALITY 107
x2X

What
where 5.1isis
TABLEl(x) alengthGood
Classes
the of
of Codes Source
the codeword associatedCode?
with x.

• Nonsingular,
Without loss of generality, Butthat
we can assume Notthe D-ary
Uniquely
alphabet Decodable,
is D = {0, 1, . . . , D 1}.
X Singular Uniquely Decodable But Not Instantaneous Instantaneous
• A code is said to be nonsingular if every element of the range of X maps into a different string in
1 0 0 10 0
D2 ; that is, 0

010 00 10
3 0 01x 6= x0 =) C(x) 6= C(x0 ). 11 110
4 0 10 110 111

• The extension C ⇤ of a code C is the mapping from finite-length strings of X to finite-length strings

theD, first
of two
defined by bits are 11, we must look at the following bits. If the next
bit is a 1, the first source C(x1 x2 . symbol is1)C(x
. . xn ) = C(x a 3.2) . If then) length of the string of
. . C(x
0’s immediately following the 11 is odd, the first codeword must be 110
and the
where C(x1first
)C(x2source
) . . . C(xn )symbol
indicates must be 4;ofifcorresponding
concatenation the lengthcodewords.
of the string of 0’s is
even, the first source symbol is a 3. By repeating this argument, we can see
• A code is called uniquely decodable if its extension is nonsingular.
that this code is uniquely decodable. Sardinas and Patterson [455] have
• A code is called
devised a prefix
a finite testcode
fororunique
an instantaneous code if nowhich
decodability, codeword is a prefixforming
involves of any other
sets
of possible suffixes to the codewords and eliminating them systematically.
codeword.
The test is described more fully in Problem 5.5.27. The fact that the last
where C(x1 )C(x2 ) . . . C(xn ) indic

•A code is called uniquely decodab


5.2 KRAFT INEQUALITY
5.2 KRAFT INEQUALITY 107 107
• A code is called a prefix code or

What
TABLE 5.1isClasses
TABLE 5.1 a Good
Classes CodesSource Code?
ofofCodes codeword.
Nonsingular,
Nonsingular, But
But Not
Not Uniquely
Uniquely Decodable,
Decodable,
X
X Singular
Singular UniquelyDecodable
Uniquely Decodable ButBut
NotNot C
Instantaneous
Instantaneous Instantaneous
Instantaneous
11 00 00 10 10 0 0
22 00 010
010 00 00 10 10
33 00 0101 11 11 110 110
44 00 1010 110110 111 111

the first two bits are 11, we must look at the following bits. If the next
the first two bits are 11, we must look at the following bits. If the next
bit is a 1, the first source symbol is a 3. If the length of the string of
bit
0’s is a 1, the first
immediately sourcethesymbol
following is athe
11 is odd, 3. first
If the length must
codeword of thebe string
110 of
0’s
and immediately
the first source following
symbol mustthe 11 is odd,
be 4; if the the firstofcodeword
length the string must
of 0’sbeis 110
and
even,thethefirst
firstsource symbolismust
source symbol be repeating
a 3. By 4; if the length of the string
this argument, we canof see0’s is
even, thecode
that this first source
is uniquelysymbol is a 3. By
decodable. repeating
Sardinas andthis argument,
Patterson [455]we can see
have
that thisa code
devised finite istestuniquely decodable.
for unique Sardinas
decodability, whichand Patterson
involves forming[455]
setshave
devised
of possiblea finite
suffixestesttofor
the unique
codewords decodability, whichthem
and eliminating involves forming sets
systematically.
The
of test is described
possible suffixes tomore fully in Problem
the codewords 5.5.27. The them
and eliminating fact that the last
systematically.
• A code is said to be nonsingular if every element of the range of X maps into a different string in

D⇤ ; that is,

x 6= x0 =) C(x) 6= C(x 0
5.2). KRAFT INEQUALITY 107
What is a Good
TABLE 5.1 Classes of Codes
Source Code?
• The extension C ⇤ of a code C is the mapping from finite-length strings of X to finite-length strings

of D, defined by Nonsingular, But Not Uniquely Decodable,


X Singular Uniquely Decodable But Not Instantaneous Instantaneous
1 0 C(x
0 1 x2 . . . xn ) = C(x1 )C(x2 ) 10
. . . C(xn ) 0
2 0 010 00 10
3 C(x )C(x
where 0 01 11 110
1 2 ) . . . C(xn ) indicates concatenation of corresponding codewords.
4 0 10 110 111
• A code is called uniquely decodable if its extension is nonsingular.

A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other

the first two bits are 11, we must look at the following bits. If the next
bit is a 1, the first source symbol is a 3. If the length of the string of
codeword.
0’s immediately following the 11 is odd, the first codeword must be 110
C and the first source symbol must be 4; if the length of the string of 0’s is
even, the first source symbol is a 3. By repeating this argument, we can see
that this code is uniquely decodable. Sardinas and Patterson [455] have
devised a finite test for unique decodability, which involves forming sets
of possible suffixes to the codewords and eliminating them systematically.
• A code is said to be nonsingular if every element of the range of X maps into a different string in

D⇤ ; that is,

x 6= x0 =) C(x) 6= C(x 0
).
5.2 KRAFT INEQUALITY 107
What is a Good Source Code?
• TABLE
The 5.1 C
extension Classes
⇤ of Codes
of a code C is the mapping from finite-length strings of X to finite-length strings

of D, defined by Nonsingular, But Not Uniquely Decodable,


X Singular Uniquely Decodable But Not Instantaneous Instantaneous
1 0 C(x
0 1 x2 . . . xn ) = C(x1 )C(x2 )10
. . . C(xn ) 0
2 0 010 00 10
3 C(x 0)C(x ) . . . C(x ) indicates
where 01 11
concatenation of corresponding codewords.110
1 2 n
4 0 10 110 111
• A code is called uniquely decodable if its extension is nonsingular.

A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other

the first two bits are 11, we must look at the following bits. If the next
bit is a 1, the first source symbol is a 3. If the length of the string of
codeword.
0’s immediately following the 11 is odd, the first codeword must be 110
C and the first source symbol must be 4; if the length of the string of 0’s is
even, the first source symbol is a 3. By repeating this argument, we can see
that this code is uniquely decodable. Sardinas and Patterson [455] have
devised a finite test for unique decodability, which involves forming sets
of possible suffixes to the codewords and eliminating them systematically.
The nesting of these definitions is shown in Figure 5.1. To illustrate the
differences between the various kinds of codes, consider the examples of
codeword assignments C(x) to x ∈ X in Table 5.1. For the nonsingular
code, the code string 010 has three possible source sequences: 2 or 14 or
31, and hence the code is not uniquely decodable. The uniquely decodable
code is not prefix-free and hence is not instantaneous. To see that it is
uniquely decodable, take any code string and start from the beginning.
What is a Good Source Code?
If the first two bits are 00 or 10, they can be decoded immediately. If

All
codes

Nonsingular
codes

Uniquely
decodable
codes

Instantaneous
codes

FIGURE 5.1. Classes of codes.


where C(x1 )C(x2 ) . . . C(xn ) indicates concatenation of corresponding codewords.

• A code is called uniquely decodable if its extension is nonsingular.

Kraft’s

Inequality for Instantenous (Prefix Codes)?
A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other

codeword.

• Kraft Inequality:For any instantaneous code (prefix code) over an alphabet of size D, the codeword

lengths l1 , l2 , . . . , lm must satisfy the inequality

X
li
D  1.
i

Conversely, given a set of codeword lengths that satisfy this inequality, there exists an instantaneous

code with these word lengths.


codeword. A binary example of such a tree is shown in Figur
prefix condition
by a leaf on the on theThe
tree. codewords
path fromimpliesthe rootthat no codeword
traces out the symbo is a
ofcodeword.
any otherAcodeword
binary exampleon theoftree.suchHence,
a tree each codeword
is shown in Figure eli
descendants
prefix condition as possible codewords.
on the codewords implies that no codeword is an
ofLetanylmax
otherbe codeword
the lengthon of the
the tree.
longest codeword
Hence, of the setelim
each codeword of c
Kraft’s Inequality for Instantenous (Prefix Codes)?
Consider
descendants
some Letare
all nodes
lmaxdescendants
of thecodewords.
as possible
be the length
tree at level lmax . Some of them are c
ofofcodewords,
the longest and some of
codeword arethe
neither.
set of co A
108 DATA COMPRESSION
• Proof: atConsider
level li has D lmax −l
all nodes descendants
ofi the tree at level at level lmax . Each
lmax . Some of themof these
are co
sets
some mustare be disjoint. Also,
descendants the total number
of codewords, and someofare nodes in these
neither. Ac
lmax −li
beat less
levelthan
li hasorDequal to descendants
D lmax . Hence,at level lmax . Each
summing overof all
these thedec
wesetshave
must be disjoint. Also, the total number of nodes in these s
0
be less than or equal to D! lmax
. Hence, summing over all the co
we have D lmax −li ≤ D lmax
!
lmax −li lmax
or D ≤ D
Root

or !
10 D −li ≤ 1,
!
D −li ≤ 1,
which is the Kraft inequality.
110
Conversely, given any set of codeword lengths l1 , l2 , . . . , l
which is the Kraft inequality.
isfy Conversely,
the Kraft inequality,
given any we set can always construct
of codeword lengths l a, ltree
, . . .like
,l
111 1 2 m
isfy the Kraft inequality, we can always construct a tree like th
FIGURE 5.2. Code tree for the Kraft inequality.

by a leaf on the tree. The path from the root traces out the symbols of the
2

Optimal Source Codes


• Proof:

• Let us limit our scope only to (prefix) instantenous source codes with binary codewords, i.e., D=2.

• Let us also find the optimal prefix codes with the minimum expected length.

• This problem becomes minimizing


X
L= pi l i
i

over all integers l1 , l2 , . . . , lm while also satidfying the constraint

X
li
2  1.
i

• The solution of constrained optimization problems rely on combining the original cost function and

the constraint(s) with a linear relationship which results in the combined function called Lagrangian

X ✓X ◆
li
J(l1 , l2 , . . . , lm , ) = pi l i + 2 1 .
i i

• Then the optimum solution should satisfy the derivative conditions for each and every one of the

variables l1 , l2 , . . . , lm and also which is the Lagrangian multiplier.

@J
= pj 2 lj
loge 2 = 0, j = 1, . . . , m. (1)
@lj
X
• The solution of constrained optimization problems rely on combining the original cost function and

the constraint(s) with a linear relationship which results in the combined function called Lagrangian

X ✓X ◆
li

Optimal Source Codes


J(l1 , l2 , . . . , lm , ) = pi l i + 2 1 .
i i

• Then the optimum solution should satisfy the derivative conditions for each and every one of the

variables l1 , l2 , . . . , lm and also which is the Lagrangian multiplier.

@J
= pj 2 lj
loge 2 = 0, j = 1, . . . , m. (1)
@lj
@J X
= 2 li
1 = 0. (2)
@ i

• Notice that the derivative w.r.t. to each guarantees that the solution satisfies the corresponding

constraint.

• The first equation implies that:

li pi
2 = i = 1, . . . , m.
loge 2

• Substituting this in the second equation shows that loge 2 = 1. Therefore

pi = 2 li
() li⇤ = log pi i = 1, . . . , m.
Optimal Source Codes 3

• The corresponding optimum expected length would be:

X X
L⇤ = pi li⇤ = pi log pi = H(X).
i i

• Notice that some li⇤ = log pi ’s may not be integer valued. So the optimal lengths are actually found

as:
⇠ ⇡
1 1 1
log  li⇤ = log  log + 1.
pi pi pi

• Correspondingly H(X)  L⇤  H(X) + 1 (Proof is easy, just plug in the upper and lower bounds

in the L⇤ formula.)

• This is an important result which says that the lower bound on how much you can compress the data

is determined by the entropy.

• This results also says that the average number of bits per symbol is upper bounded by entropy plus

maximum 1 bit in the optimal case.

• Notice that if we are to encode not one symbol but n symbols in total, (X1 , X2 , . . . , Xn . Then we

can define the expected length of the entire sequence as:

X X
• This is an important result which says that the lower bound on how much you can compress the data

is determined by the entropy.

• This results also says that the average number of bits per symbol is upper bounded by entropy plus

Optimal Source Codes


maximum 1 bit in the optimal case.

• Notice that if we are to encode not one symbol but n symbols in total, (X1 , X2 , . . . , Xn ). Then we

can define the expected length of the entire sequence as:

X X
E[L(X1 , X2 , . . . , Xn )] = ... p(x1 , . . . , xn )l(x1 , . . . , xn )

• We can follow similar steps as before to show that

H(X1 , X2 , . . . , Xn )  E[L(X1 , X2 , . . . , Xn )]  H(X1 , X2 , . . . , Xn ) + 1

P
• If X1 , X2 , Xn are i.i.d., then H(X1 , X2 , . . . , Xn ) = i H(Xi ) = nH(X). So we can show that

expected number of bits per symbol becomes

E[L(X1 , X2 , . . . , Xn )] 1
H(X)   H(X) + .
n n

• That is to say, for long sequences of symbols the optimal number of bits per symbol is determined

by the entropy with a very tight bound.


Optimality is OK but Where is the Source Code? 4

• This is actually known as the Shannon source coding scheme. It is useful in identifying the opti-

mality conditions and bounds, in determining the optimal codes but it does not help us constructing

codewords.

• It is all about the lengths of the codewords and not about the codewords themselves.

• How can we construct optimal source codes given a PMF?


• This is actually known as the Shannon source coding scheme. It is useful in identifying the opti-

mality conditions and bounds, in determining the optimal codes but it does not help us constructing

codewords.

• It is all about the lengths of the codewords and not about the codewords themselves.

First…
• Properties
How can we ofcodesAngivenOptimal
construct optimal source a PMF? Code
• Without loss of generality, we can assume that the probability masses are ordered, so that p1 p2

··· pm .
P
• Recall that a code is optimal if pi li is minimal.

• Then for any distribution, there exists an optimal instantaneous code (with minimum expected length)

that satisfies the following properties:

1) The lengths are ordered inversely with the probabilities (i.e., if pj > pk , then lj  lk ).

2) The two longest codewords have the same length.

3) Two of the longest codewords differ only in the last bit and correspond to the two least likely

symbols.
j k j k

2) The two longest codewords have the same length.

3) Two of the longest codewords differ only in the last bit and correspond to the two least likely

Huffman
symbols. Codes
• Huffman proposed a method tor generating an optimal source code satisying these conditions. The

procedure for binary codes is as follows:

1) Sort all symbols and probabilities in descending order of probability. If multiple probabilities

are the same, it does not matter in which order they are written.

2) Merge the two symbols with the lowest probabilities into a new symbol, sum up the probabilities.

Re-sort the new probabilities in descending order. Label one branch with bit 1 and the other

with bit 0.

3) Repeat step 2 with the new probabilities. (Be careful and be consistent with your ordering and

labeling throughout.)

4) When you sum up to a single symbol with probability 1, trace back the branches to come up

with the codeword assignments.

• If we consider D-ary codewords, in each step we combine D probabilities. Notice that if D > 2 in

order to have D probabilities to combine at each stage, you may need to add some 0 probability

outcomes at the beginning.


10
01 21 0.250.25 0.250.3 0.3 0.45 0.45 0.55 1
11
10 32 0.20.25 0.250.25 0.25 0.3 0.45
000
11 43 0.150.2 0.20.25 0.25
001
000 54 0.150.15 0.2
Huffman
001 Codes
5 0.15
Codewords X Probability
0 1 0.5 0.5 0.5 0 0
1
Codewords
10 2
X 0.25 0.25 0 Probability
0.5
1 1
5
5
0
0
110 31 0.1250.5 0 0 0.250.51 1
0.5 1
10
111 42 0.25
0.125
1
1 0.25 0.5 5
110 3 0.125 0.25
• 111 4
L(C) = H(X) = 1.75. Why? 0.125
• Huffman Codes are not necessarily unique.
Codewords X Probability
• D = {0, 1, 2}.
1 1 0.5 0.5 0.5 0
0 1
1
L(C) = 2 1.85 = H(X) < L(C)
01 2 0.25 0.251 0.5
0 5

0 1
P 5
000 3 0.1251 D 0.25 D
0
• Notice that for D-ary codewords, H D (X)  L ⇤
 H (X)
0 + 1 where H1 (X) =
i pi logD pi .

001
• L(C) = H(X) 4 0.125 1
• Huffman Codes are not necessarily unique.

1
Codewords
111 X4 0.125 Probability
1 1 0.5 0.5 0.5 1
01
Codewords 2X 0.25 0.25 0.5
Probability
000
1 31 0.125
0.5 0.25
0.5 0.5 1
Huffman Codes
001
01 42 0.125
0.25 0.25 0.5 5

000 3 0.125 0.25 5

Codewords
001 X4 •0.125
L(C) = H(X). Why?
L(C) = H(X). Why?

Probability 5
Huffman Codes are not necessarily unique. 0 1
1 1 1/3 Codes are not necessarily
1/3 unique. 0 1 2/3 1
• 5
•Huffman
L(C) = H(X). Why? 5

00 2XH(X). Why? Huffman 1/3 1/3 1/3



Codewords L(C) =

• Codes are not necessarily unique. 0 1
Probability 5

L(C) = H(X). Why?



010
1 =3H(X).
Huffman
••
L(C) 1CodesWhy? 1/4
1/3
are not necessarily unique. 0 1
1/3
1/3 2/3 1
Huffman Codes are not necessarily unique.
011 4 1/12


00
D = {0, 1, 2}.
Huffman

2 1/3
Codes are not necessarily unique. 0 1
1/3 1/3

010
L(C) = 2 1.85 = H(X) < L(C)
3 1/4 1/3
Codewords
011 X4 1/12 Probability
5
00 1 1/3 1/3 2/3 1 5

01
Codewords 2X 1/3
L(C) = H(X). Why? 1/3

•L(C) = H(X). Why?
1/3
Probability 5
5
5

Huffman Codes are not necessarily unique. 0 1


10
00 L(C) 3=1H(X). Why? 1/4
1/3 Codes areL(C) 1/3
1/3 Why? 0 1 2/3 1

• Huffman
• • not necessarily
= H(X). unique. 5
5
• L(C) = H(X). Why? 5

111
01 L(C)
• =
42 Codes
Huffman •
H(X). Why?
1/12
1/3
are not necessarily 1/3
unique. 0 1 Huffman

1/3unique. 0 1
Codes are not necessarily
Huffman Codes are not necessarily unique. 0 1
L(C) = H(X). Why?
L(C) = H(X). Why?

10 3 are not necessarily1/4 1/3


• •

Huffman Codes are Huffman unique.


not necessarily
• Codes unique. 0 1
• Huffman Codes are not necessarily unique.
11 4 1/12

• D = {0, 1, 2}. •D = {0, 1, 2}.


• L(C) = 2 1.85 =
• H(X)
L(C) =<2L(C)
1.85 = H(X) < L(C)
Huffman Codes

Codewords X Probability 5 5
0
01 1 0.25 0.3 0.45 0 0.55 0
1 5
5
0
10 2 0.25 0.25 0 0 0.3 1 0.45 1 1
0 1
11 3 0.2 0 0.25 1 1 0.25
1
000 4 0.15 1 0.2
001 5 0.15
code has average length 2.3 bits.
ple 5.6.2 Consider a ternary code for the same random variable.
we combine the three least likely symbols into one supersymbol and
Non-binary
the following table: Huffman Codes (D>2)
Codeword X Probability
1 1 0.25 0.5 1
2 2 0.25 0.25
00 3 0.2 0.25 5

01 4 0.15
L(C) = H(X). Why?
02 5 0.15

• Huffman Codes are not necessarily unique.

• D = {0, 1, 2}.
code has an average length of 1.5 ternary digits.
ple 5.6.3 If D ≥ 3, we may not have a sufficient number of sym-
o that we can combine them D at a time. In such a case, we add
y symbols to the end of the set of symbols. The dummy symbols
robability 0 and are inserted to fill the tree. Since at each stage of
uction, the number of symbols is reduced by D − 1, we want the
Example 5.6.3 If D ≥ 3, we may not have a sufficient number of sym-
bols so that we can combine them D at a time. In such a case, we add
dummy symbols to the end of the set of symbols. The dummy symbols
have probability 0 and are inserted to fill the tree. Since at each stage of
the reduction, the number of symbols is reduced by D − 1, we want the
Non-binary Huffman Codes (D>2)
total number of symbols to be 1 + k(D − 1), where k is the number of
merges. Hence, we add enough dummy symbols so that the total number
of symbols is of this form. For example:

• L(C) = H(X). Why?

• Huffman Codes are not necessarily unique.

• D = {0, 1, 2}.

• L(C) = 2 1.85 = H(X) < L(C)


P
This •code hasthatanforaverage
Notice length of H1.7
D-ary codewords, ternary
D (X) digits.
 L⇤  HD (X) + 1 where HD (X) = i pi logD pi .
Summary
• We have learned the limits of data compression and how to construct
optimum (in the expected codeword length sense) source codes.
• The entropy serves as the lower limit possible average number of bits
(or digits) per symbol. You cannot breach the entropy wall in
compressing data (on average).
• We have also seen an elegant method of contructing optimal codes.
Huffman codes.
• Next, we will move to the channel side and see fundamental limits on
channel capacity.

You might also like