0% found this document useful (0 votes)
6 views

Lecture7

Rate Distortion Theory focuses on the compression of random variables where reconstruction is not perfect but meets a specified fidelity criterion. It contrasts with lossless source coding, which requires that the rate exceeds the entropy of the source for error-free reconstruction. The theory includes concepts of lossy compression, distortion measures, and rate distortion codes, which allow for quantized representations of sequences while tolerating some level of distortion.

Uploaded by

zhangxbkimmich
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lecture7

Rate Distortion Theory focuses on the compression of random variables where reconstruction is not perfect but meets a specified fidelity criterion. It contrasts with lossless source coding, which requires that the rate exceeds the entropy of the source for error-free reconstruction. The theory includes concepts of lossy compression, distortion measures, and rate distortion codes, which allow for quantized representations of sequences while tolerating some level of distortion.

Uploaded by

zhangxbkimmich
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Information Theory

Lecture 7: Rate Distortion Theory

Basak Guler

1 / 19
Rate Distortion Theory
• So far in the course we have dealt with compressing a (discrete)
source as well as transmitting a (discrete or continuous) signal over a
channel reliably.

2 / 19
Rate Distortion Theory
• So far in the course we have dealt with compressing a (discrete)
source as well as transmitting a (discrete or continuous) signal over a
channel reliably.

• In the “source coding” part our assumption was that we should be able
to reconstruct the original source from its compressed representation
perfectly. In other words, the compression was lossless.

2 / 19
Rate Distortion Theory
• Rate distortion theory deals with compression (representation) of a
(sequence of) random variables so that the reconstruction is not
perfect but is as good as required with respect to a fidelity criterion.

3 / 19
Rate Distortion Theory
• Rate distortion theory deals with compression (representation) of a
(sequence of) random variables so that the reconstruction is not
perfect but is as good as required with respect to a fidelity criterion.

• This comes naturally for random variables with continuous alphabets


as they are impossible to perfectly represent with finite precision. For
discrete alphabets we can also consider this framework.

3 / 19
Rate Distortion Theory
• Rate distortion theory deals with compression (representation) of a
(sequence of) random variables so that the reconstruction is not
perfect but is as good as required with respect to a fidelity criterion.

• This comes naturally for random variables with continuous alphabets


as they are impossible to perfectly represent with finite precision. For
discrete alphabets we can also consider this framework.

• Intuitively, the motivation behind doing so is to be able to compress a


source further by tolerating some loss/distortion during reconstruction
i.e., lossy compression.

3 / 19
Lossless Source Coding

• Let (X1 , . . . , Xn ) be a sequence of of n i.i.d. realizations of X .

• Now consider an encoding scheme that maps each (X1 , . . . , Xn )


sequence to an index from {1, . . . , 2nR }.

• At the end, the decoder makes an estimate X̂ n of X n .

Source symbols

X n = (X1 , . . . , Xn ) f (X n ) 2 {1, . . . , 2nR } X̂ n


Source Encoder Decoder

4 / 19
Lossless Source Coding
• Recall that the amount of information contained in a source X is equal
to its entropy H(X ).

• In Lecture 2 (application of AEP to data compression), we saw that if


R > H(X ), then lossless reconstruction is possible (error-free
decoding as n → ∞).

5 / 19
Lossless Source Coding
• Recall that the amount of information contained in a source X is equal
to its entropy H(X ).

• In Lecture 2 (application of AEP to data compression), we saw that if


R > H(X ), then lossless reconstruction is possible (error-free
decoding as n → ∞).

• This is because there are about 2nH(X ) typical sequences.

5 / 19
Lossless Source Coding
• Recall that the amount of information contained in a source X is equal
to its entropy H(X ).

• In Lecture 2 (application of AEP to data compression), we saw that if


R > H(X ), then lossless reconstruction is possible (error-free
decoding as n → ∞).

• This is because there are about 2nH(X ) typical sequences.

• If R > H(X ), we can assign each typical sequence to a unique index.

5 / 19
Lossless Source Coding
• Recall that the amount of information contained in a source X is equal
to its entropy H(X ).

• In Lecture 2 (application of AEP to data compression), we saw that if


R > H(X ), then lossless reconstruction is possible (error-free
decoding as n → ∞).

• This is because there are about 2nH(X ) typical sequences.

• If R > H(X ), we can assign each typical sequence to a unique index.

• As n → ∞, the total probability of all “atypical” sequences will go to 0,


so the reconstruction error will go to zero.

5 / 19
Lossless Source Coding
• Recall that the amount of information contained in a source X is equal
to its entropy H(X ).

• In Lecture 2 (application of AEP to data compression), we saw that if


R > H(X ), then lossless reconstruction is possible (error-free
decoding as n → ∞).

• This is because there are about 2nH(X ) typical sequences.

• If R > H(X ), we can assign each typical sequence to a unique index.

• As n → ∞, the total probability of all “atypical” sequences will go to 0,


so the reconstruction error will go to zero.

• In lossless source coding, we must have R > H(X ), to guarantee


P[X n 6= X̂ n ] ≤  as n → ∞.

5 / 19
Lossy Source Coding
• In lossy Compression (Rate-Distortion Theory):

1) We have a continuous or discrete random variable X .


2) We represent the source with rate R < H(X ), i.e., by using R bits
on average (i.e., R bits per symbol).
3) We have a fidelity criterion known as a distortion measure.

• Source will be from an alphabet x ∈ X , reconstructed symbols will be


from an alphabet x̂ ∈ X̂ .

• We may have X = X̂ , but they could be different in general.

6 / 19
Lossy Source Coding
• Reconstruction performance is determined by a distortion measure.

• Definition 47. A distortion measure is a mapping

d(x, x̂) : X × X̂ → [0, ∞) (1)

that represents the cost (loss) of representing x by x̂.

• The distortion measure is bounded:

dmax = max d(x, x̂) < ∞ (2)


x∈X ,x̂∈X̂

7 / 19
Hamming Distortion
• Example 28. A widely used distortion measure is the Hamming
distortion (probability of error):

0 if x = x̂
d(x, x̂) = (3)
1 if x 6= x̂

Then, the expected distortion E[d(X , X̂ )] = P[X 6= X̂ ] = Pe represents


the probability of error.

• Used mainly for discrete sources, e.g., binary source where X is a


Bernoulli random variable.

8 / 19
Squared Error Distortion
• Example 29. Another widely used distortion measure is the squared
error distortion:

d(x, x̂) = (x − x̂)2 (4)

Then, the expected distortion E[d(X , X̂ )] = E[(X − X̂ )2 ] represents


the mean-squared error.

• Used mainly for continuous sources, e.g., X is a Gaussian source.

9 / 19
Distortion Between Sequences
• We define the distortion between two sequences as:
n
1X
d(x n , x̂ n ) = d(xi , x̂i ) (5)
n
i=1

10 / 19
Distortion Between Sequences
• We define the distortion between two sequences as:
n
1X
d(x n , x̂ n ) = d(xi , x̂i ) (5)
n
i=1

• The distortion is an average of the per-symbol distortion measure over


all the n elements in the sequences.

10 / 19
Distortion Between Sequences
• We define the distortion between two sequences as:
n
1X
d(x n , x̂ n ) = d(xi , x̂i ) (5)
n
i=1

• The distortion is an average of the per-symbol distortion measure over


all the n elements in the sequences.

• This is not the only possible way to define the distortion between
sequences of symbols (for instance, we could look at the whole
sequence instead of symbols one by one), but it is simple and is
easier to analyze.

10 / 19
Rate Distortion Code
• A (2nR , n) rate distortion code consists of:

1. An encoding function fn :

fn : X n → {1, . . . , 2nR } (6)

2. A decoding function gn :

gn : {1, . . . , 2nR } → X̂ n (7)

2. The distortion associated with the code:


X
D = E[d(X n , gn (fn (X n )))] = p(x n )d(X n , gn (fn (X n ))) (8)
xn

11 / 19
Rate Distortion Code
• We can think of X̂ n = gn (fn (X n )) as a quantized version of X n .

• The regions denoted by fn−1 (1), . . . , fn−1 (2nR ) each map a set of X n
sequences to a specific index, which is then mapped to a specific X̂ n
sequence.

12 / 19
Rate Distortion Code
• We can think of X̂ n = gn (fn (X n )) as a quantized version of X n .

• The regions denoted by fn−1 (1), . . . , fn−1 (2nR ) each map a set of X n
sequences to a specific index, which is then mapped to a specific X̂ n
sequence.

• In other words, we partition the set of all X n sequences into


non-overlapping groups, and represent each group with a unique
index (corresponding to a specific reconstruction X̂ n ).

• At the end, each X n is replaced by its quantized version X̂ n .

12 / 19
Rate Distortion Code
• We can think of X̂ n = gn (fn (X n )) as a quantized version of X n .
<latexit sha1_base64="eHypsSjP+f4CS23dKxc4/5LXhh8=">AAAB83icbVBNS8NAEJ34WetX1aOXxSLUS0lE1GPRi8cK9gOaWDbbbbt0swm7E6GE/g0vHhTx6p/x5r9x2+agrQ8GHu/NMDMvTKQw6Lrfzsrq2vrGZmGruL2zu7dfOjhsmjjVjDdYLGPdDqnhUijeQIGStxPNaRRK3gpHt1O/9cS1EbF6wHHCg4gOlOgLRtFKvj+kmLUnj6rinXVLZbfqzkCWiZeTMuSod0tffi9macQVMkmN6XhugkFGNQom+aTop4YnlI3ogHcsVTTiJshmN0/IqVV6pB9rWwrJTP09kdHImHEU2s6I4tAselPxP6+TYv86yIRKUuSKzRf1U0kwJtMASE9ozlCOLaFMC3srYUOqKUMbU9GG4C2+vEya51XvsurdX5RrN3kcBTiGE6iAB1dQgzuoQwMYJPAMr/DmpM6L8+58zFtXnHzmCP7A+fwBRo2RLw==</latexit>

X̂ n (1)

.
<latexit sha1_base64="/Guotl4XrTqU7LUHAwcCsbSR40o=">AAAB83icbVBNS8NAEJ3Ur1q/qh69BItQLyUpoh6LXjxWsB/QxLLZbtqlm03YnQgl9G948aCIV/+MN/+N2zYHbX0w8Hhvhpl5QSK4Rsf5tgpr6xubW8Xt0s7u3v5B+fCoreNUUdaisYhVNyCaCS5ZCzkK1k0UI1EgWCcY3878zhNTmsfyAScJ8yMylDzklKCRPG9EMOtOH2W1ft4vV5yaM4e9StycVCBHs1/+8gYxTSMmkQqidc91EvQzopBTwaYlL9UsIXRMhqxnqCQR0342v3lqnxllYIexMiXRnqu/JzISaT2JAtMZERzpZW8m/uf1Ugyv/YzLJEUm6WJRmAobY3sWgD3gilEUE0MIVdzcatMRUYSiialkQnCXX14l7XrNvay59xeVxk0eRxFO4BSq4MIVNOAOmtACCgk8wyu8Wan1Yr1bH4vWgpXPHMMfWJ8/SBKRMA==</latexit>

X̂ n (2)

.
.
<latexit sha1_base64="BWjWcBGnUz6eKNTIOdWwDwc1B/Y=">AAAB+nicbVBNT8JAEN36ifhV9OhlIzHBC2mJUY9ELx7RyEcChWyXLWzYbpvdqYZUfooXDxrj1V/izX/jAj0o+JJJXt6bycw8PxZcg+N8Wyura+sbm7mt/PbO7t6+XTho6ChRlNVpJCLV8olmgktWBw6CtWLFSOgL1vRH11O/+cCU5pG8h3HMvJAMJA84JWCknl3oDAmkrUlXlirdVN5NTnt20Sk7M+Bl4makiDLUevZXpx/RJGQSqCBat10nBi8lCjgVbJLvJJrFhI7IgLUNlSRk2ktnp0/wiVH6OIiUKQl4pv6eSEmo9Tj0TWdIYKgXvan4n9dOILj0Ui7jBJik80VBIjBEeJoD7nPFKIixIYQqbm7FdEgUoWDSypsQ3MWXl0mjUnbPy+7tWbF6lcWRQ0foGJWQiy5QFd2gGqojih7RM3pFb9aT9WK9Wx/z1hUrmzlEf2B9/gC9B5Op</latexit>

X̂ n (2nR )

• The regions fn−1 (1), . . . , fn−1 (2nR ) each map a group of X n sequences
to a specific index, which is then mapped to a specific X̂ n sequence.

13 / 19
Rate Distortion Code
• We can think of X̂ n = gn (fn (X n )) as a quantized version of X n .
<latexit sha1_base64="eHypsSjP+f4CS23dKxc4/5LXhh8=">AAAB83icbVBNS8NAEJ34WetX1aOXxSLUS0lE1GPRi8cK9gOaWDbbbbt0swm7E6GE/g0vHhTx6p/x5r9x2+agrQ8GHu/NMDMvTKQw6Lrfzsrq2vrGZmGruL2zu7dfOjhsmjjVjDdYLGPdDqnhUijeQIGStxPNaRRK3gpHt1O/9cS1EbF6wHHCg4gOlOgLRtFKvj+kmLUnj6rinXVLZbfqzkCWiZeTMuSod0tffi9macQVMkmN6XhugkFGNQom+aTop4YnlI3ogHcsVTTiJshmN0/IqVV6pB9rWwrJTP09kdHImHEU2s6I4tAselPxP6+TYv86yIRKUuSKzRf1U0kwJtMASE9ozlCOLaFMC3srYUOqKUMbU9GG4C2+vEya51XvsurdX5RrN3kcBTiGE6iAB1dQgzuoQwMYJPAMr/DmpM6L8+58zFtXnHzmCP7A+fwBRo2RLw==</latexit>

X̂ n (1)

.
<latexit sha1_base64="/Guotl4XrTqU7LUHAwcCsbSR40o=">AAAB83icbVBNS8NAEJ3Ur1q/qh69BItQLyUpoh6LXjxWsB/QxLLZbtqlm03YnQgl9G948aCIV/+MN/+N2zYHbX0w8Hhvhpl5QSK4Rsf5tgpr6xubW8Xt0s7u3v5B+fCoreNUUdaisYhVNyCaCS5ZCzkK1k0UI1EgWCcY3878zhNTmsfyAScJ8yMylDzklKCRPG9EMOtOH2W1ft4vV5yaM4e9StycVCBHs1/+8gYxTSMmkQqidc91EvQzopBTwaYlL9UsIXRMhqxnqCQR0342v3lqnxllYIexMiXRnqu/JzISaT2JAtMZERzpZW8m/uf1Ugyv/YzLJEUm6WJRmAobY3sWgD3gilEUE0MIVdzcatMRUYSiialkQnCXX14l7XrNvay59xeVxk0eRxFO4BSq4MIVNOAOmtACCgk8wyu8Wan1Yr1bH4vWgpXPHMMfWJ8/SBKRMA==</latexit>

X̂ n (2)

.
.
<latexit sha1_base64="BWjWcBGnUz6eKNTIOdWwDwc1B/Y=">AAAB+nicbVBNT8JAEN36ifhV9OhlIzHBC2mJUY9ELx7RyEcChWyXLWzYbpvdqYZUfooXDxrj1V/izX/jAj0o+JJJXt6bycw8PxZcg+N8Wyura+sbm7mt/PbO7t6+XTho6ChRlNVpJCLV8olmgktWBw6CtWLFSOgL1vRH11O/+cCU5pG8h3HMvJAMJA84JWCknl3oDAmkrUlXlirdVN5NTnt20Sk7M+Bl4makiDLUevZXpx/RJGQSqCBat10nBi8lCjgVbJLvJJrFhI7IgLUNlSRk2ktnp0/wiVH6OIiUKQl4pv6eSEmo9Tj0TWdIYKgXvan4n9dOILj0Ui7jBJik80VBIjBEeJoD7nPFKIixIYQqbm7FdEgUoWDSypsQ3MWXl0mjUnbPy+7tWbF6lcWRQ0foGJWQiy5QFd2gGqojih7RM3pFb9aT9WK9Wx/z1hUrmzlEf2B9/gC9B5Op</latexit>

X̂ n (2nR )

• The regions fn−1 (1), . . . , fn−1 (2nR ) each map a group of X n sequences
to a specific index, which is then mapped to a specific X̂ n sequence.
• In other words, we partition the set of all X n sequences into
non-overlapping groups, and represent each group with a unique
index (corresponding to a specific reconstruction X̂ n ).
• At the end, each X n is replaced by its quantized version X̂ n , and
gn (1) = X̂ n (1), . . ., gn (2nR ) = X̂ n (2nR ) is called a codebook.
13 / 19
Rate Distortion Pair
• A rate distortion pair (R, D) is said to be achievable if there exists a
sequence of (2nR , n) codes (fn , gn ) with E[d(X n , X̂ n )] ≤ D as n → ∞.

• X̂ n = gn (fn (X n )) is called the reconstruction, quantized version,


source code, estimate of X n .

14 / 19
Rate Distortion Region
• The rate distortion region for a source code is the closure of the set of
all achievable (R, D) (rate-distortion) pairs.

15 / 19
Rate Distortion Function
• The rate distortion function (R, D) is the infimum of rates R such that
(R, D) is in the rate-distortion region of the source X for a given
distortion D.

R(D)

D
• For a given distortion D, R(D) quantifies the minimum rate necessary.

16 / 19
Rate Distortion Theorem
As before, we will have 3 ingredients. Recall that:

• In lossless compression:
1. How many bits are needed to represent the source: H
n
2. Achievability: Index typical sequences E[ l(Xn ) ] ∼ H + 
3. Converse: Solving the optimal “decodable” codebook length
H ≤ E[l(X )].

17 / 19
Rate Distortion Theorem
As before, we will have 3 ingredients. Recall that:

• In lossless compression:
1. How many bits are needed to represent the source: H
n
2. Achievability: Index typical sequences E[ l(Xn ) ] ∼ H + 
3. Converse: Solving the optimal “decodable” codebook length
H ≤ E[l(X )].

• In channel capacity:
1. How many bits can we transmit through the channel reliably per
channel use: C = maxp(x) I(X ; Y ).
2. Achievability: Random coding arguments. If R < C → Pe ≤ .
3. Converse: Fano’s inequality, Data Processing Inequality (DPI). If
R > C → Pe > .

17 / 19
Rate Distortion Theorem
• In rate distortion theory:
1. How many bits do we need to represent the source X to have a
distortion D: R(D)
Definition (Information rate distortion function). Given a source X
with PMF p(x), distortion measure d(x, x̂), amount of “tolerable”
distortion D, the rate distortion function is equal to:

R(D) = min I(X ; X̂ ) (9)


p(x̂|x)
E[d(X ,X̂ )]≤D

= min I(X ; X̂ ) (10)


P p(x̂|x)
x,x̂ p(x̂|x)p(x)d(x,x̂)≤D

Solve for p(x̂|x) to minimize I(X ; X̂ ).


2. Achievability
3. Converse

18 / 19
Binary (Bernoulli) source with Hamming Distortion
• Example 30. The rate distortion function of a Bernoulli source X with
PMF p(x):

p x =1
p(x) = (11)
1−p x =0
where p ≤ 1/2, and Hamming distortion

0 if x = x̂
d(x, x̂) = (12)
1 if x 6= x̂

with a maximum average distortion constraint of D ≤ 1/2 is given by:

R(D) = H(p) − H(D) (13)


• Note that for D ∈ [0, 1/2], H(D) increases as D increases.
• D = 0 corresponds to lossless source coding/compression.
• Hence, the number of bits necessary decreases as the distortion D
sincreases.
19 / 19

You might also like