Information Theory and Coding (Lecture 1) : Dr. Farman Ullah
Information Theory and Coding (Lecture 1) : Dr. Farman Ullah
(Lecture 1)
1
What is Information Theory about?
It is a mathematical theory of information.
Information is usually obtained by getting some "messages" (speech, text,
images, etc.) from others.
When obtaining information from a message, you may care about:
What is the meaning of a message?
How important is the message?
What is Information Theory about?
It is a mathematical theory of information.
Information is usually obtained by getting some "messages" (speech, text,
images, etc.) from others.
When obtaining information from a message, you may care about:
What is the meaning of a message?
How important is the message?
How much information can I get from the message?
Claude E. Shannon
(1916 2001)
Course Overview
Information theory
a mathematical theory of communication
Course Overview
Information theory
the mathematical theory of
communication
Origin of Information Theory
Course Overview
Origin of Information Theory
Course Overview
Shannon's 1948 paper is generally considered as the "birth" of information theory and (modern)
digital communication.
It was set clear that information theory is about the quantication of information.
In particular, it focuses on characterizing the necessary and sucient condition of whether or
not a destination terminal can reproduce a message generated by a source terminal.
Course Overview General Overview of Information Theory
1 Stochastic modeling
It is a unied theory based on stochastic modeling (information source, noisy channel, etc.).
C: Channel capacity.
Fundamental Theorems
H Source R Channel Channel
Source
coder coder
C
This will include the following topics: theory of compression and coding of
image and speech signals based on Shannon's information theory;
introduction to information theory (entropy, etc.); characteristics of sources
such as voice and image; sampling theorem; methods and properties of
lossless and lossy coding; vector quantization; transform coding; and
subband coding.
What is Channel Coding Theory?
Course Overview General Overview of Information Theory
0.7 b
P(Xk+1 = a | Xk = a) = 0.3
a 0.5
1.0 P(Xk+1 = b | Xk = a) = 0.7
0.3
P(Xk+1 = c | Xk = a) = 0
0.2 c
0.3
The Markov Source
So, if Xk+1 = b, we know that Xk+2 will
equal c.
0.7 b
P(Xk+2 = a | Xk+1 = b) = 0
a 0.5
1.0 P(Xk+2 = b | Xk+1 = b) = 0
0.3
P(Xk+2 = c | Xk+1 = b) = 1
0.2 c
0.3
The Markov Source
If all the states can be reached, the
stationary probabilities for the states can be
calculated from the given transition
probabilities.
Sta
Markov models can be used to represent pro
sources with dependencies more than one kw
step back.
Use a state diagram with several symbols in
each state.
Analysis and Synthesis
Stochastic models can be used for
analysing a source.
Find a model that well represents the real-world
source, and then analyse the model instead of
the real world.
Stochastic models can be used for
synthesizing a source.
Use a random number generator in each step of
a Markov model to generate a sequence
simulating the source.
Part 3: Information and Entropy
Assume a binary memoryless source, e.g., a flip of
a coin. How much information do we receive when
we are told that the outcome is heads?
If its a fair coin, i.e., P(heads) = P (tails) = 0.5, we say
that the amount of information is 1 bit.
If we already know that it will be (or was) heads, i.e.,
P(heads) = 1, the amount of information is zero!
If the coin is not fair, e.g., P(heads) = 0.9, the amount of
information is more than zero but less than one bit!
Intuitively, the amount of information received is the
same if P(heads) = 0.9 or P (heads) = 0.1.
Self Information
So, lets look at it the way Shannon did.
Assume a memoryless source with
alphabet A = (a1, , an)
symbol probabilities (p1, , pn).
How much information do we get when
finding out that the next symbol is ai?
According to Shannon the self information
of ai is
Why?
Assume two independent events A and B, with
probabilities P(A) = pA and P(B) = pB.
Example 2:
Which logarithm? Pick the one you like! If you pick the natural log,
youll measure in nats, if you pick the 10-log, youll get Hartleys,
if you pick the 2-log (like everyone else), youll get bits.
Self Information
Let
Of t e
n de
Then note
d
1
The uncertainty (information) is greatest when
0 0.5 1
Entropy: Three properties
1. It can be shown that 0 H log N.
2. Maximum entropy (H = log N) is reached
when all symbols are equiprobable, i.e.,
pi = 1/N.
3. The difference log N H is called the
redundancy of the source.
Part 4: Entropy for Memory Sources
That is, the summation is done over all possible combinations of n symbols.