Itc Mod 1 - 1
Itc Mod 1 - 1
Information Measures
Basic terms:
Sample space: A set of all possible outcomes.
Mutually Exclusive Events: Events that are mutually exclusive cannot occur at the
same time.
Example: Head and tail can't occur simultaneously after tossing a coin.
Exhaustive Events: When the set of all experiment results is the same as the sample
space.
Probability: Measure of the occurrence of an event.
0 Probability 1
Mathematically, 𝒏𝑨
𝒑 𝑨 𝐥𝐢𝐦
𝒏→ 𝒏
𝒑 𝑨𝑩 𝒑 𝑨 . 𝒑 𝑩⌊𝑨 ⋯ 𝟏
where, 𝑝 𝐵 ⌊𝐴 Probability of occurrence of event 𝐵 on the condition that
event 𝐴 has already occurred.
Note: The above two equations are called Bayes’ theorem of conditional probability.
Note that: When the two events 𝐴 and 𝐵 are independent 𝑖. 𝑒.,
𝑖. 𝑒., 𝒑 𝑨𝑩 𝒑 𝑨 ·𝒑 𝑩
𝑝 𝑋 𝑝 𝑥, 𝑦 𝑑𝑦
Marginal
probability
𝑝 𝑌 𝑝 𝑥, 𝑦 𝑑𝑥
Joint
𝑝 𝑋, 𝑌 𝑝 𝑥, 𝑦 𝑑𝑥𝑑𝑦 probability
𝑝 𝑋, 𝑌
𝑝 𝑋𝑌
𝑝 𝑌 Conditional
𝑝 𝑋, 𝑌 probability
𝑝 𝑌𝑋
𝑝 𝑋
Note: It satisfies the conditions,
1‐D Case:
2‐D Case:
Mathematically,
𝑝 𝑋 𝑝 𝑥 PMF
Marginal probabilities
𝑝 𝑌 𝑝 𝑦
𝑝 𝑋, 𝑌 𝑝 𝑥 ,𝑦 Joint probabilities
𝑝 𝑋|𝑌 𝑝 𝑥 , 𝑦 𝑝 𝑥 |𝑦
Conditional probabilities
𝑝 𝑌|𝑋 𝑝 𝑥 , 𝑦 𝑝 𝑦 |𝑥
Note:
i). 𝑝 𝑥, 𝑦 0
ii). 𝑝 𝑥, 𝑦 1
vi). 𝜎 𝑋 𝐸 𝑋 𝐸 𝑋 (Variance)
Block Diagram of a Basic Communication System
Introduction to Information Theory
Information Theory
A mathematical approach to
study coding, quantification,
storage, and communication
of information.
𝑼𝒏𝒄𝒆𝒓𝒕𝒂𝒊𝒏𝒕𝒚 ∝ 𝑺𝒖𝒓𝒑𝒓𝒊𝒔𝒆
Example: Consider a source transmitting symbols 0 or 1 respectively to represent
whether condition as follows:
1‐bit
Source 𝟎/𝟏 RX:
𝟎 2‐bits
Source 𝟎, 𝟏 RX:
𝟏
Gaussian
Discrete Memoryless Source:
Discrete
Memoryless
Source
(DMS)
Information Measure:
The amount of information associated with a message is inversely proportional
to the probability of occurrence of the symbol used to represent the message.
𝟏
𝐈𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 or, Uncertainty, Or, Surprise ∝
𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲
𝟏
⟹ 𝑰 𝑿 𝒍𝒐𝒈𝟐 𝐎𝐫, 𝑰 𝑿 𝒍𝒐𝒈𝟐 𝒑 𝒙 𝐁𝐢𝐭𝐬
𝒑 𝒙
If the logarithmic base is 10 then, information is measured in “dits” or
“decit” or “Hartley”.
If the logarithmic base is 𝒆 then information is measured in “nits” or “nats”.
Properties of Information:
1. Information is additive.
That is, if 𝑝 𝑥 ⋯ 𝑝 𝑥
𝒙𝟏 𝒑 𝒙𝟏 then, 𝐼 𝑋 𝐼 𝑥 ⋯ 𝐼 𝑥
DMS
X 1 1
𝒙𝒏 𝒑 𝒙𝒏 𝑖. 𝑒. 𝐼 𝑋 𝑙𝑜𝑔 ⋯ 𝑙𝑜𝑔
𝑝 𝑥 𝑝 𝑥
𝒏
1
∴𝐼 𝑋 𝑙𝑜𝑔 Or, 𝑰 𝑿 𝒍𝒐𝒈𝟐 𝒑 𝒙𝒊 𝐁𝐢𝐭𝐬
𝑝 𝑥
𝒊 𝟏
2. If 𝒑 𝒙𝒊 𝒑 𝒙𝒋 then, 𝑰 𝒙𝒊 𝑰 𝒙𝒋 .
4. If 𝑰 𝒙𝒊 𝟎 then, 𝟎 𝒑 𝒙𝒊 𝟏.
Entropy (or, Average Information) of a Source:
The information associated with a DMS 𝑋 consisting of 𝑛 symbols
𝑥 , 𝑥 , … , 𝑥 will depend upon probability of occurrence of each symbol as
well as the information conveyed by the respective symbols.
𝑥 𝑝 𝑥 ,𝐼 𝑥
DMS
X
𝑥 𝑝 𝑥 ,𝐼 𝑥
𝐻 𝑋 𝑃 𝑋 𝐼 𝑋
𝒏
⇒ 𝑯 𝑿 𝒑 𝒙𝒊 𝒍𝒐𝒈𝟐 𝒑 𝒙𝒊 𝐁𝐢𝐭𝐬/Symbol
𝒊 𝟏
Order of a Source and its relation with Entropy:
𝟎𝟎
𝟐 symbols, 𝟒 messages
DMS 𝟎𝟏
𝟎, 𝟏
𝑿 𝟏𝟎
Entropy/message 𝑯 𝑿𝟐 𝟐·𝑯 𝑿
𝟏𝟏
nth‐Order Source (or, nth‐Order extension of a source):
𝒏 symbols, 𝟐𝒏 messages
DMS
𝑿 Entropy/message 𝑯 𝑿𝒏 𝒏·𝑯 𝑿
Solution:
Symbol Rate:
This is defined as the rate at which symbols are generated per second.
𝟏
𝒓𝒔 Symbols/ Second
𝝉𝒂𝒗𝒈
𝑥 𝜏 𝑥
DMS
𝑿
𝑥 𝜏 𝑥
If 𝒑 𝑨𝟎 𝒑 𝑩𝟎 , 𝒑 𝑨𝟏 𝒑 𝑩𝟏 , ... , 𝒑 𝑨𝒏 𝒑 𝑩𝒏
𝐴 𝑝 𝐴
DMS then, entropy of the two sources will be same.
𝑿𝟏 𝐴
𝑝 𝐴 i.e., 𝑯 𝑿𝟏 𝑯 𝑿𝟐
Solution: 𝐻 𝑋 𝑝 𝑙𝑜𝑔 𝑝
1 1 3 3
2 𝑙𝑜𝑔 2 𝑙𝑜𝑔 1.811 bits/message
8 8 8 8
𝑟 𝑓 2𝑓 2 4000 8000 messages/sec
𝑅 𝑟 ·𝐻 𝑋 8000 1.811 14491.67 bits/sec
Solution: 𝐻 𝑋 𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥
𝑟 𝑓 1⁄1𝜇𝑠 1 𝑀 messages/sec
𝑅 𝑟 ·𝐻 𝑋 10 1.846 1.8 𝑀‐bits/sec
Properties of Entropy:
1. Entropy is always positive.
𝑥 𝒑 𝒙𝟏 That is,
DMS
𝑯 𝑿 𝟎
X
𝑥 𝒑 𝒙𝒏
In general, 𝑯 𝑿 𝒍𝒐𝒈𝟐 𝒏
3. The maximum value of entropy occurs when all the symbols associated with
the source have equally likely probability to occur.
𝟏
That is, if 𝒑 𝒙𝟏 𝒑 𝒙𝟐 ⋯ 𝒑 𝒙𝒏 then, 𝐻 𝑋 𝐻 𝑋 𝑙𝑜𝑔 𝑛.
𝒏
Proof: 𝐻 𝑋 𝑙𝑜𝑔 𝑛
To prove, 𝐻 𝑋 𝑙𝑜𝑔 𝑛 0
𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝑝 𝑥 𝑙𝑜𝑔 𝑛
𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝑝 𝑥 𝑙𝑜𝑔 𝑛
𝑝 𝑥 𝑙𝑜𝑔 𝑛𝑝 𝑥
1
𝑝 𝑥 𝑙𝑜𝑔
𝑛𝑝 𝑥
By identity,
𝑙𝑜𝑔 𝑥 𝑥 1 for 𝑥 0
1
𝑛𝑝 𝑥
𝑝 𝑥 𝑙𝑜𝑔
𝑙𝑜𝑔 2
1 1 1 1
𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 1
𝑙𝑜𝑔 2 𝑛𝑝 𝑥 𝑙𝑜𝑔 2 𝑛𝑝 𝑥
1 1
𝑝 𝑥
𝑙𝑜𝑔 2 𝑛
1 1 1
·𝑛 𝑛· 0
𝑙𝑜𝑔 2 𝑛 𝑛
1 𝟏
** 𝐻 𝑋 𝑙𝑜𝑔 𝑛 when 1 Or, 𝒑 𝒙𝒊
𝑛𝑝 𝑥 𝒏
Entropy of a Binary Memoryless Source (BMS):
BMS Let, 𝑝 ⟶ 𝑥 0
0 /1
𝑿 1 𝑝 ⟶𝑥 1
∴ 𝐻 𝑋 𝑝𝑙𝑜𝑔 𝑝 1 𝑝 𝑙𝑜𝑔 1 𝑝
𝑑
𝐻 𝑋 0
𝑑𝑝
1 1 1
⟹ 𝑙𝑛𝑝 𝑝· 1 𝑝 · · 1 𝑙𝑛 1 𝑝 · 1 0
𝑙𝑛2 𝑝 1 𝑝
𝟏
⟹ 𝑙𝑛𝑝 𝑙𝑛 1 𝑝 Or, 𝑝 1 𝑝 ⟹𝒑
𝟐
1 1 1 1
∴ 𝐻 𝑋 𝑙𝑜𝑔 𝑙𝑜𝑔 1 Bit/Symbol
2 2 2 2
In general,
Other related entropies:
Marginal entropies:
𝒙𝟏 𝒚𝟏
X Y
𝒙𝒏 𝒚𝒏
𝒏 𝒏
𝑯 𝑿 𝒑 𝒙𝒊 𝒍𝒐𝒈𝟐 𝒑 𝒙𝒊 𝑯 𝒀 𝒑 𝒚𝒋 𝒍𝒐𝒈𝟐 𝒑 𝒚𝒋
𝒊 𝟏 𝒋 𝟏
Joint entropy:
𝒏 𝒎
𝑖. 𝑒, Joint information about the
𝑯 𝑿, 𝒀 𝒑 𝒙𝒊 , 𝒚𝒋 𝒍𝒐𝒈𝟐 𝒑 𝒙𝒊 , 𝒚𝒋
𝒊 𝟏𝒋 𝟏
source and the receiver.
Conditional entropies:
𝒏 𝒎 𝑖. 𝑒, Average information about
𝑯 𝑿|𝒀 𝒑 𝒙𝒊 , 𝒚𝒋 𝒍𝒐𝒈𝟐 𝒑 𝒙𝒊 |𝒚𝒋 the source already known to the
𝒊 𝟏𝒋 𝟏 receiver. Also called equivocation.
𝒏 𝒎 𝑖. 𝑒, Average information about
𝑯 𝒀|𝑿 𝒑 𝒙𝒊 , 𝒚𝒋 𝒍𝒐𝒈𝟐 𝒑 𝒚𝒋 |𝒙𝒊 the receiver given that a known
𝒊 𝟏𝒋 𝟏 symbol is transmitted. Also called
prevarication.
Relation with entropies:
𝐻 𝑋, 𝑌 𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑥 , 𝑦
∗
𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑥 . 𝑝 𝑦 |𝑥 𝑝 𝐴, 𝐵 𝑝 𝐴 . 𝑝 𝐵|𝐴
𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑥 𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑦 |𝑥
Note:
When the two events are statistically independent then the Joint probability is
equal to product of marginal probabilities i.e., 𝑝 𝐴, 𝐵 𝑝 𝐴 .𝑝 𝐵 .
∗
𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 · 𝑝 𝑦 𝐻 𝑌𝑋 · 𝑝 𝑦 1
⟹ 𝑯 𝑿, 𝒀 𝑯 𝑿 𝑯 𝒀𝑿
It is the measure of the difference between the total information available with the
source and the information about the source already known to the receiver.
𝑯 𝑿 𝑯 𝒀
𝑯 𝑿𝒀 𝑰 𝑿, 𝒀 𝑯 𝒀𝑿
𝑯 𝑿, 𝒀
𝑰 𝑿, 𝒀 𝑯 𝑿 𝑯 𝑿𝒀
Also, 𝑰 𝒀, 𝑿 𝑯 𝒀 𝑯 𝒀𝑿
Properties of Mutual Information:
1. It is always a positive quantity.
i.e, 𝑰 𝑿, 𝒀 𝟎 ⟹ 𝑯 𝑿 𝑯 𝑿𝒀 Or, 𝑯 𝒀 𝑯 𝒀𝑿
⟹ 𝑯 𝑿 𝑯 𝑿𝒀 , 𝑯 𝒀 𝑯 𝒀𝑿 and, 𝑯 𝑿, 𝒀 𝑯 𝑿 𝑯 𝒀
𝑰 𝑿, 𝒀 𝟎
𝑯 𝑿 𝑯 𝒀
𝑯 𝑿𝒀 𝑯 𝒀𝑿
𝑯 𝑿, 𝒀
3. The mutual information of a channel is always symmetric. i.e., 𝑰 𝑿, 𝒀 𝑰 𝒀, 𝑿
Proof: ∵ 𝐻 𝑋, 𝑌 𝐻 𝑋 𝐻 𝑌𝑋
⟹ 𝐻 𝑋 𝐻 𝑋, 𝑌 𝐻 𝑌𝑋
∵ 𝐼 𝑋, 𝑌 𝐻 𝑋 𝐻 𝑋𝑌
⟹ 𝐼 𝑋, 𝑌 𝐻 𝑋, 𝑌 𝐻 𝑌𝑋 𝐻 𝑋𝑌 … 1
𝐼 𝑌, 𝑋 𝐻 𝑌 𝐻 𝑌𝑋
⟹ 𝐼 𝑌, 𝑋 𝐻 𝑌, 𝑋 𝐻 𝑋𝑌 𝐻 𝑌𝑋 … 2
1 2
⟹ 𝐻 𝑋, 𝑌 𝐻 𝑌𝑋 𝐻 𝑋𝑌 𝐻 𝑌, 𝑋 𝐻 𝑋𝑌 𝐻 𝑌𝑋
⟹ 𝑯 𝑿, 𝒀 𝑯 𝒀, 𝑿
Example: Compute 𝐻 𝑋 , 𝐻 𝑌 , 𝐻 𝑋, 𝑌 , 𝐻 𝑋|𝑌 and 𝐼 𝑋, 𝑌 for the joint
probability matrix given below. Verify relationship among these entropies.
0.05 0 0.2 0.05
0 0.1 0.1 0
𝑃 𝑋, 𝑌
0 0 0.2 0.1
0.05 0.05 0 0.1
Solution: From the properties of joint probability matrix, summation of rows gives
input probabilities. i.e.,
𝑝 𝑥 𝑝 𝑦 ; for 𝑖 1, 2, … , 𝑛
We get,
𝑝 𝑥 0.05 0 0.2 0.05 0.3 0.3
𝑝 𝑥 0 0.1 0.1 0 0.2 0.2
⟹𝑝 𝑋
0.3
𝑝 𝑥 0 0 0.2 0.1 0.3
0.2
𝑝 𝑥 0.05 0.05 0 0.1 0.2
𝑝 𝑦 𝑝 𝑥 ; for 𝑗 1, 2, … , 𝑚
We get,
𝑝 𝑦 0.05 0 0 0.05 0.1
𝑝 𝑦 0 0.1 0 0.05 0.15
∴ 𝑝 𝑌 0.1 0.15 0.5 0.25
𝑝 𝑦 0.2 0.1 0.2 0 0.5
𝑝 𝑦 0.05 0 0.1 0.1 0.25
Now,
𝐻 𝑌 𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥
0.1𝑙𝑜𝑔 0.1 0.15𝑙𝑜𝑔 0.15 0.5𝑙𝑜𝑔 0.5 0.25𝑙𝑜𝑔 0.25 1.743 bits/symbol
𝐻 𝑋, 𝑌 𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑥 , 𝑦
𝐻 𝑋|𝑌 𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑥 |𝑦
𝐻 𝑌|𝑋 𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑦 |𝑥
0.05 0.2 0.05 1 2 1
0 0
0.3 0.3 0.3 6 3 6
0.1 0.1 1 1
𝑝 𝑋, 𝑌 0 0 0 0
0.2 0.2 2 2 1.151 bits/symbol
𝑝 𝑋 0.1 0.1 2 1
0 0 0 0
0.3 0.3 3 3
0.05 0.05 0.1 1 1 1
0 0
0.2 0.2 0.2 4 4 2
Verification:
𝐻 𝑋|𝑌 𝐻 𝑋, 𝑌 𝐻 𝑌 3.122 1.743 1.379 bits/symbol
𝐻 𝑌|𝑋 𝐻 𝑋, 𝑌 𝐻 𝑋 3.122 1.743 1.971 1.151 bits/symbol
Verification:
Source
𝒑 𝒙 → 𝑝𝑑𝑓
X
∞ 𝒙 ∞
⋯
𝑡
0 𝑇 2𝑇 ⋯
Discrete vs Continuous Information Theory
Continuous 𝑝𝑑𝑓
Discrete
Discrete Information 𝒑 𝒙
Information Source
Source
values ∞ 𝒙 ∞
Continuous values
Note: The entropy measures the average amount of information of the random variable.
If the variable is continuous, apparently the entropy will be unbounded since we need an
infinite number of bits to represent a continuous random variable in general. In such a
case we may require an alternative information measure, that is differential entropy.
Entropies in continuous case:
ℎ 𝑋 𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝑑𝑥
Marginal entropies
ℎ 𝑌 𝑝 𝑦 𝑙𝑜𝑔 𝑝 𝑦 𝑑𝑦
ℎ 𝑋𝑌 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑥 𝑦 𝑑𝑥𝑑𝑦
Conditional entropies
ℎ 𝑌𝑋 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑦 𝑥 𝑑𝑥𝑑𝑦
1 1
𝑝 𝑥 𝑒 𝑒
2𝜋𝜎 2𝜋𝜎
∴ ℎ 𝑋 𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝑑𝑥
1 1 𝜇 0 𝑥
⁄ ⁄
𝑒 𝑙𝑜𝑔 𝑒 𝑑𝑥
2𝜋𝜎 2𝜋𝜎
1 ⁄ 1 ⁄
𝑒 𝑙𝑜𝑔 𝑙𝑜𝑔 𝑒 𝑑𝑥
2𝜋𝜎 2𝜋𝜎
1 ⁄ 1 1 ⁄ 𝑥
𝑒 · 𝑙𝑜𝑔 𝑑𝑥 𝑒 · 𝑙𝑜𝑔 𝑑𝑥
2𝜋𝜎 2𝜋𝜎 2𝜋𝜎 2𝜎
1
1 ⁄ 1 1 ⁄
𝑙𝑜𝑔 2𝜋𝜎 𝑒 𝑑𝑥 𝑥 𝑒 𝑑𝑥 ∗
2𝜋𝜎 2𝜎 2𝜋𝜎 𝑙𝑜𝑔 𝑒 1
𝑝 𝑥 𝑝 𝑥
1 1
𝑙𝑜𝑔 2𝜋𝜎 ·𝜎 𝑙𝑜𝑔 2𝜋𝜎 𝑙𝑜𝑔 𝑒 ⟹ 𝒉 𝑿 𝒍𝒐𝒈𝒆 𝟐𝝅𝒆𝝈
2𝜎 2
nits/sample
Also, 𝑹 𝒓𝒔 · 𝑯 𝟐𝒇𝒎 𝒍𝒐𝒈𝒆 𝟐𝝅𝒆𝝈 𝒇𝒎 𝒍𝒐𝒈𝒆 𝟐𝝅𝒆𝝈𝟐 nits/sec
𝑹 𝒓𝒔 · 𝑯 𝒇𝒎 𝒍𝒐𝒈𝟐 𝟐𝝅𝒆𝝈𝟐 bits/sec
Maximum value of Entropy:
The maximum value of entropy occurs when pdf is in the form of Gaussian.
ℎ 𝑋 ℎ 𝑋 𝜆 𝑝 𝑥 𝑑𝑥 1 𝜆 𝑥 𝜇 𝑝 𝑥 𝑑𝑥 𝜎
𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝑑𝑥 𝜆 𝑝 𝑥 𝑑𝑥 1 𝜆 𝑥 𝜇 𝑝 𝑥 𝑑𝑥 𝜎
𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝜆 𝜆 𝑥 𝜇 𝑑𝑥 𝜆 𝜆 𝜎
𝑑ℎ 𝑋 1
0 ⟹ 𝑙𝑜𝑔 𝑝 𝑥 𝜆 𝜆 𝑥 𝜇 𝑝 𝑥 · 𝑑𝑥
𝑑𝑝 𝑥 𝑝 𝑥
⟹ 𝑙𝑜𝑔 𝑝 𝑥 𝜆 1 𝜆 𝑥 𝜇
∴𝑝 𝑥 𝑒 𝑒
𝝀𝟏 𝟏 𝟏
𝑝 𝑥 𝑎𝑒 where, 𝒂 𝒆 and, 𝒃
𝟐𝝈𝟐
⟹ 𝒃𝟐 𝒙 𝝁 𝟐
𝒑 𝒙 𝒂𝒆