0% found this document useful (0 votes)
8 views

Itc Mod 1 - 1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Itc Mod 1 - 1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Module1

Information Measures

Dr. Sanjay Kumar Singh


Associate Professor (Senior), Department of Communication Engineering,
School of Electronics Engineering, Vellore Institute of Technology, Vellore,
Tamil Nadu, India.

Fall Semester 2024‐25


Review of Probability Theory
Probability Theory: Branch of mathematics that deals with the interpretation of
random events and the likelihood of these events occurring.

Basic terms:
Sample space: A set of all possible outcomes.

Sample Point: It is one of the possible results.


Example: Tossing a coin results in two sample points (head or tail). Rolling a dice results
in 6 sample points.

Experiment: A process or trial that provides a range of potential results.

Event: Result or outcome of an experiment. It is a subset of the sample space.


Example: getting 1 as an outcome while rolling a die is an event.
Favorable outcome: An occurrence that has produced the desired consequence or an
expected event.

Equally Likely Events: Chances or probabilities of happening are equal.


Example: Chance of either a head or a tail after tossing a coin.

Mutually Exclusive Events: Events that are mutually exclusive cannot occur at the
same time.
Example: Head and tail can't occur simultaneously after tossing a coin.

Exhaustive Events: When the set of all experiment results is the same as the sample
space.
Probability: Measure of the occurrence of an event.

0 Probability 1

𝑖. 𝑒. , Uncertainty Probability Certainty

Mathematically, 𝒏𝑨
𝒑 𝑨 𝐥𝐢𝐦
𝒏→ 𝒏

where, 𝑃 𝐴 Probability of occurrence of an event 𝐴


𝑛 No. of favorable outcomes
𝑛 No. of experiments performed
(Limit indicates that the experiment was
performed large no. of times).
Similarly, for another event 𝐵,
𝒏𝑩
𝒑 𝑩 𝐥𝐢𝐦
𝒏→ 𝒏

Note: 𝑃 𝐴 and 𝑃 𝐵 are called Marginal Probabilities.


Joint probabilities: Probability of occurrence of both the events 𝐴 and 𝐵.
𝑛
𝑝 𝐴𝐵 lim
→ 𝑛
𝑛
𝑛 𝑝 𝐵 ⌊𝐴
𝑝 𝐴𝐵 lim
→ 𝑛 𝑛
lim
𝑛 → 𝑛

𝒑 𝑨𝑩 𝒑 𝑨 . 𝒑 𝑩⌊𝑨 ⋯ 𝟏
where, 𝑝 𝐵 ⌊𝐴 Probability of occurrence of event 𝐵 on the condition that
event 𝐴 has already occurred.

Similarly, we can write, 𝑛


𝑛 𝑛 𝑝 𝐴 ⌊𝐵
𝑝 𝐴𝐵 lim lim
→ 𝑛 → 𝑛 𝑛
lim
𝑛 → 𝑛
𝒑 𝑨𝑩 𝒑 𝑩 . 𝒑 𝑨⌊𝑩 ⋯ 𝟐

where, 𝑝 𝐴⌊𝐵 Probability of occurrence of event 𝐴 on the condition that


event 𝐵 has already occurred.

Note: The above two equations are called Bayes’ theorem of conditional probability.
Note that: When the two events 𝐴 and 𝐵 are independent 𝑖. 𝑒.,

𝒑 𝑨⌊𝑩 𝒑 𝑨 and 𝒑 𝑩⌊𝑨 𝒑 𝑩

then, the joint probabilities are the product of marginal probabilities.

𝑖. 𝑒., 𝒑 𝑨𝑩 𝒑 𝑨 ·𝒑 𝑩

⇔ 𝑨 and 𝑩 are independent


Random variable: Real no. obtained as the outcome of an experiment performed
on a random phenomena. The two types of RVs are:

1. Continuous RV: Takes an infinite number of possible values. Represented by


continuous probability density function.
Examples: Height, Weight, the amount of sugar in an orange, the
time required to run a mile.
Mathematically,

𝑝 𝑋 𝑝 𝑥, 𝑦 𝑑𝑦
Marginal
probability
𝑝 𝑌 𝑝 𝑥, 𝑦 𝑑𝑥

Joint
𝑝 𝑋, 𝑌 𝑝 𝑥, 𝑦 𝑑𝑥𝑑𝑦 probability
𝑝 𝑋, 𝑌
𝑝 𝑋𝑌
𝑝 𝑌 Conditional
𝑝 𝑋, 𝑌 probability
𝑝 𝑌𝑋
𝑝 𝑋
Note: It satisfies the conditions,

1‐D Case:

i). 𝑝 𝑥 𝑑𝑥 1 (area under the pdf)

ii). 𝑥𝑝 𝑥 𝑑𝑥 (Expected or Mean value)

(mean square value about the mean must


iii). 𝑥 𝜇 𝑝 𝑥 𝑑𝑥 𝜎
be equal to the variance 𝜎 ).

2‐D Case:

𝑝 𝑥, 𝑦 𝑑𝑥𝑑𝑦 1 (area under the joint pdf)


2. Discrete RV: Takes an finite number of distinct values. Represented by probability
distribution function (also called probability mass function).
Examples: Tossing a coin, rolling a dice.

Mathematically,

𝑝 𝑋 𝑝 𝑥 PMF

Marginal probabilities
𝑝 𝑌 𝑝 𝑦

𝑝 𝑋, 𝑌 𝑝 𝑥 ,𝑦 Joint probabilities

𝑝 𝑋|𝑌 𝑝 𝑥 , 𝑦 𝑝 𝑥 |𝑦
Conditional probabilities
𝑝 𝑌|𝑋 𝑝 𝑥 , 𝑦 𝑝 𝑦 |𝑥
Note:

i). 𝑝 𝑥, 𝑦 0

ii). 𝑝 𝑥, 𝑦 1

iii). 𝑝 𝑋, 𝑌 𝑝 𝑋 .𝑝 𝑌 (Statistically independent )

iv). 𝐸 𝑋 𝑋 𝑥 ·𝑝 𝑥 (Expected value or Mean)

iv). 𝐸 𝑋 𝑥 ·𝑝 𝑥 (Mean square value about the mean)

vi). 𝜎 𝑋 𝐸 𝑋 𝐸 𝑋 (Variance)
Block Diagram of a Basic Communication System
Introduction to Information Theory
Information Theory
A mathematical approach to
study coding, quantification,
storage, and communication
of information.

Claude Shannon: The Father


of Information Theory.
Associated Fields
Information:
 Definition: amount of uncertainty removed on the part of the
receiver by the occurrence of a symbol or message from the
transmitter.
 Thus, information is nothing but, a measure of uncertainty.
 Information in a message is meaningful only if receiver is able to
correctly interpret it.
 Information is about something which adds to your knowledge.

Example: Consider the following statements:


1) VIT exams will be conducted by VIT (certain event, none will be surprised).
2) VIT exams will be conducted by IIT Madras (Unlikely event, some will be
surprised).
3) No attendance is required, no exams will be conducted and all will receive
grade A (Almost impossible, All will be surprised)

𝑼𝒏𝒄𝒆𝒓𝒕𝒂𝒊𝒏𝒕𝒚 ∝ 𝑺𝒖𝒓𝒑𝒓𝒊𝒔𝒆
Example: Consider a source transmitting symbols 0 or 1 respectively to represent
whether condition as follows:
1‐bit
Source 𝟎/𝟏 RX:

𝟎 Thus, to remove the uncertainty on the part of


the receiver, the source must transmit minimum
1‐bit (0/1). The source is said to convey 1‐bit of
𝟏
information.

Example: Consider a source transmitting sequence of symbols consisting of 0 and 1


to represent whether condition as follows:

𝟎 2‐bits
Source 𝟎, 𝟏 RX:
𝟏

𝟎𝟏 To remove the uncertainty on the part of the


receiver, the source must transmit 2‐bits
1𝟎
(00,01,10,11). The source is said to convey 2‐bit
of information.
1𝟏
Classification of Information Sources

Information sources (or communication sources), or simply sources can be classified


as

Discrete Information Source


 Discrete Memoryless Sources (DMS)

 with Memory (e.g. Markov Sources)

Continuous Information Source


 non‐Gaussian

 Gaussian
Discrete Memoryless Source:

1. The information is said to be discrete if the symbols or outcomes


associated with a source assumes only discrete values.

Example: An English source is a discrete source and has 26 symbols associated


with it.

2. A discrete source is said to be memoryless if the occurrence of one


symbol does not affect the other (statistically independent).

Discrete
Memoryless
Source
(DMS)
Information Measure:
The amount of information associated with a message is inversely proportional
to the probability of occurrence of the symbol used to represent the message.

𝟏
𝐈𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 or, Uncertainty, Or, Surprise ∝
𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲

Consider a source X is transmitting a symbol with probability 𝑝 𝑥 then, the


information is,
𝟏
𝑰 𝑿 ∝
𝒑 𝒙 𝒑 𝒙
DMS
𝟏
𝑿 or, 𝑰 𝑿 𝒍𝒐𝒈
𝒑 𝒙

𝟏
⟹ 𝑰 𝑿 𝒍𝒐𝒈𝟐 𝐎𝐫, 𝑰 𝑿 𝒍𝒐𝒈𝟐 𝒑 𝒙 𝐁𝐢𝐭𝐬
𝒑 𝒙
 If the logarithmic base is 10 then, information is measured in “dits” or
“decit” or “Hartley”.
 If the logarithmic base is 𝒆 then information is measured in “nits” or “nats”.
Properties of Information:
1. Information is additive.
That is, if 𝑝 𝑥 ⋯ 𝑝 𝑥
𝒙𝟏 𝒑 𝒙𝟏 then, 𝐼 𝑋 𝐼 𝑥 ⋯ 𝐼 𝑥
DMS
X 1 1
𝒙𝒏 𝒑 𝒙𝒏 𝑖. 𝑒. 𝐼 𝑋 𝑙𝑜𝑔 ⋯ 𝑙𝑜𝑔
𝑝 𝑥 𝑝 𝑥
𝒏
1
∴𝐼 𝑋 𝑙𝑜𝑔 Or, 𝑰 𝑿 𝒍𝒐𝒈𝟐 𝒑 𝒙𝒊 𝐁𝐢𝐭𝐬
𝑝 𝑥
𝒊 𝟏

2. If 𝒑 𝒙𝒊 𝒑 𝒙𝒋 then, 𝑰 𝒙𝒊 𝑰 𝒙𝒋 .

3. If 𝒑 𝒙𝒊 𝟏 (event is certain to occur) then, 𝑰 𝒙𝒊 𝟎 (no information


conveyed). Similarly, if 𝒑 𝒙𝒊 𝟎 (event is uncertain to occur) then, 𝑰 𝒙𝒊
∞ (maximum information conveyed).

4. If 𝑰 𝒙𝒊 𝟎 then, 𝟎 𝒑 𝒙𝒊 𝟏.
Entropy (or, Average Information) of a Source:
The information associated with a DMS 𝑋 consisting of 𝑛 symbols
𝑥 , 𝑥 , … , 𝑥 will depend upon probability of occurrence of each symbol as
well as the information conveyed by the respective symbols.

The average information associated with a source is defined as entropy of the


source and is given by,

𝑥 𝑝 𝑥 ,𝐼 𝑥
DMS
X
𝑥 𝑝 𝑥 ,𝐼 𝑥

𝐻 𝑋 𝑃 𝑋 𝐼 𝑋
𝒏

⇒ 𝑯 𝑿 𝒑 𝒙𝒊 𝒍𝒐𝒈𝟐 𝒑 𝒙𝒊 𝐁𝐢𝐭𝐬/Symbol
𝒊 𝟏
Order of a Source and its relation with Entropy:

𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝒑𝒆𝒓 𝒎𝒆𝒔𝒔𝒂𝒈𝒆 𝒏 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝒑𝒆𝒓 𝒔𝒚𝒎𝒃𝒐𝒍

Where, 𝒏 no. of bits or symbols

1st‐Order Source: Each symbol associated with a source represents a message


Independently.
1 symbol, 1 message
DMS 𝟎/𝟏
𝑿 Entropy/ message = Entropy/ symbol 𝑯 𝑿

2nd‐Order Source: Lesser no. of symbols used to represent more messages.

𝟎𝟎
𝟐 symbols, 𝟒 messages
DMS 𝟎𝟏
𝟎, 𝟏
𝑿 𝟏𝟎
Entropy/message 𝑯 𝑿𝟐 𝟐·𝑯 𝑿
𝟏𝟏
nth‐Order Source (or, nth‐Order extension of a source):

𝒏 symbols, 𝟐𝒏 messages
DMS
𝑿 Entropy/message 𝑯 𝑿𝒏 𝒏·𝑯 𝑿

Note: Higher‐order sources can be converted to 1st‐order source by increasing no. of


symbols
𝟎 No Pulse
DMS 1 1V 𝑆𝑢𝑛𝑛𝑦 𝑑𝑎𝑦 ⟶ 0
𝑿 𝟐 2V 𝑉𝑒𝑟𝑦 ℎ𝑜𝑡 ⟶ 1
𝟑 3V 𝑅𝑎𝑖𝑛𝑦 ⟶ 2
𝐻𝑒𝑎𝑣𝑦 𝑟𝑎𝑖𝑛 ⟶ 3
𝑿 𝟎, 𝟏, 𝟐, 𝟑 (Quaternary)

In general, if there are 𝑛 messages then, quaternary symbols are required to


form 1st‐order source. Thus, the time required to convey an information is half
the time required for 2nd‐order source to convey same information.
Example:
Consider a DMS, 𝑆, which emits either 0 or 1 with the probabilities as 𝑝 0
0.8 and 𝑝 1 0.2 Find 𝐻 𝑆 and 𝐻 𝑆 .

Solution:
Symbol Rate:
This is defined as the rate at which symbols are generated per second.

𝟏
𝒓𝒔 Symbols/ Second
𝝉𝒂𝒗𝒈

Where, 𝝉𝒂𝒗𝒈 Average duration of the symbols associated with a source.

𝑥 𝜏 𝑥
DMS
𝑿
𝑥 𝜏 𝑥

For 𝑛 symbols, the average duration is given by,

𝝉𝒂𝒗𝒈 𝑿 𝝉 𝒙𝒊 𝒑 𝒙𝒊 Symbols/ Second


𝒊 𝟏
Information Rate:
Consider the sources 𝑋 and 𝑋 are generating the symbols as shown in figure below.

If 𝒑 𝑨𝟎 𝒑 𝑩𝟎 , 𝒑 𝑨𝟏 𝒑 𝑩𝟏 , ... , 𝒑 𝑨𝒏 𝒑 𝑩𝒏
𝐴 𝑝 𝐴
DMS then, entropy of the two sources will be same.
𝑿𝟏 𝐴
𝑝 𝐴 i.e., 𝑯 𝑿𝟏 𝑯 𝑿𝟐

In such cases entropy is measured in terms of rate of


𝐵 𝑝 𝐵 information given by,
DMS
𝑿𝟐 𝐵
𝑝 𝐵 𝑹 𝒓𝒔 . 𝑯

(Symbols/ Second) × (Bits/ Symbols) = Bits/ Second

A source with higher 𝑅 is said to convey information at a faster rate.


Example: Analog signal is bandlimited to 4 𝑘𝐻𝑧. It is sampled at Nyquist rate and
samples are quantized into 4 levels 𝑄 , 𝑄 , 𝑄 , 𝑄 are independent messages with
probability 𝑝 𝑝 and 𝑝 𝑝 . Find information rate.

Solution: 𝐻 𝑋 𝑝 𝑙𝑜𝑔 𝑝

1 1 3 3
2 𝑙𝑜𝑔 2 𝑙𝑜𝑔 1.811 bits/message
8 8 8 8
𝑟 𝑓 2𝑓 2 4000 8000 messages/sec
𝑅 𝑟 ·𝐻 𝑋 8000 1.811 14491.67 bits/sec

Example: A source generates 4 message randomly every microsecond with probabilities


0.4, 0.3, 0.2, 0.1 and they are independent messages. Find information rate.

Solution: 𝐻 𝑋 𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥

0.4𝑙𝑜𝑔 0.4 0.3𝑙𝑜𝑔 0.3 0.2𝑙𝑜𝑔 0.2 0.1𝑙𝑜𝑔 0.1


1.846 bits/message

𝑟 𝑓 1⁄1𝜇𝑠 1 𝑀 messages/sec
𝑅 𝑟 ·𝐻 𝑋 10 1.846 1.8 𝑀‐bits/sec
Properties of Entropy:
1. Entropy is always positive.

𝑥 𝒑 𝒙𝟏 That is,
DMS
𝑯 𝑿 𝟎
X
𝑥 𝒑 𝒙𝒏

2. The maximum value of entropy.


𝑯 𝑿 𝒎𝒂𝒙 𝒍𝒐𝒈𝟐 𝒏

where, 𝒏 represents no. of symbols associated with the source.

In general, 𝑯 𝑿 𝒍𝒐𝒈𝟐 𝒏

3. The maximum value of entropy occurs when all the symbols associated with
the source have equally likely probability to occur.
𝟏
That is, if 𝒑 𝒙𝟏 𝒑 𝒙𝟐 ⋯ 𝒑 𝒙𝒏 then, 𝐻 𝑋 𝐻 𝑋 𝑙𝑜𝑔 𝑛.
𝒏
Proof: 𝐻 𝑋 𝑙𝑜𝑔 𝑛
To prove, 𝐻 𝑋 𝑙𝑜𝑔 𝑛 0

Consider, 𝐻 𝑋 𝑙𝑜𝑔 𝑛 𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝑙𝑜𝑔 𝑛

𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝑝 𝑥 𝑙𝑜𝑔 𝑛

𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝑝 𝑥 𝑙𝑜𝑔 𝑛

𝑝 𝑥 𝑙𝑜𝑔 𝑛𝑝 𝑥

1
𝑝 𝑥 𝑙𝑜𝑔
𝑛𝑝 𝑥
By identity,
𝑙𝑜𝑔 𝑥 𝑥 1 for 𝑥 0

1
𝑛𝑝 𝑥
𝑝 𝑥 𝑙𝑜𝑔
𝑙𝑜𝑔 2

1 1 1 1
𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 1
𝑙𝑜𝑔 2 𝑛𝑝 𝑥 𝑙𝑜𝑔 2 𝑛𝑝 𝑥

1 1
𝑝 𝑥
𝑙𝑜𝑔 2 𝑛

1 1 1
·𝑛 𝑛· 0
𝑙𝑜𝑔 2 𝑛 𝑛

⇒𝐻 𝑋 𝑙𝑜𝑔 𝑛 0 𝑖. 𝑒. 𝐻 𝑋 𝑙𝑜𝑔 𝑛 Hence, Proved.

1 𝟏
** 𝐻 𝑋 𝑙𝑜𝑔 𝑛 when 1 Or, 𝒑 𝒙𝒊
𝑛𝑝 𝑥 𝒏
Entropy of a Binary Memoryless Source (BMS):

BMS Let, 𝑝 ⟶ 𝑥 0
0 /1
𝑿 1 𝑝 ⟶𝑥 1

∴ 𝐻 𝑋 𝑝𝑙𝑜𝑔 𝑝 1 𝑝 𝑙𝑜𝑔 1 𝑝
𝑑
𝐻 𝑋 0
𝑑𝑝
1 1 1
⟹ 𝑙𝑛𝑝 𝑝· 1 𝑝 · · 1 𝑙𝑛 1 𝑝 · 1 0
𝑙𝑛2 𝑝 1 𝑝

𝟏
⟹ 𝑙𝑛𝑝 𝑙𝑛 1 𝑝 Or, 𝑝 1 𝑝 ⟹𝒑
𝟐

1 1 1 1
∴ 𝐻 𝑋 𝑙𝑜𝑔 𝑙𝑜𝑔 1 Bit/Symbol
2 2 2 2

Also, when, 𝑝 1 𝐻 𝑋 1𝑙𝑜𝑔 1 0


𝐼 p H
0.0 & 1.0 0
0.1 & 0.9 0.469
0.2 & 0.8 0.722
0.3 & 0.7 0.881
0.4 & 0.6 0.971
0.5 1.0

In general,
Other related entropies:
Marginal entropies:

𝒙𝟏 𝒚𝟏
X Y
𝒙𝒏 𝒚𝒏

𝒏 𝒏

𝑯 𝑿 𝒑 𝒙𝒊 𝒍𝒐𝒈𝟐 𝒑 𝒙𝒊 𝑯 𝒀 𝒑 𝒚𝒋 𝒍𝒐𝒈𝟐 𝒑 𝒚𝒋
𝒊 𝟏 𝒋 𝟏

⟹ Source entropy ⟹ Receiver’s entropy

𝑖. 𝑒, Average information 𝑖. 𝑒, Average information


associated with the source gained by the receiver
Or, average information
conveyed to the receiver
𝒙𝟏 𝒙𝟏 , 𝒙𝟐 ⟶ 𝒚𝟏 𝒚𝟏
𝒙𝟐 𝒙𝟑 , 𝒙𝟒 ⟶ 𝒚𝟐 𝒚𝟐
X Y
𝒙𝟑 𝒙𝟓 , 𝒙𝟔 ⟶ 𝒚𝟑 𝒚𝟑
𝒙𝒏 𝒙𝒎 𝟏 , 𝒙𝒎 ⟶ 𝒚𝒎 𝒚𝒎

Joint entropy:
𝒏 𝒎
𝑖. 𝑒, Joint information about the
𝑯 𝑿, 𝒀 𝒑 𝒙𝒊 , 𝒚𝒋 𝒍𝒐𝒈𝟐 𝒑 𝒙𝒊 , 𝒚𝒋
𝒊 𝟏𝒋 𝟏
source and the receiver.

Conditional entropies:
𝒏 𝒎 𝑖. 𝑒, Average information about
𝑯 𝑿|𝒀 𝒑 𝒙𝒊 , 𝒚𝒋 𝒍𝒐𝒈𝟐 𝒑 𝒙𝒊 |𝒚𝒋 the source already known to the
𝒊 𝟏𝒋 𝟏 receiver. Also called equivocation.
𝒏 𝒎 𝑖. 𝑒, Average information about
𝑯 𝒀|𝑿 𝒑 𝒙𝒊 , 𝒚𝒋 𝒍𝒐𝒈𝟐 𝒑 𝒚𝒋 |𝒙𝒊 the receiver given that a known
𝒊 𝟏𝒋 𝟏 symbol is transmitted. Also called
prevarication.
Relation with entropies:

𝐻 𝑋, 𝑌 𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑥 , 𝑦


𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑥 . 𝑝 𝑦 |𝑥 𝑝 𝐴, 𝐵 𝑝 𝐴 . 𝑝 𝐵|𝐴

𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑥 𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑦 |𝑥

Note:
When the two events are statistically independent then the Joint probability is
equal to product of marginal probabilities i.e., 𝑝 𝐴, 𝐵 𝑝 𝐴 .𝑝 𝐵 .

𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 · 𝑝 𝑦 𝐻 𝑌𝑋 · 𝑝 𝑦 1

⟹ 𝑯 𝑿, 𝒀 𝑯 𝑿 𝑯 𝒀𝑿

Similarly, we can also prove, 𝑯 𝒀, 𝑿 𝑯 𝒀 𝑯 𝑿𝒀


Mutual Information:
 This is defined as the actual information shared between source and
receiver. It is also called trans‐information.

 It is the measure of the difference between the total information available with the
source and the information about the source already known to the receiver.

𝑯 𝑿 𝑯 𝒀

𝑯 𝑿𝒀 𝑰 𝑿, 𝒀 𝑯 𝒀𝑿

𝑯 𝑿, 𝒀

𝑰 𝑿, 𝒀 𝑯 𝑿 𝑯 𝑿𝒀

Also, 𝑰 𝒀, 𝑿 𝑯 𝒀 𝑯 𝒀𝑿
Properties of Mutual Information:
1. It is always a positive quantity.

i.e, 𝑰 𝑿, 𝒀 𝟎 ⟹ 𝑯 𝑿 𝑯 𝑿𝒀 Or, 𝑯 𝒀 𝑯 𝒀𝑿

2. If 𝑋 and 𝑌 are statistically independent then, 𝑰 𝑿, 𝒀 𝟎 and vice‐versa.

⟹ 𝑯 𝑿 𝑯 𝑿𝒀 , 𝑯 𝒀 𝑯 𝒀𝑿 and, 𝑯 𝑿, 𝒀 𝑯 𝑿 𝑯 𝒀

𝑰 𝑿, 𝒀 𝟎
𝑯 𝑿 𝑯 𝒀

𝑯 𝑿𝒀 𝑯 𝒀𝑿

𝑯 𝑿, 𝒀
3. The mutual information of a channel is always symmetric. i.e., 𝑰 𝑿, 𝒀 𝑰 𝒀, 𝑿

Proof: ∵ 𝐻 𝑋, 𝑌 𝐻 𝑋 𝐻 𝑌𝑋

⟹ 𝐻 𝑋 𝐻 𝑋, 𝑌 𝐻 𝑌𝑋

∵ 𝐼 𝑋, 𝑌 𝐻 𝑋 𝐻 𝑋𝑌
⟹ 𝐼 𝑋, 𝑌 𝐻 𝑋, 𝑌 𝐻 𝑌𝑋 𝐻 𝑋𝑌 … 1

𝐼 𝑌, 𝑋 𝐻 𝑌 𝐻 𝑌𝑋
⟹ 𝐼 𝑌, 𝑋 𝐻 𝑌, 𝑋 𝐻 𝑋𝑌 𝐻 𝑌𝑋 … 2

1 2

⟹ 𝐻 𝑋, 𝑌 𝐻 𝑌𝑋 𝐻 𝑋𝑌 𝐻 𝑌, 𝑋 𝐻 𝑋𝑌 𝐻 𝑌𝑋

⟹ 𝑯 𝑿, 𝒀 𝑯 𝒀, 𝑿
Example: Compute 𝐻 𝑋 , 𝐻 𝑌 , 𝐻 𝑋, 𝑌 , 𝐻 𝑋|𝑌 and 𝐼 𝑋, 𝑌 for the joint
probability matrix given below. Verify relationship among these entropies.
0.05 0 0.2 0.05
0 0.1 0.1 0
𝑃 𝑋, 𝑌
0 0 0.2 0.1
0.05 0.05 0 0.1
Solution: From the properties of joint probability matrix, summation of rows gives
input probabilities. i.e.,

𝑝 𝑥 𝑝 𝑦 ; for 𝑖 1, 2, … , 𝑛

We get,
𝑝 𝑥 0.05 0 0.2 0.05 0.3 0.3
𝑝 𝑥 0 0.1 0.1 0 0.2 0.2
⟹𝑝 𝑋
0.3
𝑝 𝑥 0 0 0.2 0.1 0.3
0.2
𝑝 𝑥 0.05 0.05 0 0.1 0.2

We can also verify that, 𝑝 𝑥 0.3 0.2 0.3 0.2 1


and, summation of columns gives output probabilities. i.e.,

𝑝 𝑦 𝑝 𝑥 ; for 𝑗 1, 2, … , 𝑚

We get,
𝑝 𝑦 0.05 0 0 0.05 0.1
𝑝 𝑦 0 0.1 0 0.05 0.15
∴ 𝑝 𝑌 0.1 0.15 0.5 0.25
𝑝 𝑦 0.2 0.1 0.2 0 0.5
𝑝 𝑦 0.05 0 0.1 0.1 0.25

Clearly verified that, 𝑝 𝑦 0.1 0.15 0.5 0.25 1

Now,

𝐻 𝑋 𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 2 0.3𝑙𝑜𝑔 0.3 2 0.2𝑙𝑜𝑔 0.2 1.971 bits/symbol

𝐻 𝑌 𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥

0.1𝑙𝑜𝑔 0.1 0.15𝑙𝑜𝑔 0.15 0.5𝑙𝑜𝑔 0.5 0.25𝑙𝑜𝑔 0.25 1.743 bits/symbol
𝐻 𝑋, 𝑌 𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑥 , 𝑦

4 0.05𝑙𝑜𝑔 0.05 4 0.1𝑙𝑜𝑔 0.1 2 0.2𝑙𝑜𝑔 0.2 3.122 bits/symbol

𝐻 𝑋|𝑌 𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑥 |𝑦

0.05 0.2 0.05 1 2 1


0 0
0.1 0.5 0.25 2 5 5
0.1 0.1 2 1
𝑝 𝑋, 𝑌 0 0 0 0
0.15 0.5 3 5 1.379 bits/symbol
𝑝 𝑌 0.1 0.1 2 2
0 0 0 0
0.5 0.25 5 5
0.05 0.05 0.1 1 1 2
0 0
0.1 0.15 0.25 2 3 5

𝐻 𝑌|𝑋 𝑝 𝑥 , 𝑦 𝑙𝑜𝑔 𝑝 𝑦 |𝑥
0.05 0.2 0.05 1 2 1
0 0
0.3 0.3 0.3 6 3 6
0.1 0.1 1 1
𝑝 𝑋, 𝑌 0 0 0 0
0.2 0.2 2 2 1.151 bits/symbol
𝑝 𝑋 0.1 0.1 2 1
0 0 0 0
0.3 0.3 3 3
0.05 0.05 0.1 1 1 1
0 0
0.2 0.2 0.2 4 4 2

Verification:
𝐻 𝑋|𝑌 𝐻 𝑋, 𝑌 𝐻 𝑌 3.122 1.743 1.379 bits/symbol
𝐻 𝑌|𝑋 𝐻 𝑋, 𝑌 𝐻 𝑋 3.122 1.743 1.971 1.151 bits/symbol

𝐼 𝑋, 𝑌 𝐻 𝑋 𝐻 𝑋𝑌 1.971 1.379 0.592 bits/symbol

Verification:

𝐼 𝑋, 𝑌 𝐻 𝑌 𝐻 𝑌𝑋 1.743 1.151 0.592 bits/symbol


Continuous Information Theory
Whenever outcome of a source assumes continuous values then the
information conveyed by the sample of the continuous signal is called the
continuous information and it is measured in bits.

Source
𝒑 𝒙 → 𝑝𝑑𝑓
X
∞ 𝒙 ∞

For example: If the variations of temperature is to be transmitted in the form of analog


voltages and if the temperature can assume infinite number of samples then, each
voltage level corresponding to the sampling time Instant will convey some amount of
information.

17℃ 20℃ 15℃


𝑡
0 𝑇 2𝑇 ⋯
Discrete vs Continuous Information Theory

Discrete Information Theory Continuous Information Theory

Continuous 𝑝𝑑𝑓
Discrete
Discrete Information 𝒑 𝒙
Information Source
Source
values ∞ 𝒙 ∞
Continuous values

 A discrete information source generates  A continuous information source can


information at a finite rate; the entropy assume any one of an infinite number
rate, which measures the information of amplitude values and so requires
generated, is finite. an infinite number of binary digits for
its exact specification. The entropy
rate is infinite. An immediate
consequence of this is that in order to
transmit the output of a continuous
information source and recover it
exactly, a channel of infinite capacity
is required.
Discrete vs Continuous Information Theory

 Discrete entropy deals with probability  Continuous entropy deals with


mass functions (PMFs). probability density functions (PDFs).
 Discrete entropy uses a sum over  Continuous entropy uses an integral
individual outcomes. over the entire range of values.
 Discrete entropy is always non‐  Continuous entropy can be negative.
negative. (A negative differential entropy suggests
that the distribution is more "peaked" or
"concentrated" around certain values,
implying fewer choices or less variability.)

 Discrete entropy is measured in bits.  Continuous entropy is also measured


in bits, but the interpretation is more
nuanced due to the continuous nature.

Note: The entropy measures the average amount of information of the random variable.
If the variable is continuous, apparently the entropy will be unbounded since we need an
infinite number of bits to represent a continuous random variable in general. In such a
case we may require an alternative information measure, that is differential entropy.
Entropies in continuous case:
ℎ 𝑋 𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝑑𝑥
Marginal entropies
ℎ 𝑌 𝑝 𝑦 𝑙𝑜𝑔 𝑝 𝑦 𝑑𝑦

ℎ 𝑋, 𝑌 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑥, 𝑦 𝑑𝑥𝑑𝑦 Joint entropy

ℎ 𝑋𝑌 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑥 𝑦 𝑑𝑥𝑑𝑦
Conditional entropies
ℎ 𝑌𝑋 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑦 𝑥 𝑑𝑥𝑑𝑦

Note: It satisfies the conditions,

i). 𝑝 𝑥 𝑑𝑥 1 (area under the pdf)

(mean square value about the mean must


ii). 𝑥 𝜇 𝑝 𝑥 𝑑𝑥 𝜎
be equal to the variance 𝜎 ).
Entropy for uniform distribution:
For a uniform pdf the random variations of the outcome of the source will be such that the
Probability of signal assuming any value between – 𝑎 to 𝑎 will be equally likely.
𝑝 𝑥 𝑓 𝑡
1
2𝑎 𝑎
0
𝑎 𝑡
𝑎 𝑓
𝑎 0 𝑡 …..
𝑡 𝑡
1 1
ℎ 𝑋 𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝑑𝑥 𝑙𝑜𝑔 𝑑𝑥
2𝑎 2𝑎
1 1 𝑎 1
𝑙𝑜𝑔 2𝑎 𝑑𝑥 𝑙𝑜𝑔 2𝑎 . 𝑥 𝑙𝑜𝑔 2𝑎 . 𝑎 𝑎
2𝑎 2𝑎 𝑎 2𝑎
∗ 𝑙𝑜𝑔 𝑥
∴𝒉 𝑿 𝒍𝒐𝒈𝒆 𝟐𝒂 nits/sample 𝑙𝑜𝑔 𝑥
𝑙𝑜𝑔 2
∴ 𝑙𝑜𝑔 𝑥 𝑙𝑜𝑔 2 · 𝑙𝑜𝑔 𝑥
⟹ 𝒉 𝒙 𝒍𝒐𝒈𝒆 𝟐 · 𝒍𝒐𝒈𝟐 𝟐𝒂 bits/sample
Note: If the sampling is done as per the Nyquist sampling theorem then, the sampling rate
𝒓𝒔 𝟐𝒇𝒎 (samples/sec) and the information rate will be,
𝑹 𝒓𝒔 · 𝑯 𝟐𝒇𝒎 𝒍𝒐𝒈𝒆 𝟐𝒂 nits/sec

Or, 𝑹 𝒓𝒔 · 𝑯 𝟐𝒇𝒎 · 𝒍𝒐𝒈𝒆 𝟐 · 𝒍𝒐𝒈𝒆 𝟐𝒂 bits/sec


Entropy for Gaussian (or, normal) distribution: 𝑝 𝑥

1 1
𝑝 𝑥 𝑒 𝑒
2𝜋𝜎 2𝜋𝜎
∴ ℎ 𝑋 𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝑑𝑥

1 1 𝜇 0 𝑥
⁄ ⁄
𝑒 𝑙𝑜𝑔 𝑒 𝑑𝑥
2𝜋𝜎 2𝜋𝜎
1 ⁄ 1 ⁄
𝑒 𝑙𝑜𝑔 𝑙𝑜𝑔 𝑒 𝑑𝑥
2𝜋𝜎 2𝜋𝜎
1 ⁄ 1 1 ⁄ 𝑥
𝑒 · 𝑙𝑜𝑔 𝑑𝑥 𝑒 · 𝑙𝑜𝑔 𝑑𝑥
2𝜋𝜎 2𝜋𝜎 2𝜋𝜎 2𝜎
1
1 ⁄ 1 1 ⁄
𝑙𝑜𝑔 2𝜋𝜎 𝑒 𝑑𝑥 𝑥 𝑒 𝑑𝑥 ∗
2𝜋𝜎 2𝜎 2𝜋𝜎 𝑙𝑜𝑔 𝑒 1

𝑝 𝑥 𝑝 𝑥
1 1
𝑙𝑜𝑔 2𝜋𝜎 ·𝜎 𝑙𝑜𝑔 2𝜋𝜎 𝑙𝑜𝑔 𝑒 ⟹ 𝒉 𝑿 𝒍𝒐𝒈𝒆 𝟐𝝅𝒆𝝈
2𝜎 2
nits/sample
Also, 𝑹 𝒓𝒔 · 𝑯 𝟐𝒇𝒎 𝒍𝒐𝒈𝒆 𝟐𝝅𝒆𝝈 𝒇𝒎 𝒍𝒐𝒈𝒆 𝟐𝝅𝒆𝝈𝟐 nits/sec
𝑹 𝒓𝒔 · 𝑯 𝒇𝒎 𝒍𝒐𝒈𝟐 𝟐𝝅𝒆𝝈𝟐 bits/sec
Maximum value of Entropy:
The maximum value of entropy occurs when pdf is in the form of Gaussian.

Maximization by Lagrange method of multipliers (𝜆 and 𝜆 ):

ℎ 𝑋 ℎ 𝑋 𝜆 𝑝 𝑥 𝑑𝑥 1 𝜆 𝑥 𝜇 𝑝 𝑥 𝑑𝑥 𝜎

𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝑑𝑥 𝜆 𝑝 𝑥 𝑑𝑥 1 𝜆 𝑥 𝜇 𝑝 𝑥 𝑑𝑥 𝜎

𝑝 𝑥 𝑙𝑜𝑔 𝑝 𝑥 𝜆 𝜆 𝑥 𝜇 𝑑𝑥 𝜆 𝜆 𝜎

𝑑ℎ 𝑋 1
0 ⟹ 𝑙𝑜𝑔 𝑝 𝑥 𝜆 𝜆 𝑥 𝜇 𝑝 𝑥 · 𝑑𝑥
𝑑𝑝 𝑥 𝑝 𝑥
⟹ 𝑙𝑜𝑔 𝑝 𝑥 𝜆 1 𝜆 𝑥 𝜇

∴𝑝 𝑥 𝑒 𝑒
𝝀𝟏 𝟏 𝟏
𝑝 𝑥 𝑎𝑒 where, 𝒂 𝒆 and, 𝒃
𝟐𝝈𝟐
⟹ 𝒃𝟐 𝒙 𝝁 𝟐
𝒑 𝒙 𝒂𝒆

You might also like