0% found this document useful (0 votes)
58 views

The Information Theory: C.E. Shannon, A Mathematical Theory of Communication'

The document summarizes key concepts from Claude Shannon's seminal 1948 paper that established information theory. It discusses how Shannon proved that the only mathematical function to quantify information must be the logarithm, defining the unit of information as bits. It describes how source coding aims to make symbol probabilities equal to maximize entropy. The source coding theorem establishes that optimal codes can achieve source entropy rates. It also introduces concepts like channel capacity and mutual information.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

The Information Theory: C.E. Shannon, A Mathematical Theory of Communication'

The document summarizes key concepts from Claude Shannon's seminal 1948 paper that established information theory. It discusses how Shannon proved that the only mathematical function to quantify information must be the logarithm, defining the unit of information as bits. It describes how source coding aims to make symbol probabilities equal to maximize entropy. The source coding theorem establishes that optimal codes can achieve source entropy rates. It also introduces concepts like channel capacity and mutual information.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 43

The Information Theory

C.E. Shannon, ‘A Mathematical Theory of Communication’,

The Bell System Technical Journal, Vol. 27, pp. 379–423, July1948
“If the rate of Information is less
than the Channel capacity then there
exists a coding technique such that
the information can be transmitted
over it with very small probability of
error despite the presence of noise.”
A block diagram of digital
communication system
1011 110010
Message Source Pulse modulator
source coder Error control Line coder shaping

decision detector demodulator channel


The Digital Signals
• The Source draws from an alphabet of M>=2
different symbols, and produces output symbols at
some average rate r.

• A typical computer terminal has approx M=90 and


the users act as discrete source acting at approx
r=5 symbols/sec or so.

• However, the Computer itself works with just M=2


internal symbols, represented as LOW and HIGH
electrical states called binary digits and r may be in
the order of 108 or so.
Electrical Communication system
Message (Physical manifestation of INFORMATION) Message

I/P Transducer O/P Transducer

Electrical signal

Transmitter Receiver
channel

Twisted Copper wire coaxial cable


optical fiber space
What is Information?
For a layman, whatsoever may be the
meaning of information but it should have
following properties

• The amount of information (Ij)


associated with any happening ‘j’ should
be inversely proportional to its
probability of occurrence.

• Ijk = Ij + Ik ; if events i and j are


independent.
Technical aspects of Information
• Shannon proved that the only mathematical
function which can retain the previously stated
properties of information for a symbol produced by
a discrete source is
Ii = log(1/Pi) bits
The base of log (if 2) define the unit of information
(then bits)

• A single binary digit (binit) may carry more/less


than one bit (may not be integer) information
depending upon its source probability.
Source Entropy
• Defined as average amount of information
produced by the source, denoted by H(x).

• Find H(x) for a discrete source which can


produce ‘n’ different symbols in a random
fashion.

• There is a binary source with symbol


probabilities ‘p’ and (1-p). Find the
maximum and minimum value of H(x).
• H(x) = ∑ xi*P(xi) ; If X is discrete
∫ x*p(x) dx ; If X is continuous.
{H(x)= 1/N(N1*x1+N2*x2+ …….)}

• H(x) = Ω (p) = p*log(1/p) + (1-p)*log(1/(1-p))


{can be solved as simple Maxima-Minima
problem}
Ω (p)
1

p
0 0.5 1
Entropy of a M-ary source
• There is a known
mathematical inequality
v -1
(V-1) >= log V
equality holds at V=1

• Let V = (Qi/Pi) ; such that v=1


∑Qi = ∑Pi =1
( P may be assumed as set of log(v)
source symbol probabilities and
Q is another independent set of
probabilities having same
number of elements)
• thus, {(Qi/Pi) – 1}>= log (Qi/Pi)

• Pi*{(Qi/Pi) – 1}>= Pi* log (Qi/Pi)

• ∑ Pi*{(Qi/Pi) – 1}>= ∑ Pi* log (Qi/Pi)

• {∑ Qi - ∑ Pi } = 0 >= ∑ Pi* log (Qi/Pi)

• ∑ Pi* log (Qi/Pi) <= 0

• Let Qi=1/M (all events are equally likely)

• ∑ Pi* log (1/M*Pi) <= 0


• ∑ Pi* log (1/Pi) – log (M) ∑ Pi <=0

• H(x) <= log (M)

• Equality holds when v=1 i.e. Pi=Qi i.e. P should


also be a set of equally likely events.

• Conclusion-
“A source which generates equally likely
symbols will have maximum avg. information”

“Source coding is done to achieve it”


The entropy: 2nd Law of thermodynamics
• German physicist Rudolf Clausius coined the term ENTROPY.
On April 24th 1856, he stated the best-known phrasing of the
second law of thermodynamics:
The entropy of the universe tends to a maximum.

• The statistical entropy function introduced by Ludwig


Boltzmann in 1872: SB = -NKBΣpi.logpi

• Shannon said, “Von Neumann told me, ‘You should call it (average
information) entropy, for two reasons.
– In the first place your uncertainty function has been used in statistical
mechanics under that name, so it already has a name.
– In the second place, and more important, nobody knows what entropy
really is, so in a debate you will always have the advantage.”
http://hyperphysics.phy-astr.gsu.edu/hbase/therm/entrop.html
Coding for Memoryless source
• Generally the information source is not of
designers choice thus source coding is
done such that it appears equally likely to
the channel.

• Coding should neither generate nor destroy


any information produced by the source i.e.
the rate of information at I/P and O/P of a
source coder should be same.
Rate of Information
• If the rate of symbol generation of a source,
with entropy H(x), is r symbols/sec. then
R = r*H(x) and R<= r*log (M)

• If a binary encoder is used then


o/p rate = rb* Ω (p) and <= rb
(if the 0’s and 1’s are equally likely in coded seq)

• Thus as per basic principle of coding theory


R {= r*H(x)} <= rb ; H(x) <= rb/r ; H(x)<= N
• Code efficiency = H(x)/N <=100%
Uniquely Decipherability (Kraft’s inequality)

• A source can produce four symbols


{A(1/2, 0); B(1/4, 1); C(1/8, 10); D(1/8, 11)}.
[symbol (probability, code)]
Then H(x)= 1.75 and N = 1.25 so efficiency > 1
where is the problem?
• Kraft’s inequality
K = ∑2 -Ni <= 1
Source coding algorithms
• Comma code • Huffman
(each word will start with ‘0’ (adding two least symbol
and one extra ‘1’ at the end. probabilities and
first code = 0) rearrangement till two
• Tree code elements, back tracing for
code.)
(no code word appears as
prefix in another codeword,
• nth extension
first code = 0) (form a group by combining
• Shannon – Fano ‘n’ consecutive symbols then
code it.)
( Bi partitioning till last two
elements. ‘0’ in upper/lower
• Lempel – Ziv
part and ‘1’ in lower/upper (Table formation for
part) compressing binary data)
Source Coding Theorem
H(x)<=N< H(x) + φ ; φ should be very small.
Proof:
• It is known that ∑ Pi* log (Qi/Pi) <= 0

• As per Kraft’s inequality 1 = (1/K)∑2 -Ni , thus it can


be assumed that Qi = 2-Ni/K (so that addition of all Qi =1).

• Thus, ∑ Pi*{log(1/Pi) – Ni – log (K)} <=0

• H(x) – N – log (K)<=0; H(x)<= N + log (K)

• since log (K)<=0 (as 0<K<=1) thus H(x)<=N

• For optimum codes K=1 and Pi=Qi


Symbol Probability Vs code length
• We know that an optimum code requires K=1
and Pi=Qi
• Thus, Pi = Qi = 2-Ni/K(=1) thus Ni = log(1/Pi)
• Ni = Ii
(the length of code should be (inversely)
proportional to its information (probability))

Samuel Morse applied this principle long


before Shannon proved it mathematically.
Predictive run encoding
n Encoding Decoding
0 00…00(k) 1
1 00…01 01
2 00...10 001
- - - • ‘run of n’ means ‘n’
successive 0’s followed by
- - - a 1.
• m = 2k-1
m-1 11…10 00..01
• k-digit binary codeword is
>=m 11…11 00..00 (m) sent in place of a ‘run of n’
such that 0<=n<=m-1
Designing parameters
• A run of n has total n+1 bits. If ‘p’ is the probability of
correct prediction by the predictor then the probability
of a run of n is P(n)= pn*(1-p).

• E[n] = E = ∑(n+1)*P(n); (for 0<=n<=infinity) = 1/(1-p)

• The series (1-v)-2=1+2v+3v2+------; for v2<1 is used.

• If n>m such that (L-1)*m<=n<=L*m – 1 then number of


codeword bits required to represent it will be N=L*k

• Write an expression for avg. no. of code digits per run.


• N = k* ∑P(n);0<=n<=(m-1)
+2k*∑ P(n);(m-1)<=n<=(2m-1)
+3k*∑ P(n);(m-1)<=n<=(2m-1) +…….

• It can be solved to N = k/(1-pm)-2

• There is an optimal value of k which


minimizes N for a given predictor.

• N/E = rb/r; measures the compression ratio. It


should be as low as possible.
Discrete Channel Examples
• Binary Erasure Channel (BEC)
2 source and 3 receiver symbols.
(two threshold detection)

• Binary Symmetric Channel


(BSC) P(xi); Probability that the source selects
symbol xi for Tx.
2 source and 2 receiver symbols. P(yj); Probability that symbol yj is received.
P(yj|xi) is called forward transition probability.
(single threshold detection)

Mutual information measures the amount of


information transferred when xi is transmitted and yj is
received.
Mutual Information (MI)
• If we happen to have an ideal noiseless channel
then definitely each yj uniquely identifies a particular
xi; then P(xi|yj)=1 and MI is expected to be equal to
self information of xi.

• On the other hand if channel noise has such a large


effect that yj is totally unrelated to xi then
P(xi|yj)=P(xi) and MI is expected to be zero.

• All real channels falls between these two extremes.

• Shannon suggested following expression for MI


which does satisfy both the above conditions
I(xi;yj) = log {P(xi|yj) / P(xi)} bits
I(X;Y) = ∑ P(xi,yj)*I(xi;yj); (for all possible values of i and j)

Discuss the physical significance of H(X/Y) and H(Y/X).


Mutual information of BSC
Discrete Channel Capacity
• Discrete Channel Capacity (Cs) = max I(X;Y)

• If ‘s’ symbols/sec is the maximum symbol rate


allowed by the channel then channel capacity
(C) = s*Cs bits/sec i.e. maximum rate of
information transfer.

• Shannon’s Fundamental theorem


“If R<C, then there exists a coding technique
such that the O/P of a source can be
transmitted over the channel with an arbitrarily
small frequency of errors.”
The general proof of theorem is well beyond the scope of this course but
following cases may be considered to make it plausible –
(a) Ideal Noiseless Channel
• Let the source generates m=2k symbols then
Cs= max I(X;Y)= max H(x) = log(m)= k and C = s*k.

• Errorless transmission rests on the fact that the channel itself


is noiseless.

• In accordance with coding principle, the rate of information


generated by binary encoder should be equal to the rate of
information over the channel (if source would be connected
directly to the channel)
Ω(p)*rb = s*H(X) on taking maximum of both sides rb= s*k = C

• We have already proved that rb>=R otherwise it will violate


Kraft’s inequality thus C>=R
C Ω (α)
1
1

0 0 0.5 1 α

(b) Binary Symmetric Channel

• I (X;Y) = Ω(α + p - 2*p*α) - Ω(α); Ω(α) being constant for a


given α.
Cs = max I(X;Y) = 1- Ω(α) and C=s*{1- Ω(α)}.

• Ω(α + p - 2*p*α) varies with source probability p and reaches


a maximum value of unity at (α+p - 2*p*α)=1/2.

• Ω(α + p - 2*p*α) =1 if p=1/2; irrespective of α (it is already


proved that Ω(1/2)=1).
Hartley-Shannon Law
C= B*log(1+S/N)

• The bandwidth compression (B/R<1) requires a drastic increase of


signal power.
• What will be the capacity of an infinite bandwidth channel?
• Find minimum required value of S/N0R for bandwidth expansion(B/R>1)
http://web.stanford.edu/class/ee368b/Handouts/04-RateDistortionTheory.pdf
Do it yourself

• Coding for binary symmetric channel


• Derivation of Continuous channel capacity
and ideal communication system with
AWGN
• Ideal Communication Systems
• System comparisons
• Signal space theory

You might also like