الحقيبة الوزارية المرحلة 4 information theroy
الحقيبة الوزارية المرحلة 4 information theroy
1- Random Variables
A random variable, usually written X, is a variable whose possible values are numerical
outcomes of a random phenomenon. There are two types of random variables, discrete
and continuous. All random variables have a cumulative distribution function. It is a
function giving the probability that the random variable X is less than or equal to x, for
every value x.
A discrete random variable is one which may take on only a countable number of distinct
values such as 0,1,2,3,4,........ If a random variable can take only a finite number of
distinct values, then it must be discrete. Examples of discrete random variables include
the number of children in a family, the number of defective light bulbs in a box of ten.
The probability distribution of a discrete random variable is a list of probabilities
associated with each of its possible values. It is also sometimes called the probability
function or the probability mass function.
When the sample space Ω has a finite number of equally likely outcomes, so that the
discrete uniform probability law applies. Then, the probability of any event x is given
by:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓 𝑥
𝑃(𝐴) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓 Ω
This distribution may also be described by the probability histogram. Suppose a random
variable X may take k different values, with the probability that X = 𝑥𝑖 defined to be
P(X = 𝑥𝑖 ) =𝑃𝑖 . The probabilities 𝑃𝑖 must satisfy the following:
∑ 𝑃𝑖 = 1
𝑖=1
Example
Suppose a variable X can take the values 1, 2, 3, or 4. The probabilities associated
with each outcome are described by the following table:
Outcome 1 2 3 4
Probability 0.1 0.3 0.4 0.2
A continuous random variable is one which takes an infinite number of possible values.
Continuous random variables are usually measurements. Examples include height,
weight and the amount of sugar in an orange. A continuous random variable is not
defined at specific values. Instead, it is defined over an interval of values, and is
represented by the area under a curve. The curve, which represents a function p(x),
must satisfy the following:
1 𝑥−𝜇 2
−0.5( )
ℎ= 𝑒 𝜎
𝜎√2𝜋
2- Joint Probability:
Joint probability is the probability of event Y occurring at the same time event X
occurs. Its notation is 𝑃(𝑋 ∩ 𝑌)𝑜𝑟 𝑃(𝑋, 𝑌), which reads; the joint probability of
X and Y.
𝑃(𝑋, 𝑌) = 𝑃(𝑋) × 𝑃(𝑌)
∑ ∑ 𝑓(𝑥, 𝑦) = 1
𝑥 𝑦
Example:
For discrete random variable, if the probability of rolling a four on one die is
𝑃(𝑋) and if the probability of rolling a four on second die is 𝑃(𝑌). Find 𝑃(𝑋, 𝑌).
Solution:
We have 𝑃(𝑋) = 𝑃(𝑌) = 1/6
1 1 1
𝑃(𝑋, 𝑌) = 𝑃(𝑋) × 𝑃(𝑌) = × = = 0.0277 = 2.8%
6 6 36
3- Conditional Probabilities:
It is happened when there are dependent events. We have to use the symbol "|" to
mean "given":
- And we write it as
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴 | 𝐵) =
𝑃(𝐵)
Example: A box contains 5 green pencils and 7 yellow pencils. Two pencils are chosen
at random from the box without replacement. What is the probability they are different
colors?
4- Bayes’ Theorem
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴 | 𝐵) = 𝑃(𝐵) ≠ 0
𝑃(𝐵)
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵 | 𝐴) = 𝑃(𝐴) ≠ 0
𝑃(𝐴)
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴 | 𝐵) × 𝑃(𝐵) = 𝑃(𝐵| 𝐴) × 𝑃(𝐴)
6- Venn's Diagram:
A Venn diagram is a diagram that shows all possible logical relations between a
finite collections of different sets. These diagrams depict elements as points in the
plane, and sets as regions inside closed curves. A Venn diagram consists of multiple
overlapping closed curves, usually circles, each representing a set. The points inside
a curve labelled S represent elements of the set S, while points outside the boundary
represent elements not in the set S. Fig. 5 shows the set 𝐴 = {1, 2, 3}, 𝐵 =
{4, 5} 𝑎𝑛𝑑 𝑈 = {1, 2, 3, 4, 5, 6}.
U
6
4 2 3
5 B 1 A
From the adjoining Venn diagram of Fig. 6, find the following sets:
(𝐵 ∪ 𝐶)′….
Solution:
𝑋 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
𝐵′ = {1, 3, 7, 8, 9, 10},
(𝐶 − 𝐴) = {3, 4, 6, 7, 10},
(𝐵 − 𝐶) = {1, 2, 4, 7, 10},
(𝐴 ∪ 𝐵) = {1, 2, 3, 4, 5, 6},
(𝐴 ∩ 𝐵) = {4, 5},
Basic block diagram of Shannon information theory. Information moves from the
source to the final destination with the overall goal to be minimizing both the noise
and total amount of required content. An information source is a device which
randomly delivers symbols from an alphabet. As an example, a PC (Personal
Computer) connected to internet is an information source which produces binary
digits from the binary alphabet {0, 1}.
1. A source encoder allows one to represent the data source more compactly by
eliminating redundancy: it aims to reduce the data rate.
Noise
8- Self- information:
A nat is the natural unit of information, sometimes also nit or nepit, is a unit
of information or entropy, based on natural logarithms and powers of e, rather than the
powers of 2 and base 2 logarithms which define the bit. This unit is also known by its
unit symbol, the nat.
Example 1:
A fair die is thrown, find the amount of information gained if you are told that 4 will
appear.
Solution:
1
𝑃(1) = 𝑃(2) = ⋯ … … . = 𝑃(6) =
6
1
1 ln( )
𝐼(4) = −log 2 ( ) = 6 = 2.5849 𝑏𝑖𝑡𝑠
6 𝑙𝑛2
Example 2:
A biased coin has P(Head)=0.3. Find the amount of information gained if you are told
that a tail will appear.
Solution:
𝑙𝑛0.7
𝐼(𝑡𝑎𝑖𝑙) = −log 2 (0.7) = − = 0.5145 𝑏𝑖𝑡𝑠
𝑙𝑛2
HW
A communication system source emits the following information with their
corresponding probabilities as follows: A=1/2, B=1/4, C=1/8. Calculate the information
conveyed by each source outputs.
6
I(xi) bits
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P(xi)
Solution:
𝑛
Then:
𝐻𝑏 (𝑋) = −[𝑃(0 𝑇 ) log 2 𝑃(0 𝑇 ) + (1 − 𝑃(0 𝑇 )) log 2 (1 − 𝑃(0 𝑇 ))] 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
If 𝑃(0 𝑇 ) = 0.2, 𝑡ℎ𝑒𝑛 𝑃(1 𝑇 ) = 1 − 0.2 = 0.8, 𝑎𝑛𝑑 𝑝𝑢𝑡 𝑖𝑛 𝑎𝑏𝑜𝑣𝑒 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛,
𝐻𝑏 (𝑋) = −[0.2 log 2 (0.2) + 0.8 log 2 (0.8)] = 0.7
For binary source, if 𝑃(0 𝑇 ) = 𝑃(1 𝑇 ) = 0.5, then the entropy is:
1
𝐻𝑏 (𝑋) = −[0.5 log 2 (0.5) + 0.5 log 2 (0.5)] = − log 2 ( ) = log 2 (2) = 1 𝑏𝑖𝑡
2
is the maximum value of source entropy. Also, 𝐻(𝑋) = 0 if one of the message has the
probability of a certain event or p(x) =1.
𝜏̅ = ∑ 𝜏𝑖 𝑃(𝑥𝑖 )
𝑖=1
𝜏̅ is the average time duration of symbols, 𝜏𝑖 is the time duration of the symbol 𝑥𝑖 .
Example 1:
A source produces dots ‘.’ And dashes ‘-‘ with P(dot)=0.65. If the time duration of dot
is 200ms and that for a dash is 800ms. Find the average source entropy rate.
Solution:
𝑃(𝑑𝑎𝑠ℎ) = 1 − 𝑃(𝑑𝑜𝑡) = 1 − 0.65 = 0.35
𝐻(𝑋) = −[0.65log 2 (0.65) + 0.35log 2 (0.35)] = 0.934 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
𝜏̅ = 0.2 × 0.65 + 0.8 × 0.35 = 0.41 𝑠𝑒𝑐
𝐻(𝑋) 0.34
𝑅(𝑋) = = = 2.278 𝑏𝑝𝑠
𝜏̅ 0.41
Example 2:
A discrete source emits one of five symbols once every millisecond. The symbol
probabilities are 1/2, 1/4, 1/8, 1/16 and 1/16 respectively. Calculate the information rate.
Solution:
5
1
H = ∑ Pi log 2
pi
i=1
1 1 1 1 1
H = log 2 2 + log 2 4 + log 2 8 + log 2 16 + log 2 16
2 4 8 16 16
A source produces dots and dashes; the probability of the dot is twice the probability
of the dash. The duration of the dot is 10msec and the duration of the dash is set to
three times the duration of the dot. Calculate the source entropy rate.
Properties of 𝑰(𝒙𝒊 , 𝒚𝒋 ):
Example:
Show that I(X, Y) is zero for extremely noisy channel.
Solution:
For extremely noisy channel, then 𝑦𝑗 gives no information about 𝑥𝑖 the receiver can’t
decide anything about 𝑥𝑖 as if we transmit a deterministic signal 𝑥𝑖 but the receiver
receives noise like signal 𝑦𝑗 that is completely has no correlation with 𝑥𝑖 . Then 𝑥𝑖 and
𝑦𝑗 are statistically independent so that 𝑃( 𝑥𝑖 ∣∣ 𝑦𝑗 ) = 𝑃(𝑥𝑖 )𝑎𝑛𝑑 𝑃( 𝑦𝑗 ∣∣ 𝑥𝑖 ) =
𝑃(𝑥𝑖 ) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗, 𝑡ℎ𝑒𝑛:
𝐼(𝑥𝑖 , 𝑦𝑗 ) = log 2 1 = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 & 𝑗, 𝑡ℎ𝑒𝑛 𝐼(𝑋, 𝑌) = 0
Marginal entropies is a term usually used to denote both source entropy H(X) defined
as before and the receiver entropy H(Y) given by:
𝑚
𝑏𝑖𝑡𝑠
𝐻(𝑌) = − ∑ 𝑃(𝑦𝑗 ) log 2 𝑃(𝑦𝑗 )
𝑠𝑦𝑚𝑏𝑜𝑙
𝑗=1
𝐻( 𝑌 ∣ 𝑋 ) = 𝐻(𝑋, 𝑌) − 𝐻(𝑋)
𝐻( 𝑋 ∣ 𝑌 ) = 𝐻(𝑋, 𝑌) − 𝐻(𝑌)
Where, the 𝐻( 𝑋 ∣ 𝑌 ) is the losses entropy.
Also we have:
𝐼(𝑋, 𝑌) = 𝐻(𝑋) − 𝐻(𝑋 ∣ 𝑌)
𝐼(𝑋, 𝑌) = 𝐻(𝑌) − 𝐻(𝑌 ∣ 𝑋)
𝐻(𝑋, 𝑌)
[0.5ln(0.5) + 0.25 ln(0.25) + 0.125 ln(0.125) + 2 × 0.0625 ln(0.0625)]
=−
𝑙𝑛2
= 1.875 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
𝑏𝑖𝑡𝑠
3- 𝐻( 𝑌 ∣ 𝑋 ) = 𝐻(𝑋, 𝑌) − 𝐻(𝑋) = 1.875 − 1.06127 = 0.813
𝑠𝑦𝑚𝑏𝑜𝑙
Q.1: A source emits 3 characters (A, B, C) with probabilities of (0.25, 0.35, 0.4)
respectively, calculate:
1) The source entropy H(x)
2) The information rate R(x) if τ(A)= 2msec, τ(B) =3msec, τ(C)=5msec.
Solution:
𝑛
0.4𝑙𝑛0.4 + 0.25𝑙𝑛0.25 + 0.35𝑙𝑛0.35
1)𝐻(𝑥) = − ∑ 𝑃(𝑥𝑖)𝑙𝑜𝑔2 𝑃(𝑥𝑖) = − [ ]
𝑙𝑛2
𝑖=1
= 1.558 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙
2) 𝜏 = ∑𝑛𝑖=1 𝜏𝑖 𝑃(𝑥𝑖) = 2 ∗ 0.25 + 3 ∗ 0.35 + 5 ∗ 0.4 = 3.55 𝑚𝑠𝑒𝑐
𝐻(𝑥) 1.558
𝑅(𝑥) = = = 0.439 𝑘𝑏𝑝𝑠
𝜏 3.55 ∗ 10−3
Q.2 A source emits 8 characters with equal probability, one each 1m sec calculate:
1) The probability of each character
2) The self information for each character
3) Source entropy H(x)
4) information rate R(x)
Solution:
1 1
1) 𝑃(𝑥𝑖) = 𝑛 = 8 ,
2) 𝐼(𝑥𝑖) = −𝑙𝑜𝑔2 𝑃(𝑥𝑖) = 3𝑏𝑖𝑡
3) 𝐻(𝑥) = 𝑙𝑜𝑔2 𝑛 = 𝑙𝑜𝑔2 8 = 3 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙
𝐻(𝑥) 3
4) 𝑅(𝑥) = = = 3 𝑘𝑏𝑝𝑠
𝜏 1 ∗ 10−3
Q.3: A source emits 12 characters with equal probability, one every 3msec, calculate:
1) Source entropy H(x)
= 1.199 𝜇𝑠𝑒𝑐
1.85677
𝑅(𝑋) = = 1.54 𝑀𝑏𝑝𝑠
1.199 × 10−6
Q.4) A source produces a stream flow of twelve letters(Y, E, S) with probabilities P(Y)
1
= P(S) = P(E) ;and P(E)=0.5. Calculate the source entropy H(x).
2
solution:
𝑝(𝐸) = 0.5
𝑝(𝑌) + 𝑝(𝑆) + 𝑝(𝐸) = 1
1 1
𝑝(𝑌) = 𝑝(𝑆) = 𝑝(𝐸) = × 0.5 = 0.25
2 2
𝐻(𝑥) = − ∑ 𝑝(𝑥). log 2 𝑝(𝑥)
𝐻(𝑥) = −(2 × 0.25 × log 2 0.25 + 0.5 log 2 0.5) = 1.5 𝑏𝑖𝑡⁄𝑠𝑦𝑚𝑏𝑜𝑙
2.1- Channel:
In telecommunications and computer networking, a communication channel
or channel, refers either to a physical transmission medium such as a wire, or to
a logical connection over a multiplexed medium such as a radio channel. A channel is
used to convey an information signal, for example a digital bit stream, from one or
several senders (or transmitters) to one or several receivers. A channel has a certain
capacity for transmitting information, often measured by its bandwidth in Hz or
its data rate in bits per second.
1 1
1-P
( ∣ )
( ∣ )
( ∣ )
( ∣ )
( ∣ ) [ ]
The TSC is symmetric but not very practical since practically and are not
affected so much as . In fact the interference between and is much less than
the interference between and or and .
1-2Pe
X1 Y1
Pe
X2 1-2Pe Y2
Pe
X3 Y3
1-2Pe
Hence the more practice but nonsymmetric channel has the trans. prob.
( ∣ ) [ ]
Where interfere with exactly the same as interference between and , but
and are not interfere.
1-Pe
X1 Y1
Pe
model of TSC
X2 1-2Pe Y2
Pe
X3 Y3
1-Pe
1- Lossless channel: It has only one nonzero element in each column of the
transitional matrix P(Y∣X).
( ∣ ) [ ]
This channel has H(X∣Y)=0 and I(X, Y)=H(X) with zero losses entropy.
2- Deterministic channel: It has only one nonzero element in each row, the
transitional matrix P(Y∣X), as an example:
( ∣ )
[ ]
This channel has H(Y∣X)=0 and I(Y, X)=H(Y) with zero noisy entropy.
3- Noiseless channel: It has only one nonzero element in each row and column, the
transitional matrix P(Y∣X), i.e. it is an identity matrix, as an example:
( ∣ ) [ ]
The Shannon-Hartley theorem states that the channel capacity is given by:
( )
Where C is the capacity in bits per second, B is the bandwidth of the channel in Hertz,
and S/N is the signal-to-noise ratio.
The Discrete Memoryless Channel (DMC) has an input X and an output Y. At any
given time (t), the channel output Y= y only depends on the input X = x at that time (t)
and it does not depend on the past history of the input. DMC is represented by the
conditional probability of the output Y = y given the input X = x, or ( ).
X Channel Y
( )
The Binary Erasure Channel (BEC) model are widely used to represent channels or
links that “losses” data. Prime examples of such channels are Internet links and routes.
A BEC channel has a binary input X and a ternary output Y.
DR. MAHMOOD 2017-10-15 19
1-Pe
X1 Y1
Pe
Erasure
X2 Y2
1-Pe
Note that for the BEC, the probability of “bit error” is zero. In other words, the
following conditional probabilities hold for any BEC model:
( ∣ )
( ∣ )
( ∣ )
( ∣ )
( ∣ )
( ∣ )
, ( )-
Physically it is the maximum amount of information each symbol can carry to the
receiver. Sometimes this capacity is also expressed in bits/sec if related to the rate of
producing symbols r:
( ) ( ) ( ) ( ) ̅
For example the following conditional probability of various channel types as shown:
But we have
( ) ( ) ( ∣ )
( ) ( ) ∑∑ ( ) ( ∣ ) ( ∣ )
∑ ( ∣ ) ( ∣ )
Where K is constant and independent of the row number i so that the equation
becomes:
( ) ( ) ∑ ( )
Example 9:
For the BSC shown:
0.7
X1 Y1
X2 0.7 Y2
( ∣ ) 0 1
( )
The channel efficiency
( ) ( )
( ) ( ) , -
( ) 0 1=0 1
( ) , - ( )
( ) ( )
Then
Example 10:
Find the channel capacity for the channel having the following transition:
( ∣ ) 0 1
Solution: First not that the channel is not symmetric since the 1st row is not
permutation of 2nd row.
( ) ( ) ( ∣ )
( ) [ ]
( ) ( )
We have
( ∣ ) ∑∑ ( ) ( ∣ )
( ∣ ) , ( ) ( ) -
, -
( ∣ ) ( ∣ )
b- , -
( ∣ )
, -
Then ( ) ,( ) ( ) ( ) ( )-
( )
, ( ) ( ) -
( ) , * ( ) ( )+- 0 . /1
c- We have ( ) ( ) ( ∣ )
( ) ( ) ( ∣ )
So that
0 . /1
[ ( )]
( )
( )
( )
( )
( ) ,( ) ( )
( ) ( )-
( )
( ∣ )
d-
, ( )-
Review questions:
A binary source sending with a probability of and with probability
through a channel with a probabilities of errors of 0.1 for and 0.2 for .Determine:
1- Source entropy.
2- Marginal entropy.
3- Joint entropy.
4- Conditional entropy ( ).
5- Losses entropy ( ).
6- Transinformation.
Solution:
1- The channel diagram:
0.9
0.4 𝑥 𝑦
0.1
0.2
0.6 𝑥 𝑦
0.8
Or ( ) 0 1
( ) ∑ ( ) ( )
, ( ) ( )-
( )
2- ( ) ( ) ( )
( ) 0 1 0 1
( ) , -
, ( ) ( )-
( )
( )
3- ( )
( ) ∑∑ ( ) ( )
, ( ) ( ) ( ) ( )-
( )
( )
4- ( )
( ) ∑∑ ( ) ( )
, ( ) ( ) ( ) ( )-
( )
( )
Or ( ) ( ) ( )
5- ( ) ( ) ( )
6- ( ) ( ) ( ) bits/symbol
3- Cascading of Channels
If two channels are cascaded, then the overall transition matrix is the product of the
two transition matrices.
p( z / x) p( y / x). p( z / y)
(n k ) ( n m) ( m k )
matrix matrix matrix
1 1 1
. . .
. Channel 1 . Channel 1 .
n m k
Example:
Find the transition matrix p( z / x) for the cascaded channel shown.
0.8
0.7
0.2 0.3
0.3 1
0.7
1
0.7 0.3
0.8 0.2 0 ,
p ( y / x) p( z / y ) 1 0
0.3 0 0.7
1 0
0.7 0.3
0.8 0.2 0 0.76 0.24
p ( z / x) 1 0
0.3 0 0.7 0.91 0.09
1 0
1- Sampling theorem:
To proof of sampling theorem: Let x(t) the continuous time signal shown in figure
below, its band width does not contain any frequency components higher than W
Hz. A sampling function samples this signal regularly at the rate of fS sample per
second.
Notice that the Fourier transform of an impulse train is another impulse train.
So that
∑ ∑
When the sampling rate is chosen each spectral replicate is separated from
each of its neighbors by a frequency band exactly equal to hertz, and the analog
waveform ca theoretically be completely recovered from the samples, by the use of
filtering. It should be clear that if , the replications will be move farther apart
in frequency making it easier to perform the filtering operation.
When the sampling rate is reduced, such that , the replications will overlap,
as shown in figure below, and some information will be lost. This phenomenon is
called aliasing.
Sampled spectrum
Sampled spectrum
Example: Find the Nyquist rate and Nyquist interval for the following signals.
i-
ii-
Solution:
i-
Nyquist interval
Nyquist rate =
ii- [ { }]
{ }
Nyquist rate =
H. W:
Find the Nyquist interval and Nyquist rate for the following:
ii-
Example:
Number of sample in
2- Source coding:
An important problem in communications is the efficient representation of data
generated by a discrete source. The process by which this representation is
accomplished is called source encoding. An efficient source encoder must satisfies two
functional requirements:
∑ ( ) ( )
We have:
The overall code length, L, can be defined as the average code word length:
Not that
then:
1- LC log 2 n bit/message if n 2 r ( n 2,4,8,16,.... and r is
an integer) which gives 100%
Example
For ten equiprobable messages coded in a fixed length code then
1
p( xi ) and LC Int[log 2 10] 1 4 bits
10
H(X ) log 2 10
and 100% 100% 83.048%
LC 4
Example: For eight equiprobable messages coded in a fixed length code then
1 3
p( xi ) and LC log 2 8 3 bits and 100% 100%
8 3
Example: Find the efficiency of a fixed length code used to encode messages obtained
from throwing a fair die (a) once, (b) twice, (c) 3 times.
Solution
c- For three throws then the possible messages are n 6 6 6 216 with
equal probabilities
C(a)= 0 l(a)=1
C(b)= 10 l(b)=2
C(c)= 11 l(c)=2
The major property that is usually required from any variable-length code is that of
unique decodability. For example, the above code C for the alphabet X = {a, b, c}
is soon shown to be uniquely decodable. However such code is not uniquely
{ }
When message probabilities are not equal, then we use variable length codes. The
following properties need to be considered when attempting to use variable length
codes:
1) Unique decoding:
Example
Consider a 4 alphabet symbols with symbols represented by binary digits as
follows:
A0
B 01
C 11
D 00
If we receive the code word 0011 it is not known whether the transmission was DC
or AAC . This example is not, therefore, uniquely decodable.
2) Instantaneous decoding:
Example
Consider a 4 alphabet symbols with symbols represented by binary digits as
follows:
A0
B 10
C 110
Example
Consider a 4 alphabet symbols with symbols represented by binary digits as follows:
A0
B 01
C 011
D 111
The code is identical to the previous example but the bits are time reversed. It is still
uniquely decodable but no longer instantaneous, since early codewords are now
prefixes of later ones.
Shannon Code
For messages x1 , x2 , x3 ,… xn with probabilities p( x1 ) , p( x2 ) , p( x3 ) ,… p( xn ) then:
r
1) li log 2 p( xi ) if p( xi ) 1 1 1 1
{ , , ,...}
2 2 4 8
r
2) li Int[ log 2 p( xi )] 1 if p( x ) 1
i
2
i 1
Also define Fi p( xk ) 1 i 0
k 1
Ci Fi 2i
l
x1 0.3 2 0 00 2
0 0 1
0 1 0
To find To find
1 1
0 1
1 0
0 0
7
H ( X ) p( x i ) log 2 p( x i ) 2.6029 bits/message.
i 1
H(X )
100% 83.965%
LC
(b) p(0) at the encoder output is
7
p(0) 0.603
Example
Repeat the previous example using ternary coding.
Solution
r
1) li log 3 p( xi ) if p( x ) 1 1 1 1
i
{ , , ,...}
3 3 9 27
xi p( xi ) li Fi Ci 0i
x1 0.3 2 0 00 2
x2 0.2 2 0.3 02 1
x3 0.15 2 0.5 11 0
x4 0.12 2 0.65 12 0
0 0 1
1
To find To find
1 2
2 0
7
H ( X ) p( x i ) log 3 p( x i ) 1.642 ternary unit/message.
i 1
H(X )
100% 73.632%
LC
(b) p(0) at the encoder output is
7
p(0) 0.404
In Shannon–Fano coding, the symbols are arranged in order from most probable to
least probable, and then divided into two sets whose total probabilities are as close
as possible to being equal. All symbols then have the first digits of their codes
assigned; symbols in the first set receive "0" and symbols in the second set receive
"1". As long as any sets with more than one member remain, the same process is
repeated on those sets, to determine successive digits of their codes.
Example:
The five symbols which have the following frequency and probabilities, design
suitable Shannon-Fano binary code. Calculate average code length, source entropy
and efficiency.
∑ ( ) ( )
Example
Develop the Shannon - Fano code for the following set of messages,
p( x) [0.35 0.2 0.15 0.12 0.1 0.08] then find the code efficiency.
Solution
xi p( xi ) Code li
x1 0.35 0 0 2
x2 0.2 0 1 2
x3 0.15 1 0 0 3
x4 0.12 1 0 1 3
x5 0.10 1 1 0 3
x6 0.08 1 1 1 3
6
LC li p( xi ) 2.45 bits/symbol
i 1
H(X )
100% 97.796%
LC
Example
Repeat the previous example using with r 3
Solution
xi p( xi ) Code li
x1 0.35 0 1
x2 0.2 1 0 2
x3 0.15 1 1 2
x4 0.12 2 0 2
x5 0.10 2 1 2
x6 0.08 2 2 2
6
LC li p( xi ) 1.65 ternary unit/symbol
i 1
6
H ( X ) p( xi ) log 3 p( xi ) 1.512 ternary unit/symbol
i 1
H(X )
100% 91.636%
LC
Huffman Code
The Huffman coding algorithm comprises two steps, reduction and splitting.
These steps can be summarized as follows:
2) Splitting
bits/symbol
The average code word length is still 2.2 bits/symbol. But variances are different!
Example
Symbol A B C D E F G H
0
B 0.18 0.18 0.18 0.19 0.23 0.37 0.40
1
0
A 0.10 0.10 0.13 0.18 0.19 0.23
1
0
F 0.10 0.10 0.10 0.13 0.18
1
0
G 0.07 0.09 0.10 0.10
1
0
E 0.06 0.07 0.09 1
0
D 0.05 0.06 1
H 0.04 1
li 3 3 1 5 4 4 4 5
8
H ( X ) p( xi ) log 2 p( xi ) 2.552 bits/symbol
i 1
8
LC li p( xi ) 2.61 bits/symbol
i 1
Note:
The condition that the number of symbols n so that we can decode them using r
Huffman coding is n r must be an integer value, otherwise, add a redundant symbols
r 1
with a probabilities equal to zero so that the condition is satisfied.
Data Compression:
In computer science and information theory, data compression, source coding, or bit-
rate reduction involves encoding information using fewer bits than the original
representation. Compression can be either lossy or lossless.
The input message to RLE encoder is a variable while the output code word is fixed,
unlike Huffman code where the input is fixed while the output is varied.
Example : Consider these repeated pixels values in an image … 0 0 0 0 0 0 0 0 0 0 0 0
5 5 5 5 0 0 0 0 0 0 0 0 We could represent them more efficiently as (12, 0)(4, 5)(8, 0)
24 bytes reduced to 6 which gives a compression ratio of 24/6 = 4:1.
Example :Original Sequence (1 Row): 111122233333311112222 can be encoded as:
(4,1),(3,2),(6,3),(4,1),(4,2). 21 bytes reduced to 10 gives a compression ratio of 21/10
= 21:10.
Example : Original Sequence (1 Row): – HHHHHHHUFFFFFFFFFFFFFF can be
encoded as: (7,H),(1,U),(14,F) . 22 bytes reduced to 6 gives a compression ratio of
22/6 = 11:3 .
Savings Ratio : the savings ratio is related to the compression ratio and is a measure
of the amount of redundancy between two representations (compressed and
uncompressed). Let:
N1 = the total number of bytes required to store an uncompressed (raw) source image.
N2 = the total number of bytes required to store the compressed data.
The compression ratio Cr is then defined as:
Higher saving ratio indicate more effective compression while negative ratios
are possible and indicate that the compressed image has larger memory size
than the original.
The idea of error detection and/or correction is to add extra bits to the digital
message that enable the receiver to detect or correct errors with limited
capabilities. These extra bits are called parity bits. If we have k bits, r parity bits
are added, then the transmitted digits are:
Here n called code word denoted as (n, k). The efficiency or code rate is equal
to .
Ideally, FEC codes can be used to generate encoding symbols that are
transmitted in packets in such a way that each received packet is fully useful to a
receiver to reassemble the object regardless of previous packet reception
patterns. The most applications of FEC are:
Compact Disc (CD) applications, digital audio and video, Global System Mobile (GSM) and
Mobile communications.
However if the data bits are spread or changed at the output code word then,
the code is said to be nonsystematic. The output of nonsystematic code word
of (n, k):
It is a linear block codes (systematic codes). In this code, an extra bit is added
for each k information and hence the code rate (efficiency) is ⁄ . At the
receiver if the number of 1’s is odd then the error is detected. The minimum
Hamming distance for this category is dmin =2, which means that the simple
parity code is a single-bit error-detecting code; it cannot correct any error.
There are two categories in this type: even parity (ensures that a code word has
an even number of 1's) and odd parity (ensures that a code word has an odd
number of 1's) in the code word.
Example: an even parity-check code of (5, 4) which mean that, k =4 and n =5.
The above table can be repeated with odd parity-check code of (5, 4) as follow:
Data word Code word Data word Code word
0010 00100 0110 01101
1010 10101 1000 10000
Note:
Error detection was used in early ARQ (Automatic Repeat on Request) systems.
If the receiver detects an error, it asks the transmitter (through another
backward channel) to retransmit.
The sender is calculate the parity bit to be added to the data word to form a
code word. At the receiver, a syndrome is calculated. The syndrome is passed
to the decision logic analyzer. If the syndrome is 0, there is no error in the
received codeword; the data portion of the received codeword is accepted as
the data word; if the syndrome is 1, the data portion of the received codeword
is discarded. The data word is not created as shown in figure below.
The repetition code is one of the most basic error-correcting codes. The idea of
the repetition code is to just repeat the message several times. The encoder is a
simple device that repeats, r times.
For example, if we have a (3, 1) repetition code, then encoding the signal
m=101001 yields a code c=111000111000000111.
Suppose we received a (3, 1) repetition code and we are decoding the signal
c=110001111. The decoded message is m=101. For (r, 1) repetition code an
error correcting capacity of (i.e. it will correct up to errors in any code
word). In other word the , or increasing the correction capability
depending on r value. Although this code is very simple, it also inefficient and
wasteful because using only (2, 1) repetition code, that would mean we have to
double the size of the bandwidth which means doubling the cost.
Linear block codes extend of parity check code by using a larger number of
parity bits to either detect more than one error or correct for one or more errors.
A block codes of an (n, k) binary block code can be selected a 2k codewords
from 2n possibilities to form the code, such that each k bit information block is
uniquely mapped to one of these 2k codewords. In linear codes the sum of any
two codewords is a codeword. The code is said to be linear if, and only if the
sum of is also a code vector, where are codeword vectors and
represents modulo-2 addition.
In the codeword, there are k data bits and redundant (check) bits,
giving a total of n codeword bits.
1. r parity bits are added to an k - bit data word, forming a code word of n
bits .
This table describes which parity bits cover which transmitted bits in the
encoded word. For example, p2 provides an even parity for bits 2, 3, 6, and 7.
It also details which transmitted by which parity bit by reading the column.
For example, d1 is covered by p1 and p2 but not p3. This table will have a
striking resemblance to the parity-check matrix (H).
Example:
Suppose we want to transmit the data 1011 over noisy communication channel.
Determine the Hamming code word.
Solution:
The first step is to calculate the parity bit value as follow and put it in the
corresponding position as follow:
Suppose the following noise is added to the code word, then the received code
becomes as:
The noise:
Hamming matrices: