0% found this document useful (0 votes)
32 views68 pages

‎⁨الحقيبة الوزارية المرحلة 4 information theroy⁩

Uploaded by

molamoe11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views68 pages

‎⁨الحقيبة الوزارية المرحلة 4 information theroy⁩

Uploaded by

molamoe11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Chapter one

1- Random Variables

A random variable, usually written X, is a variable whose possible values are numerical
outcomes of a random phenomenon. There are two types of random variables, discrete
and continuous. All random variables have a cumulative distribution function. It is a
function giving the probability that the random variable X is less than or equal to x, for
every value x.

1-1 Discrete Random Variables

A discrete random variable is one which may take on only a countable number of distinct
values such as 0,1,2,3,4,........ If a random variable can take only a finite number of
distinct values, then it must be discrete. Examples of discrete random variables include
the number of children in a family, the number of defective light bulbs in a box of ten.
The probability distribution of a discrete random variable is a list of probabilities
associated with each of its possible values. It is also sometimes called the probability
function or the probability mass function.

When the sample space Ω has a finite number of equally likely outcomes, so that the
discrete uniform probability law applies. Then, the probability of any event x is given
by:

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓 𝑥
𝑃(𝐴) =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓 Ω

This distribution may also be described by the probability histogram. Suppose a random
variable X may take k different values, with the probability that X = 𝑥𝑖 defined to be
P(X = 𝑥𝑖 ) =𝑃𝑖 . The probabilities 𝑃𝑖 must satisfy the following:

1- 0 < 𝑃𝑖 < 1 for each i

Prof. Dr. Mahmood 17 October 2021 1


2- 𝑃1 + 𝑃2 + ⋯ + 𝑃𝑘 = 1 or,
𝑘

∑ 𝑃𝑖 = 1
𝑖=1

Example
Suppose a variable X can take the values 1, 2, 3, or 4. The probabilities associated
with each outcome are described by the following table:

Outcome 1 2 3 4
Probability 0.1 0.3 0.4 0.2

Figure 1: probability distribution

The cumulative distribution function for the above probability distribution is


calculated as follows:
The probability that X is less than or equal to 1 is 0.1,
the probability that X is less than or equal to 2 is 0.1+0.3 = 0.4,
the probability that X is less than or equal to 3 is 0.1+0.3+0.4 = 0.8, and
the probability that X is less than or equal to 4 is 0.1+0.3+0.4+0.2 = 1.

Prof. Dr. Mahmood 17 October 2021 2


H.W: Having a text of (ABCAABDCAA). Calculate the probability of each letter,
plot the probability distribution and the cumulative distribution.

Figure 2: cumulative distribution

1-2 Continuous Random Variables

A continuous random variable is one which takes an infinite number of possible values.
Continuous random variables are usually measurements. Examples include height,
weight and the amount of sugar in an orange. A continuous random variable is not
defined at specific values. Instead, it is defined over an interval of values, and is
represented by the area under a curve. The curve, which represents a function p(x),
must satisfy the following:

1: The curve has no negative values (p(x) > 0 for all x)


2: The total area under the curve is equal to 1.

A curve meeting these requirements is known as a density curve. If any interval of


numbers of equal width has an equal probability, then the curve describing the
distribution is a rectangle, with constant height across the interval and 0 height
elsewhere, these curves are known as uniform distributions.

Prof. Dr. Mahmood 17 October 2021 3


Figure 3: Uniform distribution

Another type of distribution is the normal distribution having a bell-shaped density


curve described by its mean 𝜇 and standard deviation 𝜎. The height of a normal
density curve at a given point x is given by:

1 𝑥−𝜇 2
−0.5( )
ℎ= 𝑒 𝜎
𝜎√2𝜋

Figure 4: The Standard normal curve

2- Joint Probability:

Joint probability is the probability of event Y occurring at the same time event X
occurs. Its notation is 𝑃(𝑋 ∩ 𝑌)𝑜𝑟 𝑃(𝑋, 𝑌), which reads; the joint probability of
X and Y.
𝑃(𝑋, 𝑌) = 𝑃(𝑋) × 𝑃(𝑌)

Prof. Dr. Mahmood 17 October 2021 4


If X and Y are discrete random variables, then 𝑓(𝑥, 𝑦) must satisfy:
0 ≤ 𝑓(𝑥, 𝑦) ≤ 1 and,

∑ ∑ 𝑓(𝑥, 𝑦) = 1
𝑥 𝑦

If X and Y are continuous random variables, then 𝑓(𝑥, 𝑦) must satisfy:


𝑓(𝑥, 𝑦) ≥ 0 and,
∞ ∞
∫ ∫ 𝑓(𝑥, 𝑦) = 1
−∞ −∞

Example:
For discrete random variable, if the probability of rolling a four on one die is
𝑃(𝑋) and if the probability of rolling a four on second die is 𝑃(𝑌). Find 𝑃(𝑋, 𝑌).
Solution:
We have 𝑃(𝑋) = 𝑃(𝑌) = 1/6
1 1 1
𝑃(𝑋, 𝑌) = 𝑃(𝑋) × 𝑃(𝑌) = × = = 0.0277 = 2.8%
6 6 36

3- Conditional Probabilities:

It is happened when there are dependent events. We have to use the symbol "|" to
mean "given":

- P(B|A) means "Event B given Event A has occurred".

- P(B|A) is also called the "Conditional Probability" of B given A has occurred .

- And we write it as

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓 𝐴 𝑎𝑛𝑑 𝐵


𝑃(𝐴 | 𝐵) =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓 𝐵

Prof. Dr. Mahmood 17 October 2021 5


Or

𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴 | 𝐵) =
𝑃(𝐵)

Where 𝑃(𝐵) > 0

Example: A box contains 5 green pencils and 7 yellow pencils. Two pencils are chosen
at random from the box without replacement. What is the probability they are different
colors?

Solution: Using a tree diagram:

4- Bayes’ Theorem

Bayes’ theorem: an equation that allows us to manipulate conditional probabilities.


For two events, A and B, Bayes’ theorem lets us to go from p(B|A) to p(A|B).

𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴 | 𝐵) = 𝑃(𝐵) ≠ 0
𝑃(𝐵)
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵 | 𝐴) = 𝑃(𝐴) ≠ 0
𝑃(𝐴)
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴 | 𝐵) × 𝑃(𝐵) = 𝑃(𝐵| 𝐴) × 𝑃(𝐴)

Prof. Dr. Mahmood 17 October 2021 6


𝑃(𝐴)
𝑃(𝐴 | 𝐵) = 𝑃(𝐵 | 𝐴) × 𝑃(𝐵) ≠ 0
𝑃(𝐵)
Example:

If 𝑃(𝑋 = 0) = 0.2, 𝑃(𝑋 = 1) = 0.3, 𝑃(𝑋 = 2) = 0.5, 𝑃(𝑌 = 0) =


0.4 𝑎𝑛𝑑 𝑃(𝑌 = 1) = 0.6. Determine 𝑃(𝑋 = 0 |𝑌 = 0), 𝑃(𝑋 = 1 |𝑌 = 0)

5- Independence of Two Variables:

The concept of independent random variables is very similar to independent events.


If two events A and B are independent, we have P(A,B)=P(A)P(B)=P(A∩B). For
example, let’s say you wanted to know the average weight of a bag of sugar so you
randomly sample 50 bags from various grocery stores. You wouldn’t expect the
weight of one bag to affect another, so the variables are independent.

6- Venn's Diagram:

A Venn diagram is a diagram that shows all possible logical relations between a
finite collections of different sets. These diagrams depict elements as points in the
plane, and sets as regions inside closed curves. A Venn diagram consists of multiple
overlapping closed curves, usually circles, each representing a set. The points inside
a curve labelled S represent elements of the set S, while points outside the boundary
represent elements not in the set S. Fig. 5 shows the set 𝐴 = {1, 2, 3}, 𝐵 =
{4, 5} 𝑎𝑛𝑑 𝑈 = {1, 2, 3, 4, 5, 6}.

U
6
4 2 3
5 B 1 A

Figure 5: An example of Venn's Diagrams

Prof. Dr. Mahmood 17 October 2021 7


Example:

From the adjoining Venn diagram of Fig. 6, find the following sets:

A, B, C, X, A', B', C-A, B-C, A ∪ B, A ∩ B

(𝐵 ∪ 𝐶)′….

Solution:

𝐴 = {1,3, 4, 5}, 𝐵 = {2, 4, 5, 6}, 𝐶 = {1, 5, 6, 7, 10}

𝑋 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

𝐴′ = {2, 6, 7, 8, 9, 10}, Figure 6:Venn's Diagram

𝐵′ = {1, 3, 7, 8, 9, 10},

(𝐶 − 𝐴) = {3, 4, 6, 7, 10},

(𝐵 − 𝐶) = {1, 2, 4, 7, 10},

(𝐴 ∪ 𝐵) = {1, 2, 3, 4, 5, 6},

(𝐴 ∩ 𝐵) = {4, 5},

(𝐵 ∪ 𝐶)′ = {3, 8, 9}……..

7- Shannon Information Theory:

Basic block diagram of Shannon information theory. Information moves from the
source to the final destination with the overall goal to be minimizing both the noise
and total amount of required content. An information source is a device which
randomly delivers symbols from an alphabet. As an example, a PC (Personal
Computer) connected to internet is an information source which produces binary
digits from the binary alphabet {0, 1}.

1. A source encoder allows one to represent the data source more compactly by
eliminating redundancy: it aims to reduce the data rate.

Prof. Dr. Mahmood 17 October 2021 8


2. A channel encoder adds redundancy to protect the transmitted signal against
transmission errors.

Noise

Figure 7: Shannon paradigm


3. A channel is a system which links a transmitter to a receiver. It includes signaling
equipment and pair of copper wires or coaxial cable or optical fiber, among other
possibilities.
4. The rest of blocks is the receiver end, each block has inverse processing to the
corresponding transmitted end.

8- Self- information:

In information theory, self-information is a measure of the information


content associated with the outcome of a random variable. It is expressed in a unit of
information, for example bits, nats, or hartleys, depending on the base of the logarithm
used in its calculation.

Prof. Dr. Mahmood 17 October 2021 9


A bit is the basic unit of information in computing and digital communications. A bit
can have only one of two values, and may therefore be physically implemented with a
two-state device. These values are most commonly represented as 0 and 1.

A nat is the natural unit of information, sometimes also nit or nepit, is a unit
of information or entropy, based on natural logarithms and powers of e, rather than the
powers of 2 and base 2 logarithms which define the bit. This unit is also known by its
unit symbol, the nat.

The hartley (symbol Hart) is a unit of information defined by International


Standard IEC 80000-13 of the International Electrotechnical Commission. One hartley
is the information content of an event if the probability of that event occurring is 1/10. It
is therefore equal to the information contained in one decimal digit (or dit).

1 Hart ≈ 3.322 Sh ≈ 2.303 nat.


9- Logarithmic measure of information:
The amount of self-information contained in a probabilistic event depends only on
the probability of that event: the smaller its probability, the larger the self-information
associated with receiving the information that the event indeed occurred as shown in
Fig.8.
i- Information is zero if 𝑃(𝑥𝑖 ) = 1 (certain event)
ii- Information increase as 𝑃(𝑥𝑖 ) decrease to zero
iii- Information is a +ve quantity
The log function satisfies all previous three points hence:
1
𝐼(𝑥𝑖 ) = log 𝑎 = −log 𝑎 𝑃(𝑥𝑖 )
𝑃(𝑥𝑖 )
Where 𝐼(𝑥𝑖 ) is self information of (𝑥𝑖 ) and if:
i- If “a” =2 , then 𝐼(𝑥𝑖 ) has the unit of bits
ii- If “a”= e = 2.71828, then 𝐼(𝑥𝑖 ) has the unit of nats
iii- If “a”= 10, then 𝐼(𝑥𝑖 ) has the unit of hartly

Prof. Dr. Mahmood 17 October 2021 10


𝑙𝑛𝑥
Recall that log 𝑎 𝑥 =
𝑙𝑛𝑎

Example 1:

A fair die is thrown, find the amount of information gained if you are told that 4 will
appear.

Solution:

1
𝑃(1) = 𝑃(2) = ⋯ … … . = 𝑃(6) =
6
1
1 ln( )
𝐼(4) = −log 2 ( ) = 6 = 2.5849 𝑏𝑖𝑡𝑠
6 𝑙𝑛2

Example 2:

A biased coin has P(Head)=0.3. Find the amount of information gained if you are told
that a tail will appear.

Solution:

𝑃(𝑡𝑎𝑖𝑙) = 1 − 𝑃(𝐻𝑒𝑎𝑑) = 1 − 0.3 = 0.7

𝑙𝑛0.7
𝐼(𝑡𝑎𝑖𝑙) = −log 2 (0.7) = − = 0.5145 𝑏𝑖𝑡𝑠
𝑙𝑛2

HW
A communication system source emits the following information with their
corresponding probabilities as follows: A=1/2, B=1/4, C=1/8. Calculate the information
conveyed by each source outputs.

Prof. Dr. Mahmood 17 October 2021 11


10

6
I(xi) bits

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P(xi)

Figure 8: Relation between probability and self-information

10- Average information (entropy):

In information theory, entropy is the average amount of information contained in


each message received. Here, message stands for an event, sample or character
drawn from a distribution or data stream. Entropy thus characterizes our uncertainty
about our source of information.

10-1 Source Entropy:

If the source produces not equiprobable messages then 𝐼(𝑥𝑖 ), 𝑖 = 1, 2, … … . . , 𝑛 are


different. Then the statistical average of 𝐼(𝑥𝑖 ) over i will give the average amount of
uncertainty associated with source X. This average is called source entropy and
denoted by 𝐻(𝑋), given by:
𝑛

𝐻(𝑋) = ∑ 𝑃(𝑥𝑖 ) 𝐼(𝑥𝑖 )


𝑖=1
𝑛

∴ 𝐻(𝑋) = − ∑ 𝑃(𝑥𝑖 ) log 𝑎 𝑃(𝑥𝑖 )


𝑖=1
Example:
Find the entropy of the source producing the following messages:

Prof. Dr. Mahmood 17 October 2021 12


𝑃𝑥1 = 0.25, 𝑃𝑥2 = 0.1, 𝑃𝑥3 = 0.15, 𝑎𝑛𝑑 𝑃𝑥4 = 0.5

Solution:
𝑛

𝐻(𝑋) = − ∑ 𝑃(𝑥𝑖 ) log 𝑎 𝑃(𝑥𝑖 )


𝑖=1
[0.25𝑙𝑛0.25 + 0.1𝑙𝑛0.1 + 0.15𝑙𝑛0.15 + 0.5𝑙𝑛0.5]
=−
𝑙𝑛2
𝑏𝑖𝑡𝑠
𝐻(𝑋) = 1.7427
𝑠𝑦𝑚𝑏𝑜𝑙
10-2 Binary Source entropy:

In information theory, the binary entropy function, denoted or H(X) or Hb(X), is


defined as the entropy of a Bernoulli process with probability p of one of two values.
Mathematically, the Bernoulli trial is modelled as a random variable X that can take on
only two values: 0 and 1:

𝑃(0 𝑇 ) + 𝑃(1 𝑇 ) = 1 → 𝑃(1 𝑇 ) = 1 − 𝑃(0 𝑇 )


We have: find binary source entropy if p(0) =0.2
𝑛

𝐻(𝑋) = − ∑ 𝑃(𝑥𝑖 ) log 𝑎 𝑃(𝑥𝑖 )


𝑖=1
2

𝐻𝑏 (𝑋) = − ∑ 𝑃(𝑥𝑖 ) log 𝑎 𝑃(𝑥𝑖 )


𝑖=1

Then:
𝐻𝑏 (𝑋) = −[𝑃(0 𝑇 ) log 2 𝑃(0 𝑇 ) + (1 − 𝑃(0 𝑇 )) log 2 (1 − 𝑃(0 𝑇 ))] 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
If 𝑃(0 𝑇 ) = 0.2, 𝑡ℎ𝑒𝑛 𝑃(1 𝑇 ) = 1 − 0.2 = 0.8, 𝑎𝑛𝑑 𝑝𝑢𝑡 𝑖𝑛 𝑎𝑏𝑜𝑣𝑒 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛,
𝐻𝑏 (𝑋) = −[0.2 log 2 (0.2) + 0.8 log 2 (0.8)] = 0.7

10-3 Maximum Source Entropy:

For binary source, if 𝑃(0 𝑇 ) = 𝑃(1 𝑇 ) = 0.5, then the entropy is:

1
𝐻𝑏 (𝑋) = −[0.5 log 2 (0.5) + 0.5 log 2 (0.5)] = − log 2 ( ) = log 2 (2) = 1 𝑏𝑖𝑡
2

Prof. Dr. Mahmood 17 October 2021 13


Note that 𝐻𝑏 (𝑋)is maximum equal to 1(bit) if: 𝑃(0 𝑇 ) = 𝑃(1 𝑇 ) = 0.5, the entropy of
binary source or any source having only two value is distributed as shown in Fig.9

Figure 10: Entropy of binary source distribution


p(x1)=p(x2)=p(x3)
For any non-binary source, if all messages are equiprobable, then 𝑃(𝑥𝑖 ) = 1/𝑛 so that:
1 1 1
𝐻(𝑋) = 𝐻(𝑋)𝑚𝑎𝑥 = −[ log 𝑎 ( )] × 𝑛 = −log 𝑎 ( ) = log 𝑎 𝑛 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙, which
𝑛 𝑛 𝑛

is the maximum value of source entropy. Also, 𝐻(𝑋) = 0 if one of the message has the
probability of a certain event or p(x) =1.

10-4 Source Entropy Rate:


It is the average rate of amount of information produced per second.
𝑏𝑖𝑡𝑠
𝑅(𝑋) = 𝐻(𝑋) × 𝑟𝑎𝑡𝑒 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑦𝑚𝑏𝑜𝑙𝑠 = = 𝑏𝑝𝑠
𝑠𝑒𝑐
The unit of H(X) is bits/symbol and the rate of producing the symbols is symbol/sec, so
that the unit of R(X) is bits/sec.
Sometimes
𝐻(𝑋)
𝑅(𝑋) = ,
𝜏̅

Prof. Dr. Mahmood 17 October 2021 14


Where
𝑛

𝜏̅ = ∑ 𝜏𝑖 𝑃(𝑥𝑖 )
𝑖=1
𝜏̅ is the average time duration of symbols, 𝜏𝑖 is the time duration of the symbol 𝑥𝑖 .

Example 1:
A source produces dots ‘.’ And dashes ‘-‘ with P(dot)=0.65. If the time duration of dot
is 200ms and that for a dash is 800ms. Find the average source entropy rate.
Solution:
𝑃(𝑑𝑎𝑠ℎ) = 1 − 𝑃(𝑑𝑜𝑡) = 1 − 0.65 = 0.35
𝐻(𝑋) = −[0.65log 2 (0.65) + 0.35log 2 (0.35)] = 0.934 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
𝜏̅ = 0.2 × 0.65 + 0.8 × 0.35 = 0.41 𝑠𝑒𝑐
𝐻(𝑋) 0.34
𝑅(𝑋) = = = 2.278 𝑏𝑝𝑠
𝜏̅ 0.41
Example 2:
A discrete source emits one of five symbols once every millisecond. The symbol
probabilities are 1/2, 1/4, 1/8, 1/16 and 1/16 respectively. Calculate the information rate.

Solution:
5
1
H = ∑ Pi log 2
pi
i=1
1 1 1 1 1
H = log 2 2 + log 2 4 + log 2 8 + log 2 16 + log 2 16
2 4 8 16 16

𝐻=0.5+0.5+0.375+0.25+0.25 = 1.875 bit/symbol


H 1.875
R= = = 1.875 kbps
τ 10−3

Prof. Dr. Mahmood 17 October 2021 15


HW:

A source produces dots and dashes; the probability of the dot is twice the probability
of the dash. The duration of the dot is 10msec and the duration of the dash is set to
three times the duration of the dot. Calculate the source entropy rate.

11- Mutual information for noisy channel:


Consider the set of symbols 𝑥1 , 𝑥2 , … . , 𝑥𝑛 , the
transmitter 𝑇𝑥 my produce. The receiver 𝑅𝑥 may receive
𝑦1 , 𝑦2 … … … . 𝑦𝑚 . Theoretically, if the noise and
jamming is neglected, then the set X=set Y. However and
due to noise and jamming, there will be a conditional
probability 𝑃(𝑦𝑗 ∣ 𝑥𝑖 ):
1- 𝑃(𝑥𝑖 ) to be what is so called the a priori probability
of the symbol 𝑥𝑖 , which is the prob of selecting 𝑥𝑖 for transmission.
2- 𝑃(𝑦𝑗 ∣ 𝑥𝑖 ) to be what is called the aposteriori probability of the symbol 𝑥𝑖 after
the reception of 𝑦𝑗 .
The amount of information that 𝑦𝑗 provides about 𝑥𝑖 is called the mutual
information between 𝑥𝑖 and 𝑦𝑖 . This is given by:
𝑎𝑝𝑜𝑠𝑡𝑒𝑟𝑜𝑟𝑖 𝑝𝑟𝑜𝑏 𝑃( 𝑦𝑗 ∣∣ 𝑥𝑖 )
𝐼(𝑥𝑖 , 𝑦𝑗 ) = log 2 ( ) = log 2 ( )
𝑎𝑝𝑟𝑖𝑜𝑟𝑖 𝑝𝑟𝑜𝑏 𝑃(𝑥𝑖 )

Properties of 𝑰(𝒙𝒊 , 𝒚𝒋 ):

1- It is symmetric, 𝐼(𝑥𝑖 , 𝑦𝑗 ) = 𝐼(𝑦𝑗 , 𝑥𝑖 ).


2- 𝐼(𝑥𝑖 , 𝑦𝑗 ) > 0 if aposteriori probability > a priori probability, 𝑦𝑗 provides +ve
information about 𝑥𝑖 .

Prof. Dr. Mahmood 17 October 2021 16


3- 𝐼(𝑥𝑖 , 𝑦𝑗 ) = 0 if aposteriori probability = a priori probability, which is the case of
statistical independence when 𝑦𝑗 provides no information about 𝑥𝑖 .
4- 𝐼(𝑥𝑖 , 𝑦𝑗 ) < 0 if aposteriori probability < a priori probability, 𝑦𝑗 provides -ve
information about 𝑥𝑖 , or 𝑦𝑗 adds ambiguity.
𝑃( 𝑥𝑖∣∣𝑦𝑗 )
Also 𝐼(𝑥𝑖 , 𝑦𝑗 ) = log 2 ( )
𝑃(𝑦𝑗 )

Example:
Show that I(X, Y) is zero for extremely noisy channel.
Solution:
For extremely noisy channel, then 𝑦𝑗 gives no information about 𝑥𝑖 the receiver can’t
decide anything about 𝑥𝑖 as if we transmit a deterministic signal 𝑥𝑖 but the receiver
receives noise like signal 𝑦𝑗 that is completely has no correlation with 𝑥𝑖 . Then 𝑥𝑖 and
𝑦𝑗 are statistically independent so that 𝑃( 𝑥𝑖 ∣∣ 𝑦𝑗 ) = 𝑃(𝑥𝑖 )𝑎𝑛𝑑 𝑃( 𝑦𝑗 ∣∣ 𝑥𝑖 ) =
𝑃(𝑥𝑖 ) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑎𝑛𝑑 𝑗, 𝑡ℎ𝑒𝑛:
𝐼(𝑥𝑖 , 𝑦𝑗 ) = log 2 1 = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 & 𝑗, 𝑡ℎ𝑒𝑛 𝐼(𝑋, 𝑌) = 0

10.1 Joint entropy:

In information theory, joint entropy is a measure of the uncertainty associated with a


set of variables.
𝒎 𝒏

𝑯(𝑿, 𝒀) = 𝑯(𝑿𝒀) = − ∑ ∑ 𝑷(𝒙𝒊 , 𝒚𝒋 ) 𝐥𝐨𝐠 𝟐 𝑷(𝒙𝒊 , 𝒚𝒋 ) 𝒃𝒊𝒕𝒔/𝒔𝒚𝒎𝒃𝒐𝒍


𝒋=𝟏 𝒊=𝟏

10.2 Conditional entropy:

In information theory, the conditional entropy quantifies the amount of information


needed to describe the outcome of a random variable Y given that the value of another
random variable X is known.

Prof. Dr. Mahmood 17 October 2021 17


𝑚 𝑛

𝐻(𝑌 ∣ 𝑋) = − ∑ ∑ 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃(𝑦𝑗 ∣ 𝑥𝑖 ) 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙


𝑗=1 𝑖=1

10.3 Marginal Entropies:

Marginal entropies is a term usually used to denote both source entropy H(X) defined
as before and the receiver entropy H(Y) given by:

𝑚
𝑏𝑖𝑡𝑠
𝐻(𝑌) = − ∑ 𝑃(𝑦𝑗 ) log 2 𝑃(𝑦𝑗 )
𝑠𝑦𝑚𝑏𝑜𝑙
𝑗=1

10.4 Transinformation (average mutual information):


It is the statistical average of all pair 𝐼(𝑥𝑖 , 𝑦𝑗 ) , 𝑖 = 1, 2, … . . , 𝑛, 𝑗 = 1, 2, … . . , 𝑚.
This is denoted by 𝐼(𝑋, 𝑌) and is given by:
𝑛 𝑚

𝐼(𝑋, 𝑌) = ∑ ∑ 𝐼(𝑥𝑖 , 𝑦𝑗 )𝑃(𝑥𝑖 , 𝑦𝑗 )


𝑖=1 𝑗=1
𝑛 𝑚
𝑃( 𝑦𝑗 ∣∣ 𝑥𝑖 ) 𝑏𝑖𝑡𝑠
𝐼(𝑋, 𝑌) = ∑ ∑ 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 2 ( )
𝑃(𝑦𝑗 ) 𝑠𝑦𝑚𝑏𝑜𝑙
𝑖=1 𝑗=1
or
𝑛 𝑚
𝑃( 𝑥𝑖 ∣∣ 𝑦𝑗 )
𝐼(𝑋, 𝑌) = ∑ ∑ 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 2 ( ) 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
𝑃(𝑥𝑖 )
𝑖=1 𝑗=1
10.5 Relationship between joint, conditional and transinformation:

𝐻( 𝑌 ∣ 𝑋 ) = 𝐻(𝑋, 𝑌) − 𝐻(𝑋)
𝐻( 𝑋 ∣ 𝑌 ) = 𝐻(𝑋, 𝑌) − 𝐻(𝑌)
Where, the 𝐻( 𝑋 ∣ 𝑌 ) is the losses entropy.
Also we have:
𝐼(𝑋, 𝑌) = 𝐻(𝑋) − 𝐻(𝑋 ∣ 𝑌)
𝐼(𝑋, 𝑌) = 𝐻(𝑌) − 𝐻(𝑌 ∣ 𝑋)

Prof. Dr. Mahmood 17 October 2021 18


Example:
The joint probability of a system is given by:
𝑥1 0.5 0.25
𝑃(𝑋, 𝑌) = 𝑥2 [ 0 0.125 ]
𝑥3 0.0625 0.0625
Find:
1- Marginal entropies.
2- Joint entropy
3- Conditional entropies.
4- The transinformation.
𝑥 𝑥2 𝑥3 𝑦 𝑦2
1- 𝑃(𝑋) = [ 1 ] 𝑃(𝑌) = [ 1 ]
0.75 0.125 0.125 0.5625 0.4375
𝐻(𝑋) = −[0.75 ln(0.75) + 2 × 0.125 ln(0.125)]/𝑙𝑛2
= 1.06127 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
𝐻(𝑌) = −[0.5625 ln(0.5625) + 0.4375 ln(0.4375)]/𝑙𝑛2
= 0.9887 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
2-
𝑚 𝑛

𝐻(𝑋, 𝑌) = − ∑ ∑ 𝑃(𝑥𝑖 , 𝑦𝑗 ) log 2 𝑃(𝑥𝑖 , 𝑦𝑗 )


𝑗=1 𝑖=1

𝐻(𝑋, 𝑌)
[0.5ln(0.5) + 0.25 ln(0.25) + 0.125 ln(0.125) + 2 × 0.0625 ln(0.0625)]
=−
𝑙𝑛2
= 1.875 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
𝑏𝑖𝑡𝑠
3- 𝐻( 𝑌 ∣ 𝑋 ) = 𝐻(𝑋, 𝑌) − 𝐻(𝑋) = 1.875 − 1.06127 = 0.813
𝑠𝑦𝑚𝑏𝑜𝑙

𝐻( 𝑋 ∣ 𝑌 ) = 𝐻(𝑋, 𝑌) − 𝐻(𝑌) = 1.875 − 0.9887 = 0.886 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙


4- 𝐼(𝑋, 𝑌) = 𝐻(𝑋) − 𝐻( 𝑋 ∣ 𝑌 ) = 1.06127 − 0.8863 = 0.17497 𝑏𝑖𝑡𝑠/
𝑠𝑦𝑚𝑏𝑜𝑙.

Prof. Dr. Mahmood 17 October 2021 19


Review Questions

Q.1: A source emits 3 characters (A, B, C) with probabilities of (0.25, 0.35, 0.4)
respectively, calculate:
1) The source entropy H(x)
2) The information rate R(x) if τ(A)= 2msec, τ(B) =3msec, τ(C)=5msec.

Solution:
𝑛
0.4𝑙𝑛0.4 + 0.25𝑙𝑛0.25 + 0.35𝑙𝑛0.35
1)𝐻(𝑥) = − ∑ 𝑃(𝑥𝑖)𝑙𝑜𝑔2 𝑃(𝑥𝑖) = − [ ]
𝑙𝑛2
𝑖=1
= 1.558 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙
2) 𝜏 = ∑𝑛𝑖=1 𝜏𝑖 𝑃(𝑥𝑖) = 2 ∗ 0.25 + 3 ∗ 0.35 + 5 ∗ 0.4 = 3.55 𝑚𝑠𝑒𝑐

𝐻(𝑥) 1.558
𝑅(𝑥) = = = 0.439 𝑘𝑏𝑝𝑠
𝜏 3.55 ∗ 10−3
Q.2 A source emits 8 characters with equal probability, one each 1m sec calculate:
1) The probability of each character
2) The self information for each character
3) Source entropy H(x)
4) information rate R(x)
Solution:
1 1
1) 𝑃(𝑥𝑖) = 𝑛 = 8 ,
2) 𝐼(𝑥𝑖) = −𝑙𝑜𝑔2 𝑃(𝑥𝑖) = 3𝑏𝑖𝑡
3) 𝐻(𝑥) = 𝑙𝑜𝑔2 𝑛 = 𝑙𝑜𝑔2 8 = 3 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙
𝐻(𝑥) 3
4) 𝑅(𝑥) = = = 3 𝑘𝑏𝑝𝑠
𝜏 1 ∗ 10−3

Q.3: A source emits 12 characters with equal probability, one every 3msec, calculate:
1) Source entropy H(x)

Prof. Dr. Mahmood 17 October 2021 20


2) The information rate R(x)
Solution:
1) 𝐻(𝑥) = 𝑙𝑜𝑔2 𝑛 = 𝑙𝑜𝑔2 12 = 3.584 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙
𝐻(𝑥) 3.584
2) 𝑅(𝑥) = 𝜏
=
3∗10−3
= 1.194 𝑘𝑏𝑝𝑠

Q.4) Having the text (A A A B A C B C C B B A A A D B B B C D). If 𝜏(A) = 𝜏(B) = 𝜏(C)


= 0.1 𝜇𝑠𝑒𝑐 and 𝜏(D) = 0.2 𝜇𝑠𝑒𝑐. Calculate average Source Entropy Rate R(x).
Solution:
𝐻(𝑥) = ∑ log 2 𝑃(𝑥𝑖 ) × 𝑃(𝑥𝑖 )
7 7 4 2
𝑃(𝐴) = , 𝑃(𝐵) = , 𝑃(𝐶) = , 𝑃(𝐷) =
20 20 20 20
7 7 1 1 1 1
(20 × ln (20) × 2 + × ln ( ) + 10 × ln (10)) 𝑏𝑖𝑡𝑠
𝐻(𝑥) = 5 5 = 1.85677
ln(2) 𝑠𝑦𝑚𝑏𝑜𝑙
𝐻(𝑋)
𝑅(𝑋) =
𝜏̅
7 7 1 1
𝜏̅ = ∑𝑖 𝜏𝑖 ∗ 𝑃(𝑥𝑖 ) = ( + + ) ∗ 0.1 × 10−6 + × 0.2 × 10−6
20 20 5 10

= 1.199 𝜇𝑠𝑒𝑐
1.85677
𝑅(𝑋) = = 1.54 𝑀𝑏𝑝𝑠
1.199 × 10−6
Q.4) A source produces a stream flow of twelve letters(Y, E, S) with probabilities P(Y)
1
= P(S) = P(E) ;and P(E)=0.5. Calculate the source entropy H(x).
2

solution:
𝑝(𝐸) = 0.5
𝑝(𝑌) + 𝑝(𝑆) + 𝑝(𝐸) = 1
1 1
𝑝(𝑌) = 𝑝(𝑆) = 𝑝(𝐸) = × 0.5 = 0.25
2 2
𝐻(𝑥) = − ∑ 𝑝(𝑥). log 2 𝑝(𝑥)
𝐻(𝑥) = −(2 × 0.25 × log 2 0.25 + 0.5 log 2 0.5) = 1.5 𝑏𝑖𝑡⁄𝑠𝑦𝑚𝑏𝑜𝑙

Prof. Dr. Mahmood 17 October 2021 21


0.2 0.1
Q.5) The joint probability of channel is 𝑝(𝑥, 𝑦) = (0.25 0.1) ; Find:
0.25 0.1
1) The receiver entropy H(y)
2) Joint entropy H(x,y)
3) Noise entropy H(y/x)
4) Loss entropy H(x/y)
Solution:
P(x)= [ 0.3, 0.35, 0.35], P(y)=[0.7, 0.3]
𝑚 0.7 𝑙𝑛0.7 + 0.3𝑙𝑛0.3
1) 𝐻(𝑦) = − ∑ 𝑃(𝑦𝑗)𝑙𝑜𝑔2 𝑃(𝑦𝑗) = − [ ] = 0.881 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙
𝑙𝑛2
𝑗=1
𝑚 𝑛

2) 𝐻(𝑥, 𝑦) = − ∑ ∑ 𝑃(𝑥𝑖, 𝑦𝑗)𝑙𝑜𝑔2 𝑃(𝑥𝑖, 𝑦𝑗)


𝑗=1 𝑖=1
0.2𝑙𝑛0.2 + 3 ∗ 0.1𝑙𝑛0.1 + 2 ∗ 0.25𝑙𝑛0.25
= −[ ]= 2.460 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙
𝑙𝑛2
3) 𝐻(𝑦⁄𝑥) = 𝐻(𝑥, 𝑦) − 𝐻(𝑥)
𝑛
0.3𝑙𝑛0.3 + 2 ∗ 0.35𝑙𝑛0.35
𝐻(𝑥) = − ∑ 𝑃(𝑥𝑖)𝑙𝑜𝑔2 𝑃(𝑥𝑖) = − [ ] = 1.581 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙
𝑙𝑛2
𝑖=1

∴ 𝐻(𝑦⁄𝑥) = 2.640 − 1.581 = 0.879 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙

4) 𝐻(𝑥 ⁄𝑦) = 𝐻(𝑥, 𝑦) − 𝐻(𝑦) = 2.640 − 0.881 = 1.579 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙

0.4 0.05 0.05


Q.5) The joint probability of channel is 𝑝(𝑥, 𝑦) = ( ) Find:
0.5 0 0
1) Source entropy H(x)
2) Joint entropy H(x,y)
3) noise entropy H(y/x)
4) loss entropy H(x/y)

Prof. Dr. Mahmood 17 October 2021 22


Solution: P(x)= [0.5, 0.5], P(y)=[0.9, 0.05, 0.05]
𝑛
2 ∗ 0.5𝑙𝑛0.5
1) 𝐻(𝑥) = − ∑ 𝑃(𝑥𝑖)𝑙𝑜𝑔2 𝑃(𝑥𝑖) = − [ ] = 1 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙
𝑙𝑛2
𝑖=1
𝑚 𝑛
0.4𝑙𝑛0.4 + 2 ∗ 0.05𝑙𝑛0.05 + 0.5𝑙𝑛0.5
2) 𝐻(𝑥, 𝑦) = − ∑ ∑ 𝑃(𝑥𝑖, 𝑦𝑗)𝑙𝑜𝑔2 𝑃(𝑥𝑖, 𝑦𝑗) = − [ ]
𝑙𝑛2
𝑗=1 𝑖=1
= 1.460 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙

3) 𝐻(𝑦⁄𝑥) = 𝐻(𝑥, 𝑦) − 𝐻(𝑥) = 1.460 − 1 = 0.460 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙


4) 𝐻(𝑥 ⁄𝑦) = 𝐻(𝑥, 𝑦) − 𝐻(𝑦)
𝑚
0.9𝑙𝑛0.9 + 2 ∗ 0.05𝑙𝑛0.05
𝐻(𝑦) = − ∑ 𝑃(𝑦𝑗)𝑙𝑜𝑔2 𝑃(𝑦𝑗) = − [ ] = 0.568 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙
𝑙𝑛2
𝑗=1

4) 𝐻(𝑥 ⁄𝑦) = 𝐻(𝑥, 𝑦) − 𝐻(𝑦) = 1.460 − 0.568 = 0.892 𝑏𝑖𝑡/𝑠𝑦𝑚𝑏𝑜𝑙

Prof. Dr. Mahmood 17 October 2021 23


Chapter Two

2.1- Channel:
In telecommunications and computer networking, a communication channel
or channel, refers either to a physical transmission medium such as a wire, or to
a logical connection over a multiplexed medium such as a radio channel. A channel is
used to convey an information signal, for example a digital bit stream, from one or
several senders (or transmitters) to one or several receivers. A channel has a certain
capacity for transmitting information, often measured by its bandwidth in Hz or
its data rate in bits per second.

2.2- Binary symmetric channel (BSC)


It is a common communications channel model used in coding theory and information
theory. In this model, a transmitter wishes to send a bit (a zero or a one), and the
receiver receives a bit. It is assumed that the bit is usually transmitted correctly, but
that it will be "flipped" with a small probability (the "crossover probability").
1-P
0 0
P
model of BSC
P

1 1
1-P

A binary symmetric channel with crossover probability p denoted by BSCp, is a


channel with binary input and binary output and probability of error p; that is, if X is
the transmitted random variable and Y the received variable, then the channel is
characterized by the conditional probabilities:

( ∣ )

( ∣ )

( ∣ )

( ∣ )

DR. MAHMOOD 2017-10-15 16


2.3- Ternary symmetric channel (TSC):

The transitional probability of TSC is:

( ∣ ) [ ]

The TSC is symmetric but not very practical since practically and are not
affected so much as . In fact the interference between and is much less than
the interference between and or and .

1-2Pe
X1 Y1
Pe

X2 1-2Pe Y2
Pe
X3 Y3
1-2Pe

Hence the more practice but nonsymmetric channel has the trans. prob.

( ∣ ) [ ]

Where interfere with exactly the same as interference between and , but
and are not interfere.

1-Pe
X1 Y1
Pe
model of TSC
X2 1-2Pe Y2
Pe

X3 Y3
1-Pe

DR. MAHMOOD 2017-10-15 17


2.4- Special Channels:

1- Lossless channel: It has only one nonzero element in each column of the
transitional matrix P(Y∣X).

( ∣ ) [ ]

This channel has H(X∣Y)=0 and I(X, Y)=H(X) with zero losses entropy.
2- Deterministic channel: It has only one nonzero element in each row, the
transitional matrix P(Y∣X), as an example:

( ∣ )

[ ]
This channel has H(Y∣X)=0 and I(Y, X)=H(Y) with zero noisy entropy.
3- Noiseless channel: It has only one nonzero element in each row and column, the
transitional matrix P(Y∣X), i.e. it is an identity matrix, as an example:

( ∣ ) [ ]

This channel has H(Y∣X)= H(X∣Y)=0 and I(Y, X)=H(Y)=H(X).

2.5- Shannon’s theorem:

1- A given communication system has a maximum rate of information C known as


the channel capacity.
2- If the information rate R is less than C, then one can approach arbitrarily small
error probabilities by using intelligent coding techniques.
3- To get lower error probabilities, the encoder has to work on longer blocks of
signal data. This entails longer delays and higher computational requirements.

DR. MAHMOOD 2017-10-15 18


Thus, if R ≤ C then transmission may be accomplished without error in the
presence of noise. The negation of this theorem is also true: if R > C, then errors
cannot be avoided regardless of the coding technique used.

Consider a bandlimited Gaussian channel operating in the presence of additive


Gaussian noise:

The Shannon-Hartley theorem states that the channel capacity is given by:

( )

Where C is the capacity in bits per second, B is the bandwidth of the channel in Hertz,
and S/N is the signal-to-noise ratio.

2.6- Discrete Memoryless Channel:

The Discrete Memoryless Channel (DMC) has an input X and an output Y. At any
given time (t), the channel output Y= y only depends on the input X = x at that time (t)
and it does not depend on the past history of the input. DMC is represented by the
conditional probability of the output Y = y given the input X = x, or ( ).

X Channel Y
( )

2.7 Binary Erasure Channel (BEC):

The Binary Erasure Channel (BEC) model are widely used to represent channels or
links that “losses” data. Prime examples of such channels are Internet links and routes.
A BEC channel has a binary input X and a ternary output Y.
DR. MAHMOOD 2017-10-15 19
1-Pe
X1 Y1
Pe
Erasure

X2 Y2
1-Pe

Note that for the BEC, the probability of “bit error” is zero. In other words, the
following conditional probabilities hold for any BEC model:

( ∣ )

( ∣ )

( ∣ )

( ∣ )

( ∣ )

( ∣ )

DR. MAHMOOD 2017-10-15 20


Channel Capacity (Discrete channel)

This is defined as the maximum of I(X,Y):

, ( )-

Physically it is the maximum amount of information each symbol can carry to the
receiver. Sometimes this capacity is also expressed in bits/sec if related to the rate of
producing symbols r:

( ) ( ) ( ) ( ) ̅

1- Channel capacity of Symmetric channels:


The symmetric channel have the following condition:
a- Equal number of symbol in X&Y, i.e. P(Y∣X) is a square matrix.
b- Any row in P(Y∣X) matrix comes from some permutation of other rows.

For example the following conditional probability of various channel types as shown:

a- ( ∣ ) 0 1 is a BSC, because it is square matrix and 1st row is the

permutation of 2nd row.

b- ( ∣ ) [ ] is TSC, because it is square matrix and each

row is a permutation of others.

c- ( ∣ ) 0 1 is a non-symmetric since since it is not square

although each row is permutation of others.

d- ( ∣ ) [ ] is a non-symmetric although it is square since 2nd

row is not permutation of other rows.


The channel capacity is defined as , ( )-:
( ) ( ) ( ∣ )

DR. MAHMOOD 2017-10-15 21


( ) ( ) ∑∑ ( ) ( ∣ )

But we have
( ) ( ) ( ∣ )

( ) ( ) ∑∑ ( ) ( ∣ ) ( ∣ )

If the channel is symmetric the quantity:

∑ ( ∣ ) ( ∣ )

Where K is constant and independent of the row number i so that the equation
becomes:

( ) ( ) ∑ ( )

Hence ( ) ( ) for symmetric channels


Max of ( ) , ( ) - , ( )-
When Y has equiprobable symbols then , ( )-
Then
( )
Or

Example 9:
For the BSC shown:
0.7
X1 Y1

X2 0.7 Y2

DR. MAHMOOD 2017-10-15 22


Find the channel capacity and efficiency if ( )
Solution:

( ∣ ) 0 1

Since the channel is symmetric then


and

( )
The channel efficiency

( ) ( )

( ) ( ) , -

And we have ( ) ( ) ( ∣ ) so that

( ) 0 1=0 1

( ) , - ( )
( ) ( )

Then

2- Channel capacity of nonsymmetric channels:


We can find the channel capacity of nonsymmetric channel by the following
steps:
a- Find I(X, Y) as a function of input prob:
( ) ( ( ) ( ) ( ))
And use the constraint to reduce the number of variable by 1.
b- Partial differentiate I(X, Y) with respect to (n-1) input prob., then equate
these partial derivatives to zero.
c- Solve the (n-1) equations simultaneously then find
( ) ( ) ( ) that gives maximum I(X, Y).

DR. MAHMOOD 2017-10-15 23


d- Put resulted values of input prob. in the function given in step 1 to find
, ( )-

Example 10:

Find the channel capacity for the channel having the following transition:

( ∣ ) 0 1

Solution: First not that the channel is not symmetric since the 1st row is not
permutation of 2nd row.

a- Let ( ) ( ) hence instead of having two variable,


we will have only one variable P.

( ) ( ) ( ∣ )

( ) [ ]
( ) ( )

From above results ( ) , -

We have

( ∣ ) ∑∑ ( ) ( ∣ )

( ∣ ) , ( ) ( ) -

, -

( ∣ ) ( ∣ )
b- , -

( ∣ )
, -

DR. MAHMOOD 2017-10-15 24


( ) ∑ ( ) ( )

Then ( ) ,( ) ( ) ( ) ( )-

( )
, ( ) ( ) -

( ) , * ( ) ( )+- 0 . /1

c- We have ( ) ( ) ( ∣ )

( ) ( ) ( ∣ )
So that

0 . /1

[ ( )]

( )

( )

( )

( )

( ) ,( ) ( )

( ) ( )-

( )

DR. MAHMOOD 2017-10-15 25


Similarly we can substitute in H(Y∣ X) equation to get

( ∣ )

d-

, ( )-

Review questions:
A binary source sending with a probability of and with probability
through a channel with a probabilities of errors of 0.1 for and 0.2 for .Determine:
1- Source entropy.
2- Marginal entropy.
3- Joint entropy.
4- Conditional entropy ( ).
5- Losses entropy ( ).
6- Transinformation.
Solution:
1- The channel diagram:
0.9
0.4 𝑥 𝑦
0.1

0.2
0.6 𝑥 𝑦
0.8
Or ( ) 0 1

( ) ∑ ( ) ( )

, ( ) ( )-
( )

2- ( ) ( ) ( )
( ) 0 1 0 1
( ) , -

DR. MAHMOOD 2017-10-15 26


( ) ∑ ( ) ( )

, ( ) ( )-
( )
( )

3- ( )

( ) ∑∑ ( ) ( )

, ( ) ( ) ( ) ( )-
( )
( )

4- ( )

( ) ∑∑ ( ) ( )

, ( ) ( ) ( ) ( )-
( )
( )

Or ( ) ( ) ( )
5- ( ) ( ) ( )
6- ( ) ( ) ( ) bits/symbol

3- Cascading of Channels
If two channels are cascaded, then the overall transition matrix is the product of the
two transition matrices.
p( z / x)  p( y / x). p( z / y)
(n  k ) ( n  m) ( m  k )
matrix matrix matrix
1 1 1
. . .

. Channel 1 . Channel 1 .
n m k

DR. MAHMOOD 2017-10-15 27


For the series information channel, the overall channel capacity is not exceed any of
each channel individually.
( ) ( ) ( ) ( )

Example:
Find the transition matrix p( z / x) for the cascaded channel shown.
0.8
0.7
0.2 0.3
0.3 1

0.7
1

0.7 0.3
0.8 0.2 0  ,
p ( y / x)    p( z / y )   1 0
0.3 0 0.7  
 1 0 

0.7 0.3
0.8 0.2 0   0.76 0.24
p ( z / x)    1 0 
0.3 0 0.7    0.91 0.09
 1 0 

DR. MAHMOOD 2017-10-15 28


Chapter Three
Source Coding

1- Sampling theorem:

Sampling of the signals is the fundamental operation in digital communication. A


continuous time signal is first converted to discrete time signal by sampling process.
Also it should be possible to recover or reconstruct the signal completely from its
samples.

The sampling theorem state that:

i- A band limited signal of finite energy, which has no frequency components


higher than W Hz, is completely described by specifying the values of the signal
at instant of time separated by 1/2W second and
ii- A band limited signal of finite energy, which has no frequency components
higher than W Hz, may be completely recovered from the knowledge of its
samples taken at the rate of 2W samples per second.

To proof of sampling theorem: Let x(t) the continuous time signal shown in figure
below, its band width does not contain any frequency components higher than W
Hz. A sampling function samples this signal regularly at the rate of fS sample per
second.

Assume an analog waveform, with a Fourier transform, which is zero


outside the interval ( ). The sampling of can viewed as the
product of with periodic train of unit impulse function defined as

The sifting property of unit impulse state that

DR. MAHMOOD 2017-10-15 03


Using this property so that:

Notice that the Fourier transform of an impulse train is another impulse train.

Convolution with an impulse function simply shifts the original function:

DR. MAHMOOD 2017-10-15 03


We can now solve for the transform of the sampled waveform:

So that

∑ ∑

When the sampling rate is chosen each spectral replicate is separated from
each of its neighbors by a frequency band exactly equal to hertz, and the analog
waveform ca theoretically be completely recovered from the samples, by the use of
filtering. It should be clear that if , the replications will be move farther apart
in frequency making it easier to perform the filtering operation.

When the sampling rate is reduced, such that , the replications will overlap,
as shown in figure below, and some information will be lost. This phenomenon is
called aliasing.

Sampled spectrum

Sampled spectrum

DR. MAHMOOD 2017-10-15 03


A bandlimited signal having no spectral components above hertz can be
determined uniquely by values sampled at uniform intervals of .

The sampling rate is

So that . The sampling rate is called Nyquist rate.

Example: Find the Nyquist rate and Nyquist interval for the following signals.

i-

ii-

Solution:

i-
Nyquist interval

Nyquist rate =

ii- [ { }]

{ }

Then the highest frequency is 2500Hz


Nyquist interval

Nyquist rate =

H. W:

Find the Nyquist interval and Nyquist rate for the following:

DR. MAHMOOD 2017-10-15 00


i-

ii-

Example:

A waveform [20+20sin(500t+30o] is to be sampled periodically and reproduced


from these sample values. Find maximum allowable time interval between
sample values, how many sample values are needed to be stored in order to
reproduce 1 sec of this waveform?.
Solution:

Minimum sampling rate will be twice of the signal frequency:

Number of sample in

2- Source coding:
An important problem in communications is the efficient representation of data
generated by a discrete source. The process by which this representation is
accomplished is called source encoding. An efficient source encoder must satisfies two
functional requirements:

i- The code words produced by the encoder are in binary form.


ii- The source code is uniquely decodable, so that the original source sequence can
be reconstructed perfectly from the encoded binary sequence.

DR. MAHMOOD 2017-10-15 03


The entropy for a source with statistically independent symbols:

∑ ( ) ( )

We have:

A code efficiency can therefore be defined as:

The overall code length, L, can be defined as the average code word length:

The code efficiency can be found by:

Not that

i- Fixed- Length Code Words:


If the alphabet X consists of the 7 symbols {a, b, c, d, e, f, g}, then the
following fixed-length code of block length L = 3 could be used.
C(a) = 000
C(b) = 001
C(c) = 010
C(d) = 011

DR. MAHMOOD 2017-10-15 03


C(e) = 100
C(f) = 101
C(g) = 110.
The encoded output contains L bits per source symbol. For the above example
the source sequence bad... would be encoded into 001000011... . Note that the
output bits are simply run together (or, more technically, concatenated). This
method is nonprobabilistic; it takes no account of whether some symbols occur
more frequently than others, and it works robustly regardless of the symbol
frequencies.
This is used when the source produces almost equiprobable messages
p( x1 )  p( x2 )  p( x3 )  ...  p( xn ) , then l1  l2  l3  ...  ln  LC and for binary coding

then:
1- LC  log 2 n bit/message if n  2 r ( n  2,4,8,16,.... and r is
an integer) which gives   100%

2- LC  Int[log 2 n]  1 bits/message if n  2 r which gives less


efficiency

Example
For ten equiprobable messages coded in a fixed length code then
1
p( xi )  and LC  Int[log 2 10]  1  4 bits
10
H(X ) log 2 10
and    100%   100%  83.048%
LC 4
Example: For eight equiprobable messages coded in a fixed length code then
1 3
p( xi )  and LC  log 2 8  3 bits and    100%  100%
8 3
Example: Find the efficiency of a fixed length code used to encode messages obtained
from throwing a fair die (a) once, (b) twice, (c) 3 times.
Solution

DR. MAHMOOD 2017-10-15 03


a- For a fair die, the messages obtained from it are equiprobable with a probability
1
of p ( xi )  with n  6 .
6
LC  Int[log 2 6]  1  3 bits/message
H(X ) log 2 6
  100%   100%  86.165%
LC 3
b- For two throws then the possible messages are n  6  6  36 messages with
equal probabilities

LC  Int[log 2 36]  1  6 bits/message  6 bits/2-symbols

while H ( X )  log 2 6 bits/symbol   2  H ( X )  100%  86.165%


LC

c- For three throws then the possible messages are n  6  6  6  216 with
equal probabilities

LC  Int[log 2 216]  1  8 bits/message  8 bits/3-symbols

while H ( X )  log 2 6 bits/symbol 3 H ( X )


  100%  96.936%
LC

ii- Variable-Length Code Words


When the source symbols are not equally probable, a more efficient encoding
method is to use variable-length code words. For example, a variable-length code
for the alphabet X = {a, b, c} and its lengths might be given by

C(a)= 0 l(a)=1
C(b)= 10 l(b)=2
C(c)= 11 l(c)=2
The major property that is usually required from any variable-length code is that of
unique decodability. For example, the above code C for the alphabet X = {a, b, c}
is soon shown to be uniquely decodable. However such code is not uniquely

DR. MAHMOOD 2017-10-15 03


decodable, even though the codewords are all different. If the source decoder
observes 01, it cannot determine whether the source emitted (a b) or (c).

Prefix-free codes: A prefix code is a type of code system (typically a variable-


length code) distinguished by its possession of the "prefix property", which
requires that there is no code word in the system that is a prefix (initial segment) of
any other code word in the system. For example:

{ }

When message probabilities are not equal, then we use variable length codes. The
following properties need to be considered when attempting to use variable length
codes:

1) Unique decoding:
Example
Consider a 4 alphabet symbols with symbols represented by binary digits as
follows:
A0
B  01
C  11
D  00
If we receive the code word 0011 it is not known whether the transmission was DC
or AAC . This example is not, therefore, uniquely decodable.
2) Instantaneous decoding:
Example
Consider a 4 alphabet symbols with symbols represented by binary digits as
follows:
A0
B  10
C  110

DR. MAHMOOD 2017-10-15 03


D  111
This code can be instantaneously decoded since no complete codeword is a prefix of a
larger codeword. This is in contrast to the previous example where A is a prefix of
both B and D . This example is also a ‘comma code’ as the symbol zero indicates the
end of a codeword except for the all ones word whose length is known.

Example
Consider a 4 alphabet symbols with symbols represented by binary digits as follows:
A0
B  01
C  011
D  111
The code is identical to the previous example but the bits are time reversed. It is still
uniquely decodable but no longer instantaneous, since early codewords are now
prefixes of later ones.

Shannon Code
For messages x1 , x2 , x3 ,… xn with probabilities p( x1 ) , p( x2 ) , p( x3 ) ,… p( xn ) then:
r
1) li   log 2 p( xi ) if p( xi )   1  1 1 1
{ , , ,...}
2 2 4 8
r
2) li  Int[ log 2 p( xi )]  1 if p( x )   1 
i
2
i 1
Also define Fi   p( xk ) 1  i  0
k 1

then the codeword of xi is the binary equivalent of Fi consisting of li bits.

Ci  Fi 2i
l

where Ci is the binary equivalent of Fi up to li bits. In encoding, messages must be


arranged in a decreasing order of probabilities.

DR. MAHMOOD 2017-10-15 03


Example
Develop the Shannon code for the following set of messages,
p( x)  [0.3 0.2 0.15 0.12 0.1 0.08 0.05]
then find:
(a) Code efficiency,
(b) p(0) at the encoder output.
Solution
xi p( xi ) li Fi Ci 0i

x1 0.3 2 0 00 2

x2 0.2 3 0.3 010 2

x3 0.15 3 0.5 100 2

x4 0.12 4 0.65 1010 2

x5 0.10 4 0.77 1100 2

x6 0.08 4 0.87 1101 1

x7 0.05 5 0.95 11110 1

To find To find To find

0 0 1
0 1 0

To find To find

1 1
0 1
1 0
0 0

DR. MAHMOOD 2017-10-15 33


(a) To find the code efficiency, we have
7
LC   li p( xi )  3.1 bits/message.
i 1

7
H ( X )   p( x i ) log 2 p( x i )  2.6029 bits/message.
i 1

H(X )
  100%  83.965%
LC
(b) p(0) at the encoder output is
7

 0i p( xi ) 0.6  0.4  0.3  0.24  0.2  0.08  0.05


p(0)  i 1

LC 3.1

p(0)  0.603
Example
Repeat the previous example using ternary coding.
Solution
r
1) li   log 3 p( xi ) if p( x )   1  1 1 1
i
{ , , ,...}
 3 3 9 27

2) li  Int[ log 3 p( xi )]  1 Ci  Fi 3i


l
if 1
r
and
p( xi )   
 3

xi p( xi ) li Fi Ci 0i

x1 0.3 2 0 00 2

x2 0.2 2 0.3 02 1

x3 0.15 2 0.5 11 0

x4 0.12 2 0.65 12 0

x5 0.10 3 0.77 202 1

DR. MAHMOOD 2017-10-15 33


x6 0.08 3 0.87 212 0

x7 0.05 3 0.95 221 0

To find To find To find

0 0 1
1

To find To find

1 2
2 0

(a) To find the code efficiency, we have


7
LC   li p( xi )  2.23 ternary unit/message.
i 1

7
H ( X )   p( x i ) log 3 p( x i )  1.642 ternary unit/message.
i 1

H(X )
  100%  73.632%
LC
(b) p(0) at the encoder output is
7

 0i p( xi ) 0.6  0.2  0.1


p(0)  i 1

LC 2.23

p(0)  0.404

DR. MAHMOOD 2017-10-15 33


Shannon- Fano Code:

In Shannon–Fano coding, the symbols are arranged in order from most probable to
least probable, and then divided into two sets whose total probabilities are as close
as possible to being equal. All symbols then have the first digits of their codes
assigned; symbols in the first set receive "0" and symbols in the second set receive
"1". As long as any sets with more than one member remain, the same process is
repeated on those sets, to determine successive digits of their codes.

Example:

The five symbols which have the following frequency and probabilities, design
suitable Shannon-Fano binary code. Calculate average code length, source entropy
and efficiency.

Symbol count Probabilities Binary Length


codes
A 15 0.385 00 2
B 7 0.1795 01 2
C 6 0.154 10 2
D 6 0.154 110 3
E 5 0.128 111 3

The average code word length:

DR. MAHMOOD 2017-10-15 30


The source entropy is:

∑ ( ) ( )

The code efficiency:

Example
Develop the Shannon - Fano code for the following set of messages,
p( x)  [0.35 0.2 0.15 0.12 0.1 0.08] then find the code efficiency.
Solution
xi p( xi ) Code li

x1 0.35 0 0 2

x2 0.2 0 1 2

x3 0.15 1 0 0 3

x4 0.12 1 0 1 3

x5 0.10 1 1 0 3

x6 0.08 1 1 1 3

6
LC   li p( xi )  2.45 bits/symbol
i 1

DR. MAHMOOD 2017-10-15 33


6
H ( X )   p( xi ) log 2 p( xi ) 2.396
bits/symbol
i 1

H(X )
  100%  97.796%
LC

Example
Repeat the previous example using with r  3
Solution

xi p( xi ) Code li

x1 0.35 0 1

x2 0.2 1 0 2

x3 0.15 1 1 2

x4 0.12 2 0 2

x5 0.10 2 1 2

x6 0.08 2 2 2

6
LC   li p( xi )  1.65 ternary unit/symbol
i 1

6
H ( X )   p( xi ) log 3 p( xi ) 1.512 ternary unit/symbol
i 1

H(X )
  100%  91.636%
LC

Huffman Code

The Huffman coding algorithm comprises two steps, reduction and splitting.
These steps can be summarized as follows:

DR. MAHMOOD 2017-10-15 33


1) Reduction
a) List the symbols in descending order of probability.

b) Reduce the r least probable symbols to one symbol with a probability


equal to their combined probability.

c) Reorder in descending order of probability at each stage.

d) Repeat the reduction step until only two symbols remain.

2) Splitting

a) Assign 0,1,...r to the r final symbols and work backwards.

b) Expand or lengthen the code to cope with each successive split.

Example: Design Huffman codes for { } having the


probabilities { }

DR. MAHMOOD 2017-10-15 33


The average code word length:

The source entropy:

bits/symbol

The code efficiency:

It can be design Huffman codes with minimum variance:

The average code word length is still 2.2 bits/symbol. But variances are different!

Example

Develop the Huffman code for the following set of symbols

Symbol A B C D E F G H

Probability 0.1 0.18 0.4 0.05 0.06 0.1 0.07 0.04

DR. MAHMOOD 2017-10-15 33


Solution
0
C 0.40 0.40 0.40 0.40 0.40 0.40 0.60 1.0

0
B 0.18 0.18 0.18 0.19 0.23 0.37 0.40
1

0
A 0.10 0.10 0.13 0.18 0.19 0.23
1

0
F 0.10 0.10 0.10 0.13 0.18
1

0
G 0.07 0.09 0.10 0.10
1
0
E 0.06 0.07 0.09 1

0
D 0.05 0.06 1

H 0.04 1

So we obtain the following codes


Symbol A B C D E F G H
Probability 0.1 0.18 0.4 0.05 0.06 0.1 0.07 0.04
Codeword 011 001 1 00010 0101 0000 0100 00011

li 3 3 1 5 4 4 4 5
8
H ( X )   p( xi ) log 2 p( xi )  2.552 bits/symbol
i 1

8
LC   li p( xi )  2.61 bits/symbol
i 1

DR. MAHMOOD 2017-10-15 33


H(X )
  100%  97.778%
LC

Note:
The condition that the number of symbols n so that we can decode them using r
Huffman coding is n  r must be an integer value, otherwise, add a redundant symbols
r 1
with a probabilities equal to zero so that the condition is satisfied.
Data Compression:
In computer science and information theory, data compression, source coding, or bit-
rate reduction involves encoding information using fewer bits than the original
representation. Compression can be either lossy or lossless.

Lossless data compression algorithms usually exploit statistical redundancy to


represent data more concisely without losing information, so that the process is
reversible. Lossless compression is possible because most real-world data has
statistical redundancy. For example, an image may have areas of color that do not
change over several pixels.

Lossy data compression is the converse of lossless data compression. In these


schemes, some loss of information is acceptable. Dropping nonessential detail from
the data source can save storage space. There is a corresponding trade-off between
preserving information and reducing size.

Run-Length Encoding (RLE):


Run-Length Encoding is a very simple lossless data compression technique that
replaces runs of two or more of the same character with a number which represents
the length of the run, followed by the original character; single characters are coded
as runs of 1. RLE is useful for highly-redundant data, indexed images with many
pixels of the same color in a row.
Example:

DR. MAHMOOD 2017-10-15 33


Input: AAABBCCCCDEEEEEEAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAA
Output: 3A2B4C1D6E38A

The input message to RLE encoder is a variable while the output code word is fixed,
unlike Huffman code where the input is fixed while the output is varied.
Example : Consider these repeated pixels values in an image … 0 0 0 0 0 0 0 0 0 0 0 0
5 5 5 5 0 0 0 0 0 0 0 0 We could represent them more efficiently as (12, 0)(4, 5)(8, 0)
24 bytes reduced to 6 which gives a compression ratio of 24/6 = 4:1.
Example :Original Sequence (1 Row): 111122233333311112222 can be encoded as:
(4,1),(3,2),(6,3),(4,1),(4,2). 21 bytes reduced to 10 gives a compression ratio of 21/10
= 21:10.
Example : Original Sequence (1 Row): – HHHHHHHUFFFFFFFFFFFFFF can be
encoded as: (7,H),(1,U),(14,F) . 22 bytes reduced to 6 gives a compression ratio of
22/6 = 11:3 .
Savings Ratio : the savings ratio is related to the compression ratio and is a measure
of the amount of redundancy between two representations (compressed and
uncompressed). Let:
N1 = the total number of bytes required to store an uncompressed (raw) source image.
N2 = the total number of bytes required to store the compressed data.
The compression ratio Cr is then defined as:

 Larger compression ratios indicate more effective compression


 Smaller compression ratios indicate less effective compression
 Compression ratios less than one indicate that the uncompressed representation
has high degree of irregularity.
The saving ratio Sr is then defined as :

 Higher saving ratio indicate more effective compression while negative ratios
are possible and indicate that the compressed image has larger memory size
than the original.

DR. MAHMOOD 2017-10-15 33


Example: a 5 Megabyte image is compressed into a 1 Megabyte image, the savings
ratio is defined as (5-1)/5 or 4/5 or 80%.
This ratio indicates that 80% of the uncompressed data has been eliminated in the
compressed encoding.

DR. MAHMOOD 2017-10-15 33


Chapter Four
Channel coding

1- Error detection and correction codes:

The idea of error detection and/or correction is to add extra bits to the digital
message that enable the receiver to detect or correct errors with limited
capabilities. These extra bits are called parity bits. If we have k bits, r parity bits
are added, then the transmitted digits are:

Here n called code word denoted as (n, k). The efficiency or code rate is equal
to .

Two basic approaches to error correction are available, which are:

a- Automatic-repeat-request (ARQ): Discard those frames in which errors


are detected.
- For frames in which no error was detected, the receiver returns a positive
acknowledgment to the sender.
- For the frame in which errors have been detected, the receiver returns
negative acknowledgement to the sender.
b- Forward error correction (FEC):

Ideally, FEC codes can be used to generate encoding symbols that are
transmitted in packets in such a way that each received packet is fully useful to a
receiver to reassemble the object regardless of previous packet reception
patterns. The most applications of FEC are:
Compact Disc (CD) applications, digital audio and video, Global System Mobile (GSM) and
Mobile communications.

DR. MAHMOOD 2017-10-15 51


2- Basic definitions:
- Systematic and nonsystematic codes: If (a’s) information bits are
unchanged in their values and positions at the transmitted code word, then
this code is said to be systematic (also called block code) where:
Input data , The output systematic code word
(n, k) is:

However if the data bits are spread or changed at the output code word then,
the code is said to be nonsystematic. The output of nonsystematic code word
of (n, k):

- Hamming Distance (HD): It is important parameter to measure the ability


of error detection. It the number of bits that differ between any two
codewords and denoted by . For a binary (n, k) code with
possible codewords, then minimum HD (dmin) is , where:

For any code word, the possible error detection is:

For example, if , then it is possible to detect 3 errors or less. The


possible error correcting is:

So that for , it is possible to correct only one bit.

DR. MAHMOOD 2017-10-15 52


Example (2): Find the minimum HD between the following codwords. Also
determine the possible error detection and the number of error correction bits.

Solution: Here , hence .

The possible error detection .

The possible error correction .

- Hamming Weight: It is the number of 1’s in the non-zero codeword


denoted by . For example the codewords of ,
, respectively. If we have two valid
codewords- all ones and all zeros, in this case

3- Parity check codes (Error detection):

It is a linear block codes (systematic codes). In this code, an extra bit is added
for each k information and hence the code rate (efficiency) is ⁄ . At the
receiver if the number of 1’s is odd then the error is detected. The minimum
Hamming distance for this category is dmin =2, which means that the simple
parity code is a single-bit error-detecting code; it cannot correct any error.
There are two categories in this type: even parity (ensures that a code word has
an even number of 1's) and odd parity (ensures that a code word has an odd
number of 1's) in the code word.

Example: an even parity-check code of (5, 4) which mean that, k =4 and n =5.

DR. MAHMOOD 2017-10-15 53


Data word Code word Data word Code word
0010 00101 0110 01100
1010 10100 1000 10001

The above table can be repeated with odd parity-check code of (5, 4) as follow:
Data word Code word Data word Code word
0010 00100 0110 01101
1010 10101 1000 10000

Note:
Error detection was used in early ARQ (Automatic Repeat on Request) systems.
If the receiver detects an error, it asks the transmitter (through another
backward channel) to retransmit.

The sender is calculate the parity bit to be added to the data word to form a
code word. At the receiver, a syndrome is calculated. The syndrome is passed
to the decision logic analyzer. If the syndrome is 0, there is no error in the
received codeword; the data portion of the received codeword is accepted as
the data word; if the syndrome is 1, the data portion of the received codeword
is discarded. The data word is not created as shown in figure below.

DR. MAHMOOD 2017-10-15 54


4- Repetition codes:

The repetition code is one of the most basic error-correcting codes. The idea of
the repetition code is to just repeat the message several times. The encoder is a
simple device that repeats, r times.

For example, if we have a (3, 1) repetition code, then encoding the signal
m=101001 yields a code c=111000111000000111.

Suppose we received a (3, 1) repetition code and we are decoding the signal
c=110001111. The decoded message is m=101. For (r, 1) repetition code an
error correcting capacity of (i.e. it will correct up to errors in any code
word). In other word the , or increasing the correction capability
depending on r value. Although this code is very simple, it also inefficient and
wasteful because using only (2, 1) repetition code, that would mean we have to
double the size of the bandwidth which means doubling the cost.

DR. MAHMOOD 2017-10-15 55


5- Linear Block Codes:

Linear block codes extend of parity check code by using a larger number of
parity bits to either detect more than one error or correct for one or more errors.
A block codes of an (n, k) binary block code can be selected a 2k codewords
from 2n possibilities to form the code, such that each k bit information block is
uniquely mapped to one of these 2k codewords. In linear codes the sum of any
two codewords is a codeword. The code is said to be linear if, and only if the
sum of is also a code vector, where are codeword vectors and
represents modulo-2 addition.

5-1 Hamming Codes

Hamming codes are a family of linear error-correcting codes that generalize


the Hamming (n,k) -code, and were invented by Richard Hamming in 1950.
Hamming codes can detect up to two-bit errors or correct one-bit errors without
detection of uncorrected errors. Hamming codes are perfect codes, that is, they
achieve the highest possible rate for codes with their block length and minimum
distance of three.

In the codeword, there are k data bits and redundant (check) bits,
giving a total of n codeword bits.

Hamming Code Algorithm:

General algorithm for hamming code is as follows:

1. r parity bits are added to an k - bit data word, forming a code word of n
bits .

DR. MAHMOOD 2017-10-15 56


2. The bit positions are numbered in sequence from 1 to n.
3. Those positions are numbered with powers of two, reserved for the parity
bits and the remaining bits are the data bits.
4. Parity bits are calculated by XOR operation of some combination of data
bits. Combination of data bits are shown below following the rule.
5. It Characterized by ( )

5-2 Exampl: Hamming(7,4)

This table describes which parity bits cover which transmitted bits in the
encoded word. For example, p2 provides an even parity for bits 2, 3, 6, and 7.
It also details which transmitted by which parity bit by reading the column.
For example, d1 is covered by p1 and p2 but not p3. This table will have a
striking resemblance to the parity-check matrix (H).

Or it can be calculate the parity bits from the following equations:

The parity bits generating circuit is as following:

DR. MAHMOOD 2017-10-15 57


At the receiver, the first step in error correction, is to calculate the syndrome bits
which indicate there is an error or no. Also, the value of syndrome determine the
position detecting using syndrome bits = CBA. The equations for generating
syndrome that will be used in the detecting the position of the error are given by:

Example:

Suppose we want to transmit the data 1011 over noisy communication channel.
Determine the Hamming code word.

Solution:

The first step is to calculate the parity bit value as follow and put it in the
corresponding position as follow:

DR. MAHMOOD 2017-10-15 58


Bit position 1 2 3 4 5 6 7
Bit name
Received value 0 1 1 0 0 1 1

The the codeword is c=0110011

Suppose the following noise is added to the code word, then the received code
becomes as:

The noise:

The received code word:

Now calculate the syndrome:

So that which indicate that an error in the fifth bit.

Hamming matrices:

Hamming codes can be computed in linear algebra terms through matrices


because Hamming codes are linear codes. For the purposes of Hamming codes,
two Hamming matrices can be defined: the code generator matrix G and the
parity-check matrix H

DR. MAHMOOD 2017-10-15 59


Example:
Suppose we want to transmit the data 1011 over noisy communication channel

This mean that 0110011 would be transmitted instead of transmitting 1011.If no


error occurs during transmission, then the received codeword r is identical to
the transmitted codeword x: . The receiver multiplies H and r to
obtainthe syndrome vector z ,which indicates whether an error has occurred,
and if so, for which codeword bit.

suppose we have introduced a bit error on bit 5

DR. MAHMOOD 2017-10-15 60

You might also like