0% found this document useful (0 votes)
47 views

Lec07 PDF

This lecture introduces linear and affine codes. Random codes are reviewed where each symbol is encoded as a random binary string, but they are impractical due to their exponentially large tables. Affine codes are then introduced to address this issue. Linear codes naturally follow from discussing affine codes. The key properties of random codes are summarized, including encoding, decoding, and calculating the probability of error. Encoding each symbol as an independent random binary string allows embedding symbols in a lower dimensional space than orthogonal signaling. The probability of decoding error decays exponentially with the codeword length for random codes.

Uploaded by

pyaridada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Lec07 PDF

This lecture introduces linear and affine codes. Random codes are reviewed where each symbol is encoded as a random binary string, but they are impractical due to their exponentially large tables. Affine codes are then introduced to address this issue. Linear codes naturally follow from discussing affine codes. The key properties of random codes are summarized, including encoding, decoding, and calculating the probability of error. Encoding each symbol as an independent random binary string allows embedding symbols in a lower dimensional space than orthogonal signaling. The probability of decoding error decays exponentially with the codeword length for random codes.

Uploaded by

pyaridada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

EECS 121: Coding for Digital Communication and Beyond Fall 2013

Lecture 7 — September 24
Lecturer: Anant Sahai Scribe: Ankush Gupta

7.1 Overview
This lecture introduces affine and linear codes. Orthogonal signalling and random codes
are reviewed first. Impracticality of random codes (exponential table size in the number of
bits) motivates affine codes. Linear codes then fall through naturally from the discussion of
affine codes.

7.2 Review
7.2.1 Orthogonal Signalling
Consider the problem of communicating k bits simultaneously. In orthogonal signalling, each
of the 2k combinations is encoded as a function x(t) from the family of orthogonal cosine
functions. The received signal is decoded by passing it through a bank of correlators followed
by thresholding the correlation value. This process can be made more robust by adding an
additional layer of coding, such as Reed-Solomon error correcting codes.
This process of communication is summarized in the following schematic:

N(t)

E Modulator + Demodulator D
Error Decode Error
Correcting  Correcting 
Codes Codes

abstracted as 

(1­p)
0 0
p
p
1 (1­p)
1

Figure 7.1: Communication through a noisy channel and its abstractions.

7-1
EECS 121 Lecture 7 — September 24 Fall 2013

7.2.2 Random Codes


A limitation of orthogonal signalling is that it uses a lot of bandwidth — to transmit k
bits simultaneously it requires a bandwidth of 2k frequencies. This is due to the require-
ment of using orthogonal functions to represent symbols. These orthogonal functions form
a sub-space; as the number of orthogonal vectors in a subspace is equal to its dimension,
this means, we are embedding 2k symbols in a 2k dimensional space, thereby assigning one
dimension to each symbol. By relaxing the requirement of perfect orthogonality and instead
using approximate orthogonality, these symbols can be embedded in a much lower dimen-
sional space. This is achieved using random codes.

Encoding/ Decoding
In random codes, each of the 2k symbols is encoded as an i.i.d. Bernoulli( 21 ) codeword
X~ ∈ {0, 1}n of length n > k. During transmission, a codeword is corrupted by i.i.d.
Bernoulli(p) noise N ~ ∈ {0, 1}n . The noise is assumed to be independent of the code-
book, i.e., the table of the symbols and their corresponding codewords.1 Then, the received
signal Y~ can be represented as Y~ = X ~ +N ~ . The received signal Y~ can be decoded to the
2
maximum matching code-word.

Probability of Error
The noise moves a codeword around in {0, 1}n . Let ‘decoding box’ of a codeword be the
region in {0, 1}n , such that if the received signal Y~ lies in that region, it is decoded to that
codeword. An error in decoding occurs if :
• Noise pushes Y~ out of the decoding box of the true codeword X ~ true .

• Noise pushes Y~ into the decoding box of some of false codeword X


~ f alse .

~ as the set,
Let us formally define the decoding-box of a codeword X
~ = {~y ∈ {0, 1}n |#of 1’s in ~y − X
Decoding-Box(X) ~ ≤ (p + )n}

where, p is the probability of noise and n is the length of the codeword. Note, all the
codewords are identically distributed (as they are all {Bernoulli(1/2)}n . Therefore, setting
~ to ~0 above, we have a definition of a decoding-box, independent of the codeword:
X
Decoding-Box = {~b ∈ {0, 1}n |#of 1’s in ~b ≤ (p + )n}

1
This is a valid assumption as the noise originates in nature and should be independent of our choice of
the bit-string
2
The distance (or cost) function for this matching is like the hamming distance, dist(Y ~ = Pn δ(Yi −
~ , X)
i=1
Xi ), where δ(.) is the kronecker delta function.

7-2
EECS 121 Lecture 7 — September 24 Fall 2013

Let us now examine each of the above two possibilities of errors:

1. Error due to Y~ out of the decoding box of the true codeword X


~ true

P (Y~ ∈ ~ true ) = P (# of 10 s in noise ~n ≥ (p + )n)


/ Decoding-box of X
n  
X n i
= p (1 − p)n−i
i
i=(p+)n
!
n(p + ) − np
≈Q p central limit theorem
np(1 − p)
√ !
2
 
 n
=Q p ≤ exp −n
p(1 − p) 2p(1 − p)

We note that this probability decays exponentially with n.


2. Error due to Y~ being in the decoding box of some of false codeword X
~ f alse .

We first note that any two codewords, X ~ 1 and X ~ 2 are independent : this is because
1
they are made up of i.i.d. Bernoulli( 2 ) random entries. Further, the noise is i.i.d.
Bernoulli(p) and independent of the code-book. Hence, a received signal Y~ = X ~ +N ~,
is independent of all the false code-words X ~ f alse : this is because, Y~ is a function of
two variables which are independent of X ~ f alse . Moreover, the marginal distribution
of the received signal is uniform in {0, 1}n , this is because it is a sum of noise and
{Bernoulli( 21 )}n codeword.3

Then the probability that a received signal Y~ lies in some false codeword’s decoding
box is the ratio of the number of vectors in this decoding box and the total number of
vectors in {0, 1}n :
~
~ f alse claims Y~ ) = |Decoding Box(Xf alse )|
P (X by independence of Y~ and Xf alse
2n
and, uniformity of Y~ in {0, 1}n
~ f alse claims Y~ ) ≤ 2k |Decoding Box|
⇒ P (∃X by identically-distributed and union bound
2n
Note, above the number of false codewords is actually 2k − 1, but 2k is used for sim-
plicity.
One way of knowing the size of the decoding-box is to use Asymptotic Equipartition
Principle.4 To apply A.E.P., we need a slight modification in our definition of the
3
Proof in the Appendix.
4
The other way is to use Stirling’s approximation for factorials and evaluate the sum of binomial distri-
bution.

7-3
EECS 121 Lecture 7 — September 24 Fall 2013

typical-set (the decoding box):

Decoding-Box = {~b ∈ {0, 1}n |#of 1’s in ~b ≤ (p + )n and #of 1’s in ~b ≥ (p − )n}
= {~b ∈ {0, 1}n | |(#of 1’s in ~b) − pn| ≤ n}

The probability of bit-string in the typical-set is basically the same and is equal to
0 0
(1 − p)#1 s p#0 s , and they all have approximately the same number of ones = np. The
size of the decoding box according to A.E.P. is then,

0
|Decoding-Box| ≤ 2n(H(p)+ )

1.0

ϵ'
ϵ'
H(p)

0.0 p-ϵ p p+ϵ 1.0


p

Figure 7.2: Entropy H(p) = p log( p1 ) + (1 − p) log( (1−p)


1
)

where, H(p) = p log2 (1/p) + (1 − p) log2 (1/(1 − p)) is the entropy of the noise and, 0
is such that 0 → 0 as  → 0 (see Figure 7.2). Hence, we have,

~ f alse claims Y~ ) ≤ 2k−n+n(H(p)+0 )


P (∃X
0
= 2−n(1−(H(p)+ )−R)

if k = Rn, R is the channel rate. For exponential decay of error probability with
codeword length, we require, R < 1 − (H(p) + 0 ). Note, the noise probability bounds
the channel rate.

From the analysis of the two cases of decoding error, we conclude that the probability of
error in decoding is a decaying exponential function in the length of the codeword.

7-4
EECS 121 Lecture 7 — September 24 Fall 2013

Towards Affine Codes


Even though random codes remedy the problem of using too much bandwidth, yet they
suffer from a practical problem — the size of the code-book. For each of the 2k codes, there
is a corresponding n length codeword, hence the size of this codebook is n × 2k ; this is
exponential in the number of bits (k). This is further remedied by using affine codes.

7.3 Affine Codes


7.3.1 Encoding
~ ∈ {0, 1}k , is coded as
Consider the following coding scheme : a message vector m

X ~ + ~b
~ = Gm (7.1)
where, G ∈ {0, 1} , ~b ∈ {0, 1} . The entries of the matrix G, called the generator matrix,
n×k n

and the vector ~b are i.i.d. Bernoulli( 21 ). Note, this is a very compact representation of the
codebook — it requires only n × k entries for G and n entries for ~b, i.e., a total of Rn2 + n
entries, if k = Rn, R < 1.

7.3.2 Probability of Error in Decoding


Let us now consider the probability of error in decoding. The error probability for random
codes decayed exponentially with the length of the codeword. We would like to have similar
error probability characteristics for affine codes. The error analysis for random codes rests
on the following three important independence assumptions:
1. Noise is i.i.d. and independent of the codebook.
2. Codewords are marginally (individually) distributed uniformly over {0, 1}n .
3. Codewords are independent hence, the received signal Y~ = X ~ +N ~ is independent of all
~ f alse . In random coding, we have full mutual independence among all
false codewords X
codewords, but the analysis only requires pairwise independence among the codewords.

Let us inspect if the above three assumptions hold for affine codes.
1. The first assumption holds true for affine codes as the noise originates in nature and
should be independent of what coding scheme is used.
2. X~ = Gm~ + ~b, i.e., X
~ is a sum where one of the terms viz., ~b is {Bernoulli( 1 )}n . But
2
the sum of any random variable and a Bernoulli( 12 ) random variable is Bernoulli( 21 ).5
5
See Appendix for a proof of this.

7-5
EECS 121 Lecture 7 — September 24 Fall 2013

~ ∼ {Bernoulli( 1 )}n , i.e. X


Hence, X ~ is uniformly distributed in {0, 1}n .
2

~ 1, X
3. X ~ 2 are (pairwise) independent where X ~ 1 + ~b and X
~ 1 = Gm ~ 2 + ~b for some
~ 2 = Gm
m ~ 1 ∈ {0, 1}k , m
~ 1, m ~ 1 6= m
~ 1.

Proof : Let G = [~g1 ~g2 . . . ~gk ], where ∀i ∈ {1, 2, . . . , k}, ~gi ∈ {0, 1}n ∼ {Bernoulli( 12 )}n
   
b1 b̃1
 ..   .. 
Further, let m
~ 1 =  .  and m ~2=.
bk b̃k
Define S = {i|i ∈ 1, . . . , k, bi = b̃i }, i.e. set of indices where the bits of m ~ 1 and
~ 2 agree and similarly, D = {i|i ∈ 1, . . . , k, bi 6= b̃i }, i.e. set of indices where the bits
m
are different. Then, we know:

~ 1 = ~b + Pk bi~gi , and
X i=1

~ 2 = ~b + Pk b̃i~gi = ~b + P bi~gi + P
X i=1 i∈S i∈D b̃i~
gi

But note that ∀i ∈ D, b̃i~gi = (bi + 1)~gi = bi~gi + ~gi , because, bi and b̃i are comple-
ments. Therefore,

~ 2 = ~b + P bi~gi + P
X
P
i∈S gi + i∈D ~gi
i∈D bi~

X
~1 = X
⇒ X ~2 + ~gi
i∈D

Without loss of generality, there must ∃j ∈ D, s.t. ~gj is not included in X ~ 2 , i.e.,
~ 1 ). There-
b̃j = 0 (if not, since we know bj = b̃j + 1, hence, this must be the case for X
fore, ~gj and X~ 2 are independent.

Hence, X ~ 1 is a sum of X ~ 2 and some ~gj ∈ {Bernoulli( 1 )}n independent of X


~ 2 . There-
2
~ 1 and X
fore, X ~ 2 are independent.6 

As all three independence assumptions hold true for affine codes, the same analysis of
probability of error applies as the one used for random codes. Therefore even for affine
codes, the probability of error decays exponentially with the length of codewords.
6
The fact that X = Y + B is independent of Y ∼ Bernoulli(p), if B ∼ Bernoulli( 12 ) and independent of
Y is also used in point (2.) and proved in the Appendix.

7-6
EECS 121 Lecture 7 — September 24 Fall 2013

7.3.3 Decoding and Linear Codes


Now consider the decoder. A message m ~ = Gm
~ was encoded as X ~ + ~b. The received signal
is Y~ = X
~ +N
~ = Gm ~ + ~b + N
~ , where N
~ is noise ∼ {Bernoulli(p)}n (i.i.d.).

Hence, the decoder might first subtract ~b from Y~ and then, correlate Y~ − ~b = Gm ~ +N ~
~ i . But, subtraction in Z2 is the same as addition.7 Hence, we have the following
with all Gm
diagram for affine codes:

b N[i] ~ B(p) b

m G + + + + correlation 0

m0 G
+ correlation 1
Encoder Channel
m1 G

Decoder

Figure 7.3: Encoding and decoding with affine codes

But note that the noise N ~ is additive and independent of the signal X.~ Hence, in the
above encoding/ decoding scheme the addition of noise and the addition of vector ~b can
commute. Note, had the noise not been additive or dependent on the signal then it would
be incorrect to change the order of application of the noise and addition of ~b.

So the equivalent encoding/decoding scheme now looks like the following:


Therefore, we can consider the equivalent encoding scheme to be : m ~ + ~b + ~b.
~ 7→ Gm
2
~ + ~b + ~b = Gm
But, in Z , we have, Gm ~ + 2~b mod 2 = Gm.
~ Therefore, the new encoding is :
1
m ~ G ∈ {Bernoulli( )}n×k
~ 7→ Gm, (7.2)
2
We note that the above encoding scheme is a linear function of the message m.
~ Hence,
8
linear codes suffice. Linear codes inherit the same exponentially decaying probability
7
x − y mod 2 = x + (−1)y mod 2 = x + y mod 2.
8
The affine part ~b is used only in the proof (see independence assumption 2); it is not required in a
practical system with independent additive noise.

7-7
EECS 121 Lecture 7 — September 24 Fall 2013

b b N[i] ~ B(p)

m G + + + Decoder
equivalent

N[i] ~ B(p)

m G + Decoder

Figure 7.4: Affine Codes to Linear Codes : Equivalent system with order of ~b and N
~ switched.

of error with n — the length of the codeword and have a compact representation of the
codebook (Rn2 terms)!

7.3.4 Further Explorations


We have proved that the probability of error in decoding decays exponentially with the
length of the codeword. But the number of codewords also increases exponentially with the
length. Hence, one might ask if there exist some messages for a given coding strategy (G
and ~b), for which the probability of error is large.

We first note that codewords form a sub-space of {0, 1}n . Sub-spaces are closed under
addition, i.e., adding two vectors in a sub-space results in another which is also in the sub-
space. Further, addition does the ‘same thing’ to all the vectors — it is not biased against
any special vector of the sub-space. Now, if there existed a particular message m ~ for which
probability of error was large, that means that the noise when added to the encoded m ~ takes
it closer to another codeword or out of the sub-space. But, then the noise must do the same
thing to all the codewords! Hence, no such exceptionally bad message m ~ should exist.

It is important to note that if the sub-space had some ‘boundaries’, i.e., if it was not
closed under addition then there could exist some messages which could have large proba-
bilities of error.

The symmetry of codewords discussed above is used when analyzing communication


systems. As no codeword is special, in our analysis of probability of erroneous decoding, we
can pick a particular codeword, say the all zero codeword, and analyze the error probability
of that. The results would then generalize to all the codewords.

7-8
EECS 121 Lecture 7 — September 24 Fall 2013

7.4 Appendix
1. Claim : If Y = X + B, X ∼ Bernoulli(p), B ∼ Bernoulli( 12 ), B and X are indepen-
dent then:

(a) Y ∼ Bernoulli( 21 )
(b) Y is independent of X

Proof :

(a)

P (Y = 1) = P (X = 0)P (B = 1) + P (X = 1)P (B = 0)
1 1
= (1 − p) + p
2 2
1
=
2
⇒ P (Y = 1) = P (Y = 0) = 12 . Hence, Y ∼ Bernoulli( 12 ).
Trivially, the above claim generalized to vectors with independent random-variable
enteries.
(b) P (Y = y, X = x) = P (Y = y|X = x)P (X = x) = P (B = y − x)P (X = x) =
1
2
P (X = x) = P (Y = y)P (X = x) ⇒ Y and X are independent.

7-9

You might also like