Lec07 PDF
Lec07 PDF
Lecture 7 — September 24
Lecturer: Anant Sahai Scribe: Ankush Gupta
7.1 Overview
This lecture introduces affine and linear codes. Orthogonal signalling and random codes
are reviewed first. Impracticality of random codes (exponential table size in the number of
bits) motivates affine codes. Linear codes then fall through naturally from the discussion of
affine codes.
7.2 Review
7.2.1 Orthogonal Signalling
Consider the problem of communicating k bits simultaneously. In orthogonal signalling, each
of the 2k combinations is encoded as a function x(t) from the family of orthogonal cosine
functions. The received signal is decoded by passing it through a bank of correlators followed
by thresholding the correlation value. This process can be made more robust by adding an
additional layer of coding, such as Reed-Solomon error correcting codes.
This process of communication is summarized in the following schematic:
N(t)
E Modulator + Demodulator D
Error Decode Error
Correcting Correcting
Codes Codes
abstracted as
(1p)
0 0
p
p
1 (1p)
1
7-1
EECS 121 Lecture 7 — September 24 Fall 2013
Encoding/ Decoding
In random codes, each of the 2k symbols is encoded as an i.i.d. Bernoulli( 21 ) codeword
X~ ∈ {0, 1}n of length n > k. During transmission, a codeword is corrupted by i.i.d.
Bernoulli(p) noise N ~ ∈ {0, 1}n . The noise is assumed to be independent of the code-
book, i.e., the table of the symbols and their corresponding codewords.1 Then, the received
signal Y~ can be represented as Y~ = X ~ +N ~ . The received signal Y~ can be decoded to the
2
maximum matching code-word.
Probability of Error
The noise moves a codeword around in {0, 1}n . Let ‘decoding box’ of a codeword be the
region in {0, 1}n , such that if the received signal Y~ lies in that region, it is decoded to that
codeword. An error in decoding occurs if :
• Noise pushes Y~ out of the decoding box of the true codeword X ~ true .
~ as the set,
Let us formally define the decoding-box of a codeword X
~ = {~y ∈ {0, 1}n |#of 1’s in ~y − X
Decoding-Box(X) ~ ≤ (p + )n}
where, p is the probability of noise and n is the length of the codeword. Note, all the
codewords are identically distributed (as they are all {Bernoulli(1/2)}n . Therefore, setting
~ to ~0 above, we have a definition of a decoding-box, independent of the codeword:
X
Decoding-Box = {~b ∈ {0, 1}n |#of 1’s in ~b ≤ (p + )n}
1
This is a valid assumption as the noise originates in nature and should be independent of our choice of
the bit-string
2
The distance (or cost) function for this matching is like the hamming distance, dist(Y ~ = Pn δ(Yi −
~ , X)
i=1
Xi ), where δ(.) is the kronecker delta function.
7-2
EECS 121 Lecture 7 — September 24 Fall 2013
We first note that any two codewords, X ~ 1 and X ~ 2 are independent : this is because
1
they are made up of i.i.d. Bernoulli( 2 ) random entries. Further, the noise is i.i.d.
Bernoulli(p) and independent of the code-book. Hence, a received signal Y~ = X ~ +N ~,
is independent of all the false code-words X ~ f alse : this is because, Y~ is a function of
two variables which are independent of X ~ f alse . Moreover, the marginal distribution
of the received signal is uniform in {0, 1}n , this is because it is a sum of noise and
{Bernoulli( 21 )}n codeword.3
Then the probability that a received signal Y~ lies in some false codeword’s decoding
box is the ratio of the number of vectors in this decoding box and the total number of
vectors in {0, 1}n :
~
~ f alse claims Y~ ) = |Decoding Box(Xf alse )|
P (X by independence of Y~ and Xf alse
2n
and, uniformity of Y~ in {0, 1}n
~ f alse claims Y~ ) ≤ 2k |Decoding Box|
⇒ P (∃X by identically-distributed and union bound
2n
Note, above the number of false codewords is actually 2k − 1, but 2k is used for sim-
plicity.
One way of knowing the size of the decoding-box is to use Asymptotic Equipartition
Principle.4 To apply A.E.P., we need a slight modification in our definition of the
3
Proof in the Appendix.
4
The other way is to use Stirling’s approximation for factorials and evaluate the sum of binomial distri-
bution.
7-3
EECS 121 Lecture 7 — September 24 Fall 2013
Decoding-Box = {~b ∈ {0, 1}n |#of 1’s in ~b ≤ (p + )n and #of 1’s in ~b ≥ (p − )n}
= {~b ∈ {0, 1}n | |(#of 1’s in ~b) − pn| ≤ n}
The probability of bit-string in the typical-set is basically the same and is equal to
0 0
(1 − p)#1 s p#0 s , and they all have approximately the same number of ones = np. The
size of the decoding box according to A.E.P. is then,
0
|Decoding-Box| ≤ 2n(H(p)+ )
1.0
ϵ'
ϵ'
H(p)
where, H(p) = p log2 (1/p) + (1 − p) log2 (1/(1 − p)) is the entropy of the noise and, 0
is such that 0 → 0 as → 0 (see Figure 7.2). Hence, we have,
if k = Rn, R is the channel rate. For exponential decay of error probability with
codeword length, we require, R < 1 − (H(p) + 0 ). Note, the noise probability bounds
the channel rate.
From the analysis of the two cases of decoding error, we conclude that the probability of
error in decoding is a decaying exponential function in the length of the codeword.
7-4
EECS 121 Lecture 7 — September 24 Fall 2013
X ~ + ~b
~ = Gm (7.1)
where, G ∈ {0, 1} , ~b ∈ {0, 1} . The entries of the matrix G, called the generator matrix,
n×k n
and the vector ~b are i.i.d. Bernoulli( 21 ). Note, this is a very compact representation of the
codebook — it requires only n × k entries for G and n entries for ~b, i.e., a total of Rn2 + n
entries, if k = Rn, R < 1.
Let us inspect if the above three assumptions hold for affine codes.
1. The first assumption holds true for affine codes as the noise originates in nature and
should be independent of what coding scheme is used.
2. X~ = Gm~ + ~b, i.e., X
~ is a sum where one of the terms viz., ~b is {Bernoulli( 1 )}n . But
2
the sum of any random variable and a Bernoulli( 12 ) random variable is Bernoulli( 21 ).5
5
See Appendix for a proof of this.
7-5
EECS 121 Lecture 7 — September 24 Fall 2013
~ 1, X
3. X ~ 2 are (pairwise) independent where X ~ 1 + ~b and X
~ 1 = Gm ~ 2 + ~b for some
~ 2 = Gm
m ~ 1 ∈ {0, 1}k , m
~ 1, m ~ 1 6= m
~ 1.
Proof : Let G = [~g1 ~g2 . . . ~gk ], where ∀i ∈ {1, 2, . . . , k}, ~gi ∈ {0, 1}n ∼ {Bernoulli( 12 )}n
b1 b̃1
.. ..
Further, let m
~ 1 = . and m ~2=.
bk b̃k
Define S = {i|i ∈ 1, . . . , k, bi = b̃i }, i.e. set of indices where the bits of m ~ 1 and
~ 2 agree and similarly, D = {i|i ∈ 1, . . . , k, bi 6= b̃i }, i.e. set of indices where the bits
m
are different. Then, we know:
~ 1 = ~b + Pk bi~gi , and
X i=1
~ 2 = ~b + Pk b̃i~gi = ~b + P bi~gi + P
X i=1 i∈S i∈D b̃i~
gi
But note that ∀i ∈ D, b̃i~gi = (bi + 1)~gi = bi~gi + ~gi , because, bi and b̃i are comple-
ments. Therefore,
~ 2 = ~b + P bi~gi + P
X
P
i∈S gi + i∈D ~gi
i∈D bi~
X
~1 = X
⇒ X ~2 + ~gi
i∈D
Without loss of generality, there must ∃j ∈ D, s.t. ~gj is not included in X ~ 2 , i.e.,
~ 1 ). There-
b̃j = 0 (if not, since we know bj = b̃j + 1, hence, this must be the case for X
fore, ~gj and X~ 2 are independent.
As all three independence assumptions hold true for affine codes, the same analysis of
probability of error applies as the one used for random codes. Therefore even for affine
codes, the probability of error decays exponentially with the length of codewords.
6
The fact that X = Y + B is independent of Y ∼ Bernoulli(p), if B ∼ Bernoulli( 12 ) and independent of
Y is also used in point (2.) and proved in the Appendix.
7-6
EECS 121 Lecture 7 — September 24 Fall 2013
Hence, the decoder might first subtract ~b from Y~ and then, correlate Y~ − ~b = Gm ~ +N ~
~ i . But, subtraction in Z2 is the same as addition.7 Hence, we have the following
with all Gm
diagram for affine codes:
b N[i] ~ B(p) b
m G + + + + correlation 0
m0 G
+ correlation 1
Encoder Channel
m1 G
Decoder
But note that the noise N ~ is additive and independent of the signal X.~ Hence, in the
above encoding/ decoding scheme the addition of noise and the addition of vector ~b can
commute. Note, had the noise not been additive or dependent on the signal then it would
be incorrect to change the order of application of the noise and addition of ~b.
7-7
EECS 121 Lecture 7 — September 24 Fall 2013
b b N[i] ~ B(p)
m G + + + Decoder
equivalent
N[i] ~ B(p)
m G + Decoder
Figure 7.4: Affine Codes to Linear Codes : Equivalent system with order of ~b and N
~ switched.
of error with n — the length of the codeword and have a compact representation of the
codebook (Rn2 terms)!
We first note that codewords form a sub-space of {0, 1}n . Sub-spaces are closed under
addition, i.e., adding two vectors in a sub-space results in another which is also in the sub-
space. Further, addition does the ‘same thing’ to all the vectors — it is not biased against
any special vector of the sub-space. Now, if there existed a particular message m ~ for which
probability of error was large, that means that the noise when added to the encoded m ~ takes
it closer to another codeword or out of the sub-space. But, then the noise must do the same
thing to all the codewords! Hence, no such exceptionally bad message m ~ should exist.
It is important to note that if the sub-space had some ‘boundaries’, i.e., if it was not
closed under addition then there could exist some messages which could have large proba-
bilities of error.
7-8
EECS 121 Lecture 7 — September 24 Fall 2013
7.4 Appendix
1. Claim : If Y = X + B, X ∼ Bernoulli(p), B ∼ Bernoulli( 12 ), B and X are indepen-
dent then:
(a) Y ∼ Bernoulli( 21 )
(b) Y is independent of X
Proof :
(a)
P (Y = 1) = P (X = 0)P (B = 1) + P (X = 1)P (B = 0)
1 1
= (1 − p) + p
2 2
1
=
2
⇒ P (Y = 1) = P (Y = 0) = 12 . Hence, Y ∼ Bernoulli( 12 ).
Trivially, the above claim generalized to vectors with independent random-variable
enteries.
(b) P (Y = y, X = x) = P (Y = y|X = x)P (X = x) = P (B = y − x)P (X = x) =
1
2
P (X = x) = P (Y = y)P (X = x) ⇒ Y and X are independent.
7-9