0% found this document useful (0 votes)
7 views

Lecture_13

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lecture_13

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

EE/Stats 376A Lecture 12 — February 21 Winter 2017

EE/Stats 376A Lecture 12 — February 21 Winter 2017


EE/Stats 376A: Information theory Winter 2017
The channel corresponding to the input V squeezes the extra bits equal to the gap

1. Reliability: The error probability of repetition codes,


Lecture 13 — February 23
betweenn→∞
pe −→
In light
the capacity 2C and the rate of the repetition channel(n = 2).
of the0. above
Equivalently, we we conclude that:
discussions,
may interpret this in terms of the mutual information between U and the output of
the channel:
Lecturer: David Tse 1. From U ’s point of view, it is passedScribe:through aDavid
BEC(p2L, Tong which
) channel, M, Vivek B
has more
capacity than BEC(p).
I(U ; Y1 , Y2 , . . . , Yn ) = H(U ) − H(U |Y1 , Y2 , · · · , Yn )
2. However, from V ’s point of view, it is passed through a BEC(1 − (1 − p)2 ) channel,
= 1 − H(U |Y1 , Y2 , · · · , Yn )
n→∞
which has less capacity than BEC(p).
−→ 1.
This is diagrammatically represented in Figure 12.5.
13.1 Outline
2. Rate: The rate of communication is approximately 1 bit for every n uses of the channel.
This is small compared to the maximum possible communication rate nC(P X2the
).⊕ In
V BEC(p) Y2 V BEC(1 − (1 − p)2) Y1 , Y2
• Polar
next section, we will modify this coding
Codes scheme to make up the difference and get more
⇐⇒
bits through the channel.
X1
U BEC(p) Y1 U BEC(p2 ) Y1 , Y2 , V
12.3.2 Repetition code forReading
13.1.1 n=2
Figure 12.5: Diagrammatic interpretation of equivalence between coding scheme in Figure
et us consider the repetition code for n = 2 on the BEC(p) 12.4,
channel
andwith
two capacity = 1 −channels
separate Cbinary p for U and V .
refer to Figure 12.3) • CT: 8.1, 8.3 − 8.6, 9.1, 9.2
X2
BEC(p) Y2 Sanity Check for equivalence: From equations (12.4), (12.5) and (12.6), we have
13.2 Recap - Polar Coding Introduction
I(U, V ; Y , Y ) = I(V ; Y , Y ) + I(U ; Y , Y |V )
1 2 1 2 1 2
X1
U BEC(p) Y1 = 1 − p2 + (1 − p)2
Last time, we modified the repetition coding scheme=to2(1obtain − p), a capacity-achieving coding
igure 12.3: Repetition code for n = 2 for BEC(p) channel for input U ∼
scheme as shown in Figure 13.1. The modified coding scheme takes in a two bit message U ,
Unif{0, 1}.
Channel inputs: X1 = X2 = U . which gives us back equation (12.3).
V , and transmits X1 = U and X2 = the
Extending U⊕ V in two
conclusions uses12.3.3,
in section the channel
two uses of P
a .general symmetric channel P
+ −
are equivalent to a P and P channel, as shown in Figure 12.6.

X2 X2
P Y2 V ⊕ P Y2 V P− Y1 , Y2
Modified
−→ ⇐⇒
X1 X1
U P Y1 U P Y1 U P+ Y1 , Y2 , V

igure 12.4: Repetition code for n = 2 for BEC(p) channel for input U ∼ Unif{0, 1}. Figure 12.6
Channel inputs: X1Figure
= X2 = U13.1:
. The coding scheme in the left is a plain-vanilla repetition
code for two uses of
the channel. It is modified (right) by adding an extra bit V to the second use of the channel.
12.3.4 Conclusion
We lower bound the mutual information term
Thus, we have in some sense divided two symmetric channels P into two separate channels.
Equivalence:
I(U ; Y1 , Y2 ) = I(U ; YAs
1 ) + shown
I(U ; Y in
2 |Y1first
The FigureP −13.1,
) channel the modified
is less ‘reliable’ coding
than P , whereas scheme
the second is Pequivalent
channel + to
is more ‘reliable’.
In the+next lecture, we’ll cascade this argument to obtain polar − codes. +
transmitting U via channel P , and transmitting V via channel P , where P has higher
≥ I(U ; Y1 )
capacity than= PC., and P − is lower capacity than
(12.1)
P. 12-5
Even though the underlying physical channels do not transform/split, for ease of ex-
12-3 loosely use phrases like - Channel P splits into channels P + and P − ,
position, we shall
P, P ⇔ P , P to refer to the above equivalence, and we shall also refer P + , P − as the bit
+ −

channels.

13.2.1 Example: BEC(p)


In the previous lecture, we explicitly characterize the bit channels P + and P − for P =
BEC(p) (Refer to Figure 13.2). In particular, we showed that C(P + ) + C(P − ) = 2C(P ).

13-1
2. However, from V ’s point of view, it is passed through a BEC(1 − (1 − p)2 ) channel,
EE/Stats 376A Lecture 13 — February 23 Winter 2017
which has less capacity than BEC(p).

This is diagrammatically represented in Figure 12.5.

X2
V ⊕ BEC(p) Y2 V BEC(1 − (1 − p)2) Y1 , Y2
⇐⇒
X1
U BEC(p) Y1 U BEC(p2 ) Y1 , Y2 , V

Figure 12.5: Diagrammatic interpretation+ of equivalence between coding scheme in Figure


Figure 13.2: P = BEC(p2 ) and P − = BEC(1 − (1 − p)2 ).
12.4, and two separate binary channels for U and V .
In the next section, we will give the idea behind polar codes and show that they achieve
capacity.
Sanity Check for equivalence: From equations (12.4), (12.5) and (12.6), we have

13.3 Idea = I(V ; Y1 , Y2 ) + I(U ; Y1 , Y2 |V )


I(U, V ; Y1 , Y2 )
= 1 − p2 + (1 − p)2
1
Note: some of the figures in=this2(1
section
− p), are reproduced from Erdal Arikan’s slides
Without loss of generality, we will restrict ourselves to symmetric channels P whose
capacity
which gives 0 ≤ C(P (12.3).
us backisequation ) ≤ 1. We know that channel coding is trivial for two types of channels:
Extending1.the conclusions
Noiseless in section
channel 12.3.3,
- Channel withtwo
no uses
noiseofi.e,
a general
C(P ) =symmetric
1. channel P
+ −
are equivalent to a P and P channel, as shown in Figure 12.6.
2. Useless channel - Channel with zero capacity i.e, C(P ) = 0.
X2
The main V idea⊕ behind Y2 is to transform
Ppolar codes V the P −
message Y1 , Yin
bits 2 such a way that
C(P ) fraction of these bits are ‘effectively’
⇐⇒ passed through a noiseless channel, whereas the
the remaining (1 − C(P )) fraction are ‘effectively’ passed through a useless channel (Figure
X1
13.3). Polar
U codes achieve P this by Ysuccessively
1 Uapplying the Y1 , Y2 , V P, P ⇔ P + , P −
P +transformation
until ‘most’ of the bit channels are either noiseless or useless.
Figure 12.6
nC
Noiseless
Channels
12.3.4 Conclusion
Message Polar Codes Decoder
Useless
Thus, we have in some sense divided two symmetric channels P into two separate channels.
− Channels +
The first channel P is less ‘reliable’ than P , whereas the secondn(1-C)
channel P is more ‘reliable’.
In the next lecture, we’ll cascade this argument to obtain polar codes.
Figure 13.3: Idea behind polar codes.
12-5
13.3.1 Successive splitting
Up until now, we’ve applied transformations which effectively split a pair of P channels
into P + and P − channels. We now apply the same transformation on pairs of P + , P −
channels, and split them further into P ++ , P +− and P +− , P −− channels respectively. As
illustrated in Figure 13.4, this is equivalent of splitting four P channels into four channels -
P ++ , P +− , P +− and P −− .
1
https://simons.berkeley.edu/sites/default/files/docs/2691/slidesarikan.pdf

13-2
EE/Stats 376A Lecture 13 — February 23 Winter 2017

(a) (b)

Figure 13.4: (a) Splitting pairs of channels P + , P − into P ++ P +− and P −+ , P −− respectively.


(b) Overall transformation from 4 P channels.

Applying the same transformation to the pairs of above four channels will further split
them into 8 different channels whose transformation is shown in Figure 13.5

Figure 13.5: Third stage of the transformation with 8 P channels, transformed bits U ,
codeword bits X, and output Y .

(n)
As we keep on splitting further, the bit channels denoted by Pi converge to either a
noiseless or a useless channel. This statement is made precise in the following theorem:

Theorem 1. As the number of channels grow, capacity of the ‘bit’ channels converge to
either 0 or 1, and the fraction is equal to
 (n)
i : C(Pi ) > 1 − δ n→∞
−→ C(P ) ∀δ > 0
n
 (n)
i : C(Pi ) < δ n→∞
−→ 1 − C(P ) ∀δ > 0, (13.1)
n
where n is the total number of ‘original’ P channels.

13-3
EE/Stats 376A Lecture 13 — February 23 Winter 2017

Proof. Out of the scope of this class. Interested reader can refer the original paper [Arikan].

(n)
Corollary 1. The above equations (13.1) directly implies that capacities of all the Pi
channels gravitate to either 0 or 1 i.e,
 (n)
i : δ < C(Pi ) < 1 − δ n→∞
−→ 0.
n
This phenomenon of polarization of capacities is illustrated in Figures 13.6 and 13.7.

Figure 13.6: Plotting capacity of the bit channels as we apply the polar code transformation
in successive stages. Note that there are n = 2k bit channels at the k th stage. In this figure
the channels are denoted by W instead of P .

Figure 13.7: Plotting capacity of the bit channels for a BEC(1/2) with n = 64 (left) and
n = 1024 (right). Observe that capacities concentrate near 0 or 1 as n increases.

13-4
EE/Stats 376A Lecture 13 — February 23 Winter 2017

13.3.2 Why study polar codes?


In the last few lectures, we showed that random codes achieve capacity but their encoding
and decoding is inefficient because they do not have a ‘nice structure’. Keeping this in
mind, we added a structure into the coding scheme by restricting ourselves to random linear
codes, and this change made the encoding efficient while achieving capacity at the same time.
However, this structure is destroyed by the channel noise, and makes decoding inefficient for
general linear codes.
Polar codes overcome this problem by splitting the channels into noiseless and useless
channels. The noiseless channels preserve the encoding structure and therefore, for the
information passed only over these noiseless channels, decoding can be done efficiently.
Therefore, the next logical step is to obtain an encoding scheme to transmit maximum
information over the noiseless channels. For example, polar codes split 512 BEC(1/2)
channels into 256 channels with C ≈ 1 and 256 channels with C ≈ 0, and our goal here is
to transmit information only using channels with C ≈ 1.

13.4 Encoding
13.4.1 Linear representation of Polar codes
2nd stage - 4 channels splitting
Let us first consider the case of 4 channels. From Figure 13.8 it is easy to see that codeword
bits for message bits U1 , U2 , U3 and U4 are

1. X1 = U1 ⊕ U2 ⊕ U3 ⊕ U4

2. X2 = U2 ⊕ U4

3. X3 = U3 ⊕ U4

4. X4 = U4 .

We can express this in a linear form A4 X = U ,


    
1 1 1 1 U1 X1
0 1 0 1 U2  X2 
0 0 1 1 U3  = X3  .
    

0 0 0 1 U4 X4

 We see that the top left, top right, and bottom right blocks of A4 are all the equal to
1 1
, and this is a direct consequence of successive splitting [13.3.1] of the channels.
0 1

13-5
EE/Stats 376A Lecture 13 — February 23 Winter 2017

P
P
P
P

Figure 13.8

3rd stage - 8 channels splitting


Similarly, we can extend the above calculations for eight channels to obtain a linear trans-
formation A8 U = X:
    
1 1 1 1 1 1 1 1 U1 X1
0 1 1 0 0 1 1 0 U2  X2 
    
0 0 1 1 0 0 1 1 U3  X3 
   

0 0 0 1 0 0 0 1 U5  = X5 
   

0 0 0 0 1 1 1 1 U4  X4 
   

0 0 0 0 0 1 1 0 U6  X6 
   

0 0 0 0 0 0 1 1 U7  X7 
0 0 0 0 0 0 0 1 U8 X8

which can be verified from Figure 13.9. Here too we have a recursive structure:
 
A4 A4
A8 = . (13.2)
A4 A4
Up until this point, we have only established equivalence between the splitting of the
channels and the linear transformations from message bits U ’s to codeword bits X’s. How
do we obtain a working coding scheme from this? The answer lies in capacities of the bits
channels corresponding to U1 , U2 · · ·

13.4.2 Encoding
For n = 8 and P = BEC(1/2), the effective capacities are listed in Figure 13.9. Here, the
encoding scheme sends over the data in the top nC = 4 bit channels (U8 , U7 , U6 , U5 ), and
‘sends’ no information (or 0’s) in the rest of the bits channels. Note that the order of the
bit channel capacities is not monotonic, and in this case we would use the bits U8 , U7 , U6 ,
and U4 to send over the data.

13-6
EE/Stats 376A Lecture 13 — February 23 Winter 2017

P
P
P
P
P
P
P
P

Figure 13.9: Diagram showing the polar code transformation for P = BEC(1/2), n = 8. The
bit channel capacities are shown in the ‘I(Wi )’ column, and their rank, in order of descending
magnitude, is shown in the ‘Rank’ column. The bits corresponding to high-capacity channels
are the ones which should be used to send information (marked ‘data’), and no information
should be sent using bits corresponding to low-capacity channels (marked ‘frozen’).

In the above example, the sum of the capacities of the top four bit channels is only 80% of
the total capacity and hence, block length of n = 8 is not large enough for the asymptotics
of equations (13.1) to kick in. Therefore, in practice we use larger block lengths of size
n = 1024.

Encoding Complexity
From Theorem ?? we know that encoding for polar codes is a linear transformation, and
hence it can be achieved in O(n2 ) steps. However, in the previous subsection we showed that
the generator matrix of polar codes has an interesting ‘recursive’ structure, and using this,
the running time can be improved up to O(n log n).

13.5 Decoding
For the two channel case, decoding P + channel requires the knowledge of Y1 , Y2 , and V .
Thus we must decode the P − channel (which only depends on Y1 and Y2 ) before decoding
P + to determine V . Notice that the lesser reliable channel P − should be decoded before
decoding P + . Similarly in the four channel case (see Fig. 13.8) we have to first decode
U1 (least reliable channel). Then decode U2 , and finally decode U3 , then U4 . Alternative
argument - the value of U4 corresponds to the repetition code ‘embedded’ in X1 , X2 , X3 , and
X4 , so it clearly must be decoded last.

13-7
EE/Stats 376A Lecture 13 — February 23 Winter 2017

Let us go through a detailed example for the n = 8 BEC(1/2) channel shown in Fig.
13.9. As per the previous section 13.4.2, we encode the message bits by ‘freezing’ (send 0)
U1 , U2 , U3 , U5 because these bit channels have the lowest capacity. The message is thus sent
only over the U4 , U6 , U7 , U8 (high capacity bit channels). In the following, we step through
the decoding the output Y1 , · · · , Y8 to obtain back the message.

1. Decode U1 (Frozen) → U1 = 0.

2. Decode U2 (Frozen) → U2 = 0.

3. Decode U3 (Frozen) → U3 = 0.

4. Decode U4 (Data!) We use Y1 , Y2 , . . . , Y8 , and U1 , U2 , U3 to decode this signal. Let the


decoded bit be denoted by Û4

5. Decode U5 (Frozen) → U5 = 0.

6. Decode U6 (Data!) We use Y1 , Y2 , . . . , Y8 , and U1 , U2 , U3 , U4 , U5 to decode this signal.


We know U1 , U2 , U3 , and U5 since they are frozen, and we will use (Û4 ) to estimate Û5 .

7. Continue in the same fashion for U7 and U8 .

So in general, we need know ‘previous’ bits Û1 , . . . , Ûi−1 to decode Ui . From the above bit-
by-bit decoding scheme, we conclude that decoding a message of length n requires O(nm)
steps, where m is the time required to decode each bit.
To find the complexity of decoding each bit, we look at the k th splitting stage shown in
Figure 13.10:

Figure 13.10: : k th splitting stage.

Decoding comprises of two steps (i) Decoding Ua from Ya , Yb , and then (ii) decoding Ub
from Ya , Yb , Ua . This procedure has to be repeated for k − 1 lower splitting stages. Thus, the
complexity of decoding in k th stage = 2×complexity of decoding in (k − 1)th stage, which
implies m is equal to 2k = n. Thus, the total running time of this recursive algorithm is
O(n2 ). Similar to encoding, the ‘recursive’ structure of the generator matrix can be exploited
to reuse components and therefore reduce the running time from O(n2 ) to O(n log n).
With encoding and decoding running time as efficient as O(n log n), polar codes can be
applied in practice. In fact, polar codes have been incorporated into the latest 5G wireless
standard.

13-8
Bibliography

[Arikan] Arikan, Erdal. “Channel polarization: A method for constructing capacity-


achieving codes for symmetric binary-input memoryless channels.” IEEE Trans-
actions on Information Theory 55.7 (2009): 3051-3073.

You might also like