0% found this document useful (0 votes)
9 views

Scaling Deep Learning-Based Decoding of Polar

Uploaded by

vanconghao041002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Scaling Deep Learning-Based Decoding of Polar

Uploaded by

vanconghao041002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Scaling Deep Learning-based Decoding of Polar

Codes via Partitioning


Sebastian Cammerer∗ , Tobias Gruber∗ , Jakob Hoydis† , and Stephan ten Brink∗
∗ Institute of Telecommunications, Pfaffenwaldring 47, University of Stuttgart, 70659 Stuttgart, Germany
{cammerer,gruber,tenbrink}@inue.uni-stuttgart.de
† Nokia Bell Labs, Route de Villejust, 91620 Nozay, France

[email protected]

Abstract—The training complexity of deep learning-based there is a demand for alternative decoding strategies. Besides
channel decoders scales exponentially with the codebook size algorithmic optimizations [7] of the existing algorithms, mod-
arXiv:1702.06901v1 [cs.IT] 22 Feb 2017

and therefore with the number of information bits. Thus, neural ifying the code structure [8] can be considered to overcome
network decoding (NND) is currently only feasible for very
short block lengths. In this work, we show that the conventional this issue. However, once standardized, the encoder cannot be
iterative decoding algorithm for polar codes can be enhanced changed. In this work, we propose an alternative approach by
when sub-blocks of the decoder are replaced by neural network applying machine learning techniques to find an alternative
(NN) based components. Thus, we partition the encoding graph decoding algorithm instead of changing the code structure.
into smaller sub-blocks and train them individually, closely Once trained, the final decoding algorithm (i.e., the weights
approaching maximum a posteriori (MAP) performance per
sub-block. These blocks are then connected via the remaining of a deep neural network) itself is static and can be efficiently
conventional belief propagation decoding stage(s). The resulting implemented and parallelized on a graphical processing unit
decoding algorithm is non-iterative and inherently enables a high- (GPU), field programmable gate array (FPGA), or application-
level of parallelization, while showing a competitive bit error specific integrated circuit (ASIC).
rate (BER) performance. We examine the degradation through A first investigation of the topic learning to decode was
partitioning and compare the resulting decoder to state-of-the-
art polar decoders such as successive cancellation list and belief already done in [1]. The authors showed that the main dif-
propagation decoding. ficulty lies within the curse of dimensionality meaning that
for k information bits 2k classes exist, leading to exponential
I. I NTRODUCTION complexity during the training phase. In other applications,
Non-iterative and consequently low-latency decoding to- such as computer vision, the number of possible output classes
gether with close to maximum a posteriori (MAP) decoding is typically limited, e.g., to the number of different objects. In
performance are two advantages of deep learning-based chan- contrast to many other machine learning fields, an unlimited
nel decoding. However, this concept is mainly restricted by amount of labeled training data is available, since the encoding
its limited scalability in terms of the supported block lengths, function and the channel model are well known. Additionally,
known as curse of dimensionality [1]. For k information bits, a clear benchmark with existing decoders is possible. Although
the neural network (NN) needs to distinguish between 2k very powerful machine learning libraries such as Theano [9]
different codewords, which results in an exponential training and Tensorflow [10] are available nowadays and the computa-
complexity in case that the full codebook needs to be learned. tion power increased by order of magnitudes, the exponential
In this work we focus on NN decoding of polar codes and complexity still hinders straight-forward learning of practical
show that the scalability can be significantly improved towards code lengths as shown in [11]. It was observed in [11], that
practical lengths when the NN only replaces sub-components there is a certain generalization of NN decoding, meaning that
of the current decoding algorithm. the NN can infer from certain codewords to others it has never
List decoding [2] with manageable list sizes and excellent seen before. This is essential for learning longer codes and
bit error rate (BER) performance for short block lengths makes gives hope that neural network decoding (NND) can be scaled
polar codes [3] a potential candidate for future communication to longer codes. However, to the best of our knowledge, the
standards such as the upcoming 5G standard or internet naive approach of learning to decode only works for rather
of things (IoT) applications. As polar codes are currently small block lengths.
proposed for the 5G control channel [4], decoding algorithms The authors in [12] proposed the idea of using machine
for very short block lengths are of practical importance [5]. learning techniques to train the weights of a belief propagation
Besides that, the rate can be completely flexible adjusted factor graph in order to improve its decoding performance for
with a single-bit granularity. However, the price to pay is an high density parity check (HDPC) codes. As the Tanner graph
inherently serial decoding algorithm which is in general hard is already given initially and only its weights are refined, their
to accelerate, e.g., through parallel processing [6]. This leads approach scales very well for larger block lengths and does
to high decoding latency when compared to state-of-the-art not suffer from the curse of dimensionality. However, in this
low density parity check (LDPC) codes/decoders [7]. Thus, case, the use of machine learning refines an existing solution.
The decoding algorithm itself is not learned, since the iterative u0 x0
nature of the BP algorithm is kept.
u1 x1
In our work, we tackle the problem of completely replacing
the polar code decoder by a machine learning approach. As u2 x2
it turns out, only small codeword lengths can be trained
u3 x3
efficiently, and thus we divide the polar encoding graph
N x4 xN
into several sub-graphs (cf. [13]). We learn sub-block wise u u4
decoding and couple the components by conventional belief u5 x5
propagation (BP) stages. This scales the results from [11]
towards practical block lengths. u6 x6

u7 x7
II. P OLAR C ODES
An encoder for polar codes maps the k information bits Fig. 1: Polar Encoding circuit for N = 8; blue boxes indicate
onto the k most reliable bit positions of the vector u of length the independent partitions of the code for M = 2 and green
N , denoted as information set A, while the remaining N − k boxes for M = 4.
positions are treated as frozen positions. These frozen positions
are denoted as Ā and must be known at the decoder side.
Now the input block u is encoded according to x = u · GN , 2) Right-to-left propagation: The L-messages are updated
where G = F⊗n is the generator matrix and F⊗n denotes the starting from the rightmost stage (i.e., the stage of
nth Kronecker power of the kernel F = [ 11 01 ]. The resulting channel information) until reaching the leftmost stage.
encoding circuit is depicted in Fig. 1 for N = 8, which also The output from two nodes becomes the input to a specific
defines the decoding graph. This factor graph consists of neighboring processing element (PE) (for more details we refer
n + 1 = log2 (N ) + 1 stages, each consisting of N nodes. to [15]). One PE updates the L- and R-messages as follows
The BER performance of a polar code highly depends on [15]:
the type of decoder used and has been one of the most exciting Lout,1 = f (Lin,1 , Lin,2 + Rin,2 )
and active areas of research related to polar coding. There are Rout,1 = f (Rin,1 , Lin,2 + Rin,2 )
(1)
two main algorithmic avenues to tackle the polar decoding Lout,2 = f (Rin,1 , Lin,1 ) + Lin,2
problem: Rout,2 = f (Rin,1 , Lin,1 ) + Rin,2
1) successive cancellation-based decoding, following a se- where
1 + ea+b
 
rial “channel layer unwrapping” decoding strategy [3], f (a, b) = ln .
2) belief propagation-based decoding based on Gallager’s ea + eb
BP iterative algorithm [14]. For initialization, all messages are set to zero, except for the
Throughout this work, we stick with the BP decoder as its first and last stage, where
structure is a better match to neural networks and enables par-
Li,n+1 = Li,ch , and
allel processing. For details about successive cancellation (SC) (
decoding and its list extension called successive cancellation Lmax ∀i ∈ Ā (2)
Ri,1 =
list (SCL) decoding, we refer the interested reader to [2] and 0 else
[3]. The BP decoder describes an iterative message passing
algorithm with soft-values, i.e., log likelihood ratio (LLR) with Lmax denoting the clipping value of the decoder (in
values over the encoding graph. For the sake of simplicity, theory: Lmax → ∞), as all values within the simulation
we assume binary phase shift keying (BPSK) modulation and are clipped to be within (−Lmax , Lmax ). This prevents from
an additive white Gaussian noise (AWGN) channel. However, experiencing numerical instabilities.
other channels can be implemented straightforwardly. For a
A. Partitionable Codes
received value y, it holds that
  As opposed to other random-like channel codes with close-
P (x = 0|y) 2 to-capacity performance, polar codes exhibit a very regu-
LLR(y) = ln = 2y
P (x = 1|y) σ lar (algebraic) structure. It is instructive to realize that the
encoding graph, as visualized in Fig. 1 for N = 8, can
where σ 2 is the noise variance. There are two types of LLR be partitioned into independent sub-graphs [13], [16], i.e.,
messages: the right-to-left messages (L-messages) and the left- there is no interconnection in the first log2 (Np ) stages, where
to-right messages (R-messages). One BP iteration consists of Np denotes the number of bits per sub-block. We define a
two update propagations: partitionable code in a sense that each sub-block can be
1) Left-to-right propagation: The R-messages are updated decoded independently (i.e., no interconnections within the
starting from the leftmost stage (i.e., the stage of a priori same stage exist; leading to a tree like factor graph). This
information) until reaching the rightmost stage. algorithm can be adopted to all partitionable codes and is
not necessarily limited to polar codes. Each sub-graph (in BER performance gap between a NND and MAP decoding is
the following called sub-block) is now coupled with the other shown in Fig. 2. It illustrates that learning to decode is limited
sub-blocks only via the remaining polar stages as depicted in through exponential complexity as the number of information
Fig. 1. In order to simplify polar decoding, several sub-blocks bits in the codewords increases.
Bi can now be decoded on a per-sub-block basis [13]. The
set of frozen bit positions Ā need to be split into sub-sets Āi
corresponding to the sub-blocks Bi (with information vector 0.8

gap to MAP perf. [dB]


N = 32
ui ) and, thus, each sub-block might show a different code rate. N = 64
In a more abstract view, we follow the spirit of [17]. Their 0.6
simplified successive cancellation (SSC) algorithm partitions
the decoding tree into single-parity checks (SPC) and repeti- 0.4
tion codes (RC). This turns out to be a very efficient way
of improving the overall throughput, as the SPC and RC 0.2
sub-decoders can be efficiently implemented. Our approach
replaces these partitions by NNs.
0
B. Deep-Learning for Channel Coding 8 9 10 11 12 13 14
For the fundamentals of deep learning, we refer the reader # information bits k
to [18]. However, for the sake of terminology, we provide a
brief overview on the topic of machine learning. For a more Fig. 2: Scalability shown by the gap to MAP performance
detailed explanation how to train the sub-block NND we refer in dB at BER = 0.01 for 32/64 bit-length polar codes with
to [11]. Feedforward NNs consist of many connected neurons different code rates for fixed learning epoches.
which are arranged in L layers without feedback connections.
The output y of each neuron depends on its weighted inputs
θi xi and its activation function g, given by III. PARTITIONED P OLAR N EURAL N ETWORK D ECODING
X
! Instead of using sequential SCL sub-block decoders as in
y=g θ i xi + θ 0 . (3) [13], we replace them by NN decoders. Every sub-block
i NNi covers ki information bits. As the number of possible
The whole network composes together many different func- encoder output states (i.e., different codewords) is 2ki , efficient
tions f (l) of each layer l and describes an input-output mapping training is only possible for small sub-blocks, containing a
    small amount of information bits ki [11]. The advantage of
w = f (v; Θ) = f (L−1) f (L−2) . . . f (0) (v) (4) this concept is that each sub-block NN decoder can be trained
where v, w and Θ denote the input vector, output vector and independently. However, each NN has its own corresponding
the weights of the NN, respectively. It was shown in [19] that frozen bit positions Ai and a specific block lengths Ni . As we
such a multi-layer NN with L = 2 and nonlinear activation deal with man made signals, the training and validation dataset
functions can theoretically approximate any continuous func- can be created by random input data, a conventional polar
tion on a bounded region arbitrarily closely—if the number encoder and a channel. Thus, an infinite amount of labeled
of neurons is large enough. A training set of known input- training data is available.
output mappings is required in order to find the weights Θ Now, the NN decoder can be efficiently trained offline to
of the NN with gradient descent optimization methods and MAP performance [11], since the effective block-size per sub-
the backpropagation algorithm [20]. After training, the NN is block Ni reduces by the number of sub-blocks M . Each NN
able to find the right output even for unknown inputs which decoder outputs the decoded codeword (and/or the extracted
is called generalization. information bits) either as soft-values, i.e., probabilities, or
As described in [11], we use a feed-forward deep NN that after a hard-decision which might require re-encoding. These
can learn to map a noisy version of the codeword to its bits are now treated as known values.
corresponding information bits. Each hidden layer employs Our proposed decoder consists of two stages (see Fig. 1 and
rectified linear unit (ReLU) activation functions and the final Fig. 3):
stage is realized with a sigmoid activation function [18] in 1) M deep learning blocks, trained to decode the corre-
order to obtain output values in between zero and one giving sponding sub-codeword with length Ni and frozen bit
the probability that the output bit is “1”. In order to keep the position vector Ai
training set small, we extend the decoder NN with additional 2) A conventional BP part, when the already decoded sub-
layers which model an abstract channel [11], i.e., a training set blocks are propagated via the coupling stages of the
containing every codeword is sufficient to train with as much remaining polar encoding graph.
training samples as desired. After initializing the rightmost stages with the received
It was shown in [11], that it is possible to decode polar channel LLR-values, messages are propagated from stage to
codes with MAP performance for small block lengths. The stage according to the BP update rules in (1) until the first NN
received received received
codeword 1 codeword 2 codeword 3
Polar-like
structure
BP BP BP

NN1 NN1 NN1


û1 NND1 BP BP BP ...
.. ..
(N1 , k1 , A1 ) . .

channel input (LLR)


NN2 NN2
decoded codeword

BP BP ...

x̂1 NN3
..
.. .
.
û2 NND2 t
.. ..
(N2 , k2 , A2 ) . . Fig. 4: Pipelined implementation of the proposed PNN de-
R coder.
x̂2 BP
.. .. ..
. . L .
leading to an unequal sub-block size. This helps to obtain as
Fig. 3: Partitioned neural network polar decoding structure. few as possible sub-blocks. The results for M = 8 partitions
are shown in Fig. 5.
Additionally, finetuning-learning can be applied to the
decoder stage nNN (see Fig. 1). Then, the first NN decoder
overall network in order to adjust the independently trained
estimates the received sub-block NN1 . After having decoded
components to the overall system. This means that the whole
the first sub-block, the results are propagated via conventional
decoding setup is used to re-train the system such that the
BP decoding algorithms through the remaining coupling stages
conventional stages are taken into consideration for decoding.
(see Fig. 1 and 3). Thus, this algorithm is block-wise sequen-
This prevents from performance degradation due to potentially
tial where M sub-blocks NNi are sequentially decoded from
non-Gaussian NN-input distributions, which was assumed
top to bottom. The detailed decoding process works as follows:
during the training. Such an effect can be observed whenever
1) Initialize stage n + 1 with received channel LLR-values clipping of the LLRs is involved. However, the basic structure
2) Update stages nNN to n + 1 according to BP algorithm is already fixed and thus only a small amount of the 2k possible
rules (propagate LLRs) codewords is sufficient for good training results. The required
3) Decode next sub-block (top to bottom) training set is created with the free-running decoder.
4) Re-encode results and treat results as perfectly known, The coupling could be also done by an SC stage (as orig-
i.e., as frozen bits inally proposed in [13]) without having additional iterations.
5) If not all sub-blocks are decoded go to step 2 However, the BP structure suits better to the NN structure as
The interface between the NND and the BP stages depends both algorithms can be efficiently described by a graph and
on the trained input format of the NN. Typically, the NN their corresponding edge weights. Thus, the BP algorithm is
prefers normalized input values between 0 and 1. Fortunately, preferred.
it was observed in [11] that the NN can handle both input For cyclic redundancy check (CRC) aided decoding (a CRC
formats and effective training is possible. check over the whole codeword) the CRC check can be split
In summary, the system itself can be modeled as one large into smaller parts as in [13], where each CRC only protects
NN as well. Each BP update iterations defines additional one sub-block, i.e., the CRC can be considered by the NN
layers, which are deterministic and thus do not effect the decoders and thus straightforwardly learned. However, this
training complexity, similar to regularization layers [18]. This requires at least some larger NN-sub-blocks, otherwise the
finally leads to a pipelined structure as depicted in Fig. 4. As rate-loss due to the CRC checks becomes prohibitive and is
each NN is only passed once and to emphasize the difference thus not considered at the moment.
compared to iterative decoding, we term this kind of decoding
as one-shot-decoding. IV. C OMPARISON WITH SCL/BP
In general, a fair comparison with existing solutions is
hard, as many possible optimizations need to be considered.
A. Further Optimizations We see this idea as an alternative approach, for instance in
As it can be seen in Fig. 2, the limiting parameter is the cases whenever low-latency is required. The BER results for
number of information bits per sub-block ki , since it defines N = 128 and different decoding algorithms are shown in
the number of possible estimates of the NND. One further Fig. 5. As shown in Tab I, the size of the partitions is chosen
improvement in terms of sub-block size can be done
P by merg- such each partition does not contain more than kmax = 12
ing multiple equally-sized sub-blocks such that i ki < kmax , information bits, which facilitates learning the sub-blocks. If
10−1 TABLE I: Number of information bits ki for each sub-block 5
sub-block i 1 2 3 4 5 6 7 8 N = 64
sub-block size Ni 32 16 16 16 16 8 8 16 N = 128
information bits ki 1 3 11 5 13 7 8 16 4 N = 256
−2
10

NEPSCL
N = 512
3 N = 1024

10−1
2
10−3
10−2 1
1 2 4 8 16 32 64
number of equally sized partitions
−4 10−3
BER

10 Fig. 6: Effect of the partitioning on the BER performance.


10−4 SC
SC
PSCL8 {32, 16, 16, 16, 16, 8, 8, 16}
PSCL8 {32, 16, 16, 16, 16, 8, 8, 16}
64bit λpart λNN

−5
10 10−5
PNN8
PNN8{32, 16,
{32, 16, 16, 16,
16, 16,16, 16, 8, 8, 16}
8, 8, 16}
128bit λpart

λpart
λNN

BP 256bit λNN

10−6 BP SCL (L = 32) 512bit λpart λNN


NE
10 −6 0SCL1 (L = 2 32)3 4 5 6 1 2 3 4 5
Eb /N0 [dB] Fig. 7: Normalized error NEPSCL due to λpart (blue) and NEPNN
due to λNN (green) for fixed size of partitions Ni = 16.
0 1 2 3 4 5 6
Fig. 5: BER performance of the proposed partitioned NN
E /N [dB]
decoder for N = 128 and M = 8 bpartitions
0 with variable size for properly chosen parameters [2], the PSCL gives a lower
in comparison to state-of-the-art SC, BP and SCL (L = 32) bound on the expected BER according to the actual number
decoding. of partitions. This enables to observe λpart and λNN separately.
In order to quantify the loss, we introduce a similar concept
as in [11]: the normalized error (NE) for a decoding concept,
the sub-block does not contain any frozen bits, a simple hard-
e.g., PNN, PSCL, which is given by
decision decoder is used. Basically, the concept of partitioning
polar codes helps to scale NND to longer codes. D
1 X BER (ρt )
We denote a partitioned neural network (PNN) with M NE = (5)
D t=1 BERSCL (ρt )
partitions as PNNM . Although each NN in the PNN can be
learned to approximately MAP performance, there is a loss λ
where ρt is a certain signal to noise ratio (SNR) value,
between PNN decoding and conventional SCL decoding with
BER (ρt ) denotes the decoders BER for this specific SNR and
list size L. We limit ourselves in this work to L = 32. The
BERSCL (ρt ) is the corresponding BER of the SCL decoder,
loss λ can be explained by two main reasons:
respectively. Thus, NE compares the decoding performance
1) Loss through partitioning λpart as the concept only over a range of D different SNR values ρ1 , . . . , ρT .
applies sub-block MAP decoding. It can be observed in Fig. 6, that NEPSCL increases with the
2) Loss due to sub-optimality of the NND λNN , due to number of partitions, i.e., decreases with the partition sizes.
insufficient training or non-Gaussian input distributions.1 Fig. 7 relates the effect of λpart and λNN . It can be observed
For further analysis of this problem, we replace the NNDs that the main error originates from partitioning and only a
in each partition by SCL decoders with list size L = 32 as small part from suboptimal NNs. The amount of sub-blocks
in [13] and obtain a partitioned successive cancellation list with larger ki increases with larger codelength and at the
(PSCL) decoder. As before, PSCLM terms a PSCL decoder same time it is more difficult to achieve MAP performance
with M partitions. This enables the investigation of larger for these blocks [11]. Therefore, the loss λNN becomes more
partition sizes and thus the effect of partitioning, which is important for longer codes. The training complexity limits the
currently not possible with NNs due to the limited training feasible sub-block size of PNN decoding and thus long codes
length. Since SCL decoding approximates MAP performance require a lot of partitions. The larger the number of partitions,
the larger λpart . However, the fast progress in the machine
1 We did not focus on how to find the best NN structure for each NN because
learning domain might enable larger sub-blocks, which should
we want to introduce the concept of partitioned polar codes in order to scale
NND. We expect that for sufficient training and hyperparameter tuning [21] improve λpart and therefore the overall performance of the
this loss vanishes. PNN concept.
3,000 We have shown that the performance degradation is mainly a
BP result of the small partitions, as the sub-block size is currently
synchronization steps S

SC(L) strictly limited by the training. Nonetheless, the proposed setup


MAP would scale very well for larger sub-blocks, therefore future
2,000 work needs to be done on potential improvements of the NN
PNN
structure such that larger components become available.
R EFERENCES
1,000 [1] X.-A. Wang and S. B. Wicker, “An artificial neural net Viterbi decoder,”
IEEE Trans. Commun., vol. 44, no. 2, pp. 165–171, Feb. 1996.
[2] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Trans. Inform.
Theory, vol. 61, no. 5, pp. 2213–2226, May 2015.
[3] E. Arkan, “Channel polarization: A method for constructing capacity-
0 achieving codes for symmetric binary-input memoryless channels,” IEEE
100 200 300 400 500
Trans. Inform. Theory, vol. 55, no. 7, pp. 3051–3073, July 2009.
N [4] “RAN1 meeting #87,” http://www.3gpp.org/DynaReport/
TDocExMtg--R1-87--31665.htm, accessed: 2017-01-09.
[5] G. Liva, L. Gaudio, T. Ninacs, and T. Jerkovits, “Code design for
Fig. 8: Comparison of required synchronization steps for short blocks: A survey,” CoRR, vol. abs/1610.00873, 2016. [Online].
different polar decoding approaches. Available: http://arxiv.org/abs/1610.00873
[6] S. Cammerer, B. Leible, M. Stahl, J. Hoydis, and S. ten Brink,
“Combining belief propagation and successive cancellation list decoding
of polar codes on a GPU platform,” in accepted for 42th Int. Conf. on
In order to approximate the decoding latency in Fig. 8, we Acoustics, Speech, and Signal Processing (ICASSP), 2017.
count the number of synchronization steps as operations can [7] P. Giard, G. Sarkis, A. Balatsoukas-Stimming, Y. Fan, C. y. Tsui,
A. Burg, C. Thibeault, and W. J. Gross, “Hardware decoders for polar
be done in parallel, but need to be synchronized after each codes: An overview,” in 2016 IEEE International Symposium on Circuits
synchronization step. The latency of the SCL algorithm can and Systems (ISCAS), May 2016, pp. 149–152.
be described by O (N log N ) because it estimates each bit [8] J. Guo, M. Qin, A. G. i Fabregas, and P. H. Siegel, “Enhanced belief
propagation decoding of polar codes through concatenation,” in 2014
sequentially. Thus, the number of required synchronization IEEE Internat. Symp. Inf. Theory, June 2014, pp. 2987–2991.
steps is [9] R. Al-Rfou, G. Alain, A. Almahairi, and et al., “Theano: A python
SSCL = N log N. framework for fast computation of mathematical expressions,” CoRR,
vol. abs/1605.02688, 2016. [Online]. Available: http://arxiv.org/abs/
The BP algorithm scales much better with O (log N ) because 1605.02688
[10] M. A. et al., “Tensorflow: Large-scale machine learning on
synchronization is required after each BP stage but depends heterogeneous distributed systems,” CoRR, vol. abs/1603.04467,
on the number of iterations I, namely, 2016. [Online]. Available: http://arxiv.org/abs/1603.04467
[11] T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink, “On deep learning-
SBP = 2I log N. based channel decoding,” in Proc. of CISS, March 2017.
[12] E. Nachmani, Y. Be’ery, and D. Burshtein, “Learning to decode linear
The PNN decoder enforces less BP updates and the NNDs codes using deep learning,” Proc. of the Allerton Conf. on Commun.,
itself only synchronize after each layer Control, and Computing, 2016.
[13] S. A. Hashemi, A. Balatsoukas-Stimming, P. Giard, C. Thibeault, and
N N N W. J. Gross, “Partitioned successive-cancellation list decoding of polar
NH + SPNN =
2 log codes,” in 2016 IEEE International Conference on Acoustics, Speech
NP NP NP and Signal Processing (ICASSP), March 2016, pp. 957–960.
where NP and NH respectively denote the size of each [14] R. Gallager, “Low-density parity-check codes,” IEEE Trans. Inform.
Theory, vol. 8, no. 1, pp. 21–28, January 1962.
partition and the number of hidden layers in the NN. Fig. 8 [15] J. Xu, T. Che, and G. Choi, “XJ-BP: Express journey belief propagation
is given for NP = 16 and NH = 3. To sum up, our decoding for polar codes,” in IEEE Global Communications Conference
approach enables latency reduction of BP decoding, while (GLOBECOM), Dec 2014, pp. 1–6.
[16] B. Li, H. Shen, and D. Tse, “Parallel decoders of polar codes,” CoRR,
being competitive with the SC and BP BER performance. vol. abs/1309.1026, 2013. [Online]. Available: http://arxiv.org/abs/1309.
1026
V. C ONCLUSION [17] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Fast
In this work, we have shown that one way to reach list decoders for polar codes,” IEEE Journal on Selected Areas in
Communications, vol. 34, no. 2, pp. 318–328, Feb 2016.
scalability of deep-learning based channel decoding can be [18] I. Goodfellow, Y. Bengio, and A. Courville, “Deep Learning,”
described by replacing sub-components of an existing decoder 2016, book in preparation for MIT Press. [Online]. Available:
by NN-based components. This enables scalability in terms http://www.deeplearningbook.org
[19] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward
of block length and number of information bits towards networks are universal approximators,” Neural Networks, vol. 2, no. 5,
practical lengths. Meanwhile, the length is still limited to short pp. 359–366, 1989.
codes as the degradation through partitioning limits the overall [20] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Parallel distributed
processing: Explorations in the microstructure of cognition, vol. 1.”
performance of this concept. The BER performance of our Cambridge, MA, USA: MIT Press, 1986, ch. Learning Internal Rep-
decoder turns out to be similar to the SC and BP perfor- resentations by Error Propagation, pp. 318–362.
mance. However, the latency reduces a lot when the inherent [21] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms
for hyper-parameter optimization,” in Advances in Neural Information
parallel structure of this algorithm is exploited, since one- Processing Systems 24. Curran Associates, Inc., 2011, pp. 2546–2554.
shot-decoding (i.e., non-iterative decoding) becomes possible.

You might also like