0% found this document useful (0 votes)
10 views

A Simple Circular-Shift Network

There is an increasing need for configurable quasicyclic low-density parity-check (QC-LDPC) decoders that can support a family of structurally compatible codes instead of a single code. The key component in a configurable QC-LDPC decoder is a programmable circular-shift network that supports cyclic shifts of any size up to a predefined maximum submatrix size. This paper presents a QC-LDPC shift network (QSN), which has two key advantages over state-of-the-art solutions in recent literature.

Uploaded by

Muhammed sadek
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

A Simple Circular-Shift Network

There is an increasing need for configurable quasicyclic low-density parity-check (QC-LDPC) decoders that can support a family of structurally compatible codes instead of a single code. The key component in a configurable QC-LDPC decoder is a programmable circular-shift network that supports cyclic shifts of any size up to a predefined maximum submatrix size. This paper presents a QC-LDPC shift network (QSN), which has two key advantages over state-of-the-art solutions in recent literature.

Uploaded by

Muhammed sadek
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

782 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 57, NO.

10, OCTOBER 2010

QSN—A Simple Circular-Shift Network for


Reconfigurable Quasi-Cyclic LDPC Decoders
Xiaoheng Chen, Shu Lin, Life Fellow, IEEE, and Venkatesh Akella

Abstract—There is an increasing need for configurable quasi- the parity-check matrices for the QC-LDPC codes can be parti-
cyclic low-density parity-check (QC-LDPC) decoders that can tioned blockwise or CPM-wise and implemented with partially
support a family of structurally compatible codes instead of a parallel decoder architectures [2], [3], which achieve an effi-
single code. The key component in a configurable QC-LDPC cient tradeoff between very-large-scale-integration complexity
decoder is a programmable circular-shift network that supports
cyclic shifts of any size up to a predefined maximum submatrix and decoding throughput. For a given code, the interconnect
size. This paper presents a QC-LDPC shift network (QSN), which network between the CNUs and the VNUs is predetermined;
has two key advantages over state-of-the-art solutions in recent lit- therefore, it can be optimized for that code. However, emerg-
erature. First, the QSN reduces the number of stages in the critical ing applications such as 802.11n, 802.16e, and DVB-S2 need
path, which improves the clock frequency and makes it scalable, decoders that work for a set of codes, not just a single code.
particularly in a field-programmable gate array (FPGA)-based A decoder that can implement a set of structurally compatible
implementation where an interconnect delay is dominant. Second,
codes is called a reconfigurable (or sometimes just configurable
the QSN’s control logic is simple to generate and occupies a
significantly smaller area. The QSNs for a variety of codes suit- or flexible) decoder. A reconfigurable QC-LDPC decoder re-
able for emerging applications are implemented, targeting both quires a programmable shift network to accommodate different
a 180-nm Taiwan Semiconductor Manufacturing Company Ltd. submatrix sizes, code rates, and block lengths.
complimentary metal–oxide–semiconductor library and a Xilinx The field-programmable gate array (FPGA)-based emulation
Virtex 4 FPGA. The proposed implementation is shown to be of LDPC decoders is widely used to design and optimize
2.1 times faster than the best known implementation in literature LDPC codes. However, the design and the optimization of an
and requires almost eight times less control area. Furthermore,
this paper presents analytical models of the critical-path and data-
efficient LDPC decoder on an FPGA are formidable and time-
path complexity for arbitrary-sized submatrices and proves that consuming tasks. A single reconfigurable decoder that can op-
the QSN indeed generates all the output combinations required erate for a family of related codes [1] is highly desirable. Such
for implementing reconfigurable QC-LDPC decoders. a reconfigurable decoder once again requires a programmable
Index Terms—Benes network, error correction codes, quasi-
shift network to implement different codes that are derived
cyclic low-density parity-check (QC-LDPC) codes, very large scale from a mother code through algebraic transformations, such as
integration, WiFi, WiMAX. masking, row, and column decomposition [1]. The dominance
of the interconnect (or routing delay) makes the design of a
I. I NTRODUCTION fast and scalable circular-shift network even more challenging
on a FPGA. An architecture that minimizes the interconnect

L OW-DENSITY parity-check (LDPC) codes, discovered


by Gallager in 1962, were rediscovered and shown to
approach the Shannon capacity in the late 1990s. The quasi-
complexity and, hence, the interconnect delay is more desirable.
In this paper, we describe a QC-LDPC shift network (QSN)
that is simple and efficient for both FPGA and application-
cyclic (QC) LDPC codes [1] have received significant attention, specified integrated circuit (ASIC) implementations and more
since their structures ease the hardware implementation, and scalable and flexible than networks proposed in recent literature
have excellent error performance over noisy channels. The [4]–[14]. Specifically, for a given maximum submatrix size
parity-check matrices of the QC-LDPC codes consist of circu- PM × PM , the QSN can perform any circular shift c (0 ≤ c <
lant permutation matrices (CPMs) and/or zero matrices of the p) on an array of 2 ≤ p ≤ PM messages.
same size, which determine the interconnection of the check- One question that immediately comes to mind is why not
node processing units (CNUs) and the variable-node processing use a Benes network [15], which has been proven to be an
units (VNUs). For the QC-LDPC codes, the interconnection optimal nonblocking permutation network. There are three
network is highly structured and can be characterized by the reasons why a regular Benes network is not appropriate for a
submatrix size and the circular-shift value of each CPM. Thus, reconfigurable QC-LDPC decoder. First, we need only cyclic
permutations for a reconfigurable QC-LDPC decoder, not all
Manuscript received April 2, 2010; revised May 21, 2010; accepted July 23, the permutations. Second, PM is not necessarily a power of
2010. Date of publication September 30, 2010; date of current version 2, for example, in emerging applications of interest, PM could
October 15, 2010. This work was supported by the National Science Foundation be 96; therefore, using the nearest power of 2, which is 128,
under Grant CCF-0727478, by the National Aeronautics and Space Adminis- is not efficient. Third, as noted above, the shift amount p is
tration under Grant NNX07AK50G and Grant NNX09AI21G, and by the gift
variable, i.e., 2 ≤ p ≤ PM . Consequently, we need to support
grants from Intel and Northrop Grumman Space Technology. This paper was  M
recommended by Associate Editor C.-Y. Lee. only [ P p=2 (p − 1) + 1] output combinations instead of PM !
The authors are with the Department of Electrical and Computer Engi- combinations.
neering, University of California, Davis, CA 95616 USA (e-mail: xhchen@
ucdavis.edu; [email protected]; [email protected]). Over the past several years, researchers have proposed tech-
Digital Object Identifier 10.1109/TCSII.2010.2067811 niques to modify the Benes network to make it less general and,

1549-7747/$26.00 © 2010 IEEE


CHEN et al.: QSN—A SIMPLE CIRCULAR-SHIFT NETWORK FOR RECONFIGURABLE QUASI-CYCLIC LDPC DECODERS 783

hence, more efficient for the use in a reconfigurable QC-LDPC


decoder. For example, in [4] and [5], a standard Benes network
is augmented with a lookup table to hold configurations of
the subset of permutations required for a given reconfigurable
decoder. In [6] and [7], a second Benes network is employed to
compute the control signal on the fly efficiently, which reduces
the control logic complexity compared with that of a generic
Benes-network implementation. The barrel-shifting network
composed of modular cells or the multisize circular-shifting
networks that support not arbitrary but predefined expansion
factors for specific LDPC decoders such as the IEEE 802.16e or
802.11n systems are also proposed. Although the shifter-based
permutation structures are implemented with a small area, when Fig. 1. (a) CPM measuring 7 × 7 and (b) top-level architecture of a recon-
figurable QC-LDPC decoder. QSN (7,5,8) indicates that PM is 8, c is 5, and
they are designed for full flexibility, it would increase the p is 7. Note that the last output is not used here; therefore, it is marked as a
latency by requiring a large number of stages for the worst case do-not-care (X).
[8]–[10]. Liu et al. [14] designed a self-routing network for
the IEEE 802.11n and 806.16e standards with a barrel shifter compare the results of an implementation of the QSN with that
and two extra stages of multiplexors. The most recent approach of the OPN on a 180-nm ASIC realization. We also present
in this category is by Oh and Parhi [11], [13], which we call results for an FPGA realization of the QSN.
Oh–Parhi network (OPN) in the rest of this paper. The key
insight in the OPN is given as follows: Given that PM is not
always a power of two, instead of using only 2 × 2 switches, the II. BACKGROUND
OPN proposes the use of other types of primitive switches, such A QC-LDPC code is given by the null space of a parity-
as 3 × 3 or 5 × 5, to reduce the area and control complexity. check matrix H over GF(2), which is a γ × ρ array (or block)
When PM is of the form 2i 3j 5k , the OPN results in an efficient of circulants or CPMs and/or zero matrices of the same size,
implementation compared with a generic Benes network. For e.g., p × p, of the following form:
example, a 12 × 12 switch can be done with 36 2 × 2 switches ⎡ A ⎤
instead of 56 2 × 2 switches that a 16 × 16 network would 0,0 A ···
0,1A 0,ρ−1
need. However, the critical path in the OPN is still at least ⎢ A1,0 A1,1 ··· A1,ρ−1 ⎥
2log2 PM  − 1 stages [11]–[13]. H=⎢
⎣ .. .. .. .. ⎥.
⎦ (1)
. . . .
In this paper, we propose a low-complexity shift network,
Aγ−1,0 Aγ−1,1 ··· Aγ−1,ρ−1
which is called QSN, that can perform any circular shift c (0 ≤
c < p) for a given maximum submatrix size PM × PM , where Then, H is a γp × ρp matrix over GF(2). The QC-LDPC code
2 ≤ p ≤ PM , which is appropriate for realizing reconfigurable given by the null space of the H matrix has a length of ρp and
QC-LDPC decoders on ASICs and FPGAs. a rate of at least 1 − (γ/ρ).
The QSN has two main advantages compared with the re- The QC structure is advantageous in terms of the encoder
cently proposed configurable shift network for reconfigurable and decoder implementations. The encoding of a QC-LDPC
QC-LDPC decoders, such as [13]. First, the QSN requires code can be implemented with shift registers with complexity
fewer logic stages that effectively reduce the length of the linearly proportional to the number of parity-check bits or to
critical path, which in turn results in reduced interconnect the code length [16]. The QC-LDPC codes can be decoded
delay and improves the clock cycle time. This is particularly efficiently using a partially parallel architecture proposed in
advantageous for an FPGA implementation of a reconfigurable [4] and [5], which is illustrated in Fig. 1(b). The processing
decoder. Specifically, for a PM × PM shift network, the critical units P 0, P 1, . . . , P 8 are connected to a set of memory banks
path in the QSN has log2 PM  + 1 multiplexors. In contrast, that hold the intermediate messages. The QSN is responsible
the Benes-topology-based networks such as the OPN require for shuffling (circularly shifting) the messages read from the
2log2 PM  − 1 multiplexor stages. Second, the control logic memory into the correct order before they are processed by the
for the QSN is simple, and all the control signals can be processing units. The processing units process the messages
generated in a single pass as opposed to a complex recursive serially in a reconfigurable decoder. A PM × PM QSN can
algorithm used in [13]. As a result, the QSN is more scalable shuffle up to PM messages in one clock cycle.
than the OPN, i.e., the QSN can be used to build networks when Fig. 1(a) shows a 7 × 7 CPM with the extrinsic messages
PM is larger than 96. This is important because we are not only {M0 , M1 , . . . , M6 } corresponding to the entry of 1 and stored
interested in flexible decoders for 802.11n, 802.16e, DVB-S2, in the memory in the column-major order. Fig. 1(b) shows
etc. but also in realizing universal decoders for the FPGA-based a simplified diagram of a configurable decoder with PM =
emulation of large and complex QC-LDPC codes. 8. The QSN reorders the messages to the row-major order
This paper is organized as follows. Section II briefly reviews {M5 , M6 , M0 , . . . , M4 }.
the QC-LDPC codes and decoders. This will illustrate the
requirements for the design of the QSN. In Section III, we
III. QSN—A RCHITECTURE AND I MPLEMENTATION
present the architecture and the implementation of the QSN and
prove that it generates all the necessary output combinations A PM × PM QSN is required for a reconfigurable LDPC
necessary for the proposed applications. In Section IV, we decoder with a submatrix size of 2 < p ≤ PM and a CPM
784 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 57, NO. 10, OCTOBER 2010

above. Let b denote the number of bits in the unsigned binary


representation for PM , i.e., 2b−1 < PM ≤ 2b . There are three
groups of signals for the left-shift network: 1) I[0, 1, . . . , PM −
1] denotes the array of input messages; 2) L[0, 1, . . . , PM − 2]
denotes the array of output messages; and 3) l represents the
b-bit control signal for the network. There are b stages of
multiplexors in the left-shift network. l[i] is the control signal
for the ith (0 ≤ i < b) stage. When l[i] = 0, the messages are
directly routed to the next stage without modification of their
positions. When l[i] = 1, the messages are shifted from the left
by an amount of 2i . There are PM − 2i two-to-one multiplexors
Fig. 2. QSN architecture for configurable QC-LDPC-code decoders. Only
PM − 2 signals are generated for the left-shift network, since O[PM − 1] is whose data width is the same as the message width. Therefore,
always connected to R[PM − 1]. the total number of multiplexors Nleft in the left-shift network
is given by
offset of 0 ≤ c < p for the incoming messages of mem-
ory banks. For example, let {I[0], I[1], . . . , I[PM − 1]} and b−1 b−1

{O[0], O[1], . . . , O[PM − 1]} be the input and output mes- Nleft = (PM − 2 ) = bPM −
i
(2i ) = bPM − 2b + 1.
sages of the QSN, respectively. When p = PM , O[i] = I[(i + i=0 i=0
(2)
c) mod p] for all 0 ≤ i < PM − 1. When p < PM , the input
messages {I[p], I[p + 1], . . . , I[PM − 1]} and the output mes- Similarly, the total number of multiplexors Nright in the right-
sages {O[p], O[p + 1], . . . , O[PM − 1]} would be ineffective, shift network is equal to bPM − 2b + 1.
and the related ports would not be used, as showed in Fig. 1(b). The merge network chooses the proper output messages
The other ports maintain the circular-shift property, i.e., O[i] = from L[0, 1, . . . , PM − 1] and R[0, 1, . . . , PM − 1], based on
I[(i + c) mod p] for 0 ≤ i < p − 1. Overall, the number of a (PM − 1)-bit control signal m. m[i] corresponds to the
 M
output combinations for the QSN network would be P p=2 (p − multiplexor whose inputs are L[i] and R[i]. When m[i] =
1) + 1 since input size has p − 1 shifted combinations, and 0, O[i] ← R[i]; otherwise, O[i] ← L[i]. R[PM − 1] is routed
when c = 0, the combinations for all p are the same. directly to O[PM − 1]. The total number of multiplexors in the
In the succeeding subsections, we propose a low-complexity merge network is denoted by Nmerge and is equal to PM − 1.
switch network design for the configurable LDPC decoders, Therefore, the total number of 2:1 multiplexors (width equal to
which can be implemented with a small area. Also, an efficient the input-message width) is given by
algorithm and its hardware implementation to generate all the
control signals of the proposed switch network are discussed. Ntotal = Nleft + Nright + Nmerge
= 2(bPM − 2b + 1) + (PM − 1)
= (2b + 1)PM − 2b+1 + 1. (3)
A. Overall Architecture
The output of the QSN can be divided into two parts: 1) the Fig. 3 shows an example of an 11 × 11 QSN and its output
left part, i.e., for 0 ≤ i < p − c, O[i] = I[i + c]; 2) the right when c = 7 and p = 11. The p − c = 4 effective outputs of
part, i.e., for p − c ≤ i < p, O[i] = I[i − (p − c)]. Based on the left-shift network {I7 , I8 , I9 , I10 } and the c = 7 effective
the observation, the output of the QSN can be viewed as the outputs of the right-shift network {I0 , I1 , I2 , I3 , I4 , I5 , I6 } are
combination of two shifted arrays of the inputs. Thus, the reorganized at the merge network and generate the final out-
generation of the circular-shifted array has three steps, which put, i.e., {I7 , I8 , I9 , I10 , I0 , I1 , I2 , I3 , I4 , I5 , I6 }. The number
are listed below: of multiplexors used is 68, which agrees with the calculation
Step 1) Left shift: Generate the left part of the final output shown in (3). Furthermore, the number of stages in our design
messages by performing left-shift operation on the is only five. In contrast, the OPN would need a seven-stage net-
array; let L[i] be the left-shift output, then L[i] ← work with 72 multiplexors to implement the same functionality,
I[i + c]. as shown in Fig. 4.
Step 2) Right shift: Generate the right part of the final output
messages by performing right-shift operation on the
array; let R[i] be the right-shift output, then R[i] ← B. Proof of Correctness
I[i − (p − c)]. Next, we prove that the QSN implementation can actually
Step 3) Merge: Extract the useful part from the left-shift out- generate all the required output combinations, given by O[i] =
put and the right-shift output. When 0 ≤ i < (p − I[(i + c) mod p] for all 0 ≤ i < PM − 1. The output of the
c), O[i] ← L[i]; when (p − c) ≤ i < p, O[i] ← left-shift network is L[i] = I[i + c] for 0 ≤ i < (p − c). The
R[i]. output of the right-shift network is R[i] = I[i − (p − c)] for
Steps 1 and 2 are independent and, thus, can be performed in (p − c) ≤ i < p. Based on our merge-network control-signal
parallel. Step 3 depends on the output of step 1 and step 2. The generating function, the final output would be O[i] = I[i + c]
overall architecture is shown in Fig. 2. for 0 ≤ i < (p − c) and O[i] = I[i − (p − c)] for (p − c) ≤
As shown in Fig. 2, the QSN has three components, i.e., i < p. As I[(i + c) mod p] = I[i + c] when 0 ≤ i < (p − c),
the left-shift network, the right-shift network, and the merge I[(i + c) mod p] = I[i + c − p] = I[i − (p − c)] when (p −
network, which corresponds to steps 1, 2, and 3 described c) ≤ i < p. Hence, it is proven.
CHEN et al.: QSN—A SIMPLE CIRCULAR-SHIFT NETWORK FOR RECONFIGURABLE QUASI-CYCLIC LDPC DECODERS 785

TABLE I
ALGORITHM TO GENERATE CONTROL SIGNALS IN PSEUDO VERILOG

Fig. 5. Circuit for generating the control signals of the merge network for the
11 × 11 QSN.

C. Efficient Control-Signal Generation for the QSN


The 2b + PM − 1 control signals have to be generated for a
PM × PM QSN. The control logic can be generated on the fly
in a single pass using the algorithm described in Table I.
For example, in Fig. 3, the 11 × 11 QSN generates the output
based on c = 7 and p = 11. The control signal l for the left-shift
network is the binary version of c = 7, i.e., l[3 : 0] = 0111. The
control signal r for the right-shift network is the binary version
of p − c = 4, i.e., r[2 : 0] = 0100. The control bit for the merge
Fig. 3. Example of an 11 × 11 QSN. c = 7 and p = 11. network is derived by m[9 : 0] = 0000001111.
Fig. 5 shows the control logic for the merge network in an 11
× 11 QSN. Generally, b stages of logic gates are required. At
the ith stage, i.e., (0 ≤ i < b), 2i+1 − 1 AND gates and 2i+1 −
1 OR gates are needed. Thus, in total, 2b+1 − (b + 1) AND
gates and 2b+1 − (b + 1) OR gates are needed for a PM × PM
network when PM = 2b . Otherwise, 2b+1 − (b + 1) − 2b +
PM = 2b + PM − (b + 1) AND gates and 2b+1 − (b + 1) OR
gates are needed.

IV. R ESULTS AND D ISCUSSION


We synthesized the proposed QSN implementation using
Fig. 4. OPN network measuring 12 × 12 to implement p = 11. X means do- Synopsys design flow and mapped it to a Taiwan Semi-
not-care, adapted from [13]. conductor Manufacturing Company Ltd. (TSMC) 0.18-μm
786 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 57, NO. 10, OCTOBER 2010

TABLE II
HARDWARE COMPLEXITY COMPARISONS FOR CONFIGURABLE SWITCH NETWORKS (8-BIT WORD LENGTH)

TABLE III
FPGA RESULTS ON VIRTEX 4 LX160-10 AFTER PLACE AND ROUTING (8-BIT WORD LENGTH). NETWORK ONLY, NO CONTROLLER

TABLE IV R EFERENCES
FPGA RESULTS ON VIRTEX 4 LX160-10 FOR A CONFIGURABLE DECODER
SUPPORT IEEE 802.11n AND 802.16e STANDARD [1] L. Lan, L. Zeng, Y. Tai, L. Chen, S. Lin, and K. Abdel-Ghaffar, “Construc-
tion of quasi-cyclic LDPC codes for AWGN and binary erasure channels:
A finite field approach,” IEEE Trans. Inf. Theory, vol. 53, no. 7, pp. 2429–
2458, Jul. 2007.
[2] Y. Chen and K. Parhi, “Overlapped message passing for quasi-cyclic low-
standard-cell complimentary metal–oxide–semiconductor li- density parity check codes,” IEEE Trans. Circuits Syst. I, Reg. Papers,
vol. 51, no. 6, pp. 1106–1113, Jun. 2004.
brary. Table II compares the datapath and the controller area and [3] Y. Dai, Z. Yan, and N. Chen, “Optimal overlapped message passing
clock frequency of the QSN with the OPN (which represents decoding of quasi-cyclic LDPC codes,” IEEE Trans. Very Large Scale
the state of the art in the published research literature) and a Integr. (VLSI) Syst., vol. 16, no. 5, pp. 565–578, May 2008.
more conventional Benes-network-based implementation. Note [4] G. Masera, F. Quaglio, and F. Vacca, “Implementation of a flexible
LDPC decoder,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 6,
that the control complexity, as quantified by the area required pp. 542–546, Jun. 2007.
to implement the control logic, is almost a factor of 8, smaller [5] M. Karkooti, P. Radosavljevic, and J. Cavallaro, “Configurable LDPC
than that required by the OPN (0.015 versus 0.114 mm2 ). Also, decoder architectures for regular and irregular codes,” J. Signal Process.
because of the reduced critical-path delay, the clock frequency Syst., vol. 53, no. 1, pp. 73–88, 2008.
[6] J. Tang, T. Bhatt, V. Sundaramurthy, and K. Parhi, “Reconfigurable shuffle
of our implementation is 200 MHz, as compared with 94 MHz network design in LDPC decoders,” in Proc. Int. Conf. Appl.-Specific Syst.
reported by that of the OPN in [13]. Archit. Process., Sep. 2006, pp. 81–86.
As mentioned before, our goal is to use the QSN to build [7] K. Gunnam, G. Choi, M. Yeary, and M. Atiquzzaman, “VLSI architec-
tures for layered decoding for irregular LDPC codes of WiMax,” in Proc.
a single decoder to emulate a family of complex QC-LDPC
IEEE ICC, 2007, pp. 4542–4547.
codes. Therefore, we also mapped the design to a FPGA and [8] T. Brack, M. Alles, F. Kienle, and N. Wehn, “A synthesizable IP core for
compared our results with the approach described in [13], as WIMAX 802.16E LDPC code decoding,” in Proc. IEEE Int. Symp. Pers.,
showed in Table III. Based on our design scheme, we design Indoor, Mobile Radio Commun., Sep. 2006, pp. 1–5.
[9] M. Rovini, G. Gentile, and L. Fanucci, “Multi-size circular shifting net-
a reconfigurable decoder for the IEEE 802.11n and 802.16e works for decoders of structured LDPC codes,” Electron. Lett., vol. 43,
standard (PM = 96) and present the results in Table IV. no. 17, pp. 938–940, Aug. 2007.
[10] C.-H. Liu, C.-C. Lin, H.-C. Chang, C.-Y. Lee, and Y. Hsua, “Multi-mode
message passing switch networks applied for QC-LDPC decoder,” in
V. C ONCLUSIONS Proc. IEEE Int. Symp. Circuits Syst., May 2008, pp. 752–755.
[11] D. Oh and K. Parhi, “Area efficient controller design of barrel shifters for
We have presented the architecture and the implementation reconfigurable LDPC decoders,” in Proc. IEEE Int. Symp. Circuits Syst.,
of the QSN, a simple circular-shift network that can be used to May 2008, pp. 240–243.
implement reconfigurable QC-LDPC decoders efficiently. Un- [12] J. Lin, Z. Wang, L. Li, J. Sha, and M. Gao, “Efficient shuffle network
like existing solutions to this problem in the research literature, architecture and application for WiMAX LDPC decoders,” IEEE Trans.
Circuits Syst. II, Exp. Briefs, vol. 56, no. 3, pp. 215–219, Mar. 2009.
the QSN has not been derived from a Benes topology and has [13] D. Oh and K. Parhi, “Low-complexity switch network for reconfigurable
hence resulted in simpler control logic and fewer stages in the LDPC decoders,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
critical path. Consequently, the proposed network is suitable for vol. 18, no. 1, pp. 85–94, Jan. 2010.
both the ASIC and FPGA implementation of low-overhead and [14] C. Liu, C. Lin, S. Yen, C. Chen, H. Chang, C. Lee, Y. Hsu, and S. Jou, “De-
sign of a multimode QC-LDPC decoder based on shift-routing network,”
fast circular-shift networks for decoding QC-LDPC codes. IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 56, no. 9, pp. 734–738,
Sep. 2009.
[15] V. Benes, “Optimal rearrangeable multistage connecting networks,”
ACKNOWLEDGMENT Bell Syst. Tech. J., vol. 43, no. 7, pp. 1641–1656, 1964.
[16] Z. Li, L. Chen, L. Zeng, S. Lin, and W. Fong, “Efficient encoding of quasi-
The authors would like to thank D. Truong for helping with cyclic low-density parity-check codes,” IEEE Trans. Commun., vol. 53,
the TSMC ASIC library and synthesis. no. 11, p. 1973, Nov. 2005.

You might also like