0% found this document useful (0 votes)
2 views

Multi-Rate_QC-LDPC_Encoder

This paper proposes a multi-rate memory-efficient encoder for low-density parity-check (LDPC) codes using a shift-register-adder-accumulator (SRAA) algorithm, which simplifies computation and reduces complexity. The encoder is designed based on quasi-cyclic LDPC codes, which require less memory for storing parity-check matrices, making it suitable for different bit-rate options in the Chinese digital TV standard (DMB-T). Simulations indicate that the proposed encoder meets the DMB-T requirements while maintaining lower complexity compared to traditional single-rate encoders.

Uploaded by

anamika051197
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Multi-Rate_QC-LDPC_Encoder

This paper proposes a multi-rate memory-efficient encoder for low-density parity-check (LDPC) codes using a shift-register-adder-accumulator (SRAA) algorithm, which simplifies computation and reduces complexity. The encoder is designed based on quasi-cyclic LDPC codes, which require less memory for storing parity-check matrices, making it suitable for different bit-rate options in the Chinese digital TV standard (DMB-T). Simulations indicate that the proposed encoder meets the DMB-T requirements while maintaining lower complexity compared to traditional single-rate encoders.

Uploaded by

anamika051197
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Multi-rate QC-LDPC Encoder

Huxing Zhang Hongyang Yu


Electronic Engineering Electronic Engineering
University of Electronic Science and Technology University of Electronic Science and Technology
Chengdu China Chengdu China
Email: [email protected] Email: [email protected]

Abstract—A multi-rate memory-efficient encoder for low-density


parity-check (LDPC) codes is proposed in this paper based on II. QUASI-CYCLE LDPC CODES
shift-register-adder-accumulator (SRAA). The SRAA algorithm Cyclic code is an important sub-category of linear block
simplifies the encoder computation module and reduces the
code, which has a variety of simple and practical decoding
complexity of the operation. The LDPC code generator matrix is
constructed by lots of quasi-cyclic square matrices in the Chinese
algorithms with lower decoding complexity and more
digital TV terrestrial broadcasting standard (DMB-T), and the efficiency due to its inherent structure of algebra. The
encoder is presented based on the quasi-cyclic character that disadvantage of general LDPC codes is that a significant
reduces the memory cost. Simulations demonstrate that the amount of memory is needed to store their parity-check
proposed encoder can satisfy the DMB-T in three-rate according matrices (5). Quasi-cyclic LDPC (QC-LDPC) codes may be a
to different bit-rate option with lower complexity. good candidate to solve the memory problem, since their
parity-check matrices consist of circulant permutation
Keywords-Low-density parity-check codes; DTV; encoder; error matrices or the zero matrix. In fact, the required memory for
correcting code.
storing them can be reduced by a factor 1/L, when L*L
circulant permutation matrices are employed. A good example
I. INTRODUCTION for QC-LDPC codes is the array codes (4).Quasi-cyclic LDPC
LDPC code (Low Density Parity Check Codes, LDPC) is a code (QC-LDPC) is a very important category of LDPC, and it
class of linear block error-correcting codes that can be defined is linear, systematic, quasi-cycle (7). Compared to the Turbo
by a very sparse parity-check matrix or bipartite graph, that code, it has much lower decoding algorithm complexity. QC-
first proposed by Gallager (1) and MacKay rediscovered the LDPC cyclic determinant is a square that has such
superiority of LDPC(2). In recent years, many studies show characteristics: every row is the rotate right of the first row,
that LDPC code is a kind of remarkable performance code that and every column is the rotate down of the first column. As a
is close to the Shannon limit over additive white Gaussian result, the cyclic determinant is fully determined by its first
noise (AWGN) channels. For rate 1/2, the best code found has line (or the first column). This section will briefly introduce
a threshold within 0.0045 dB of the Shannon limit of the the traditional RU encoding algorithm, and then deduce the
binary-input additive white Gaussian noise channel and encoding algorithm specifically.
simulation results with a somewhat simpler code show that the
performance can achieve within 0.04 dB of the Shannon limit A. Brief Introduction to RU Algorithm
−6 7
at a bit error rate of 10 using a block length of 10 (3). LDPC The current popular encoding algorithm for random LDPC
codes have been used as the core of channel coding in a code is RU algorithm, also called parity-check matrix lower
number of international standards, such as Europe's next- triangular form or approximate lower triangular form. The
generation Satellite Digital TV Broadcasting Standard (DVB- latter has higher efficiency, so we set it for an example.
S2)(6), China Digital Television Terrestrial Broadcasting Assume we are given an m × n sparse parity-check matrix
Standard (DMB-T)(8). For LDPC code, the parity-check H and by “performing row and column permutations only”
matrix is sparse, but the generator matrix is not sparse, so in we can bring the parity-check matrix H into the form
general coding, the check matrix is used to simplify the indicated in Figure 2-1.
encoding methods, such as RU algorithm(3). RU algorithm,
n-m g m-g
also called lower-triangular matrix algorithm, is from the
1
simplified form of check matrix by Gaussian algorithm and the B
1 0
A 1 m-g
mathematical expression of the check bits can be attained from T 1
1
the lower-triangular matrix. In DMB-T, the LDPC code is very m
long, and the RU algorithm will increase the complexity and E
C D g
consume lots of hardware resource in FPGA. As a result, this
paper adopts an approach based on the SRAA algorithm to
simplify the encoding complexity and reduce the cost of n
memory using the quasi-cycle character of LDPC code Figure 2-1 H in approximate lower triangular form
generator matrix, compared with the traditional single-rate Note that since this transformation was accomplished
encoders, this encoder can suit for three-rate of the DMB-T. solely by permutations, the matrix is still sparse. Assume that
we bring the matrix in the form

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 16,2023 at 19:04:52 UTC from IEEE Xplore. Restrictions apply.
978-1-4244-2587-7/09/$25.00 ©2009 IEEE
⎡A B T ⎤ the first row. Therefore, the memory cost will greatly reduce
H =⎢ (1)
⎥ to 1/127 of the original cost.
⎣C D E ⎦
If the first row of Gi , j is defined as g i,j (0) = [g 0 ,g1 ,...,g126 ]T ,
Where A is (m − g ) × ( n − m) , B is ( m − g ) × g , C is
g × (n − m) , D is g × g , E is g × (m − g ) , and finally T is then Gi , j can be expressed in the following equation.

(m − g ) × (m − g ) .Multiplying this matrix from the left by


⎡ g0 g1 ... g125 g126 ⎤
⎡ I O⎤ ⎢g
⎢ − ET −1 I ⎥ ⎢ 126 g0 ... g124 g125 ⎥⎥
⎣ ⎦
Gi , j = ⎢ ... ... ... ... ... ⎥
we get ⎢ ⎥
⎢ g2 g3 ... g 0 g1 ⎥
⎡ A B T⎤ ⎢⎣ g1 g2 ... g126 g 0 ⎥⎦
⎢ − ET −1 A + C − ET −1B + D 0 ⎥
⎣ ⎦
T
Let x = ( s,
p1 , p2 ) , where s denotes the systematic part, = ⎡⎣ gi, j (0) gi, j (1) ... gi, j (125) gi, j (126)⎤⎦ (7)
p1 and p2 combined denote the parity part, p1 has length g , The (6) can be expressed as:
and p2 has length m − g . The defining equation ⎡ G0,0 ... ... G0,C−1 ⎤
⎢ ... ... ... ... ⎥
T T
R1×(n−k) = ⎡⎣M0 K Mk−1 ⎤⎦ ⎢ ⎥ (8)
Hx = 0 splits naturally into two equations, namely M1
⎢ ... ... ... ... ⎥
As T + Bp1T + Tp 2 = 0 (2) ⎢ ⎥
−1 T −1 T
⎣GK−1,0 ... ... GK,C−1 ⎦
And ( − ET A + C ) s + ( − ET B + D ) p = 0 (3)
1

Define φ = − ET B + D and assume


−1
φ is nonsingular, Where M i = [mi×127 mi×127+1 ... mi×127+125 mi×127+126 ] ,
then we can get M i is the 127 bits of the ith information bit section.
p1T = φ −1 (− ET −1 A + C ) s T (4)
R1×( n−k ) = ⎡ (9)
K −1 K −1 K −1

T T T ⎢ ∑ M i Gi ,0 ... ∑M G ... ∑M G ⎥
p = −T ( As + Bp )
i i, j i i ,C −1
2 1
(5) ⎣ i =0 i =0 i =0 ⎦

Obtained p1 and p2 , we can get x , then the encoding is From (9), the multiplication of long vector and large
finished. In DMB-T, the parity-check matrix of LDPC is very
large and obviously, RU encoding algorithm is not suit. matrix can be broken down into K × C multiplications of 127-
B. The Encoding Algorithm for DMB-T bit vector and 127 × 127 matrix, (K−1)×C vector additions.
Assuming the code length is n, and the information bit T
length is k, so the linear block codes encoding algorithm can ⎡ mi×127 ⎤ ⎡ g0 g1 ... g125 g126 ⎤
⎢m ⎥ ⎢g g 0 ... g124 g125 ⎥⎥
be expressed by multiplication as follows: ⎢ i×127+1 ⎥ ⎢ 126
C1× n = m1×k Gk × n (6) M i Gi , j = ⎢ ... ⎥ ⎢ ... ... ... ... ... ⎥
⎢ ⎥ ⎢ ⎥
C1×n is the LDPC code, m1× k is the information bits, and ⎢mi×127+125 ⎥ ⎢ g2 g 3 ... g 0 g1 ⎥
⎢⎣mi×127+126 ⎥⎦ ⎢⎣ g1 g 2 ... g126 g 0 ⎥⎦
G k × n is the generator matrix. In DMB-T, the general LDPC
generator matrix has a form showed in the following, = ⎡⎣mi×127 gi , j T (0) + mi×127+1 gi , j T (1) + L + mi×127 +126 gi , j T (126)⎤⎦ (10)
⎡ G0 , 0 G0,1 K G0,C −1 I O K O⎤ Now, the blocks matrices operations are broken down into
⎢ G G1,1 K G1,C −1 O I K O ⎥⎥ one bit multiplied by the corresponding vectors, and additions.
Gqc = ⎢
1, 0

⎢ M M Gi , j M M M O M⎥ As a result, the check bits can be computed using the


⎢ ⎥ following equation:
⎣GK −1,0 GK −1,1 K GK −1,C −1 O O K I⎦ K −1

I is a b × b identity matrix, O is a b × b zero matrix, and rj = ∑⎡⎣mi×127 gi, jT (0) + mi×127+1gi, jT (1) +K+ mi×127+126gi, jT (126)⎤⎦ (11)
i=0
G i , j is a b × b cyclic matrix. 0 ≤ i ≤ K − 1, 0 ≤ j ≤ C − 1 , b=127.
In (11), rj is 127-bit vector, i is from 0 to K − 1 , j is from
C and K are determined by the corresponding rate of the
0 to C − 1 , then all the check bits can be calculated. The check
DMB-T. G i , j is a cyclic matrix, and in the hardware
bit of 0.8-rate LDPC (7493, 6096) in DMB-T is computed
implementation of the LDPC code, as a result of the using the following equation:
characteristics of the cycle, only the first row needs to be
[ ]
47

stored, and the rest rows can be attained by the rotate right of rj = ∑ mi×127gi, j (0) + mi×127+1gi, j (1) +K+ mi×127+126gi, j (126) (12)
i=0

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 16,2023 at 19:04:52 UTC from IEEE Xplore. Restrictions apply.
Where i is from 0 to 47, and j is from 0 to 11. The following Table 1 Parameters of the generator matrices
will discuss the specific hardware implementation of the
encoder based on this algorithm using FPGA. Rate K C b

0.4 24 35 127
III. LDPC ENCODER BASED ON FPGA
0.6 36 23 127
A. The Computing Module Design 0.8 48 11 127
The blocks matrices operations can be broken down into
multiplications and additions, based on FPGA, multiplications Like the above 0.8-rate encoder computing module, the
can be achieved using AND, additions can be achieved using 0.6-rate LDPC (7493, 4572) can call 23 SRAA circuits
XOR. Researching the computing module based on the 0.8- parallel to compute the 2921 bits check bit, after 36*127 clock
rate generator matrix as an example, the generator matrix of cycles, all the check bit can be computed in the SRAA circuits
QC-LDPC (7493, 6096) code has 48 pieces of row-block and stored in the cumulative register B; the 0.4-rate LDPC
(7493,3048) can call 35 SRAA circuits parallel to compute the
matrices and 11 pieces of column-block matrices. The first
4445 bits check bit, after 24*127 clock cycles, all the check bit
row elements of the row-block matrices can be stored in a
can be computed in the SRAA circuits and stored in the
127-bit wide, 48-depth memory. There are 1397 bits check bit, cumulative register B. If just combine the three-rate encoder
which is127*11, in order to increase the encoding speed, 11 together, it will spend 11 +23 +35 = 69 SRAA circuits, since
pieces of the SRAA (shift-register-adder-accumulator) circuits each time we can only choose one rate, the SRAA circuits can
(in figure 3-1) are called in parallel to compute all the check- be reused to reduce the waste of resource, so only 35 SRAA
bit (9). If using one ROM to store each first row elements of circuits can achieve three-rate encoding. The overall structure
the block matrices, as the 11 SRAA parallel computing, each of the encoder is shown in Fig.3-2.
time reading data from ROM into the 11 127-bit shift registers
will cost at least 11 clock cycles, and 48 pieces of row-block
matrices operation will at least take 11 * 48 clock cycles to
read the total generator matrix. However, we can configure
separated ROM for each SRAA to store the first line of the
block matrix, and all the register A can read data in parallel
from the ROMs to increase the encoding speed.

Fig.3-2 Whole structure of multi-rate encoder


Control Module is responsible for every part of the
encoder; G-SRAA Module has a total of 35 sub-modules to
store the three-rate generator matrices and compute the check
bit; Parallel to Serial Module reads check bit in parallel from
register B and outputs them serially to FIFO; FIFO is used to
buffer the data; RAM is applied to store the input data, and
send them out after the check bit. Rate_control signal controls
the select of three-rate, and Data_in is the input signal. 0.4-rate
calls G0-SRAA to G34-SRAA, 0.6-rate calls G0-SRAA to
G22-SRAA, 0.8-rate calls G0-SRAA to G10-SRAA. G0 to
 Fig.3-1 SRAA circuit G10 are constructed by three ROMs to store the head 11
In Fig.3-1, a 127-bit shift register is used to store the first column block matrices of 0.4-rate, 0.6-rate, and 0.8-rate; G11
line elements of Gi , j in each clock rising edge, the serial input to G22 are constructed by two ROMs to respectively store the
12 to 23 column block matrices of 0.4-rate and 0.6-rate; G23 to
data multiplied by the register A, then added by the register B. G34 are constituted by one ROM to store 24 to 35 column
At the same time, register A rotates right. After 127 clock block matrices of 0.4-rate. The multi-rate encoder using several
periods, 127 bits input data are calculated, that is, the end of a ROMs to store the matrices separately in order to update
calculation of Gi , j . Each column block matrix is constructed by SRAA shift register simultaneously and to avoid time delay of
48 pieces of Gi, j , and a total of 48*127 clock cycles will cost to reading data from the ROMs.
calculate the 127 bits check bit, which is kept in the cumulative C. Comparison of Complexity and Resource
register B.
This subsection will contrast RU encoding algorithm
B. The Whole Structure of the Encoder and SRAA encoding algorithm in complexity and resource,
as well as single-rate encoder and multi-rate encoder. In
In DMB-T, there are three encoding rates, 0.4, 0.6, 0.8, DMB-T the information part of QC-LDPC has length
and different generator matrix parameters are according to
m = k × b , and the whole code has length n = ( k + c) × b .
the three rates. The generator matrix parameters of three rates
in DMB-T are shown in table 1. For RU algorithm, the computing complexity is calculated

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 16,2023 at 19:04:52 UTC from IEEE Xplore. Restrictions apply.
by p1 and p2 . The parity-check matrix is needed to D. Simulations and Verification
transform into A, B, C, D, E and T respectively to store, The multi-rate QC-LDPC encoder based on SRAA
which can not make use of the quasi-cyclic nature of H to algorithm was implemented using Xilinx ISE9.1 platform,
reduce the memory. Comparison of complexity and resource described by Verilog hardware language, conducted on Xilinx
between RU algorithm and SRAA encoding algorithm is virtex2P xc2vp100 FPGA. The XST synthesis report showed
shown in table 2. that it cost 9707 slices, 9841 flip-flops, 17573 4-input-LUTs.
Table 2 Comparison of RU and SRAA To give a clear understanding, the simulation of the SRAA
Algorithm Complexity Memory(bit) structure is shown in Fig. 3-5 based on Xilinx ISE simulator.
RU 2
(k + c) × c × b
2
O(n + g )
SRAA O(n) k ×c×b
Table 2 shows that, SRAA algorithm costs less complexity
and memory than RU algorithm, and it is more suitable for
QC-LDPC in DMB-T. The following will compare the
resource of single-rate QC-LDPC encoder and multi-rate
encoder based on SRAA algorithm, shown in table 3 Fig.3-5 The simulation of SRAA encoding structure
Table 3 Comparison of single-rate and multi-rate In Fig.3-5, data_in is the information bit; rate_contr signal is
Rate Encoding Flip- XOR/ Memory
speed flops AND 2’b11, which means the rate is 0.8; after computed, the check
gates bit is sent to sraa_data_out in parallel; this is the key of the
Single
ki b 2ci b ci b ki ci b multi-rate encoder.
In order to verify the encoder, we have adopted a
Multi
k max b 2cmaxb cmax b 3
b ∑ (k ici ) verification program in MatLab platform. A set of random
i =1 binary sequence is produced and sent into both the encoder and
In DMB-T k1 = 24 , k 2 = 36 , k3 = 48 ; c1 = 35 , c2 = 23 , MatLab simulation. Comparing the two encoding results can
test and verify the correctness of the LDPC encoder proposed
c3 = 11 ; b = 127 , i is from 1 to 3, k max is the maximal of ki , in this paper.
cmax is the maximal of ci . Three rates are 0.4, 0.6 and 0.8. The IV. CONCLUSIONS
comparison of specific resource consumption is shown in A multi-rate LDPC encoder, using SRAA circuits and
figure3-3 and 3-4.
taking up less memory, is proposed in this paper that based on
F lip -flo p s the quasi-cyclic characters of the generator matrix of DMB-T.
Using separated ROMs to store blocks matrices is effective in
9000
reducing delays in the system. Compared with the traditional
6000 LDPC encoder, this encoder can achieve in three-rate, occupy
fewer resources, and have more practical value.
3000

R a te REFERENCES
0 .4 0 .6 0 .8 m u lti
[1] R.G.Gallager, Low-Density Parity-Check Codes [M].Cambridge: MIT
Press, 1963.
Fig.3-3 Flip-flops cost of different-rate encoders [2] D. J. C. MacKay, “Good error-correcting codes based on very sparse
Matrices”, IEEE Trans.Inf.Theory,vol.45, no.2, pp.399–431, Mar.1999.
[3] Sae-Young Chung, “The Design of Low-Density Parity-Check Codes
within 0.0045 dB of the Shannon Limit”, IEEE Comm.Letters, 5(2):58~60.
[4] J. L. Fan, “Array codes as low-density parity-check codes,” in Proc.
2nd. Int. Symp. Turbo Codes, Brest, France, Sep. 2000, pp. 543–546.
[5] Seho Myung, Kyeongcheol Yang, “Quasi-Cyclic LDPC Codes for Fast
Encoding” IEEE Trans.Inform.Trans, 51(8), Aug.2005, pp 2894~2896.
[6] WEN Hong, “The principle and application of LDPC codes”, UESTC
Publishing House.
[7] JIANG Huiyuan, TIAN Bin and YI Kechu, “Design of Quasi-regular
Fig.3-4 Memory cost of different-rate encoders LDPC Codes Encoder Base on Q-matrix”, VIDEO ENGINEERING,
No.11 Vol.31 2007.
The computing module of multi-rate QC-LDPC encoder
[8] GB 20600-2006,Framing structure, channel coding and modulation for
and 0.4-rate encoder consume almost the same number of flip- digital television terrestrial broadcasting system.[S]
flops、two-input XOR and AND gates, and the control module [9] Zongwang Li, Lei Chen and Lingqi Zeng, “IEEE Communications
of the multi-rate encoder costs another few gates. The memory Society subject matter experts for publication in the IEEE GLOBECOM
cost of the multi-rate encoder is the sum of memory cost of 2005 proceedings.”, pp1205~1208.
three single-rate encoder.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 16,2023 at 19:04:52 UTC from IEEE Xplore. Restrictions apply.

You might also like