0% found this document useful (0 votes)
49 views

Wireless Sensor Networks

research paper on wsn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Wireless Sensor Networks

research paper on wsn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 53, NO.

11, NOVEMBER 2006 1245

Area-Efficient VLSI Design of Reed–Solomon


Decoder for 10GBase-LX4 Optical
Communication Systems
Huai-Yi Hsu, Student Member, IEEE, An-Yeu (Andy) Wu, Member, IEEE, and Jih-Chiang Yeo

Abstract—The Reed–Solomon (RS) code is a widely used


forward error correction technique to cope with the channel
impairments in fiber communication systems. The typical parallel
RS architecture requires huge hardware cost to achieve very
high speed transmission data rate for optical systems. This brief
presents an area-efficient VLSI architecture of the RS decoder by
using a novel just-in-time folding modified Euclidean algorithm
(JIT-FMEA). The JIT-FMEA VLSI architecture can greatly
reduce the hardware complexity by about 50% compared with
the fully expanded parallel RS architecture. Meanwhile, it can
achieve very high throughput rate for the 10Gbase-LX4 optical
communication system. The proposed RS decoder architecture
has been designed and implemented by using 0.18- m CMOS
standard cell technology at a supply voltage of 1.8 V. The post-
layout simulation results show that the design requires only about Fig. 1. (a) Proposed JIT-FMEA RS decoder architecture. (b) 10GBase-LX4
20 K gates and can achieve the data processing rate of 3.2 Gb/s at optical system.
a clock frequency of 400 MHz.
Index Terms—Forward error correction (FEC), just-in-time
folding modified Euclidean algorithm (JIT-FMEA), key equation approximately 5.5-dB coding gain to reduce the bit error rate
solver (KES), Reed–Solomon (RS) codec, 10Gbase-LX4 optical (BER) from to for correcting random errors [3].
system. Fig. 1(a) shows the overall architecture of the RS decoder.
For optical applications, the transmission rate reaches several
I. INTRODUCTION gigabits per second. Hence, the data processing rate, the hard-
ware complexity, and the power consumption become very
challenging issues in VLSI implementations. Conventionally,

T HE CAPACITY of optical transmission systems has been


drastically increased over the past ten years. When the
data rate of multimode fiber (MMF) optical systems reaches the
many high-speed RS decoder designs have adopted fully ex-
pended parallel architectures to achieve the requirement of
high throughput rate [5]–[10]. However, hardware utilization
range of tens of gigabits per second, the channel impairments is not efficient. In this brief, we develop a new decoding algo-
become more and more severe, and the degradation of the op- rithm called just-in-time folding modified Euclidean algorithm
tical signals limits the data transmission distance. Therefore, ad- (JIT-FMEA). It results in an area-efficient VLSI architecture
vanced digital signal processing (DSP) techniques are now em- that reduces the total hardware complexity. The key idea is to
ployed to enhance the transmission capacity in optical systems, use a “precomputation scheme” to eliminate the idle cycles as
such as equalization (EQ) and forward error correction (FEC) well as to reduce hardware cost. Besides, the retiming ability
codec [1]. The DSP techniques can help overcome channel im- of the JIT-FMEA architecture can overcome the critical paths
pairments, thus achieving the goal of improving the transmis- of the bottleneck in an RS decoding procedure to achieve high
sion quality and increasing the transmission distance. throughput rate in optical systems.
The Reed–Solomon (RS) code is one of the most widely used The proposed folding RS decoding architecture can reduce
FEC techniques. It provides excellent error correcting capability the hardware complexity by about 50% compared with the fully
for both random and burst errors [2]. System simulation of the parallel architecture [13]. We implement the design in 0.18- m
optical system shows that the RS(255, 239) code can provide CMOS technology. Postlayout simulation of our VLSI design
can work up to 400 MHz, and the throughput rate can achieve
3.2 Gb/s to meet the requirement of 10Gbase-LX4 optical com-
Manuscript received April 1, 2005. This work was supported in part by Me-
diaTek Incorporation and by the National Science Council, R.O.C, under Grant
munication systems. Moreover, based on the developed RS de-
NSC 92-2220-E-002-012. This paper was recommended by Associate Editor coder, we can easily construct a four-way parallel FEC decoder
A. Apsel. architecture for the application of 10GBase-LX4 optical trans-
The authors are with the Department of Electrical Engineering and Graduate mission systems, as shown in Fig. 1(b).
Institute of Electronics Engineering, National Taiwan University, Taipei 106,
Taiwan, R.O.C. (e-mail: [email protected]) The rest of this brief is organized as follows. Section II
Digital Object Identifier 10.1109/TCSII.2006.882360 discusses the existent RS decoding architectures. Section III
1057-7130/$20.00 © 2006 IEEE
1246 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 53, NO. 11, NOVEMBER 2006

presents the proposed JIT-FMEA scheme. Section IV shows


the proposed area-efficient RS decoding architecture. The
comparison results are discussed in Section V. Finally, the
conclusions are given in Section VI.

II. DRAWBACK OF EXISTING RS DECODING ARCHITECTURES


The syndrome-based RS decoding scheme consists of three
components, namely: 1) the syndrome calculator (SC); 2) the
key equation solver (KES); and 3) the error corrector (EC), as
shown in Fig. 1(a) [4]. In addition, a delay buffer is usually used
to buffer the received symbols according to the latency of these
components.
Since KES involves the highest computational complexity of
the RS decoding procedure, it affects the speed and hardware
complexity of the RS decoders. Hence, the throughput bottle-
neck in an RS decoder is in the KES block. Other blocks of the
RS decoder can be simply pipelined due to their feedforward
structures. To meet the optical rate requirement, the target RS
decoder needs to provide two design features—high data pro-
cessing rate but at low hardware complexity.
Fig. 2. (a) Timing chart for the parallel architecture. (b) Timing chart of direct
A. Problem in Fully Parallel MEA Architecture folding by 2t.
To achieve high data processing rate, many RS decoders em-
ploy the pipeline and/or parallel architecture to improve the
throughput rate. For example, the designs in [6] and [10] use reduce the overall hardware complexity [10], [11]. However,
the pipelining architecture to improve the maximum operating the resource sharing method introduces large input and output
frequency. However, it results in much more registers to handle buffers to process the synchronous problem of four channels.
the timing matching issue, which greatly increases the latency Hence, both methods cause inefficient implementation prob-
and hardware cost. lems in area utilization and power consumption domains.
In general, the fully parallel architecture requires cycles to Due to the disadvantages of the two existing RS decoding ar-
finish the modify Euclidean algorithm (MEA) to find the error chitectures, in this brief, we propose an area-efficient architec-
location polynomial and the error magnitude polynomial. How- ture to reduce hardware complexity compared with parallelism
ever, fully parallel architectures usually employ coarse-grain MEA architecture. The major feature of our method is the em-
processing elements (PEs) for KES block [11]. Therefore, im- ployment of the precalculation scheme (PCS) to eliminate idle
plementation of the fully expanded parallel method requires cycles and reduce hardware cost as discussed below.
large hardware cost and involves higher power consumption [6],
[11]. III. PROPOSED JIT-FMEA
The timing chart of the parallel architecture is shown in
In general, it takes a total of cycles to use
Fig. 2(a). As we can see, the parallel MEA architecture uses
only a single PE repeatedly executing the MEA for KES. Nev-
only cycles to execute the KES operations. There are “many
ertheless, the calculation of the SC or EC blocks needs only
idle cycles” during the decoding procedure. Consequently,
255 cycles.
the conventional parallel architecture is an inefficient method,
For this reason, the direct folding-by- method has difficulty
which leads to waste of silicon area.
in meeting the timing requirement of the whole RS decoder with
B. Problem in Direct Folding MEA Architecture the pipelining scheme. Hence, we need to reduce the number of
operation cycles so that the decoding time of the KES block can
To reduce hardware complexity, some RS decoders have be smaller than 256 cycles.
adopted resource sharing and/or time multiplexing schemes to
reduce the hardware cost. In general, the direct folding-by- A. PCS
method finishes the KES operation in cycles,
To eliminate the additional cycles, we employ the PCS to
which is greater than the received code length . The
eliminate the first iteration of MEA. Thus, the total processing
timing chart of the direct folding-by- architecture is shown
time can be reduced from to iterations. Based on the
in Fig. 2(b). Since the direct folding-by- method needs one
Euclidean algorithm, we have initial conditions
more cycle, it is not suitable for application in continuous
real-time RS decoding.
Besides, there are some approaches employing the resource
(1)
sharing concept to reuse four times the parallel MEA architec-
ture in a four-way parallel FEC decoder architecture, which can
HSU et al.: AREA-EFFICIENT VLSI DESIGN OF RS DECODER FOR 10GBASE-LX4 SYSTEMS 1247

TABLE I
SUMMARY OF THE PROPOSED JIT-FMEA ALGORITHM

Fig. 3. New timing chart of the JIT-FMEA algorithm. It does not have any idle
cycles in the KES procedure.

Since the syndrome polynomial has been calculated by the


SC block, we can use it to precalculate one iteration operation
of the MEA before KES easily. Hence, we can obtain the new
initial values of MEA by using the PCS method as shown at the
right-hand side of (1).
We can see that the new initialization values of the MEA Fig. 4. (a) Parallel MED architecture. (b) Parallel MEM architecture.
are also the already known values from (1). Therefore, the PCS
method can not only save one iteration from MEA but also di-
minish the operating complexity and the power consumption at denotes an R-type register, the index denotes the itera-
no increase of extra circuits. tive number, the index denotes the folding cycles by using the
folding architecture, and the index denotes the order number
B. JIT-FMEA of the coefficient of the polynomial.
We employ the regular MEA parallel architecture and fold it The new timing chart of the proposed JIT-FMEA is shown in
by times [13]. The folding architecture requires only Fig. 3. We can see that no idle cycles occur during the KES pro-
hardware cost of the PE to solve the KES operation. Hence, we cedure. With the modifications, now, the decoding method can
can greatly reduce the total hardware cost. Nevertheless, the di- be easily applied to 10GBase-LX4 systems, which do not re-
rect folding method causes the unsuitable pipelining architec- quire additional buffers at both input and output ports to handle
ture for the continuous data processing. Hence, we adopted the the synchronization problem in [11].
PCS method in the folding MEA algorithm to save the first one
iteration, i.e., we can save operating cycles in the KES block. IV. FOLDED JIT-FMEA VLSI ARCHITECTURE
However, this method will involve idle cycles during the In [13], we have proposed a scalable parallel MEA architec-
RS decoding procedure. Consequently, there are cycles ture, as shown in Fig. 4. It can be easily folded by . Moreover,
that are redundant. Therefore, hardware utilization is not fully according to the JIT-FMEA algorithm, we can easily construct
utilized. the regular data path for the JIT-FMEA architecture, as shown in
To avoid such a problem, at each iteration operation, we em- Fig. 5. The control flow of the JIT-FMEA architecture is shown
ploy the additional cycle to lock the leading coefficients and in Fig. 6.
to ensure the correct results. Moreover, the control flow can The JIT-FMEA algorithm consists of two major operations:
be implemented easily. Each iteration operation needs One is the RQ part for the modified Euclidean division (MED)
symbol cycles. Hence, FMEA architecture needs operation. It performs the long division operation to obtain the
cycles (i.e., symbols) to error magnitude polynomial . The other is the LU part for
accomplish the KES procedure. We call this method JIT-FMEA. the modified Euclidean multiplication (MEM) operation. It per-
It does not have any idle cycles during the RS decoding proce- forms the multiplication and accumulation in the polynomial
dure. Therefore, this algorithm can provide fully hardware uti- domain. It is used to obtain the error location polynomial .
lization as compared to the conventional parallel architecture. Moreover, the equations in Table I describe the data path con-
Table I summarizes the proposed JIT-FMEA algorithm, where trol flow. The KES operation can be separated into two modes:
1248 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 53, NO. 11, NOVEMBER 2006

Fig. 5. Proposed JIT-FMEA architecture.

exchange and unexchange. All odd iterations are operating in


the exchange mode, and all even iterations are operating in the
unexchange mode. All iterations performed alternatively be-
tween exchange and unexchange modes until the stop condition
is satisfied.
According to the initial values of the PCS method, we need
to initialize all types of registers, , , , and , at the starting
phase of the KES block. The two major operations MED and
MEM can be folded by into RQ and LU computational Fig. 6. Flowchart of the data path control for the proposed JIT-FMEA archi-
blocks, respectively. The architecture is shown in Fig. 5. More- tecture.
over, using additional cycle to lock the leading coefficients
can simplify the control flow. Hence, adopting the proposed
JIT-FMEA algorithm to solve the key equation just requires Note that the NT index is “technology dependent” since the
cycles. Finally, we can give the error magnitude gate speed increases while the process advances. Define the
polynomial from the MSB of R-type registers and the error scaling factor as . Usually, the gate delay scales down by
location polynomial from the MSB of L-type registers times under constant electric field [14, Ch. 4]. Therefore, the
and , respectively. NT value will depend on the fabrication technology. To elimi-
The proposed JIT-FMEA architecture can greatly reduce the nate the scaling effect in our comparison, we adopt another per-
hardware cost. From the synthesis results, we can save about formance index called technology-scaled NT rate (TSNT) as
50% of the hardware cost of MEA, which is more efficient than
the parallel architecture, while satisfying the speed requirement throughput rate technology m
TSNT (9)
of the optical system. of total gates
TSNT is the NT index normalized to 0.13 m and can help eval-
V. IMPLEMENTATION RESULT AND COMPARISON
uate the architectural advantage of a given VLSI design inde-
The VLSI architecture based on this JIT-FMEA algorithm pendent of implementing technologies. Hence, for different RS
has only 20 614 gate counts, and the core size is only decoding architectures, we can approximately normalize their
m . Postlayout simulation shows that the proposed RS gate speed to 0.13 m and make fair comparison for their ar-
decoding chip can operate at a clock frequency of 400 MHz chitectural advantage. From Table II, we can see that our design
and has a data processing rate of 3.2 Gb/s in 0.18- m CMOS has the smallest gate cost but can still meet the stringent speed
technology at 1.8 V. Hence, it can meet the speed requirement of requirement of the 10GBase-LX4 system. In addition, the NT
the 10Gbase-LX4 system. The chip layout and the chip feature and TSNT indexes of our design are also the highest among all
summary are shown in Fig. 7. designs, which demonstrates the architectural advantage of the
We compare our design with other existing chip solutions in proposed decoding scheme.
Table II. To have fair comparison, we define the normalized
throughput rate (NT) as the maximum data throughput rate over VI. CONCLUSION
the total number of gates, i.e., In this brief, we developed the VLSI architecture of an area-
throughput rate efficient RS decoder for 10GBase-LX4 optical communication
NT Mb/s/Kgate (8) systems. The folding architecture can greatly reduce the hard-
of total gates
ware complexity by about 50% compared with the fully par-
The NT index shows the computing capability that every kilo allel RS decoding architecture. The proposed design is very area
gate can deliver for a given design with its available imple- efficient and can meet the stringent speed requirement of the
menting technology. 10GBase-LX4 optical system.
HSU et al.: AREA-EFFICIENT VLSI DESIGN OF RS DECODER FOR 10GBASE-LX4 SYSTEMS 1249

Fig. 7. Chip layout of the JIT-FMEA RS decoder.

TABLE II
COMPARISON RESULTS OF VARIOUS RS DECODER DESIGNS

ACKNOWLEDGMENT [7] ——, “Modified Euclidean algorithm block for high-speed


Reed–Solomon decoder,” Electron. Lett., vol. 37, no. 14, pp. 903–904,
The authors would like to thank the National Chip Implemen- Jul. 2001.
tation Center for the IC design flow. [8] H. Lee, M.-L. Yu, and L. Song, “VLSI design of Reed–Solomon
decoder architectures,” in Proc. IEEE Int. Symp. Circuits and Syst.,
Geneva, Switzerland, May 28–31, 2000, pp. v-705–v-708.
REFERENCES [9] Y. X. You, J. X. Wang, F. C. Lai, and Y. Z. Ye, “Design and implemen-
tation of high-speed Reed–Solomon decoder,” in Proc. ICCSC, 2002,
[1] K. Azadet, E. F. Haratsch, H. Kim, F. Saibi, J. H. Saunders, M. Shaffer, pp. 146–149.
L. Song, and M. L. Yu, “Equalization and FEC techniques for optical [10] L. Song, M. L. Yu, and M. S. Shaffer, “10- and 40-Gb/s forward error
transceivers,” IEEE J. Solid-State Circuits, vol. 37, no. 3, pp. 317–327, correction devices for optical communications,” IEEE J. Solid-State
Mar. 2002. Circuits, vol. 37, no. 11, pp. 1565–1573, Nov. 2002.
[2] S. Lin and D. J. Costello, Jr., Error Control Coding: Fundamentals and [11] H. Lee, “High-speed VLSI architecture for parallel Reed–Solomon de-
Applications. Englewood Cliffs, NJ: Prentice-Hall, 1983. coder,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no.
[3] Forward Error Correction for Submarine Systems, G.975, Telecom- 2, pp. 288–294, Apr. 2003.
munication Standardization Section, International Telecommunication [12] S. Le-Ngoc and Z. Young, “An approach to double error correction
Union, 1996. Reed–Solomon decoding without chien search,” in Proc. 36th Midwest
[4] A. Raghupathy and K. J. R. Liu, “Algorithm-based low-power/high- Symp., 1993, vol. 1, pp. 534–537.
speed Reed–Solomon decoder design,” IEEE Trans. Circuits Syst. II, [13] H.-Y. Hsu and A.-Y. Wu, “VLSI design of a reconfigurable multi-
Analog Digit. Signal Process., vol. 47, no. 11, pp. 1254–1270, Nov. mode Reed–Solomon codec for high-speed communication systems,”
2000. in Proc. IEEE AP-ASIC, Aug. 2002, pp. 359–362.
[5] H. Lee, “An area-efficient euclidean algorithm block for [14] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design:
Reed–Solomon decoder,” in Proc. IEEE Symp. VLSI, 2003, pp. A System Perspective, 2nd ed. Reading, MA: Addison-Wesley, Jun.
209–210. 1997.
[6] ——, “A VLSI design of a high-speed Reed–Solomon decoder,” in [15] H. Lee, “A high-speed low-complexity Reed–Solomon decoder for op-
Proc. 14th Annu. IEEE Int. ASIC/SOC Conf., Sep. 12–15, 2001, pp. tical communications,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol.
316–320. 52, no. 8, pp. 461–465, Aug. 2005.

You might also like