FPGA: Field Programmable Gate Array
FPGA: Field Programmable Gate Array
One of the first flexible-logic designs can be traced back to the 1960s when Estrins variable logic was experimented with in CPUs. 1970s marked the invention of the first PLA / PLD (Programmable Logic Arrays / Devices). In 1985 first field programmable gate arrays were introduced. In 1990s FPGAs were combined to form multi-FPGA CCMs (custom computing machines) which marked first steps in designing and implementing first high-power, task-adaptable computing devices. In the late 1990s the density of FPGAs kept increasing allowing more and more complex designs to be laid out on a single device. In the early 2000s the FPGAs started to see the SOC (system on the chip) approach FPGAs are now offered with DSP blocks, soft-cores (which can be programmed in by choosing a macro in the design tool), dedicated functional blocks (multipliers / ALUs). The flagship devices (such as Xilinxs Virtix and Spartan and Alteras Stratix) offer intensive I/O support, higher operating (200 400 MHz) speeds. Although it may seem that the FPGAs may never catch up to the performance of the CPUs, 32 bits added in parallel (for example) at 300 MHz is a lot faster than the same 32 bits added serially at 1 GHz. Of course as with anything else there are plenty of gotchas when using FPGAs (some of which will be discussed later in this paper). So who would be a typical user of an FPGA today? The communication industry is probably one of the major users to date (they definitely have the volume). FPGAs are used for protocol translation (i.e. two devices are trying to talk to each other can do so through and FPGA in which the decoder would be implemented in hardware, allowing for very fast translation). The cell phone and base station producers are using FPGAs in 3G / 4G communications. High-speed modem designers use FPGAs to insure the upgrade-ability of the final modem products (ADSL, ISDN, HDSL, etc.). In signal processing field FPGAs allow for a whole range of hardware coded DSP functions such FIR, IIR filtering and FFT processing. Video imaging and compression are another area where FPGAs are becoming widely accepted to be used as hardwired standard decoder (MPEG for example) and other processing blocks; thereby, giving the systems a significant performance boost as these functions would otherwise be performed in software with a tremendous loss in system speed. FPGA Architecture We should start of the FPGA description by introducing a basic unit of the FPGA still used today the LE (Logic Element). It could be used as the common comparison unit amongst various FPGA design from the same or different manufacturer. LEs (also known as LBs Logic Blocks, CLB Configurable Logic Blocks, PLBs Programmable Logic Blocks) for now are still the major portion of the real-estate (Figure 1). Devices can have anywhere from several hundreds to several hundred thousands of these embedded amongst an interconnection network (more on network later). Typical CLB includes LUT (a Look-Up Table) D-Flip-Flip (which may or may not be used) and provides either logic, arithmetic (and ripple) capabilities or can act as RAM / ROM. So it can emulate a Boolean function or be hard-coded to store a variable, which can be combined (i.e. multiplied or added) with the input much faster than doing so in software.
I O B
I O B
LB IOB LB IOB LB
I O B
LB LB LB
I O B
LB IOB LB IOB LB
LUT
0 1 2 1 1 1 1 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
1 1 1 1
Figure 2 LUT emulating a four-input gate only when all four input lines (can be thought of as address lines) are set to 1 will the output be 1, otherwise it will be 0. For an analogy you could see an AND gate on the right.
In addition to the LEs, other components should be considered in todays devices. For example, a multiplier is very inefficient to implement by routing it using LEs (too many LEs would be wasted and the final layout would still have insufficient performance). As the result, major FPGA manufacturers will hard-wire such devices in to the FPGA fabric. In higher end devices (such as Xilinx Virtex-II Pro and Altera Stratix GX) the user is also given the use of pre-built functional blocks such as DSP (Stratix GX) or CPU core (PowerPC 405 in Virtex-II Pro). Also, todays FPGAs provide an astounding amount of support for peripheral standards. For example, Xilinxs Virtex-II Family provides blocks of I/O. They can be thought of as a group of I/O pins, which can support various interfaces (PCI-X for example) as long as the group (or bank) of pins is used at a same voltage level. This is very useful to a designer since instead of being tied to a particular set of pins for a certain I/O standard, now the designer can place them as he needs to and for whatever (current) signaling level desired (2.5 V, 3.3 V, 1.8 V, LVDS). Design Flow At present, a high-end FPGA device will provide thousands of Logic Elements along with built in components such as multipliers, DSP blocks or complete CPU cores (PowerPC 405 in Virtex-II Pro), on-board RAM (up to 1 MB in Virtex-II Pro) and a number of I/O support blocks, which can be adapted to various signaling standards. But as wonderful as these devices may sound the actual algorithm or operation has to be processed and converted in to the FPGA fabric layout. Figure 3 shows the cycle, which has to be performed to achieve that objective. The logic has to be thought of first and then it has to be translated in to the component language. Components then have to be laid out on the device (automatically or manually) physical optimization is important here as even minor amounts of optimization could result in significant performance boost and reduce latency and improve timing.
Comparisons CPU vs. FPGA FPGA based systems could provide a number of run-time advantages over the sequential machines. Although todays CPUs claim multi-GHz speeds, most of that power is wasted in the peripherals and software-based execution. A true hardware-accelerated execution can beat software-based process running on a much faster processor. Consider the following example for sequential execution of a few simple functions vs. the same functions performed in hardware, utilizing some parallelism. FPGAs can provide massive amounts of reprogramable hardware acceleration, which can be further boosted by wide use of parallelism. In addition, due to their structure and variably logic, FPGAs inherently can utilize pipelining, where the pipeline or pipelines can be tailored to the needs of the problem.
Figure 4 Sequential execution via software and the sequential hardware vs. execution in hardware (where in FPGA every constant could be programmed in the LUT since it is a RAM unit essentially).
Recent advances in FPGAs programmability-on-the-fly have secured them as the tools of choice for reconfigurable computing (RCC) implementation. It is fine that the device can be changed after a reset, but what if only a small part of the system needs to be changed? Modern FPGAs support the ability to be reprogrammed at run time and in sections (Virtex-II Pro over 10 separate clock domains so if designed properly all the system-blocks can run, while one of them is changed). DSP vs. FPGA DSPs are still accepted as the popular choice for signal processing functions, but FPGA are starting to beat them due to massive parallel processing abilities vs. cost per unit. To make up for lack of parallel processing, DSP designers are bumping up the clock rates (1+ GHz); however this makes for a very hard analog design of the supporting PCBs (Printed Circuit Boards) suddenly the board designers have to fall back on high speed analog communications theory.
Function 8x8 Multiply Accumulate (MAC) FIR Filter - 256 Taps, Linear phase - 16-bit data/coefficients Complex FFT - 1024 point, 16-bit data Reed-Solomon Decoding Throughput
Industry's Fastest DSP Processor Core 4.8 Billion MAC/s fclk = 600 MHz 9.3 MSPS fclk = 600 MHz 10 s fclk = 600 MHz 4.1 Mbps fclk = 600 MHz
Xilinx Virtex-II Pro Platform 1 Trillion MACs/s fclk = 300 MHz 300 MSPS fclk = 300 MHz 1 s fclk = 150 MHz 10 Gbps** (OC-192 rates) fclk = 85 MHz
Table 1 Xilinxs recent comparison between a DSP core and the Virtex-II Pro performance.
Conclusions Despite all of the plausible qualities, the FPGA is not a do all device (at least for now). FPGAs have a few issues that do not always make them the weapon-of-choice: 1. The Logic Blocks may not be fully utilized perhaps this is not an issue to a small scale designer, but it could be problem. 2. Floating Point problems FPGAs dont handle floating point well, but then again what discrete device does not everything is done by approximation at some point in the digital world. 3. The reprogramming speeds are an issue and steps are taken to address them, but still the bigger the device the more time it will take to update it. This is yet another reason why partial programmability is important. 4. The FPGA is not the best solution for sequential algorithms CPU still has a job, yet CPUs are integrated in to the FPGA fabric, so this may not be a problem for too long. FPGA provides an ever-growing flexibility in logic design and implementation and the advances in their capabilities help to propel the research in to the RCC paradigm, making it a considerable (re-emerging) segment of the Computer Science.