VND Openxmlformats-Officedocument Wordprocessingml Document&rendition 1
VND Openxmlformats-Officedocument Wordprocessingml Document&rendition 1
PRESENTER’S MANUAL
Department of ECE
Course Name: System Design with FPGA
Course Code: EC20552
1
What is Paavai Teaching Methodology?
At Paavai Educational Institutions, inclusive, flexible and insightful learning aims to provide engaging
educational experiences and meet the needs of learners from all the backgrounds. Teachers should align
their teaching with everyday life which will make learning meaningful.
This semester is based on the teaching methodology following three cardinal components:
1. Concept Class
Concept class: This is a theory class that will focus on the concepts. Whenever required this session will
also demonstrate how these concepts get translated into Mechanical Engineering topics.
Directed Learning Class: Learning and application may be challenging for some students. One of the
oldest and most comprehensive ways of delivery information, self-directed class allows the student to
apply themselves in a manner that makes understanding content more accessible. In this process, learners
take initiative in their own learning by planning, implementing and evaluating their learning.
Concept focused
2
EC20552 SYSTEM DIESIGN WITH FPGA 3 0 0 3
COURSE OBJECTIVES
Introduction: Programmable Logic Array: Programmable Logic Devices, Generic Array Logic-
Architecture of Xilinx cool runner XCR3064XL CPLD: CPLD implementation of Parallel adder.
Counter Examples: Fast Video Controller, Position Tracker for a Robot Manipulator, Design Counters with
ACT devices, Designing Adders and Accumulators with the ACT architecture
TOTAL PERIODS 45
3
Course Outcomes
TEXT BOOKS
1. Charles H.Roth Jr.Lizy Kuriyan John and Byeong Kil Lee. “ Digital System Design using Verilog”
Cengage I Earning 2016.
2. Stephen M. and Trim Berger “Field Programmable Gate Array Technology” Springer International
Edition 1994.
REFERENCE
1. John V. Oldfield and Richard C Dorf “Field Programmable Gate Arrays” Wiley India 1995.
2. Pak K Chan Samiha Mourad “Digital Design using FPGA” Pearson Low Price Edition.2009.
3. Ian Grout “Digital System Design using FPGA and CPLDs” Elsevier. Newnes 2008.
4. Wayne Wolf “FPGA based System Design” Prentice Hall Modern Semiconductor Design Series.2004.
5
Pre-requisites for taking this course
This semester is based on the teaching methodology following three cardinal components:
Optimize Designs Apply optimization techniques to improve the performance, area, and power
consumption of FPGA-based designs
6
CO/PO MAPPING:
CO1 3 3 3 2 - - - - 2 - - 3 3 3
CO2 3 3 3 2 - - - - 2 - - - 3 3
CO3 3 3 3 - - - - - - - - - 3 3
CO4 3 3 3 2 - - - - - - - 3 3 3
CO5 3 3 3 2 - - - - 2 - - 3 3 3
I. CONCEPT CLASS
LESSON PLAN
Xilinx Xc3000
Architecture 10/08/24
1,2 CC 2 11-15 1 L/CB
Configurable Logic 12/08/24
Block
I/O Blocks
Programmable CC 14/08/24
3,4 2 1 L/CB
Interconnect 17/08/24 17-21
Xc4000 Architecture
19/08/24
5,6 Configurable Logic CC 2 22-24 1 L/CB
19/08/24
Block
8
UNIT-III
1. TECHNICAL TERMS
9
Xilinx XC3000 Architecture
& Configurable Logic Block
S.No Time Structure
1. 2mins Attendance
2. 2mins Pranayama
Introduction to Xilinx XC3000 Architecture
3. What is the features of XC3000?
3mins
What is the use of Logic Block?
Summarizing
8. 3mins
Introduction of Programmable logic block Architectures
10
XILINX XC3000 ARCHITECTURE
Introduced in 1987/88, XC3000 is the industry’s most successful family of FPGAs, with over 10 million
devices shipped. In 1992/93, Xilinx introduced three additional families, offering more speed,
functionality, and a new supply-voltage option.
There are now five distinct family groupings within theXC3000 class of LCA devices.
• XC3000 Family
• XC3000A Family (use for new designs)
• XC3000L Family (use for new designs)
• XC3100 Family
• XC3100A Family (use for new designs)
All five families share a common architecture, development software, design and programming
methodology, and also common package pin-outs. An extensive Product Description covers these
information for the four individual product families. common aspects. (Page 2-99).
XC3000 Family
The basic XC3000 family forms the cornerstone for the rest of the XC3000 class of devices. The
basic
11
The XC3000A is an enhanced version of the basic XC3000 family, featuring additional interconnect
resources and other user-friendly enhancements. The ease-of-use of theXC3000A family makes it the
obvious choice for all new designs that do not require the speed of the XC3100 or the3-V operation of the
XC3000L.
XC3000L Family
The XC3000L is identical in architecture and features to the XC3000A family, but operates at a
nominal supply voltage of 3.3 V. The XC3000L is the right solution for battery-operated and low-
power applications.
XC3100 Family
The XC3100 is a performance-optimized relative of the basic XC3000 family. While both families
are bit stream and footprint compatible, the XC3100 family extends toggle rates to 270 MHz and
in-system performance to 80 MHz. The XC3100 family also offers one additional array size, the
XC3195. The XC3100 is best suited for designs that require the highest clock speed or the shortest
net delays.
XC3100A Family
The XC3100A combines the enhanced feature set of theXC3000Awith the performance of the XC3100.
It offers the highest functionality, speed and capacity of all XC3000 families.
The figure below illustrates the relationships between the families. Compared to the original
XC3000 family, XC3000A offers additional functionality and , coming soon, increased speed. The
XC3000L family offers the same additional functionality, but reduced speed due to its lower supply
voltage of 3.3 V. The XC3100 family offers no additional functionality, but substantially higher
speed, and higher density with its new member, the XC3195.
ARCHITECTURE
P P9 P 12
P P P P P GN
W
P 3-State Configur
T
C
A A A A
P
The perimeter of configurable IOBs provides a programmable interface between the internal logic array and the
device package pins. The array of CLBs performs user-specified logic functions. The interconnect resources are
programmed to form networks, carrying logic signals among blocks, analogous to printed circuit board traces
connecting MSI/SSI packages.
The block logic functions are implemented by programmed look-up tables. Functional options are implemented
by program-controlled multiplexers. Interconnecting networks between blocks are implemented with metal
segments joined by program-controlled pass transistors.
These LCA functions are established by a configuration program which is loaded into an internal,
distributed array of configuration memory cells.
Configuration Memory
The static memory cell used for the configuration memory in the Logic Cell Array has been
designed specifically for high reliability and noise immunity. Integrity of the LCA device configuration
memory based on this design is assured even under adverse conditions. Compared with other programming
alternatives, static memory provides the best combination of high density, high performance, high reliability
and comprehensive testability. As shown in Figure 2, the basic memory cell consists of two CMOS inverters
plus a pass transistor used for writing and reading cell data. The cell is only written during configuration and
only read during read back. During normal operation, the cell provides continuous control and the pass
transistor is off and does not affect cell stability. This is quite different from the operation of conventional
memory devices, in which the cells are frequently read and rewritten.
DI
DATA IN
0
QX MUX 1 D
Q F DING
LOGIC
QY
C
VARIABLES F DING
COMBINATORI
AL FUNCTION F RD
G
14
A
B
Count Enable Terminal
Count
Parallel Enable
Clock
Dual Function of 4 Variables
D Q Q0
A D0
B FG
Mode
D Q Q1
D1
Function of 5 Variables
F
Mode
15
I/O Blocks &
Programmable Interconnect
16
I/O BLOCK
Vcc
3- STATE
(OUTPUT ENABLE)
O
DIRECT IN D Q OUTPUT
FLIP
REGISTERED IN
I/O PAD
Q
Q D
TTL or CMOS
FLIP
OUT or
INPUT
CK1
OK IK (GLOBAL RESET)
17
Each IOB includes input and output storage elements and I/O options selected by configuration
memory cells. A choice of two clocks is available on each die edge. The polarity of each clock line (not each flip-
flop or latch) is programmable. A clock line that triggers the flip-flop on the rising edge is an active Low Latch
Enable (Latch transparent) signal and vice versa. Passive pull-up can only be enabled on inputs, not on outputs.
All user inputs are programmed for TTL or CMOS threshold. e input-buffer portion of each IOB provides
threshold detection to translate external signals applied to the package pin to internal logic levels. The global
input-buffer threshold of the IOBs can be programmed to be compatible with either TTL or CMOS levels. The
buffered input signal drives the data input of a storage element, which may be configured as either a flip-flop or a
latch. The clocking polarity (rising/falling edge-triggered flip-flop, High/Low transparent latch) is programmable
for each of the two clock lines on each of the four die edges. Note that a clock line driving a rising edge-triggered
flip-flop makes any latch driven by the same line on the same edge Low-level transparent and vice versa (falling
edge, High transparent). All Xilinx primitives in the supported schematic-entry packages, however, are positive
edge- triggered flip-flops or High transparent latches. When one clock line must drive flip-flops as well as
latches, it is necessary to compensate for the difference in clocking polarities with an additional inverter either in
the flip-flop clock input or the latch-enable input. I/O storage elements are reset during configuration or by the
active-Low chip RESET input. Both direct input (from IOB pin I) and registered input (from IOB pin Q) signals
are available.
For reliable operation, inputs should have transition times of less than 100 ns and should not be left floating.
Floating CMOS input-pin circuits might be at threshold and produce oscillations. This can produce additional
power dissipation and system noise. A typical hysteresis of about 300 mV reduces sensitivity to input noise. Each
user IOB includes a programmable high-impedance pull-up resistor, which may be selected by the program to
provide a constant High for otherwise un driven package pins. Although the Logic Cell Array provides circuitry
to provide input protection for electrostatic discharge, normal CMOS handling precautions should be observed.
Flip-flop loop delays for the IOB and logic-block flip-flops are about 3 ns. This short delay provides good
performance under asynchronous clock and data conditions. Short loop delays minimize the probability of a
metastable condition that can result from assertion of the clock during data transitions. Because of the short-loop-
delay characteristic in the Logic Cell Array, the IOB flip-flops can be used to synchronize external signals
applied to the device. Once synchronized in the IOB, the signals can be used internally without further
consideration of their clock relative timing, except as it applies to the internal logic and routing-path delays. IOB
18
output buffers provide CMOS-compatible 4-mA source-or-sink drive for high fan-out CMOS or TTL- com-
partible signal levels (8 mA in the XC3100 family). The network driving IOB pin O becomes the registered or
direct data source for the output buffer.
PROGRAMMABLE INTERCONNECT
Programmable-interconnection resources in the Logic Cell Array provide routing paths to connect inputs
and outputs of the IOBs and CLBs into logic networks. Inter- connections between blocks are composed of a two-
layer grid of metal segments. Specially designed pass transistors, each controlled by a configuration bit, form
programmable interconnect points (PIPs) and switching matrices used to implement the necessary connections
between selected metal segments and block pins. Figure 7 is an example of a routed net. The XACT development
system provides automatic routing of these interconnections. Interactive routing (Editnet) is also available for
design optimization. The inputs of the CLBs or IOBs are multiplexers which can be programmed to select an
input network from the adjacent interconnect segments
19
• Long lines (multiplexed busses and wide AND gates)
LONGLINES
The Long lines bypass the switch matrices and are in- tended primarily for signals that must travel a long
distance, or must have minimum skew among multiple destinations. Long lines, shown in Figure 13, run
vertically and horizontally the height or width of the interconnect area. Each interconnection column has three
vertical Long lines, and each interconnection row has two horizontal Long lines. Two additional Long lines
are located adjacent to the outer sets of switching matrices. In devices larger than the XC3020, two vertical
Long lines in each column are connectable half-length lines. On the XC3020, only the outer Long lines are
connectable half- length lines. Long lines can be driven by a logic block or IOB output on a column-by-
column basis. This capability provides a common low skew control or clock line within each column of logic
blocks. Interconnections of these Long lines are shown in Figure 14. Isolation buffers are provided at each
input to a Long line and are enabled automatically by the development system when a connection is made.
A buffer in the upper left corner of the LCA chip drives a global net which is available to all K inputs of
logic blocks. Using the global buffer for a clock signal provides a skew-free, high fan-out, synchronized clock
20
for use at any or all of the IOBs and CLBs. Configuration bits for the K input to each logic block can select
this global line or another routing resource as the clock source for its flip-flops. This net may also be
programmed to drive the die edge clock lines for IOB use. An enhanced speed, CMOS threshold, direct
access to this buffer is available at the second pad from the top of the left die edge.
A buffer in the lower right corner of the array drives a horizontal Long line that can drive programmed
connections to a vertical Long line in each interconnection column. This alternate buffer also has low skew
and high fan- out. The network formed by this alternate buffer’s Long lines can be selected to drive the K
inputs of the CLBs. CMOS threshold, high speed access to this buffer is available from the third pad from
the bottom of the right die edge.
DIRECT INTERCONNECT
Direct interconnect, shown in Figure 11, provides the most efficient implementation of networks between
adjacent CLBs or I/O Blocks. Signals routed from block to block using the direct interconnect exhibit
minimum interconnect propagation and use no general interconnect resources. Where logic blocks are
adjacent to IOBs, direct connect is provided alternately to the IOB inputs (I) and outputs (O) on all four edges
of the die. The right edge provides additional direct connects from CLB outputs to adjacent IOBs. Direct
interconnections of IOBs with CLBs are shown in Fig.
XC4000 Architecture
& Configurable Logic Block
21
S.No Time Structure
1. 2mins Attendance
2. 2mins Pranayama
Introduction to Xilinx XC4000 Architecture
3. What is the features of XC4000?
3mins
What is the use of Logic Block?
Summarizing
8. 3mins
Introduction of Programmable logic block Architectures
ARCHITECHTURE
22
The XC4000 families achieve high speed through advanced semiconductor technology and through
improved architecture, and supports system clock rates of up to 50 MHz. Compared to older Xilinx FPGA
families, the XC4000 families are more powerful, offering on-chip RAM and wide-input decoders. They are
more versatile in their applications, and design cycles are faster due to a combination of increased routing
resources and more sophisticated software. And last, but not least, they more than double the available
complexity, up to the 20,000-gate level.
Xilinx high-density user-programmable gate arrays include three major configurable elements:
configurable logic blocks (CLBs), input/output blocks (IOBs), and inter- connections. The CLBs provide the
functional elements for constructing the user’s logic. The IOBs provide the interface between the package
pins and internal signal lines. The programmable interconnect resources provide routing paths to connect the
inputs and outputs of the CLBs and IOBs onto the appropriate networks. Customized configuration is
established by programming internal static memory cells that determine the logic functions and inter-
connections implemented in the LCA device. The first generation of LCA devices, the XC2000 family, was
introduced in 1985. It featured logic blocks consisting of a combinatorial function generator capable of
implementing 4-input Boolean functions and a single storage element. The XC2000 family has two members
ranging in complexity from 800 to 1500 gates .In the second-generation XC3000A LCA devices, introduced in
1987, the logic block was expanded to implement wider Boolean functions and to incorporate a second flip-
flop in each logic block. Today, the XC3000 devices range in complexity from 1,300 to 10,000 usable gates.
They have a maximum guaranteed toggle frequency ranging from 70 to 270 MHz, equivalent to maximum
system clock frequencies of up to 80 MHz. The third generation of LCA devices further extends this
architecture with a yet more powerful and flexible logic block. I/O block functions and interconnection
options have also been enhanced with each successive genera-tion, further extending the range of applications
that can be implemented with an LCA device.
23
CONFIGURABLE LOGIC BLOCK
CONTROL
G1-G4
FUNCTION
CONTROL
DIN
FUNCTION
(CLOCK)
C1 C2 C3 C4
A number of architectural improvements contribute to the increased logic density and performance levels
of the XC4000 families. The most important one is a more powerful and flexible CLB surrounded by a
versatile set of routing resources, resulting in more “effective gates per CLB.” The principal CLB
elements are shown in Figure 1.Each new CLB also packs a pair of flip-flops and two independent 4-
input function generators. The two function generators offer designers plenty of flexibility because
most combinatory
function generator independently, thus improving cell usage.
24
Thirteen CLB inputs and four CLB outputs provide access to the function generators and flip-flops. More than
double the number available in the XC3000 families, these inputs and outputs connect to the programmable
interconnect resources outside the block. Four independent inputs are provided to each of two function generators
(F1 – F4 and G1 – G4). These function generators, whose outputs are labeled F' and G', are each capable of
implementing any arbitrarily defined Boolean function of their four inputs. The function generators are
implemented as memory look-uptables; therefore, the propagation delay is independent of the function being
implemented. A third function genera-tor, labeled H', can implement any Boolean function of its three inputs: F' and
G' and a third input from outside the block (H1). Signals from the function generators can exit the CLB on two
outputs; F' or H' can be connected to the X output, and G' or H' can be connected to the Y output. Thus, a CLB can
be used to implement any two independent functions of up-to-four variables, or any single function of five
variables, or any function of four variables together with some functions of five variables , or it can implement even
some functions of up to nine variables. Implementing wide functions in a single block reduces both the number of
blocks required and the delay in the signal path, achieving both increased density and speed . Each flip-flop can be
triggered on either the rising or falling clock edge. The source of a flip-flop data input is programmable: it is driven
either by the functions F', G', and H', or the Direct In (DIN) block input . The flip-flops drive the XQ and YQ CLB
outputs. Multiplexers in the CLB map the four control inputs, labeled C1 through C4 in Figure 1, into the four
internal control signals (H1, DIN, S/R, and EC) in any arbitrary manner. The flexibility and symmetry of the CLB
architecture facilitates the placement and routing of a given application. Since the function generators and flip-flops
have independent inputs and outputs, each can be treated as a separate entity during placement to achieve high
packing density. Inputs, outputs, and the functions themselves can freely swap positions within a CLB to avoid
routing congestion during the placement and routing operation.
25
I/O Blocks &
Programmable Interconnect
I/O BLOCK
User-configurable IOBs provide the interface between external package pins and the internal logic Fig.
Each IOB controls one package pin and can be defined for input, output, or bidirectional signals. Two
paths, labeled I1 and I2, bring input signals into the array. Inputs are routed to an input register that can
be programmed as either an edge-triggered flip-flop or a level-sensitive transparent latch. Optionally, the
data input to the register can be delayed by several nanoseconds to compensate for the delay on the clock
signal, that first must pass through a global buffer before arriving at the IOB. This eliminates the
possibility of a data hold-time requirement at the external pin. The I1 and I2 signals that exit the block
can each carry either the direct or registered input signal.
26
OE
Out
Output signals can be inverted or not inverted, and can pass directly to the pad or be stored in an
edge- triggered flip-flop. Optionally, an output enable signal can be used to place the output buffer in a high-
impedance state, implementing 3-state outputs or bidirectional I/O. Under con- figuration control, the output
(OUT) and output enable (OE) signals can be inverted, and the slew rate of the output buffer can be reduced to
minimize power bus transients when switching non-critical signals. Each XC4000-families output buffer is
capable of sinking 12 mA; two adjacent output buffers can be wire-ANDed externally to sink up to 24 mA. In
the XC4000A and XC4000H families, each output buffer can sink 24 mA. There are a number of other
programmable options in the IOB. Programmable pull-up and pull-down resistors are useful for tying unused
pins to VCC or ground to minimize power consumption. Separate clock signals are provided for the input and
output registers; these clocks can be inverted, generating either falling-edge or rising-edge triggered flip-flops.
As is the case with the CLB registers, a global set/reset signal can be used to set or clear the input and output
registers whenever the RESET net is active.
PROGRAMMABLE INTERCONNECT
All internal connections are composed of metal segments with programmable switching points to implement
the desired routing. An abundance of different routing re- sources is provided to achieve efficient automated
routing. The number of routing channels is scaled to the size of the array; i.e., it increases with array size. In
previous generations of LCAs, the logic-block inputs were located on the top, left, and bottom of the block;
outputs exited the block on the right, favoring left-to-right data flow through the device. For providing
additional routing flexibility (Figure 6). In general, the entire architecture is more symmetrical and regular
than that of earlier generations, and is more suited to well-established placement and routing algorithms
26
developed for conventional mask- programmed gate-array design.
26
There are three main types of interconnect, distinguished by the relative length of their
segments: single-length lines, double-length lines, and Long lines. Note: The number of routing
channels shown in Figures 6 and 9 are for illustration purposes only; the actual number of routing
channels varies with array size. The routing scheme was designed for minimum resistance and
capacitance of the average routing path, resulting in significant performance improvements. Compared
to the previous generations of LCA architectures, the number of possible connections through the
Switch Matrix has been reduced. This decreases capacitive loading and minimizes routing delays, thus
increasing performance. However, a much more versatile set of connections between the single-length
lines and the CLB inputs and outputs more than compensate for the reduction in Switch Matrix options,
resulting in overall increased routability.
Switc Switc
h h
F4 C4 G4
G1
G3
C1 CLB
K C3
F1 F3
F2 C2
Switc Switc
h h
F4 C4
G4 YQ
Y
X G
X CL
C
3
F2 G
The double-length lines (Figure 8) consist of a grid of metal segments twice as long as the single-length lines;
i.e, a double-length line runs past two CLBs before entering a Switch Matrix. Double-length lines are grouped
in pairs with the Switch Matrices staggered so that each line goes through a Switch Matrix at every other CLB
location in that row or column.
27