programmable-logic
programmable-logic
P1 = ĀB C̄ + ĀBC + AB̄C + ABC (1) This could be expressed in an 8-element LUT (8 = 23 where
3 is the number of inputs) with all elements zero except for
or, equivalently, pick the rows where P = 0 and multiply the element at address 6 (110 in binary) if S1 was the MSB
the corresponding maxterms, to form the product-of-sums, of the address.
1
UW-Physics PHYS623 Introduction to Programmable Logic Devices
2.2 FPGAs
The FPGA, short for field-programmable gate array, has be-
come a popular solution for digital designers in the past
two decades due to its flexibility and power. Totally re-
programmable, it is possible to purchase generic PCBs with
hard-wired on-board peripherals but in addition, expansion
headers that connect to off-board user-custom hardware, and
develop complex systems with minimal hardware develop-
ment. The programmable logic can then be configured with
Figure 1: Typical PAL internal structure. The small
firmware1 images which define the behavior of the digital
shaded dots interconnecting the rows and columns are
system.
(re-)programmable switches and allow arbitrary product
minterms to be presented to the summing OR gate at right. FPGAs, unlike CPLDs, are based on configurations defining
the logic to be implemented held in static RAM (SRAM)
rather than flash. This means that the devices must be re-
2 Programmable Logic Types configured each time they are powered on or reset due to
the volatile nature of SRAM. Despite this inconvenience,
the much higher densities and speeds which can be achived
2.1 CPLDs
with FPGAs relative to CPLDs has given them a market
edge.
Programmable array logic devices (PALs) implement the
sum-of-products using a structure shown in Figure 1. Ex-
ternal and feedback inputs are overlaid on an array of AND 2.2.1 Logic Cells
gates. Fuse links connect these inputs into the AND gates
and can be programmed open or closed. Arbitrary product The fundamental element of the FPGA is the logic element,
sums are then formed using the multiple input OR gates. or LE (Altera); configurable logic block, or CLB (Xilinx); or
The PAL may contain a number of registers on the outputs programmable logic block, or PLB (Lattice Semiconductor).
as well to implement sequential logic. They all function similarly, though differ in the details. A
schematic of the LEs found in Altera’s Cyclone IV series of
Complex Programmable Logic Devices, or CPLDs, evolved
FPGAs is shown in Figure 2.2.1.
the registered sum-of-product structures of PAL devices to
include more flexiblity, called it a macrocell, and then packed Signals from the global interconnect matrix arrive from the
many macrocells connected by an intricate network of inter- left and exit at right. Signals at top and bottom are lo-
connects into a single IC. CPLDs replaced the fuses with cal (more on this later). The Cyclone IV LE’s each con-
switches whose state is held in nonvolatile flash RAM: once tain a single 4-input (i.e.16-element) LUT and a D register.
programmed the switch states persist across power down and The LUT may be configured either as a single 16-element
reset conditions which is a nice convenience. The flash mem- LUT or a special arithmetic mode where the LUT is split
ory can be reprogrammed thousands of times. 1 Note, firmware is also often used when discussing the software im-
CPLDs are typically distinguished from the related logic de- age running on an embedded microcontroller. I normally refer to mi-
croprocessor and microcontroller instruction bitstreams as software to
vice to be taken up shortly, the field-programmable gate ar- disambiguate the various image files in cases where a soft IP micropro-
ray, in their non-volatility, relatively low cost, clock speeds, cessor core runs on an FPGA and needs its own executable image or
logic density, and power consumption. A popular CPLD, the images.
Page -2-
UW-Physics PHYS623 Introduction to Programmable Logic Devices
into 2 8-element LUTs where in addition to the normal out- blocks. Xilinx calls these BlockRAM and each block
put which is derived from the LUT upper half there is a contains 36kbit on the newest generation “7 series” FP-
fast carry output derived from the lower half. This is use- GAs. In both cases the RAMs may be arranged in vary-
ful for implementing adders and counters. It should be noted ing word lengths. In addition to the dedicated RAM
that the carries propagate along low propagation delay paths blocks, the LUTs may also be configured for distributed
which increases the maximum speed of adders and counters. RAM in cases where small blocks or extra flexibility is
The shaded muxes allow configuration-time selection of sig- needed.
nal paths (i.e.these muxes cannot be changed dynamically
• Hardware multipliers;
by logic). The D register have a dedicated clock enable (CE)
input. Clock enables are an important technique employed • Hard processor cores - dedicated resources which im-
in FPGA designs to control clock skew. More on this later plement high-performance ARM microprocessors are of-
(see section 5.1). fered by Altera (SoC variants of the Cyclone, Arria, and
Stratix families) and Xilinx (Zynq);
The LEs are arranged, in Cyclone IV devices in groups of 16,
in LABs or logic array blocks which contain, in addition to the • Dedicated DDR physical interfaces;
LEs, local interconnect lines which offer low skew connections • Hardware multipliers (DSP slices);
to LEs within the same LAB. The LABs are then packed
in array fashion onto the die. The layout of the smallest • Digital clock management tiles (fractional-N PLLs with
member of the Cyclone IV family - the EP4CE6E22 with precision delay taps);
only 6k LEs is shown in Figure 3. • Gigabit transceivers for high-speed serial interfaces;
• PCIe hard endpoints.
2.2.2 Other Resources
2.2.3 Clock Networks
In addition to LEs, the FPGA fabric contains other special
functions of general use:
Clocks are handled differently from normal logic in FPGA de-
• Random access memory (RAM) blocks. On Altera de- signs: they are routed along low-skew networks, interconnect
vices these are arranged in 9kbit blocks called M9K freeways, while normal logic must travel the surface streets.
Page -3-
UW-Physics PHYS623 Introduction to Programmable Logic Devices
Figure 3: Structure of Altera EP4CE6E22 6k LE Cyclone IV FPGA. Each small rectangular tile in the matrix is a logic array
block (LAB) which itself contains 16 logic elements (LEs) and local interconnects (zoom at right). The greyed-out tiles are
unused. The two light blue strips are M9K memory blocks and the light red stripe contains the DSP multiplier blocks. Two
PLLs are located in opposite corners of the die. Around the periphery are located the various I/O banks. Each I/O bank
is capable of supporting several signaling standards, however, as I/Os within each bank share the power rail, designers are
constrained to common signaling within a bank.
Page -4-
UW-Physics PHYS623 Introduction to Programmable Logic Devices
There are only a few entry points to these preferred routes clearer after discussing the highest level of design abstrac-
where normal logic can access the clock networks. Because tion.
logic signals can accumulate a considerable delay in travelling This level is called the behavioral modeling level and is de-
these routes, it is strongly discouraged in firmware designs scribed using one or more textual-based HDLs as opposed
to connect the output of LUTs or registers to clock inputs. to graphical-based schematic capture. The designer speci-
The standard work around for instances where gated clocks fies how the logic signals are to behave and how they are
or clock dividers would normally be used is to use clock en- linked togethers but does not explicitly use gates to do so.
ables. These are described further in Section 5.1 Statements to the effect of “make a group of signals C which
are the binary sum of signals A and B” or “at the rising
edge of the clock signal set the state of a state machine to
3 Firmware Design Tutorial some specified state if the value of an input signal is high,
otherwise remain in the current state” are how the design
is expressed in behavioral modeling. To be quite honest, de-
Firmware development using a hardware description lan- signers are free to model at the gate or even technology level:
guage (HDL) offers many advantages over schematic entry, as will soon be shown, a basic modeling concept present in all
however it presents a steep learning curve. We will plunge HDLs is the module which allows encapsulation of firmware
in, tutorial fashion: after giving some high-level guidance, sub-circuits in boxes with input and output ports. It is en-
the reader is presented with familiar examples of logic gates tirely possible2 to define boolean gates or D or JK flip-flops
and shown how they are modeled in the two common HDLs as firmware modules or even use vendor-supplied modules
in the hope that firmware design patterns will be recognized that implement the primitive logic cells and connect these
and generalized. Then the important subject of simulation together in the firmware source text.
testbenches will be used to demonstrate good design prac-
tice (test early, test often) and explore additional features of In my experience, design expression at the behavioral level in
HDLs. an HDL is the most efficient method of design entry and I see
the design globally as a collection of the firmware modules.
However, within a given firmware module I always seem to
have a running guess at least of how the synthesizer will
3.1 Three Levels of Abstraction render the RTL.
Coming back briefly to the comment about behavioral mod-
There are three levels on which a firmware designer can, and
eling statements that are not synthesizable (i.e., unable to
ought to, regard the firmware design. Starting with the most
be realized in hardware), one may well guess that there are
concrete there is the technology level: at this level the de-
applications of behavioral modeling beyond firmware imple-
sign exists mapped onto the various logic elements and other
mentation. HDLs are also used for documenting and sim-
resources on the PLD. It is probably impossible to compre-
ulating digital circuits. Only a subset of the languages are
hend an entire design of even middling complexity at this
used to implement firmware and this is a particularly steeply-
level, nevertheless it is often necessary for reasons of opti-
sloped section of the learning curve for HDLs. It is not al-
mization to examine critical path elements at this low level.
ways clear which constructs will synthesize; worse, it is not
Also, in order to understand the reports generated by the
consistent across different toolchains. An intuitive under-
logic synthesis tools, resource utilization for example, some
standing of the RTL representation of the behavioral model
familiarity with the technology level is useful.
helps: if a statement would take a great number of logic
At the next level, detaching itself slightly from the reality of gates to implement, be on guard that it may not synthesize.
the underlying logic cells, the design exists on the RTL, reg- Having said that, most synthesizers will infer arbitrarily long
ister transfer level, or sometimes called gate level. A designer adders from a single line of HDL code which adds two signals.
entering the firmware design in schematic capture mode in- Now that many FPGAs have dedicated hardware multipli-
puts directly at this level. It is considerably easier to navi- ers, inference of DSP multiplier blocks is often supported by
gate a design here: hierarchical structure exists or can exist synthesis tools. Recommendation: invest time reading the
at this level so generic (i.e., counters) and user-defined com- documentation of each tool to see what it will do.
posed logic blocks are used to improve design readability.
The design, while not a literal representation of the low-level
structure, is a functionally equivalent view and, moreover, is 3.2 VHDL and Verilog
able to be realized on the PLD, unlike some constructions
VHDL (Very High Speed Integrated Circuit HDL) and Ver-
found at the higher level. This may seem an odd comment
ilog are the two most popular HDLs in use currently. These
to make: what’s the point of designing firmware that cannot
be implemented in hardware? The answer should become 2 and occassionally necessary for design optimization
Page -5-
UW-Physics PHYS623 Introduction to Programmable Logic Devices
languages have evolved over time, again read the tool doc- lines inform the synthesizer that the IEEE code library
umentation to see what standard is supported. Despite an called std_logic_1164 must be loaded to gain access to the
admitted personal bias toward VHDL, there is not a right std_logic type. VHDL contains many built-in types however
choice nor a wrong one - each have strengths and weak- the type most used for logic synthesis, std_logic, is contained
nesses: VHDL supports some high-level constructs which in an add-on (but omnipresent) library. VHDL makes a dis-
Verilog does not but Verilog syntax is much more succint. tinction between module interface and implementation and
Verilog syntax is close to C and thus more intuitive to a so requires the designer to declare the input and output ports
wider group of people while VHDL has an Ada-like syntax in the entity declaration between lines 4 and 6, while putting
which takes some getting used to. SystemVerilog with its the implementation in an architecture block, here between
Verilog syntax and support for more abstraction is now sup- lines 8 and 11.
ported by major toolchains and so may be the right choice 1 library ieee ;
for future-looking designers. To give a flavor for both VHDL 2 use i e e e . std_logic_1164 . a l l ;
and Verilog, the basic introductory portions of this tutorial 3
4 e n t i t y my_nand i s
on HDLs will include and compare both languages. However,
5 p o r t ( a , b : i n s t d _ l o g i c ; q : out s t d _ l o g i c ) ;
the later sections will present only VHDL. 6 end e n t i t y my_nand ;
7
8 a r c h i t e c t u r e b e h a v i o r a l o f my_nand i s
3.3 Entities and Modules 9 begin
10 q <= a nand b ;
11 end a r c h i t e c t u r e b e h a v i o r a l ;
VHDL and Verilog designs are entered into text files with
extension .vhd or .vhdl for VHDL, .v for Verilog files. By
convention one file holds one design unit. The basic design
unit is the entity (VHDL) or module (Verilog). All firmware 3.4 Modeling Sequential Logic
designs start with the top module whose I/O ports then cor-
respond to the physical I/O pins of the IC. Modules are then While similar concepts exist in both languages for concur-
hierarchically arranged to arbitrary depth. rent (combinational) and sequential logic, Verilog and VHDL
differ substantially on how they handle stateful and state-
This tutorial starts with a familiar circuit element, the 2- less signals. Before tackling this more difficult topic, let’s
input NAND gate of Figure 4. first cover each language’s syntax to deal with sequential
events: Verilog’s always blocks and VHDL’s process state-
ments which are of similar nature. Again, we take a known
example from the real digital world: a JK flip-flop, Fig-
ure 5.
Page -6-
UW-Physics PHYS623 Introduction to Programmable Logic Devices
12 e l s e begin 16 begin
13 case ({ j , k }) 17 q <= q_int ;
14 2 ’ b00 : ; 18 process ( clk , r s t )
15 2 ’ b01 : q <= 1 ’ b0 ; 19 begin
16 2 ’ b10 : q <= 1 ’ b1 ; 20 i f r s t = ’ 1 ’ then
17 2 ’ b11 : q <= ~q ; 21 q_int <= ’ 0 ’ ;
18 default : ; 22 e l s i f r i s i n g _ e d g e ( c l k ) then
19 endcase 23 case s t d _ l o g i c _ v e c t o r ’ ( j & k ) i s
20 end 24 when ”00” => n u l l ;
21 end 25 when ”01” => q_int <= ’ 0 ’ ;
22 endmodule 26 when ”10” => q_int <= ’ 1 ’ ;
27 when ”11” => q_int <= not q_int ;
28 when o t h e r s => n u l l ;
The module’s input and output ports are enumerated in lines 29 end case ;
2-6. It is more conventional to write port lists in this man- 30 end i f ;
ner, one port per line, as opposed to several per line unless 31 end p r o c e s s ;
the list fits on a single line. Note that the output port has 32 end a r c h i t e c t u r e b e h a v i o r a l ;
Page -7-
UW-Physics PHYS623 Introduction to Programmable Logic Devices
3.5.1 Verilog Testbench The JK flip-flop DUT is placed in lines 27–28. The
ports are mapped using the syntax .port_name(net), where
Testbenches are special examples of modules with no inputs port_name is the name of the port in the module definition,
and outputs – they exist only in their own isolated simula- and net is the name of the wire or register in the current
tion universes. The first line of the testbench is a directive module to connect to that port.
telling the simulator what is the fundamental time step in
the simulation. This will become important to understand
time in the # delay statements on lines 25 to 34. Otherwise 3.5.2 VHDL Testbench
the file is a normal Verilog file with the caveat that it will
not produce hardware files of course. Similar to Verilog testbenchs, VHDL testbenches are entities
with no ports. VHDL entity local signals are declared in
Module local nets and registers are declared in lines 5–6. the lines between the architecture keyword and the begin,
The parameter is a constant giving the clock period. It is here lines 8–21. The component declaration, lines 14–21,
defined here to define clock half period delays, described in is needed to declare to VHDL that there exists an entity
the very next paragraph, in one place so that, if the clock or module somewhere with those I/O ports. Note that the
period changes, it can be changed in just one place. module could be written in Verilog.
Clocks are simulated using continuously retriggered always 1 library ieee ;
2 use i e e e . std_logic_1164 . a l l ;
blocks, seen in lines 9–13. At the beginning of the block the 3
clk is set to ‘0’. Line 11 contains a delay statement, only 4 entity jkff_tb is
useful in simulation, where the number after the # spec- 5 end e n t i t y j k f f _ t b ;
ifies how many time units, specified by the ‘timescale di- 6
7 architecture simulation of j k f f _ t b i s
rective, should elapse before the statement is executed. It 8 signal clk : std_logic ;
waits one half clock period, sets the clock high, then waits 9 signal rst : std_logic ;
another half clock period on line 12 before repeating the end- 10 signal j , k : std_logic ;
less loop. 11 signal q : std_logic ;
12 constant CLKPER : time := 10 ns ;
1 ‘ t i m e s c a l e 1 ns / 1 ps 13
2 14 component j k f f i s
3 module j k f f _ t b ( ) ; 15 port (
4 parameter CLKPER = 1 0 ; 16 clk : in std_logic ;
5 reg c l k , j , k , r s t ; 17 rst : in std_logic ;
6 wire q ; 18 j : in std_logic ;
7 19 k : in std_logic ;
8 // Clock g e n e r a t o r 20 q : out std_logic );
9 always begin 21 end component j k f f ;
10 c l k = 1 ’ b0 ; 22 begin
11 #(CLKPER/2) c l k = 1 ’ b1 ; 23
12 #(CLKPER/ 2 ) ; 24 −− c l o c k g e n e r a t o r
13 end 25 clkgen : process
14 26 begin
15 // S t i m u l u s g e n e r a t o r 27 c l k <= ’ 0 ’ ;
16 i n i t i a l begin 28 wait f o r CLKPER/ 2 ;
17 { j , k} = 2 ’ b00 ; 29 c l k <= ’ 1 ’ ;
18 rst = 1 ’ b1 ; 30 wait f o r CLKPER/ 2 ;
19 #10 k = 1 ’ b1 ; 31 end p r o c e s s c l k g e n ;
20 #30 r s t = 1 ’ b0 ; 32
21 #50 j = 1 ’ b1 ; 33 s t i m u l i : process
22 #412 r s t = 1 ’ b1 ; 34 begin
23 #32 r s t = 1 ’ b0 ; 35 r s t <= ’ 0 ’ , ’ 1 ’ a f t e r 15 ns , ’ 0 ’ a f t e r 40 ns ;
24 end 36 j <= ’ 0 ’ , ’ 1 ’ a f t e r 60 ns ;
25 37 k <= ’ 0 ’ , ’ 1 ’ a f t e r 80 ns ;
26 // DUT 38 wait ;
27 j k f f jk_inst (. clk ( clk ) , . rst ( rst ) , 39 end p r o c e s s ;
28 . j ( j ) , . k(k ) , . q(q ) ) ; 40
29 endmodule 41 j k _ i n s t : j k f f p o r t map( c l k=>c l k , r s t=>r s t ,
42 j=>j , k=>k , q=>q ) ;
43
The initial block, lines 16–24, is similar to the always block 44 end a r c h i t e c t u r e s i m u l a t i o n ;
but is executed only once. It is used here to define stimuli for
the device under test (DUT). It is used in synthesizable code Lines 25–31 mirror the Verilog clock generation using VHDL
to set initial conditions for registers, memories, &c. wait statements. The stimuli are generated in another pro-
Page -8-
UW-Physics PHYS623 Introduction to Programmable Logic Devices
cess which terminates in a wait halting the process, func- unknown, i.e. driven to different levels by multiple sources;
tionally equivalent to the Verilog initial block. Note that in ‘L’ weak pull-down; ‘H’ weak pull-up; ‘-’ don’t care.
VHDL, signal assignments happen asynchronously (they act std_logic objects can be used where bit types are
like non-blocking assignments in Verilog) so that, while rst used.
goes high at 15 ns and then back low at 55 ns, the rising edge
of j happens at 60 ns, and k at 80 ns. This is in contrast to Example signal declarations (with initial values):
the cumulative delays of Verilog delay statements. Not sur- s i g n a l c l k : b i t := ‘ 0 ’ ;
prisingly, VHDL wait statements and delayed assignments s i g n a l sda : s t d _ l o g i c := ‘H ’ ;
are not synthesizable.
Note that setting std_logic objects to anything other than
‘0’ or ‘1’ in code meant for hardware synthesis is likely to
result in the synthesizer silently (!!) ignoring the assign-
4 More on VHDL ment.
N.B.: VHDL is not case sensitive. Verilog is. The above declaration would result in allocation of only 8
flip flops instead of potentially 32. Standard VHDL defines a
4.1 Data Types natural integer subtype which only includes positive integers
and zero.
VHDL is a strongly typed language and provides many types,
each one serving a specific purpose. Data types specify the 4.1.2 Array Types
nature of signals and variables (sec. 4.3).
Standard Logic The type std_logic_vector is an array
of std_logic elements used ubiquitously to describe multibit
4.1.1 Scalar Types
logic arrays. It is common practice to order bit arrays with
MSB on the left and LSB on the right, the way numbers are
Booleans With the prior duality between digial logic and
normally written. For example, a signal group to hold the
boolean algebra, one might imagine the fundamental logic
32-bit sum of two addends could be declared like this:
type to be a boolean. A boolean type does exist, with possi-
ble values false and true, however since logic signals tend to s i g n a l sum32 : s t d _ l o g i c _ v e c t o r (31 downto 0 ) ;
come in groups of many bits, and character strings are less
std_logic_vector literals are bit strings delineated by double
bulky typograpically than boolean arrays, booleans are not
quote marks. By default the bit strings are binary base-2 but
the dominant type. Nonetheless, they are common and quite
can be written as hexadecimal base-16 by prepending an x
useful.
before the bit string:
Example signal declaration (with initial value): sum32 <= X” 4000C8B3” ;
s i g n a l t r i g g e r e d : bo ol e a n := f a l s e ;
which is equivalent to
sum32 <= X” 01000000000000001100100010110011 ” ;
Bits and Standard Logic Bit types take on (character)
values ‘0’ and ‘1’ and are easier to gang into bit vectors. Individual elements can be accessed read or write:
However, the std_logic type is normally used in preference i f sum8 ( 1 1 ) = ’ 1 ’ then
to bit types because logic synthesis tools need to model logic −− t r u e
end i f ;
states other than LO and HI: other possible states are tri-
stated (high impedance), and weak pullups or pulldowns, to
as can entire slices
name a few. To encompass models with these states, the
IEEE has developed a standard package (std_logic_1164) i f sum32 (15 downto 12) = ” 1100 ” then
−− a l s o t r u e
which defines the std_logic type. Valid std_logic values are: end i f ;
‘0’ logic LO; ‘1’ logic HI; ‘Z’ tri-stated; ‘U’ uninitialized; ‘X’
Page -9-
UW-Physics PHYS623 Introduction to Programmable Logic Devices
Signed and Unsigned The signed and unsigned types shift operation. Division (’/’), modulus (mod), and remain-
are used for modeling numeric computations. They are der (mod) will synthesize to division and modulus logic in
bit strings, not integers, however the packages which define the Quartus synthesizer:
them, numeric_std and numeric_bit, define a number of oper-
signal w : u n s i g n e d (15 downto 0) := x ”3 f 0 a ” ;
ators which allow bitwise and arithmetic operations between signal x : u n s i g n e d (15 downto 0) := x ” 0049 ” ;
signed, unsigned, and integer types. signal y : u n s i g n e d (15 downto 0);
signal z : u n s i g n e d (31 downto 0);
constant u1 : u n s i g n e d (7 downto 0) := x ”44” ; y <= w / x;
constant u2 : u n s i g n e d (3 downto 0) := b” 1010 ” ; z <= w * x;
i f u2 < 10 then
−− not t r u e should work but will consume a fair number of logic re-
end i f ;
sources.
Enumerations Enumerated data types specify a type Two directions (left, right) and three varieties (rotations, log-
which can taken on a small number of discrete values. Finite ical, arithmetic) of shift operation work on bit vectors. Here
state machine states are the textbook example of uses for is the list:
enumerated types. The type definition defines a type which
is later attached to a data object: rol Rotate bits left N places, bits which fall off the left hand
type s t a t e _ t i s ( i d l e , edge , w a i t i n g ) ;
side return to fill the rightmost bit;
s i g n a l s : s t a t e _ t := i d l e ;
ror Rotate bits right N places, new bits fill leftmost bit from
bits exiting on right;
sll Logical shift left. Bits shift N places to the left filling
4.2 Operators right bits with zeros;
4.2.1 Logical Operators srl Logical shift right. Bits shift N places to the right filling
left bits with zeros;
Taking bit or bit vector arguments of equal length and re-
sla Arithmetic shift left. Bits shift N places to the left. The
turning bit or bit vector, the following logical operators com-
rightmost bit holds its state and is propagated to the
pute the named logic operation: not, and, nand, or, nor, xor,
neighboring bits in the shift;
and xnor. Examples:
signal a : s t d _ l o g i c _ v e c t o r (3 downto 0) := ” 1010 ” ; sra Arithmetic shift right. Bits shift N places to the right.
signal b : s t d _ l o g i c _ v e c t o r (3 downto 0) := ” 0011 ” ; The leftmost bit holds its state and is propagated to the
signal c : s t d _ l o g i c _ v e c t o r (3 downto 0); neighboring bits in the shift.
signal d : s t d _ l o g i c _ v e c t o r (3 downto 0);
signal e : s t d _ l o g i c _ v e c t o r (3 downto 0);
signal f : s t d _ l o g i c _ v e c t o r (3 downto 0);
4.2.4 Comparison Operators
c <= not b ; −− result = ”1100”
d <= a xor b ; −− result = ”1101”
e <= a nor b ; −− result = ”0100” The comparison operations are equality: =; inequality: /=;
f <= b and ( c or d ) ; −− result = ”0001” less than: <; less than or equal: <=; greater than: >; greater
than or equal: >=. All return boolean.
Page -10-
UW-Physics PHYS623 Introduction to Programmable Logic Devices
Page -11-
UW-Physics PHYS623 Introduction to Programmable Logic Devices
Page -12-
UW-Physics PHYS623 Introduction to Programmable Logic Devices
So far, it has not even been necessary to do any multiplica- which then repeats. Hardly a random sequence, but if you
tions! The units attached to the velocities and positions are make it long enough it looks random. In general it is possible
0.766 mm/s for velocity and 15.32 µm. To convert to more using an n bit register to produce an sequence of repetition
useful units, millimeters, it is necessary to divide by 65.274 length 2n − 1. Note that the sequence all zeros must be
(or, equivalently, multiply by 0.01532). This is pretty close excluded to prevent the sequence from getting stuck in that
to a power of two so you might consider your requirements state.
on accuracy: if the application can tolerate a 2% error, the
Getting the taps right is not trivial unless you are familiar
simplest approximation is to divide by 64, or equivalently,
with advanced algebra. Fortunately several magic taps have
shifting the bits to the right six steps:
been tabulated5 . Horowitz and Hill gives several suggestions
x_in_near_mm <= x (15 downto 6 ) ; for various LFSR lengths.
Producing random integers from the random bits is not dif-
If higher accuracy is needed, the least expensive method is to
ficult. Returning to the random bits above by taking every
use a rational approximation of the scaling factor where the
4th clock cycle (to avoid explicit bit correlations), a repeat-
denominator is a power of two, again we can profit from the
ing sequence of integers ranging from 1 to 15 can be pro-
ability to shift right instead of dividing4 . Luck is on our side:
duced:
0.01532 is very close to the rational number 251/16384 . We can
get an accuracy of %0.001 (way better than the acceleration 6, 13, 3, 2, 11, 14, 1, 9, 5, 15, 8, 12, 10, 7, 4, 6, ...
measurement, by the way, so be on guard to not overestimate Often, random numbers in the real interval (0, 1) are needed.
your accuracy!) by multiplying by 251 and then right shifting In the spirit of fixed-point numbers, discussed above, the ran-
14 bits. If we want to use the Cyclone IV’s 9×9 bit multiplier dom integer sequences can be interpreted as having implicit
hardware, the most efficient VHDL transformation is: scale factors. For example, the sequence above could be in-
tmp := t o _ s i g n e d (25 1 , 9) * x (15 downto 7 ) ; terpreted as representing the numbers
x_mm <= tmp(15 downto 7 ) ;
0.3750, 0.8125, 0.1875, 0.1250,
0.6875, 0.8750, 0.0625, 0.5625,
where tmp is an 18-bit signed variable defined to capture the
0.3125, 0.9375, 0.5000, 0.7500,
result of the 9 × 9 multiplication:
0.6250, 0.4375, 0.2500, 0.3750...
v a r i a b l e tmp : s i g n e d (17 downto 0 ) ;
Looking at the synthesis reports I see that the synthesizer 5.5 ROMs
has even gotten around using the multipliers! How? Multi-
plying x by 251 is equivalent to multiplying x by 256 (i.e. Read Only Memories are useful constructions to hold con-
left shifting by 8) and then subtracting x times 4 and finally stants and implement functions. Let’s start just by intro-
again subtracting x. Keep that trick in your pocket in case ducing the syntax for forcing VHDL to synthesize a ROM
you are using PLDs without hardware multipliers. using distributed storage elements (i.e. LEs). The ROM is
just an array of std_logic_vector types (array of bit arrays).
The VHDL syntax requires a 2-step definition; first, define
5.4 Pseudorandom Numbers the array element type, then define the array type and finally
declare a concrete constant object of this array type (lines
(see Horowitz and Hill section 9.33 pg 655) 12–14 in the listing below):
Apparently random sequences of bits can be generated using 1 library ieee ;
a FSM configuration known as the linear feedback shift reg- 2 use i e e e . std_logic_1164 . a l l ;
ister (LFSR). It is simply a shift register whose input bit is 3
4 e n t i t y romexa is
determined by taking XORs of various bit positions further 5 port (
into the register, these are often called taps. For example a 6 clk : in std_logic ;
4-bit LFSR could be constructed like this: 7 ce : in std_logic ;
8 data : out s t d _ l o g i c _ v e c t o r (7 downto 0 ) ) ;
9 end romexa ;
Q3 ← Q0 ⊕ Q1 (8)
10
11 a r c h i t e c t u r e b e h a v i o r a l o f romexa i s
This would then produce the bit sequence 12 s u b t y p e word_t i s s t d _ l o g i c _ v e c t o r (7 downto 0 ) ;
13 t y p e rom_t i s a r r a y (0 to 31) o f word_t ;
0110, 1011, 0101, 1010, 1101, 1110, 1111, 0111 14 c o n s t a n t mem : rom_t := (
0011, 0001, 1000, 0100, 0010, 1001, 1100, 0110
5 Of course, this is a security issue - don’t use these popular numbers
4 As a general rule, avoid expensive divisions whenever possible. for crypto applications othewise your ciphers will be easily cracked!
Page -13-
UW-Physics PHYS623 Introduction to Programmable Logic Devices
Page -14-