DDR4 Memory Controller IP - Xilinx
DDR4 Memory Controller IP - Xilinx
Architecture-Based
FPGAs Memory
Interface Solutions v4.2
Product Guide for Vivado
Design Suite
IP Facts
Chapter 1: Overview
Feature Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Licensing and Ordering Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 7: Simulation
Appendix A: Debugging
Finding Help on Xilinx.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Debug Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
IP Facts
Overview
Notes:
1. For a complete listing of supported devices, see the
Vivado IP catalog.
2. For the supported versions of the tools, see the Xilinx
Design Tools: Release Notes Guide .
Overview
The Xilinx UltraScale™ architecture DDR3, DDR4, and RLDRAM 3 memory interface cores
provide solutions for interfacing with these SDRAM memory types. Both a complete
Memory Controller and a physical (PHY) layer only solution are supported. The UltraScale
architecture DDR3, DDR4, and RLDRAM 3 memory interface cores are organized in the
following high-level blocks.
• Controller – The controller accepts burst transactions from the User Interface and
generates transactions to and from the SDRAM. The controller takes care of the SDRAM
timing parameters and refresh. It coalesces write and read transactions in order to
reduce the dead cycles involved in turning the bus around. The controller also reorders
commands to improve the utilization of the data bus to the SDRAM.
• Physical Layer – The physical layer provides a high speed interface to the SDRAM. This
layer includes the hard blocks inside the FPGA and the soft blocks calibration logic
necessary to ensure optimal timing of the hard blocks interfacing to the SDRAM.
The new hard blocks in the UltraScale architecture allow interface rates of up to
2,400 Mb/s to be achieved. The application logic is responsible for all SDRAM
transactions, timing, and refresh.
The above User Interface is layered on top of the Native Interface to the controller. The
native interface is accessible by removing the User Interface. The Native Interface has no
buffering and presents return data to the application as it is received from the SDRAM
which is not necessarily in the original request order. The application must buffer the
read and write data as needed and reorder the data if the native interface is used. The
native interface does provide the lowest possible latency and the least amount of logic
utilization.
X-Ref Target - Figure 1-1
8OWUD6FDOH$UFKLWHFWXUH%DVHG)3*$V
3K\VLFDO,QWHUIDFH
8VHU,QWHUIDFH
8OWUD6FDOH$UFKLWHFWXUH%DVHG)3*$V
UVW 0HPRU\,QWHUIDFH6ROXWLRQ GGUBDGGU
FON GGUBED
DSSBDGGU GGUBFDVBQ
DSSBFPG GGUBFN
DSSBHQ 8VHU GGUBFNBQ
0HPRU\ 3K\VLFDO
DSSBKLBSUL ,QWHUIDFH
&RQWUROOHU /D\HU GGUBFNH
%ORFN
DSSBZGIBGDWD
GGUBFVBQ
DSSBZGIBHQG
GGUBGP
DSSBZGIBPDVN
GGUBRGW ''5''5
8VHU DSSBZGIBZUHQ 6'5$0
,2%
)3*$ 1DWLYH,QWHUIDFH 0&3+<,QWHUIDFH GGUBSDULW\
DSSBUG\
/RJLF
DSSBUGBGDWD GGUBUDVBQ
GGUBUHVHWBQ
DSSBUGBGDWDBHQG
DSSBUGBGDWDBYDOLG GGUZHQ
DSSBZGIBUG\
GGUBGT
DSSBVUBUHT
GGUBGTVBQ
DSSBVUBDFWLYH
DSSBUHIBUHT GGUBGTV
DSSBUHIBDFN
DSSB]TBUHT
DSSB]TBDFN
Feature Summary
DDR3 SDRAM
• Component support for interface width of 16 bits
• DDR3 (1.5V)
• 4 Gb density device support
• 8-bank support
• x8 and x16 device support
• 8:1 DQ:DQS ratio support
• 8-word burst support
• Support for 5 to 14 cycles of column-address strobe (CAS) latency (CL)
• On-die termination (ODT) support
• Support for 5 to 10 cycles of CAS write latency
• Write leveling support for DDR3 (fly-by routing topology required component designs)
• JEDEC ®-compliant DDR3 initialization support
• Source code delivery in Verilog
• 4:1 memory to FPGA logic interface clock ratio
• Open, closed, and transaction based pre-charge controller policy
DDR4 SDRAM
• Component support for interface width of 16 bits
• 4 Gb density device support
• x8 and x16 device support
• 8:1 DQ:DQS ratio support
• 8-word burst support
• Support for 9 to 24 cycles of column-address strobe (CAS) latency (CL)
• ODT support
• Support for 9 to 18 cycles of CAS write latency
• Write leveling support for DDR4 (fly-by routing topology required component designs)
• JEDEC-compliant DDR4 initialization support
RLDRAM 3
• Component support for interface width of 36 bits
• x18 and x36 memory device support
• 8-word burst support
• Support for 5 to 16 cycles of Read Latency
• Address Multiplexing Mode support
• ODT support
• JEDEC-compliant RLDRAM 3 initialization support
• Source code delivery in Verilog
• 4:1 memory to FPGA logic interface clock ratio
License Checkers
If the IP requires a license key, the key must be verified. The Vivado ® design tools have
several license check points for gating licensed IP through the flow. If the license check
succeeds, the IP can continue generation. Otherwise, generation halts with error. License
checkpoints are enforced by the following tools:
IMPORTANT: IP license level is ignored at checkpoints. The test confirms a valid license exists. It does
not check IP license level.
Product Specification
Core Architecture
Simulation
Example Design
Test Bench
Product Specification
Standards
This core complies to the JESD79-3F, DDR3 SDRAM Standard and JESD79-4, DDR4 SDRAM
Standard, JEDEC ® Solid State Technology Association [Ref 1].
For more information on UltraScale™ architecture documents, see References, page 129.
Performance
Maximum Frequencies
For more information on the maximum frequencies, see Kintex UltraScale Architecture Data
Sheet, DC and AC Switching Characteristics (DS892) [Ref 2].
Resource Utilization
Kintex UltraScale Devices
Table 2-1 provides approximate resource counts on Kintex® UltraScale™ devices.
Resources required for the UltraScale architecture-based FPGAs MIS core have been
estimated for the Kintex UltraScale devices (Table 2-1). These values were generated using
Vivado® IP catalog. They are derived from post-synthesis reports, and might change during
implementation.
Port Descriptions
There are three port categories at the top-level of the memory interface core called the
“user design.”
• The first category are the memory interface signals that directly interfaces with the
SDRAM. These are defined by the JEDEC specification.
• The second category are the application interface signals which can be either the
“native interface” or the simpler “user interface.” These are described in the Protocol
Description, page 55.
• The third category includes other signals necessary for proper operation of the core.
These include the clocks, reset, and status signals from the core. The clocking and reset
signals are described in their respective sections.
The active high init_calib_complete signal indicates that the initialization and
calibration are complete and that the interface is now ready to accept commands for the
interface.
Core Architecture
This chapter describes the UltraScale™ architecture-based FPGAs Memory Interface
Solutions core with an overview of the modules and interfaces.
Overview
The UltraScale architecture-based FPGAs Memory Interface Solutions is shown in
Figure 3-1.
X-Ref Target - Figure 3-1
8OWUD6FDOH$UFKLWHFWXUH%DVHG)3*$V
8OWUD6FDOH$UFKLWHFWXUH%DVHG)3*$V0HPRU\,QWHUIDFH6ROXWLRQ
0HPRU\
&RQWUROOHU
''5''5
8VHU¶V)3*$ 8VHU 3K\VLFDO
,QLWLDOL]DWLRQ 6'5$0
/RJLF LQWHUIDFH /D\HU
&DOLEUDWLRQ
&DO'RQH
5HDG'DWD
Memory Controller
The Memory Controller (MC) is a general purpose design that is suitable for many
applications. The MC balances logic utilization, throughput, latency, and efficiency for
typical situations.
The design of the MC is bounded on one side by the Native Interface and on the other side
by the PHY. These interfaces result in certain design constraints on the Memory Controller.
ŵĐ'ƌŽƵƉ͘ǀ
;/ŶƐƚĂŶĐĞϬͿ
ŵĐƌď͘ǀ
ŵĐŵĚDƵdž ;ƌďŝƚƌĂƚĞƐĨŽƌ
^ĂĐĐĞƐƐͿ
ŵĐ'ƌŽƵƉ͘ǀ
;/ŶƐƚĂŶĐĞϭͿ
ŵĐƚů͘ǀ Θ
ŵĐƌď͘ǀ
ŵĐŵĚDƵdž ;ƌďŝƚƌĂƚĞƐĨŽƌ
ĂĐƚŝǀĂƚĞĂĐĐĞƐƐͿ
ŵĐ'ƌŽƵƉ͘ǀ
;/ŶƐƚĂŶĐĞϮͿ
ŵĐƌďW͘ǀ
;ƌďŝƚƌĂƚĞƐĨŽƌ
ŵĐŵĚDƵdžW
ƉƌĞͲĐŚĂƌŐĞ
ĂĐĐĞƐƐͿ
ŵĐ'ƌŽƵƉ͘ǀ
;/ŶƐƚĂŶĐĞϯͿ
Native Interface
The Native Interface does not offer any opportunity of pipelining data, either read or write.
On writes data is requested one cycle before it is needed by presenting the data buffer
address and the data is expected to be supplied on the next cycle. Hence there is no
buffering of any kind for data (except due to the barrel shifting to place the data on a
particular DDR clock).
On reads, the data is offered by the MC on the cycle it is available. Read data, along with a
buffer address is presented on the Native Interface as soon as it is ready. The data has to be
accepted by the Native Interface master.
The number of requests that can be outstanding is dictated by the amount of command
buffering provided in the mcGroup module. Although there are no groups in DDR3, the
name group notionally represents either a real group in DDR4 x4 and x8 devices (which
serves four banks of that group). For DDR3, each mcGroup module would service two
banks. In case of DDR4 x16 interface, the mcGroup represents 1-bit of group (there are only
one group bit in x16) and 1-bit of bank, whereby the mcGroup serves two banks.
Datapath
The read and write data do not pass through the Memory Controller at all, but are directly
connected to the mcCal module. The MC generates the requisite control signals to the
mcRead and mcWrite modules telling them the timing of read and write data. The two
modules acquire or provide the data as required at the right time.
The parameters are RDCYCLES (default 256) and WRCYCLES (default 128). To prevent
starvation, counters (10-bit each) are maintained and are controlled by these parameter
specifications to switch between two modes of operation (read mode versus write mode).
The MC is either in read mode or write mode at any given instance and entertains only the
requests of that type (read or write). It switches over to the other mode if all the pending
requests are of “other type” or the counter expires.
The number of read and write cycles can be changed through the RDCYCLES and WRCYCLES
parameters. It should be observed that very small parameter values might result in too
many switchovers between read and write modes, thus resulting in poor efficiency. Values
that are too large result in higher bus efficiency, but at the expense of read latency.
Reordering
The mcGroup module reorders the transactions in a limited manner. The module has a
queue for the commands. The queues are reordered based on any address collisions and
whether they are reads or writes. To achieve high-speed operation, very complex reordering
is not implemented. The address collisions are checked only between banks and not pages.
Reads and writes can pass each other depending on the mode of operation (either read or
write). Reads (writes) can bypass reads (writes) depending on the page status. For instance,
if a read (write) can bypass another read (write) if the earlier operation is waiting for the
page to be opened.
Group Machines
In the Memory Controller, there are four group state machines. These state machines are
allocated depending on technology (DDR3 or DDR4) and width (x4, x8, and x16). The
following summarizes the allocation to each group machine. In this description, GM refers
to the Group Machine (0 to 3), BG refers to group address, and BA refers to bank address.
Note that group in the context of a group state machine denotes a notional group and does
not necessarily refer to a real group (except in case of DDR4, part x4 and x8).
PHY
PHY is considered the low-level physical interface to an external DDR3 or DDR4 SDRAM
device as well as all calibration logic for ensuring reliable operation of the physical interface
itself. PHY generates the signal timing and sequencing required to interface to the memory
device.
• Clock/address/control-generation logics
• Write and read datapaths
• Logic for initializing the SDRAM after power-up
In addition, PHY contains calibration logic to perform timing training of the read and write
datapaths to account for system static and dynamic delays.
The Memory Controller and calibration logic communicate with this dedicated PHY in the
slow frequency clock domain, which is either divided by 4 or divided by 2. This depends on
the DDR3 or DDR4 memory clock. A more detailed block diagram of the PHY design is
shown in Figure 3-3.
X-Ref Target - Figure 3-3
8OWUD6FDOH$UFKLWHFWXUH%DVHG)3*$V0HPRU\,QWHUIDFH6ROXWLRQ
''5$GGUHVV
SOOFONV
&RQWURO:ULWH'DWD
DQG0DVN SOOY
&0':ULWH'DWD
0HPRU\&RQWUROOHU UHIFONV
SOO*DWH
PF&DOY
8VHU
SK\Y LREY
,QWHUIDFH
FDOY PF3,Y
FDO$GGU'HFRGHY
0LFUR%OD]H
&DO'RQH
FRQILJBURPY
5HDG'DWD
5HDG'DWD
6WDWXV
&DO'RQH
The Memory Controller is designed to separate out the command processing from the
low-level PHY requirements to ensure a clean separation between the controller and
physical layer. The command processing can be replaced with custom logic if desired, while
the logic for interacting with the PHY stays the same and can still be used by the calibration
logic.
The PHY architecture encompasses all of the logic contained in phy.v. The PHY contains
wrappers around dedicated hard blocks to build up the memory interface from smaller
components. A byte lane contains all of the clocks, resets, and datapaths for a given subset
of I/O. Multiple byte lanes are grouped together, along with dedicated clocking resources,
to make up a single bank memory interface. For more information on the hard silicon
physical layer architecture, see the UltraScale™ Architecture-Based FPGAs SelectIO™
Resources User Guide (UG571) [Ref 3].
The address unit connects the MCS to the local register set and the PHY by performing
address decode and control translation on the I/O module bus from spaces in the memory
map and MUXing return data (calAddrDecode.v). In addition, it provides address
translation (also known as “mapping”) from a logical conceptualization of the DRAM
interface to the appropriate pinout-dependent location of the delay control in the PHY
address space.
Although the calibration architecture presents a simple and organized address map for
manipulating the delay elements for individual data, control and command bits, there is
flexibility in how those I/O pins are placed. For a given I/O placement, the path to the FPGA
logic is locked to a given pin. To enable a single binary software file to work with any
memory interface pinout, a translation block converts the simplified RIU addressing into
the pinout-specific RIU address for the target design. The specific address translation is
written by MIG after a pinout is selected. The code shows an example of the RTL structure
that supports this.
In this example, DQ0 is pinned out on Bit[0] of nibble 0 (nibble 0 according to instantiation
order). The RIU address for the ODELAY for Bit[0] is 0x0D (for more details on the RIU
address map, see the RIU specification). When DQ0 is addressed — indicated by address
0x000_4100), this snippet of code is active. It enables nibble 0 (decoded to one-hot
downstream) and forwards the address 0x0D to the RIU address bus.
The MicroBlaze I/O module interface updates at a maximum rate of once every three clock
cycles, which is not always fast enough for implementing all of the functions required in
calibration. A helper circuit implemented in calAddrDecode.v is required to obtain
commands from the registers and translate at least a portion into single-cycle accuracy for
submission to the PHY. In addition, it supports command repetition to enable back-to-back
read transactions and read data comparison.
Figure 3-4 shows the overall flow of memory initialization and the different stages of
calibration.
X-Ref Target - Figure 3-4
^LJƐƚĞŵZĞƐĞƚ
/ͬKKĨĨƐĞƚĂůŝďƌĂƚŝŽŶ
/^
ZϯͬZϰ^ZD/ŶŝƚŝĂůŝnjĂƚŝŽŶ
Y^'ĂƚĞĂůŝďƌĂƚŝŽŶ
tƌŝƚĞ>ĞǀĞůŝŶŐ
zĞƐ
ZĂŶŬсс
Ϭ͍
ZĞĂĚdƌĂŝŶŝŶŐ;WĞƌͲďŝƚĚĞƐŬĞǁͿ EŽ /ƚĞƌĂƚŝǀĞůŽŽƉƚŽ
ĐĂůŝďƌĂƚĞŵŽƌĞƌĂŶŬƐ
ZĞĂĚdƌĂŝŶŝŶŐ;Y^ĐĞŶƚĞƌŝŶŐͲDWZĐůŽĐŬƉĂƚƚĞƌŶͿ
zĞƐ
ZĂŶŬсс
Ϭ͍
tƌŝƚĞdƌĂŝŶŝŶŐ;WĞƌͲďŝƚĚĞƐŬĞǁĂŶĚ EŽ
ĐĞŶƚĞƌŝŶŐͲĐůŽĐŬƉĂƚƚĞƌŶͿ
tƌŝƚĞĂůŝďƌĂƚŝŽŶ
tƌŝƚĞdƌĂŝŶŝŶŐ;sZ&ͲĐůŽĐŬƉĂƚƚĞƌŶͿ
ZĞĂĚdƌĂŝŶŝŶŐ;Y^ĐĞŶƚĞƌŝŶŐĂŶĚsZ&ͲĐŽŵƉůĞdžƉĂƚƚĞƌŶͿ
tƌŝƚĞdƌĂŝŶŝŶŐ;Y^ĐĞŶƚĞƌŝŶŐĂŶĚsZ&ͲĐŽŵƉůĞdžƉĂƚƚĞƌŶͿ
tƌŝƚĞdƌĂŝŶŝŶŐ;DͿ
EŽ
ůů ZĂŶŬĐŽƵŶƚнϭ
ŽŶĞ͍
zĞƐ
ŶĂďůĞsddƌĂĐŬŝŶŐ
ĂůŝďƌĂƚŝŽŶŽŶĞ
Clocking
The memory interface requires one PLL in each bank that is occupied by the interface. There
are two PLLs per bank. If a bank is shared by two interfaces, both PLLs in that bank are used.
The memory interface requires a high quality reference clock. This clock should come from
a differential pair on the same column of the FPGA that the memory interface occupies. The
memory interface PLLs generate the appropriate frequencies and phases necessary for
proper operation. An output clock is provided for the FPGA logic which includes the
controller and the interface logic. This output clock is 1/4 of the memory interface clock in
the 4:1 memory clock to FPGA logic clock mode.
Resets
An asynchronous reset input is provided. This active-High reset must assert for a minimum
of 20 cycles of the controller clock.
As the x16 DRAM component has the same physical width as the x8 DRAM component, the
same design rules apply to a four component (x16 DRAM) 64-bit interface. Other DDR3
memory interface design guidelines are included in this document in the future.
IMPORTANT: All guidelines in this section must be followed in to achieve the maximum data rates
specified for the DDR3 interface.
Reference Stack-Up
This design guide refers to the stack-up in Table 4-2. All electrical routing constraints are
defined upon the reference stack-up. The actual stack-up might be different from this
reference stack-up. The related constraints such as spacing should be adjusted accordingly.
IMPORTANT: To achieve highest memory interface performance, all DDR3 signals must be routed on
the top signal layers, that is, L3/L5, as shown in Figure 4-1, to minimize FPGA pin field via crosstalk
impact.
All differential signals, clocks, and strobes must be routed as closely coupled differential pairs from
FPGA pins to DRAM pins. Routing signals on bottom layers might degrade the timing margin, as shown
in Figure 4-2.
To determine system timing margins in this design following the Xilinx memory simulation
guidelines, system designers should use I/O Buffer Information Specification (IBIS) or other
simulation tools.
Note: Material for this reference stack-up is Isola High-Tg FR-4, 370H.
X-Ref Target - Figure 4-1
;
;
0HPRU\
&RQWUROOHU
0$,1 '5$0WR'5$0 '5$0WR'5$0 '5$0WR'5$0
YLD YLD YLD YLD YLD YLD
WϬ
WϬ >Ϭ >ϭ >Ϯ >Ϯ >ϯ >Ϯ >Ϯ >ϯ >Ϯ >Ϯ >ϯ >Ϯ >Ϯ >ϰ
577
3.*/HQJWK %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW 678%
Table 4-3: Impedance, Length, and Spacing Guidelines for addr/cmd/ctrl Signals
L0 (FPGA L2 (DRAM
Parameter L1 (Main PCB) L3 L4 (To RTT) Units
Breakout) Breakout)
Trace type Stripline Stripline Stripline Stripline Stripline –
Single-ended impedance Z0 50±10% 36±10% 50±10% 50±10% 39±10% Ω
Trace width 4.0 7.0 4.0 4.0 6.0 mil
Trace length 0.0~0.5 1.0~3.0 0.0~0.1 0.35~0.55 0.6~1.0 inch
Spacing in addr/cmd/ctrl 4.0 8.0 4.0 8.0 8.0 mil
(minimum)
Spacing to clock signals 8.0 20 8.0 20 20 mil
(minimum)
Spacing to other group signals 8.0 30 30 30 30 mil
(minimum)
Maximum PCB via count 6.0, see Figure 4-3 for location. –
& )
0HPRU\
&RQWUROOHU
0$,1 YLD '5$0WR'5$0 YLD '5$0WR'5$0 YLD '5$0WR'5$0 YLD 577 577
YLD YLD
3 / / / / / / / / / / / / / /
3.*/HQJWK %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW 678%
3.*/HQJWK %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW 678%
Table 4-4: Impedance, Length, and Spacing Guidelines for Clock Signals
L0 (FPGA L2 (DRAM
Parameter L1 (Main PCB) L3 L4 (To RTT) Units
Breakout) Breakout)
Trace type Stripline Stripline Stripline Stripline Stripline –
Clock differential impedance Z diff 86±10% 76±10% 86±10% 90±10% 76±10% Ω
Trace width/space/width 4.0/4.0/4.0 6.0/6.0/6.0 4.0/4.0/4.0 4.0/5.0/4.0 6.0/6.0/6.0 mil
Trace length 0.0~0.5 1.0~3.0 0.0~0.1 0.35~0.55 0.6~1.0 inch
Spacing in addr/cmd/ctrl 8.0 20 8.0 20 20 mil
(minimum)
Spacing to other group signals 8.0 30 30 30 30 mil
(minimum)
Maximum PCB via count per signal 6.0, see Figure 4-3 and Figure 4-4 for location. –
0HPRU\
&RQWUROOHU
0$,1 '5$0
YLD YLD
WϬ >Ϭ >ϭ >Ϯ
Table 4-5: Impedance, Length, and Spacing Guidelines for Data Signals
Parameter L0 (FPGA Breakout) L1 (Main PCB) L2 (DRAM Breakout) Units
Trace type Stripline Stripline Stripline –
dq single-ended impedance Z0 50±10% 39±10% 50±10% Ω
dqs differential impedance Zdiff 86±10% 76±10% 86±10% Ω
Trace width (nominal) 4.0 6.0 4.0 mil
Differential trace width/space/width 4.0/4.0/4.0 6.0/6.0/6.0 4.0/4.0/4.0 mil
Trace length (nominal) 0.0~0.5 1.0~5.0 0.0~0.1 inch
Spacing in byte (minimum) 4.0 8.0 4.0 mil
Spacing byte to byte (minimum) 4.0 20 4.0 mil
dq to strobe spacing (minimum) 4.0 20 8.0 mil
Spacing to other group signals (minimum) 8.0 30 30 mil
Maximum PCB via count 2.0, see Figure 4-5 for location. –
The data group length matching constraints are listed in Table 4-8.
IMPORTANT: Package routing length should be included in both total length constraints and length
matching constraints.
4. Keep the routing at least 30 mils away from the reference plane and void edges with the
exception of breakout region, as shown in Figure 4-6.
5. In the breakout region, route signal lines in the middle of the via void aperture. Avoid
routing at the edge of via void, as shown in Figure 4-8.
X-Ref Target - Figure 4-8
'5$0%DOO0DS
FPGDGGUFWUO GT
FPGDGGUFWUOURXWLQJFKDQQHO GTURXWLQJFKDQQHO
)3*$
FPGDGGUFWUO
FPGDGGUFWUO
FPGDGGUFWUO
FPGDGGUFWUO
FPGDGGUFWUO
GT
GT
GT
GT
GT
FPG
DGGUFWUO
'4 GT GT GT GT GT
)3*$
Figure 4-10: Component Placement Recommendations for Five Components with Fly-By Topology
ODT Settings
The recommended ODT settings for a four component DDR3 32-bit x8 DRAM or 64-bit x16
DRAM single rank are listed in Table 4-9.
As the x16 DRAM component has the same physical width as the x8 DRAM component, the
same design rules apply to a four component (x16 DRAM) 64-bit interface. Other DDR4
memory interface design guidelines are included in this document in the future.
IMPORTANT: All guidelines in this section must be followed in to achieve the maximum data rates
specified for the DDR4 interface.
Reference Stack-Up
This design guide refers to the stack-up in Table 4-11. All electrical routing constraints are
defined upon the reference stack-up. The actual stack-up might be different from this
reference stack-up. The related constraints such as spacing should be adjusted accordingly.
IMPORTANT: To achieve highest memory interface performance, all DDR4 signals must be routed on
the top signal layers, that is, L3/L5, as shown in Figure 4-11, to minimize FPGA pin field via crosstalk
impact.
All differential signals, clocks, and strobes must be routed as closely coupled differential pairs from
FPGA pins to DRAM pins. Routing signals on bottom layers might degrade the timing margin, as shown
in Figure 4-12.
To determine system timing margins in this design following the Xilinx memory simulation
guidelines, system designers should use IBIS or other simulation tools.
Note: Material for this reference stack-up is Isola High-Tg FR-4, 370H.
X-Ref Target - Figure 4-11
;
;
0HPRU\
&RQWUROOHU
0$,1 '5$0WR'5$0 '5$0WR'5$0 '5$0WR'5$0
YLD YLD YLD YLD YLD YLD
WϬ
WϬ >Ϭ >ϭ >Ϯ >Ϯ >ϯ >Ϯ >Ϯ >ϯ >Ϯ >Ϯ >ϯ >Ϯ >Ϯ >ϰ
577
3.*/HQJWK %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW 678%
Table 4-12: Impedance, Length, and Spacing Guidelines for addr/cmd/ctrl Signals
L0 (FPGA L2 (DRAM
Parameter L1 (Main PCB) L3 L4 (To RTT) Units
Breakout) Breakout)
Trace type Stripline Stripline Stripline Stripline Stripline –
Single-ended impedance Z0 50±10% 36±10% 50±10% 50±10% 39±10% Ω
Trace width 4.0 7.0 4.0 4.0 6.0 mil
Table 4-12: Impedance, Length, and Spacing Guidelines for addr/cmd/ctrl Signals (Cont’d)
L0 (FPGA L2 (DRAM
Parameter L1 (Main PCB) L3 L4 (To RTT) Units
Breakout) Breakout)
Trace length 0.0~0.5 1.0~3.0 0.0~0.1 0.35~0.55 0.6~1 inch
Spacing in addr/cmd/ctrl 4.0 8.0 4.0 8.0 8.0 mil
(minimum)
Spacing to clock signals 8.0 20 8.0 20 20 mil
(minimum)
Spacing to other group signals 8.0 30 30 30 30 mil
(minimum)
Maximum PCB via count 6.0, see Figure 4-13 for location. –
& )
0HPRU\
&RQWUROOHU
0$,1 YLD '5$0WR'5$0 YLD '5$0WR'5$0 YLD '5$0WR'5$0 YLD 577 577
YLD YLD
3 / / / / / / / / / / / / / /
3.*/HQJWK %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW 678%
3.*/HQJWK %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW %UHDNRXW 678%
Table 4-13: Impedance, Length, and Spacing Guidelines for Clock Signals
L0 (FPGA L2 (DRAM
Parameter L1 (Main PCB) L3 L4 (To RTT) Units
Breakout) Breakout)
Trace type Stripline Stripline Stripline Stripline Stripline –
Clock differential impedance Z diff 86 76 86 90 76 Ω
Trace width/space/width 4.0/4.0/4.0 6.0/6.0/6.0 4.0/4.0/4.0 4.0/5.0/4.0 6.0/6.0/6.0 mil
Trace length 0.0~0.5 L1 0.0~0.1 0.35~0.55 0.6~1.0 inch
(Figure 4-13)
+0.09
Spacing in addr/cmd/ctrl 8.0 20 8.0 20 20 mil
(minimum)
Spacing to other group signals 8.0 30 30 30 30 mil
(minimum)
Maximum PCB via count per signal 6.0, see Figure 4-14 for location. –
0HPRU\
&RQWUROOHU
0$,1 '5$0
YLD YLD
WϬ >Ϭ >ϭ >Ϯ
Table 4-14: Impedance, Length, and Spacing Guidelines for Data Signals
Parameter L0 (FPGA Breakout) L1 (Main PCB) L2 (DRAM Breakout) Units
Trace type Stripline Stripline Stripline –
dq single-ended impedance Z0 50±10% 39±10% 50±10% Ω
dqs differential impedance Zdiff 86 76 86 Ω
Trace width (nominal) 4.0 6.0 4.0 mil
Differential trace width/space/width 4.0/4.0/4.0 6.0/6.0/6.0 4.0/4.0/4.0 mil
Trace length 0.0~0.5 1.0~4.0 0.0~0.1 inch
Spacing in byte (minimum) 4.0 8.0 4.0 mil
Spacing byte to byte (minimum) 4.0 20 4.0 mil
dq to strobe spacing (minimum) 4.0 20 8.0 mil
Spacing to other group signals (minimum) 8.0 30 30 mil
Maximum PCB via count 2.0, see Figure 4-15 for location. –
The data group length matching constraints are listed in Table 4-17.
IMPORTANT: Package routing length should be included in both total length constraints and length
matching constraints.
4. Keep the routing at least 30 mils away from the reference plane and void edges with the
exception of breakout region, as shown in Figure 4-16.
5. In the breakout region, route signal lines in the middle of the via void aperture. Avoid
routing at the edge of via void, as shown in Figure 4-18.
X-Ref Target - Figure 4-18
'5$0%DOO0DS
FPGDGGUFWUO GT
FPGDGGUFWUOURXWLQJFKDQQHO GTURXWLQJFKDQQHO
)3*$
FPGDGGUFWUO
FPGDGGUFWUO
FPGDGGUFWUO
FPGDGGUFWUO
FPGDGGUFWUO
GT
GT
GT
GT
GT
FPG
DGGUFWUO
'4 GT GT GT GT GT
)3*$
Figure 4-20: Component Placement Recommendations for Five Components with Fly-By Topology
ODT Settings
The recommended ODT settings for a four component DDR4 32-bit x8 DRAM or 64-bit x16
DRAM single rank are listed in Table 4-18.
• Address/control means cs_n, ras_n, cas_n, we_n, ba, ck, cke, a, and odt.
• Pins in a byte lane are numbered N0 to N12.
• Byte lanes in a bank are designed by T0, T1, T2, or T3. Nibbles within a byte lane are
distinguished by a “U” or “L” designator added to the byte lane designator (T0, T1, T2,
or T3). Thus they are T0L, T0U, T1L, T1U, T2L, T2U, T3L, and T3U.
Note: There are two PLLs per bank and a controller uses one PLL in every bank that is being used by
the interface.
1. dqs, dq, and dm location.
a. Designs using x8 or x16 components – dqs must be located on a dedicated byte
clock pair in the upper nibble designated with “U.” dq associated with a dqs must be
in same byte lane on any of the other pins except pins 1 and 12.
b. Designs using x4 components – dqs must be located on a dedicated byte clock pair
in the nibble. dq associated with a dqs must be in same nibble on any of the other
pins except pins N1 (lower nibble) and pin N12 (upper nibble).
c. dm (if used) must be located on pin N0 in the byte lane with the corresponding dqs.
2. Byte lanes are configured as either data or address/control.
a. Pins N1 and N12 can be used for address/control in a data byte lane.
b. No data signals (dqs, dq, dm) can be placed in an address/control byte lane.
3. Address/control can be on any of the 13 pins in the address/control byte lanes. Address/
control must be contained within the same bank. Address/control must be in the
centermost bank.
4. There is one vr pin per bank and DCI is required. DCI cascade is not permitted. All rules
for the DCI in the UltraScale™ Architecture-Based FPGAs SelectIO™ Resources User Guide
(UG571) [Ref 3] must be followed.
5. ck must be on the PN pair in the center of the byte lane which is designated as the
Upper byte clock pair.
6. reset_n can be on any pin as long as FPGA logic timing is met and I/O standard can be
accommodated for the chosen bank (LVCMOS15 or LVCMOS135).
7. Banks can be shared between two controllers.
a. Each byte lane is dedicated to a specific controller (except for reset_n).
b. Byte lanes from one controller cannot be placed inside the other. For example, with
controllers A and B, “AABB” is allowed, while “ABAB” is not.
8. All I/O banks used by the memory interface must be in the same column.
9. Maximum height of interface is five contiguous banks for 144-bit wide interface.
10. Bank skipping is not allowed.
11. The input clock for the master PLL in the interface must come from the a clock capable
pair in the I/O column used for the memory interface.
12. There are dedicated V REF pins (not included in the rules above). If an external V REF is not
used, the V REF pins should be pulled to ground by a 500Ω resistor. For more information,
see the UltraScale™ Architecture-Based FPGAs SelectIO™ Resources User Guide (UG571)
[Ref 3]. These pins must be connected appropriately for the standard in use.
13. The interface must be contained within the same I/O bank type (High Range or High
Performance). Mixing bank types is not permitted with the exceptions of the reset_n
in step 6 above and the input clock mentioned in step 11 above.
1 a0 T3U_12 – –
1 a1 T3U_11 N –
1 a2 T3U_10 P –
1 a3 T3U_9 N –
1 a4 T3U_8 P –
1 a5 T3U_7 N DBC-N
1 a6 T3U_6 P DBC-P
1 a7 T3L_5 N –
1 a8 T3L_4 P –
1 a9 T3L_3 N –
1 a10 T3L_2 P –
1 a11 T3L_1 N DBC-N
1 a12 T3L_0 P DBC-P
1 a13 T2U_12 – –
1 a14 T2U_11 N –
1 we T2U_10 P –
1 cas_n T2U_9 N –
1 ras_n T2U_8 P –
1 ck_n T2U_7 N QBC-N
1 ck_p T2U_6 P QBC-P
1 cs_n T2L_5 N –
1 ba0 T2L_4 P –
1 ba1 T2L_3 N –
1 ba2 T2L_2 P –
1 pll refclk_n T2L_1 N QBC-N
1 pll refclk_p T2L_0 P QBC-P
1 cke T1U_12 – –
1 dq15 T1U_11 N –
1 dq14 T1U_10 P –
1 dq13 T1U_9 N –
1 dq12 T1U_8 P –
1 dqs1_n T1U_7 N QBC-N
1 dqs1_p T1U_6 P QBC-P
1 dq11 T1L_5 N –
1 dq10 T1L_4 P –
1 dq9 T1L_3 N –
1 dq8 T1L_2 P –
1 odt T1L_1 N QBC-N
1 dm1 T1L_0 P QBC-P
1 vr T0U_12 – –
1 dq7 T0U_11 N –
1 dq6 T0U_10 P –
1 dq5 T0U_9 N –
1 dq4 T0U_8 P –
1 dqs0_n T0U_7 N DBC-N
• Address/control means cs_n, ras_n, cas_n, we_n, ba, bg, ck, cke, a, odt, act_n,
and par.
• Pins in a byte lane are numbered N0 to N12.
• Byte lanes in a bank are designed by T0, T1, T2, or T3. Nibbles within a byte lane are
distinguished by a “U” or “L” designator added to the byte lane designator (T0, T1, T2,
or T3). Thus they are T0L, T0U, T1L, T1U, T2L, T2U, T3L, and T3U.
Note: There are two PLLs per bank and a controller uses one PLL in every bank that is being used by
the interface.
1. dqs, dq, and dm/dbi location.
a. Designs using x8 or x16 components – dqs must be located on a dedicated byte
clock pair in the upper nibble designated with “U.” dq associated with a dqs must be
in same byte lane on any of the other pins except pins N1 and N12.
b. Designs using x4 components – dqs must be located on a dedicated byte clock pair
in the nibble. dq associated with a dqs must be in same nibble on any of the other
pins except pins N1 (lower nibble) and pin N12 (upper nibble).
c. dm/dbi must be on pin N0 in the byte lane with the associated dqs.
2. Byte lanes are configured as either data or address/control.
a. Pins N1 and N12 can be used for address/control in a data byte lane.
b. No data signals (dqs, dq, dm/dbi) can be placed in an address/control byte lane.
3. Address/control can be on any of the 13 pins in the address/control byte lanes. Address/
control must be contained within the same bank. Address/control must be in the
centermost bank.
4. There is one vr pin per bank and DCI is required. DCI Cascade is not permitted. All rules
for the DCI in the UltraScale™ Architecture-Based FPGAs SelectIO™ Resources User Guide
(UG571) [Ref 3] must be followed.
5. ck must be on the PN pair in the center of the byte lane which is designated as the
Upper byte clock pair.
6. reset_n can be on any pin as long as FPGA logic timing is met and I/O standard can be
accommodated for the chosen bank.
7. Banks can be shared between two controllers.
a. Each byte lane is dedicated to a specific controller (except for reset_n).
b. Byte lanes from one controller cannot be placed inside the other. For example, with
controllers A and B, “AABB” is allowed, while “ABAB” is not.
8. All I/O banks used by the memory interface must be in the same column.
9. Maximum height of interface is five contiguous banks for 144-bit wide interface.
10. Bank skipping is not allowed.
11. The input clock for the master PLL in the interface must come from the a clock capable
pair in the I/O column used for the memory interface.
12. The dedicated VREF pins in the banks used for DDR4 must be tied to ground with a 500Ω
resistor. For more information, see the UltraScale™ Architecture-Based FPGAs SelectIO™
Resources User Guide (UG571) [Ref 3].
13. The interface must be contained within the same I/O bank type (High Range or High
Performance). Mixing bank types is not permitted with the exceptions of the reset_n
in step 6 above and the input clock mentioned in step 11 above.
14. The par input for command and address parity and the alert_n input/output are not
supported by this interface. Consult the memory vendor for information on the proper
connection for these pins when not used.
IMPORTANT: Component interfaces should be created with the same component for all components in
the interface. x16 components have a different number of bank groups than the x8 components. For
example, a 72-bit wide component interface should be created by using nine x8 components or five x16
components where half of one component is not used. Four x16 components and one x8 component is
not permissible.
Bank 1
1 – T3U_12 – –
1 – T3U_11 N –
1 – T3U_10 P –
1 – T3U_9 N –
1 – T3U_8 P –
1 – T3U_7 N DBC-N
1 – T3U_6 P DBC-P
1 – T3L_5 N –
1 – T3L_4 P –
1 – T3L_3 N –
1 – T3L_2 P –
1 – T3L_1 N DBC-N
1 – T3L_0 P DBC-P
1 – T2U_12 – –
1 – T2U_11 N –
1 – T2U_10 P –
1 – T2U_9 N –
1 – T2U_8 P –
1 – T2U_7 N QBC-N
1 – T2U_6 P QBC-P
1 – T2L_5 N –
1 – T2L_4 P –
1 – T2L_3 N –
1 – T2L_2 P –
1 – T2L_1 N QBC-N
1 – T2L_0 P QBC-P
1 reset_n T1U_12 – –
1 dq31 T1U_11 N –
1 dq30 T1U_10 P –
1 dq29 T1U_9 N –
1 dq28 T1U_8 P –
1 dqs3_n T1U_7 N QBC-N
1 dqs3_p T1U_6 P QBC-P
1 dq27 T1L_5 N –
1 dq26 T1L_4 P –
1 dq25 T1L_3 N –
1 dq24 T1L_2 P –
1 unused T1L_1 N QBC-N
1 dm3/dbi3 T1L_0 P QBC-P
1 vr T0U_12 – –
1 dq23 T0U_11 N –
1 dq22 T0U_10 P –
1 dq21 T0U_9 N –
1 dq20 T0U_8 P –
1 dqs2_n T0U_7 N DBC-N
1 dqs2_p T0U_6 P DBC-P
1 dq19 T0L_5 N –
1 dq18 T0L_4 P –
1 dq17 T0L_3 N –
1 dq16 T0L_2 P –
1 – T0L_1 N DBC-N
1 dm2/dbi2 T0L_0 P DBC-P
Bank 2
2 a0 T3U_12 – –
2 a1 T3U_11 N –
2 a2 T3U_10 P –
2 a3 T3U_9 N –
2 a4 T3U_8 P –
2 a5 T3U_7 N DBC-N
2 a6 T3U_6 P DBC-P
2 a7 T3L_5 N –
2 a8 T3L_4 P –
2 a9 T3L_3 N –
2 a10 T3L_2 P –
2 a11 T3L_1 N DBC-N
2 a12 T3L_0 P DBC-P
2 a13 T2U_12 – –
2 we_n/a14 T2U_11 N –
2 cas_n/a15 T2U_10 P –
2 ras_n/a16 T2U_9 N –
2 act_n T2U_8 P –
2 ck_n T2U_7 N QBC-N
2 ck_p T2U_6 P QBC-P
2 ba0 T2L_5 N –
2 ba1 T2L_4 P –
2 bg0 T2L_3 N –
2 bg1 T2L_2 P –
2 pll refclk_n T2L_1 N QBC-N
2 pll refclk T2L_0 P QBC-P
2 cs_n T1U_12 – –
2 dq15 T1U_11 N –
2 dq14 T1U_10 P –
2 dq13 T1U_9 N –
2 dq12 T1U_8 P –
2 dqs1_n T1U_7 N QBC-N
2 dqs1_p T1U_6 P QBC-P
2 dq11 T1L_5 N –
2 dq10 T1L_4 P –
2 dq9 T1L_3 N –
2 dq8 T1L_2 P –
2 odt T1L_1 N QBC-N
2 dm1/dbi1 T1L_0 P QBC-P
2 vr T0U_12 – –
2 dq7 T0U_11 N –
2 dq6 T0U_10 P –
2 dq5 T0U_9 N –
2 dq4 T0U_8 P –
2 dqs0_n T0U_7 N DBC-N
2 dqs0_p T0U_6 P DBC-P
2 dq3 T0L_5 N –
2 dq2 T0L_4 P –
2 dq1 T0L_3 N –
2 dq0 T0L_2 P –
2 cke T0L_1 N DBC-N
2 dm0/dbi0 T0L_0 P DBC-P
Protocol Description
This core has the following interfaces:
• User Interface
• Native Interface
User Interface
The user interface is shown in Table 4-21 and connects to an FPGA user design to allow
access to an external memory device.
app_addr[ADDR_WIDTH – 1:0]
This input indicates the address for the request currently being submitted to the user
interface. The user interface aggregates all the address fields of the external SDRAM and
presents a flat address space.
app_cmd[2:0]
This input specifies the command for the request currently being submitted to the user
interface. The available commands are shown in Table 4-22.
app_en
This input strobes in a request. Apply the desired values to app_addr[], app_cmd[2:0], and
app_hi_pri, and then assert app_en to submit the request to the user interface. This
initiates a handshake that the user interface acknowledges by asserting app_rdy.
app_wdf_data[APP_DATA_WIDTH – 1:0]
This bus provides the data currently being written to the external memory.
app_wdf_end
This input indicates that the data on the app_wdf_data[] bus in the current cycle is the
last data for the current request.
app_wdf_mask[APP_MASK_WIDTH – 1:0]
This bus indicates which bits of app_wdf_data[] are written to the external memory and
which bits remain in their current state.
app_wdf_wren
This input indicates that the data on the app_wdf_data[] bus is valid.
app_rdy
This output indicates whether the request currently being submitted to the user interface is
accepted. If the user interface does not assert this signal after app_en is asserted, the
current request must be retried. The app_rdy output is not asserted if:
° All the bank machines are occupied (can be viewed as the command buffer being
full).
- A read is requested and the read buffer is full.
- A write is requested and no write buffer pointers are available.
app_rd_data[APP_DATA_WIDTH – 1:0]
This output contains the data read from the external memory.
app_rd_data_end
This output indicates that the data on the app_rd_data[] bus in the current cycle is the
last data for the current request.
app_rd_data_valid
This output indicates that the data on the app_rd_data[] bus is valid.
app_wdf_rdy
This output indicates that the write data FIFO is ready to receive data. Write data is accepted
when both app_wdf_rdy and app_wdf_wren are asserted.
ui_clk_sync_rst
This is the reset from the user interface which is in synchronous with ui_clk.
ui_clk
This is the output clock from the user interface. It must be a half or quarter the frequency
of the clock going out to the external SDRAM, which depends on 2:1 or 4:1 mode selected
in Vivado IDE.
init_calib_complete
PHY asserts init_calib_complete when calibration is finished. The application has no need to
wait for init_calib_complete before sending commands to the Memory Controller.
Command Path
When the user logic app_en signal is asserted and the app_rdy signal is asserted from the
user interface, a command is accepted and written to the FIFO by the user interface. The
command is ignored by the user interface whenever app_rdy is deasserted. The user logic
needs to hold app_en High along with the valid command and address values until
app_rdy is asserted as shown in Figure 4-21.
X-Ref Target - Figure 4-21
CLK
APP?CMD 72)4%
APP?ADDR !DDR
APP?EN
APP?RDY
5'?C??
Figure 4-21: User Interface Command Timing Diagram with app_rdy Asserted
A non back-to-back write command can be issued as shown in Figure 4-22. This figure
depicts three scenarios for the app_wdf_data, app_wdf_wren, and app_wdf_end
signals, as follows:
1. Write data is presented along with the corresponding write command (second half of
BL8).
2. Write data is presented before the corresponding write command.
3. Write data is presented after the corresponding write command, but should not exceed
the limitation of two clock cycles.
For write data that is output after the write command has been registered, as shown in Note
3 (Figure 4-22), the maximum delay is two clock cycles.
X-Ref Target - Figure 4-22
CLK
APP?CMD 72)4%
APP?ADDR !DDR
APP?WDF?RDY
APP?WDF?DATA 7
APP?WDF?WREN
APP?WDF?END
APP?WDF?DATA 7
APP?WDF?WREN
APP?WDF?END
APP?WDF?DATA 7
APP?WDF?WREN
APP?WDF?END
5'?C??
Figure 4-22: 4:1 Mode User Interface Write Timing Diagram (Memory Burst Type = BL8)
Write Path
The write data is registered in the write FIFO when app_wdf_wren is asserted and
app_wdf_rdy is High (Figure 4-23). If app_wdf_rdy is deasserted, the user logic needs to
hold app_wdf_wren and app_wdf_end High along with the valid app_wdf_data value
until app_wdf_rdy is asserted. The app_wdf_mask signal can be used to mask out the
bytes to write to external memory.
FON
:5,7( :5,7( :5,7( :5,7( :5,7( :5,7( :5,7(
DSSBFPG
$GGUD $GGUE $GGUF $GGUG $GGUH $GGUI $GGUJ
DSSBDGGU
DSSBHQ
DSSBUG\
DSSBZGIBPDVN
DSSBZGIBUG\
DSSBZGIBZUHQ
DSSBZGIBHQG
Figure 4-23: 4:1 Mode User Interface Back-to-Back Write Commands Timing Diagram
(Memory Burst Type = BL8)
As shown in Figure 4-21, page 58, the maximum delay for a single write between the write
data and the associated write command is two clock cycles. When issuing back-to-back
write commands, there is no maximum delay between the write data and the associated
back-to-back write command, as shown in Figure 4-24.
X-Ref Target - Figure 4-24
CLK
APP?EN
APP?RDY
APP?WDF?MASK
APP?WDF?RDY
APP?WDF?DATA 7 A 7 B 7 C 7 D 7 E 7 F 7 G
APP?WDF?WREN
APP?WDF?END
5'?C??
Figure 4-24: 4:1 Mode User Interface Back-to-Back Write Commands Timing Diagram
(Memory Burst Type = BL8)
The app_wdf_end signal must be used to indicate the end of a memory write burst. For
memory burst types of eight in 2:1 mode, the app_wdf_end signal must be asserted on the
second write data word.
The map of the application interface data to the DRAM output data can be explained with
an example.
For a 4:1 Memory Controller to DRAM clock ratio with an 8-bit memory, at the application
interface, if the 64-bit data driven is 0000_0806_0000_0805 (Hex), the data at the DRAM
interface is as shown in Figure 4-25. This is for a BL8 (Burst Length 8) transaction.
X-Ref Target - Figure 4-25
The data values at different clock edges are as shown in Table 4-23.
For a 2:1 Memory Controller to DRAM clock ratio, the application data width is 32 bits.
Hence for BL8 transactions, the data at the application interface must be provided in two
clock cycles. The app_wdf_end signal is asserted for the second data as shown in
Figure 4-26. In this case, the application data provided in the first cycle is 0000_0405 (Hex),
and the data provided in the last cycle is 0000_080A (Hex). This is for a BL8 transaction.
X-Ref Target - Figure 4-26
Read Path
The read data is returned by the user interface in the requested order and is valid when
app_rd_data_valid is asserted (Figure 4-28 and Figure 4-29). The app_rd_data_end
signal indicates the end of each read command burst and is not needed in user logic.
X-Ref Target - Figure 4-28
CLK
APP?CMD 2%!$
APP?ADDR !DDR
APP?EN
APP?RDY
APP?RD?DATA 2
APP?RD?DATA?VALID
5'?C??
Figure 4-28: 4:1 Mode User Interface Read Timing Diagram (Memory Burst Type = BL8)
CLK
APP?CMD 2%!$
APP?EN
APP?RDY
APP?RD?DATA 2 2
APP?RD?DATA?VALID
5'?C??
Figure 4-29: 4:1 Mode User Interface Read Timing Diagram (Memory Burst Type = BL4 or BL8)
In Figure 4-29, the read data returned is always in the same order as the requests made on
the address/control bus.
Native Interface
The native interface connects to an FPGA user design to allow access to an external memory
device.
The bank, row, and column comprise a target address on the memory device for read and
write operations. Commands are specified using the cmd[2:0] input to the core. The
available read and write commands are shown in Table 4-25.
accept
This signal indicates to the user design whether or not a request is accepted by the core.
When the accept signal is asserted, the request submitted on the last cycle is accepted, and
the user design can either continue to submit more requests or go idle. When the accept
signal is deasserted, the request submitted on the last cycle was not accepted and must be
retried.
use_addr
The user design asserts the use_addr signal to strobe the request that was submitted to
the native interface on the previous cycle.
data_buf_addr
The user design must contain a buffer for data used during read and write commands.
When a request is submitted to the native interface, the user design must designate a
location in the buffer for when the request is processed. For write commands,
data_buf_addr is an address in the buffer containing the source data to be written to the
external memory. For read commands, data_buf_addr is an address in the buffer that
receives read data from the external memory. The core echoes this address back when the
requests are processed.
Precharge
The precharge signal provides a transaction by transaction control over the auto-precharge
feature of the Memory Controller.
• precharge = 0 – The controller behaves in “keep open” mode. Pages are kept open
until a different page is opened in the same bank.
• precharge = 1 – The controller behaves in “keep closed” mode. This mode is useful for
certain special access patterns.
• precharge = By Transaction – A page is closed after the transaction if precharge = 1
and it is kept open if precharge = 0. The user of the native interface has transaction by
transaction control.
wr_data
This bus is the data that needs to be written to the external memory. This bus can be
connected to the data output of a buffer in the user design.
wr_data_addr
This bus is an echo of data_buf_addr when the current write request is submitted. The
wr_data_addr bus can be combined with the wr_data_offset signal and applied to
the address input of a buffer in the user design.
wr_data_mask
This bus is the byte enable (data mask) for the data currently being written to the external
memory. The byte to the memory is written when the corresponding wr_data_mask signal
is deasserted.
wr_data_en
When asserted, this signal indicates that the core is reading data from the user design for a
write command. This signal can be tied to the chip select of a buffer in the user design.
wr_data_offset
This bus is used to step through the data buffer when the burst length requires more than
a single cycle to complete. This bus, in combination with wr_data_addr, can be applied to
the address input of a buffer in the user design.
rd_data
This bus is the data that was read from the external memory. It can be connected to the data
input of a buffer in the user design.
rd_data_addr
This bus is an echo of data_buf_addr when the current read request is submitted. This
bus can be combined with the rd_data_offset signal and applied to the address input of
a buffer in the user design.
rd_data_en
This signal indicates when valid read data is available on rd_data for a read request. It can
be tied to the chip select and write enable of a buffer in the user design.
rd_data_offset
This bus is used to step through the data buffer when the burst length requires more than
a single cycle to complete. This bus can be combined with rd_data_addr and applied to
the address input of a buffer in the user design.
app_ref_req
app_ref_ack
app_zq_req
app_zq_ack
CLK
RANK BANK ROW COLUMN
CMD HI?PRIORITY
ACCEPT
USE?ADDR
DATA?BUF?ADDR
WR?DATA?EN
WR?DATA?ADDR
WR?DATA?MASK
RD?DATA?EN
RD?DATA?ADDR
RD?DATA $n$ $n$
5'?C??
Requests are presented to the native interface as an address and a command. The address
is composed of the bank, row, and column inputs. The command is encoded on the cmd
input.
The address and command are presented to the native interface one state before they are
validated with the use_addr signal. The memory interface indicates that it can accept the
request by asserting the accept signal. Requests are confirmed as accepted when
use_addr and accept are both asserted in the same clock cycle. If use_addr is asserted
but accept is not, the request is not accepted and must be repeated. This behavior is shown
in Figure 4-31.
X-Ref Target - Figure 4-31
CLK
RANK BANK ROW COLUMN
CMD HI?PRIORITY
ACCEPT
USE?ADDR
DATA?BUF?ADDR
5'?C??
In Figure 4-31, requests 1 and 2 are accepted normally. The first time request 3 is presented,
accept is driven Low, and the request is not accepted. The user design retries request 3,
which is accepted on the next attempt. Request 4 is subsequently accepted on the first
attempt.
The data_buf_addr bus must be supplied with requests. This bus is an address pointer into
a buffer that exists in the user design. It tells the core where to locate data when processing
write commands and where to place data when processing read commands. When the core
processes a command, the core echoes data_buf_addr back to the user design by
wr_data_addr for write commands and rd_data_addr for read commands. This behavior is
shown in Figure 4-32. Write data must be supplied in the same clock cycle that
wr_data_en is asserted.
X-Ref Target - Figure 4-32
CLK
WR?DATA?EN
WR?DATA?ADDR
WR?DATA?OFFSET
WR?DATA $ $ $ $ $ $ $ $ $ $ $ $
WR?DATA?MASK
4WO "ACK TO "ACK
$ATA "URSTS
RD?DATA?EN
RD?DATA?ADDR
RD?DATA?ADDR
RD?DATA $ $ $ $ $ $ $ $ $ $ $ $
5'?C??
Transfers can be isolated with gaps of non-activity, or there can be long bursts with no gaps.
The user design can identify when a request is being processed and when it finishes by
monitoring the rd_data_en and wr_data_en signals. When the rd_data_en signal is
asserted, the Memory Controller has completed processing a read command request.
Similarly, when the wr_data_en signal is asserted, the Memory Controller is processing a
write command request.
When NORM ordering mode is enabled, the Memory Controller reorders received requests
to optimize throughput between the FPGA and memory device. The data is returned to the
user design in the order processed, not the order received. The user design can identify the
specific request being processed by monitoring rd_data_addr and wr_data_addr.
These fields correspond to the data_buf_addr supplied when the user design submits
the request to the native interface. Both of these scenarios are depicted in Figure 4-32.
The native interface is implemented such that the user design must submit one request at
a time and, thus, multiple requests must be submitted in a serial fashion. Similarly, the core
must execute multiple commands to the memory device one at a time. However, due to
pipelining in the core implementation, read and write requests can be processed in parallel
at the native interface.
If you are customizing and generating the core in the Vivado IP integrator, see the Vivado
Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994) [Ref 4] for
detailed information. IP integrator might auto-compute certain configuration values when
validating or generating the design. To check whether the values do change, see the
description of the parameter in this chapter. To view the parameter value you can run the
validate_bd_design command in the Tcl Console.
For details, see the Vivado Design Suite User Guide: Designing with IP (UG896) [Ref 5] and
the Vivado Design Suite User Guide: Getting Started (UG910) [Ref 6].
Note: Figures in this chapter are illustrations of the Vivado Integrated Design Environment (IDE).
This layout might vary from the current version.
Controller Options
X-Ref Target - Figure 5-1
Only one controller instance can be created and only two kinds of controllers are available
for instantiation:
• DDR3
• DDR4
1. After a controller is added with the Add button, the Common and I/O Planning tabs
are enabled.
2. Copy and Delete are not available in the release.
Figure 5-2: Vivado Customize IP Dialog Box – Common and I/O Planning
3. After adding a controller, select the controller row in the table and click the Edit button
to edit controller options. Controller options are divided into two tabs as shown in
Figure 5-3 and Figure 5-4.
X-Ref Target - Figure 5-3
IMPORTANT: All parameters shown in the controller options dialog are limited selection options in this
release.
• Bank Assignment
• Individual Pin Assignment
X-Ref Target - Figure 5-5
Bank Assignment dialog provides a method to assign bank bytes to signal groups. At the
first level, it shows the list of I/O banks available for MIG. All available byte groups are
shown inside each bank. Warnings are issued regarding rule violations by a change in color
of the selected options and the log window at the bottom of the dialog.
Pin swapping or completely new pin assignment is possible through the various methods
provided by the pin planner.
X-Ref Target - Figure 5-7
Limitations
There are certain limitations for this release. These are relaxed through the production
release in a staged fashion.
1. System clock selection is available through Pin Planner Lite with additional DRC for
2013.4.
2. Only one controller instance can be created. Multi-controller is not supported.
3. Only DDR3, DDR4, and RLDRAM 3 controllers are available.
4. AXI interface is not available.
5. Simulation model is not generated.
6. User selection of data width is supported up to 80-bit for DDR3 or DDR4. For RLDRAM 3,
36-bit data width is supported.
7. Memory interfaces can be assigned to High Performance (HP) or High Range (HR) banks
depending on design requirements.
8. Memory port allocation can spread across five banks for DDR3 or DDR4 interfaces.
Whereas for RLDRAM 3, span is limited to two banks.
9. Example design with Logic Debug cores (ILA, VIO) is not available.
10. Memory clock to FPGA logic clock ratio is restricted to 4:1.
11. DCI cascade is always enabled for DDR4 interfaces.
12. Internal V REF is always required for DDR4 interfaces. User selection in the Vivado IDE will
not have any effect.
13. Calibration block is included but calibration routines are disabled. Fixed delays are
provided for proper write and read operations.
14. The dialog box for I/O planning at a pin-level takes about eight seconds to open up.
15. While the I/O planning view allows for changes to I/O Standard for pin locations, these
are enabled for selection with DRCs.
16. There is no provision for reading an existing constraints file for importing I/O locations.
Output Generation
For details, see the Vivado Design Suite User Guide: Designing with IP (UG896) [Ref 5].
Required Constraints
The MIG Vivado IDE generates the required constraints. A location constraint and an I/O
standard constraint are added for each external pin in the design. The location is chosen by
the Vivado IDE according to the banks and byte lanes chosen for the design.
The I/O standard is chosen by the memory type selection and options in the Vivado IDE and
by the pin type. A sample for dq[0] is shown here.
Internal VREF is always used for DDR4. Internal VREF is optional for DDR3. A sample for DDR4
is shown here.
IMPORTANT: Do not alter these constraints. If the pin locations need to be altered, rerun the MIG
Vivado IDE to generate a new XDC file.
Simulation
For comprehensive information about Vivado® simulation components, as well as
information about using supported third party tools, see the Vivado Design Suite User
Guide: Logic Simulation (UG900) [Ref 7].
Example Design
This chapter contains information about the provided example design in the Vivado ®
Design Suite environment.
Vivado supports Open IP Example Design flow. To create the example design using this flow,
right-click the IP in the Source Window, as shown in Figure 9-1 and select Open IP
Example Design.
X-Ref Target - Figure 9-1
This option creates a new Vivado project. Upon selecting the menu, a dialog box to enter
the directory information for the new design project opens.
Select a directory, or use the defaults, and click OK. This launches a new Vivado with all of
the example design files and a copy of the IP. This project has example_top as the
Implementation top directory and sim_tb_top as the Simulation top directory, as shown
in Figure 9-2.
IMPORTANT: Xilinx UNISIMS_VER and SECUREIP library must be mapped into the simulator.
<project_dir>/example_project/<Component_Name>example/
<Component_Name>_example.srcs/sim_1/imports/<Component_Name>/tb
If the MIG design is generated with the Component Name entered in the Vivado IDE as
mig_0, the simulation directory path is the following:
<project_dir>/example_project/mig_0_example/mig_0_example.srcs/
sim_1/imports/mig_0/tb
Copy the memory models in the above directory and see the readme.txt file located in
the folder for running simulations.
The Questa ® SIM simulation tool is used for verification of MIG IP at each software release.
Script file to run simulations with Questa SIM is generated in MIG generated output. MIG
designs are not verified with Vivado Simulator. Other simulation tools can be used for MIG
IP simulation but are not specifically verified by Xilinx.
Test Bench
This chapter contains information about the provided test bench in the Vivado ® Design
Suite environment.
The intent of the performance test bench is for you to obtain an estimate on the efficiency
for a given traffic pattern with the MIG controller. The test bench passes your supplied
commands and address to the Memory Controller and measures the efficiency for the given
pattern. The efficiency is measured by the occupancy of the dq bus. The primary use of the
test bench is for efficiency measurements so no data integrity checks are performed. Static
data is written into the memory during write transactions and the same data is always read
back.
Stimulus Pattern
Each stimulus pattern is 48 bits and the format is described in Table 10-2 and Table 10-3.
For example, an eight bank configuration only bank Bits[2:0] is sent to the Memory
Controller and the remaining bits are ignored. The extra bits for an address field are
provided for you to enter the address in a hexadecimal format. You must confirm the value
entered corresponds to the width of a given configuration.
The address is assembled based on the top-level MEM_ADDR_ORDER parameter and sent to
the user interface.
Bus Utilization
The bus utilization is calculated at the user interface taking total number of Reads and
Writes into consideration and the following equation is used:
bw_cumulative = --------------------------------------------------------------------------------
Example Patterns
These examples are based on the MEM_ADDR_ORDER set to BANK_ROW_COLUMN.
00_0_2_000F_00A_0 – This pattern is a single write to 10 th column, 15th row, and second bank.
00_0_2_000F_00A_1 – This pattern is a single read from 10th column, 15th row, and second bank.
X-Ref Target - Figure 10-3
0A_0_0_0010_000_0 – This corresponds to 10 writes with address starting from 0 to 80 which can be seen in the column.
X-Ref Target - Figure 10-4
0A_0_0_0010_000_1 – This corresponds to 10 reads with address starting from 0 to 80 which can be seen in the column.
Send Feedback
0A_0_2_000F_3F8_0 – This corresponds to 10 writes with column address wrapped to the starting of the page after one
write.
X-Ref Target - Figure 10-6
<project_dir>/example_project/<Component_Name>example/
<Component_Name>_example.srcs/sim_1/imports/<Component_Name>/tb
<project_dir>/example_project/<Component_Name>example/
<Component_Name>_example.srcs/sim_1/imports/<Component_Name>/tb
Product Specification
Core Architecture
Simulation
Example Design
Test Bench
Product Specification
Standards
For more information on UltraScale™ architecture documents, see References, page 129.
Performance
Maximum Frequencies
For more information on the maximum frequencies, see Kintex UltraScale Architecture Data
Sheet, DC and AC Switching Characteristics (DS892) [Ref 2].
Resource Utilization
Kintex UltraScale Devices
Table 11-1 provides approximate resource counts on Kintex® UltraScale™ devices.
Resources required for the UltraScale architecture-based FPGAs MIS core have been
estimated for the Kintex UltraScale devices (Table 11-1). These values were generated using
Vivado® IP catalog. They are derived from post-synthesis reports, and might change during
implementation.
Port Descriptions
There are three port categories at the top-level of the memory interface core called the
“user design.”
• The first category are the memory interface signals that directly interfaces with the
RLDRAM. These are defined by the RLDRAM 3 specification.
• The second category are the application interface signals which is the “user interface.”
These are described in the Protocol Description in Chapter 13.
• The third category includes other signals necessary for proper operation of the core.
These include the clocks, reset, and status signals from the core. The clocking and reset
signals are described in their respective sections.
Core Architecture
This chapter describes the UltraScale™ architecture-based FPGAs Memory Interface
Solutions core with an overview of the modules and interfaces.
Overview
Figure 12-1 shows a high-level block diagram of the RLDRAM 3 memory interface solution.
This figure shows both the internal FPGA connections to the user interface for initiating
read and write commands, and the external interface to the memory device.
8VHU,QWHUIDFH
3K\VLFDO
8OWUD6FDOH$UFKLWHFWXUH%DVHG)3*$V ,QWHUIDFH
FON UOGBFNBS FN
V\VBUVW UOGBFNBQ FNBQ
UVWBFON UOGBGNBS GN
UVW UOGBGNBQ GNBQ
PHPBUHIFON UOGBFVBQ FVBQ
IUHTBUHIFON UOGBZHBQ ZHBQ
FONBUHI UOGBUHIBQ UHIBQ
SOOBORFN UOGBD D 5/'5$0
V\QFBSXOVH UOGBED ED
XVHUBFPGBHQ UOGBGT GT
XVHUBFPG UOGBGP GP
XVHUBDGGU UOGBTNBS TN
XVHUBED UOGBTNBQ TNBQ
XVHUBZUBHQ UOGBTYOG TYOG
XVHUBZUBGDWD UOGBUHVHWBQ UHVHWBQ
XVHUBZUBGP
XVHUBDILIRBHPSW\
XVHUBDILIRBIXOO
XVHUBDILIRBDHPSW\
XVHUBDILIRBDIXOO
XVHUBZGILIRBHPSW\
XVHUBZGILIRBIXOO
XVHUBZGILIRBDHPSW\
XVHUBZGILIRBDIXOO
XVHUBUGBYDOLG
XVHUBUGBGDWD
LQLWBFDOLEBFRPSOHWH
Figure 12-2 shows the UltraScale architecture-based FPGAs Memory Interface Solutions
diagram.
8OWUD6FDOH$UFKLWHFWXUH%DVHG)3*$V
8OWUD6FDOH$UFKLWHFWXUH%DVHG)3*$V0HPRU\,QWHUIDFH6ROXWLRQ
0HPRU\
&RQWUROOHU
8VHU¶V)3*$ 8VHU 3K\VLFDO 5/'5$0
/RJLF LQWHUIDFH ,QLWLDOL]DWLRQ /D\HU
&DOLEUDWLRQ
&DO'RQH
5HDG'DWD
The user interface uses a simple protocol based entirely on SDR signals to make read and
write requests. See User Interface in Chapter 13 for more details describing this protocol.
The Memory Controller takes commands from the user interface and adheres to the
protocol requirements of the RLDRAM 3 device. See Memory Controller for more details.
The physical interface generates the proper timing relationships and DDR signaling to
communicate with the external memory device, while conforming to the RLDRAM 3
protocol and timing requirements. See Physical Interface in Chapter 13 for more details.
Memory Controller
The Memory Controller (MC) enforces the RLDRAM 3 access requirements and interfaces
with the PHY. The controller processes commands in order, so the order of commands
presented to the controller is the order in which they are presented to the memory device.
The MC first receives commands from the user interface and determines if the command
can be processed immediately or needs to wait. When all requirements are met, the
command is placed on the PHY interface. For a write command, the controller generates a
signal for the user interface to provide the write data to the PHY. This signal is generated
based on the memory configuration to ensure the proper command-to-data relationship.
Auto-refresh commands are inserted into the command flow by the controller to meet the
memory device refresh requirements.
For CIO devices, the data bus is shared for read and write data. Switching from read
commands to write commands and vice versa introduces gaps in the command stream due
to switching the bus. For better throughput, changes in the command bus should be
minimized when possible.
The controller remains in the CTL_IDLE state until calibration completes. When the
calibration done signal is asserted and a command request is received, the state machine
transitions to CTL_LOAD_CMD2 through the CTL_LOAD_CMD1 state (essentially a pipeline
state).
For a single command request the controller state machine transitions from
CTL_LOAD_CMD2 to CTL_PROC_LAST_CMD and back to CTL_IDLE.
For multiple commands to the same RLDRAM 3 bank and CMD_PER_CLK = 1, the state
machine transitions from CTL_LOAD_CMD2 → CTL_PROC_CMD → CTL_PROC_CMD1 →
CTL_PROC_LAST_CMD → CTL_IDLE.
For multiple commands to the same RLDRAM 3 bank and CMD_PER_CLK > 1, the state
machine transitions from CTL_LOAD_CMD2 → CTL_PROC_CMD → CTL_PROC_CMD1 →
CTL_PROC_LAST_CMD → CTL_PROC_LAST_CMD1 → CTL_IDLE.
For multiple commands to different RLDRAM 3 banks and CMD_PER_CLK = 1, the state
machine transitions from CTL_LOAD_CMD2 → CTL_PROC_CMD → CTL_PROC_LAST_CMD →
CTL_IDLE.
For multiple commands to different RLDRAM 3 banks and CMD_PER_CLK > 1, the state
machine transitions from CTL_LOAD_CMD2 → CTL_PROC_CMD → CTL_PROC_LAST_CMD →
CTL_PROC_LAST_CMD1 → CTL_IDLE.
Figure 12-3 shows the state machine logic for the controller.
X-Ref Target - Figure 12-3
REFR?DONE
REFR?DONE
CAL?DONE
CMD?EMPTY
#4,?02
#4,?)$,
/#?2%&
%
2
REFR?REQ
#4,?,/!
$?#-$ CMD?EMPTY
CMD?EMPTY
RD?GRANT;= \\ WR?GRANT;=
BANKS?MATCH?R;= \\
WR?TO?RD?R
RD?GRANT;= \\ WR?GRANT;= \\
#4,?02 #4,?02 #4,?02
NOP?REQ;=
/#?#-$ /#?,!3 /#?,!3
4?#-$ RD?GRANT;= 4?#-$
WR?GRANT;= RD?GRANT;=
#-$?0%2?#, WR?GRANT;=
+ \\
RD?GRANT;= NOP?REQ;=
WR?GRANT;=
NOP?REQ;= RD?GRANT;= \\ WR?GRANT;=
CMD?EMPTY
REFR?REQ
8
PHY
PHY is considered the low-level physical interface to an external RLDRAM 3 device as well as
all calibration logic for ensuring reliable operation of the physical interface itself. PHY
generates the signal timing and sequencing required to interface to the memory device.
• Clock/address/control-generation logics
• Write and read datapaths
• Logic for initializing the SDRAM after power-up
In addition, PHY contains calibration logic to perform timing training of the read and write
datapaths to account for system static and dynamic delays.
The Memory Controller and calibration logic communicate with this dedicated PHY in the
slow frequency clock domain, which is either divided by 4 or divided by 2. This depends on
the RLDRAM 3 memory clock. A more detailed block diagram of the PHY design is shown in
Figure 12-4.
8OWUD6FDOH$UFKLWHFWXUH%DVHG)3*$V0HPRU\,QWHUIDFH6ROXWLRQ
''5$GGUHVV
SOOFONV
&RQWURO:ULWH'DWD
DQG0DVN SOOY
&0':ULWH'DWD
0HPRU\&RQWUROOHU UHIFONV
SOO*DWH
UOGBSK\Y
8VHU
UOGB[LSK\Y UOGBLREY
,QWHUIDFH
UOGBFDOY
UOGBFDOBDGUB
GHFRGHY
0LFUR%OD]H
&DO'RQH
FRQILJBURPY
5HDG'DWD
5HDG'DWD
6WDWXV
&DO'RQH
The MC is designed to separate out the command processing from the low-level PHY
requirements to ensure a clean separation between the controller and physical layer. The
command processing can be replaced with custom logic if desired, while the logic for
interacting with the PHY stays the same and can still be used by the calibration logic.
The PHY architecture encompasses all of the logic contained in rld_xiphy.v. The PHY
contains wrappers around dedicated hard blocks to build up the memory interface from
smaller components. A byte lane contains all of the clocks, resets, and datapaths for a given
subset of I/O. Multiple byte lanes are grouped together, along with dedicated clocking
resources, to make up a single bank memory interface. For more information on the hard
silicon physical layer architecture, see the UltraScale™ Architecture-Based FPGAs SelectIO™
Resources User Guide (UG571) [Ref 3].
The address unit connects the MCS to the local register set and the PHY by performing
address decode and control translation on the I/O module bus from spaces in the memory
map and MUXing return data (rld_cal_adr_decode.v). In addition, it provides address
translation (also known as “mapping”) from a logical conceptualization of the DRAM
interface to the appropriate pinout-dependent location of the delay control in the PHY
address space.
Although the calibration architecture presents a simple and organized address map for
manipulating the delay elements for individual data, control and command bits, there is
flexibility in how those I/O pins are placed. For a given I/O placement, the path to the FPGA
logic is locked to a given pin. To enable a single binary software file to work with any
memory interface pinout, a translation block converts the simplified RIU addressing into
the pinout-specific RIU address for the target design. The specific address translation is
written by MIG after a pinout is selected. The code shows an example of the RTL structure
that supports this.
In this example, DQ0 is pinned out on Bit[0] of nibble 0 (nibble 0 according to instantiation
order). The RIU address for the ODELAY for Bit[0] is 0x0D (for more details on the RIU
address map, see the RIU specification). When DQ0 is addressed — indicated by address
0x000_4100), this snippet of code is active. It enables nibble 0 (decoded to one-hot
downstream) and forwards the address 0x0D to the RIU address bus.
The MicroBlaze I/O module interface updates at a maximum rate of once every three clock
cycles, which is not always fast enough for implementing all of the functions required in
calibration. A helper circuit implemented in rld_cal_adr_decode.v is required to
obtain commands from the registers and translate at least a portion into single-cycle
accuracy for submission to the PHY. In addition, it supports command repetition to enable
back-to-back read transactions and read data comparison.
Figure 12-5 shows the overall flow of memory initialization and the different stages of
calibration.
X-Ref Target - Figure 12-5
6\VWHP5HVHW
%,6&
5/'5$0,QLWLDOL]DWLRQ
5HDG&ORFN*DWH&DOLEUDWLRQ
5HDG/HYHOLQJXVLQJVWHSIXQFWLRQ $IXOOEXUVWRIVIROORZHGE\DIXOOEXUVWRIV
5HDGSHUELWGHVNHZ
&HQWHU5HDG&ORFNLQ5HDG'4ZLQGRZ
49$/,'FDOLEUDWLRQ
5HDG%LW6OLS
5HDG%\WH$OLJQ $OLJQV49$/,'V
35%6RU&RPSOH[SDWWHUQ5HDG/HYHOLQJ 5HDG&ORFNFHQWHULQJLQUHDG'4
ZLQGRZZLWK35%6SDWWHUQWRDFFRXQWIRU,6,HIIHFWVDQGSHUELWGHVNHZ
:ULWH&ORFNFHQWHULQJDQGSHUELWGHVNHZZLWKFRPSOH[SDWWHUQ
6DQLW\&KHFN (QVXUHZULWHVDQGUHDGVDUHFRUUHFWEHIRUHDVVHUWLQJFDOBGRQH
(QDEOH977UDFNLQJ
&DOLEUDWLRQ&RPSOHWH
Clocking
The memory interface requires one PLL in each bank that is occupied by the interface. There
are two PLLs per bank. If a bank is shared by two interfaces, both PLLs in that bank are used.
The memory interface requires a high quality reference clock. This clock should come from
a differential pair on the same column of the FPGA that the memory interface occupies. The
memory interface PLLs generate the appropriate frequencies and phases necessary for
proper operation. An output clock is provided for the FPGA logic which includes the
controller and the interface logic. This output clock is 1/4 of the memory interface clock in
the 4:1 memory clock to FPGA logic clock mode.
Resets
An asynchronous reset input is provided. This active-High reset must assert for a minimum
of 20 cycles of the controller clock.
Note: Note that pin 12 is not part of a pin pair and must not be used for differential clocks.
d. qvld signal must be placed on pins N2 to N12 but first priority must be given for dq
and dm allocation. qvld0 signal must be placed on pin N11 of byte lane (if available,
dm is disabled), pin N12 of byte lane of the qk0/qk0_n data byte lane and qvld1
signal must be placed on pin N11 of byte lane (if available), or pin N12 of byte lane
of the qk2/qk2_n data byte lane
2. Byte lanes are configured as either data or address/control.
a. Pin N12 can be used for address/control in a data byte lane.
b. No data signals (qvalid, dq, dm) can be placed in an address/control byte lane.
3. Address/control can be on any of the 13 pins in the address/control byte lanes. Address/
control must be contained within the same bank. Address/control must be in the
centermost bank.
4. There is one vr pin per bank and DCI is required. DCI cascade is not permitted. All rules
for the DCI in the UltraScale™ Architecture-Based FPGAs SelectIO™ Resources User Guide
(UG571) [Ref 3] must be followed.
5. ck must be on the PN pair in the Address/Control byte lane.
6. reset_n can be on any pin as long as FPGA logic timing is met and I/O standard can be
accommodated for the chosen bank (LVCMOS12).
7. Banks can be shared between two controllers.
a. Each byte lane is dedicated to a specific controller (except for reset_n).
b. Byte lanes from one controller cannot be placed inside the other. For example, with
controllers A and B, “AABB” is allowed, while “ABAB” is not.
8. All I/O banks used by the memory interface must be in the same column.
9. Maximum height of interface is three contiguous banks for 72-bit wide interface.
10. Bank skipping is not allowed.
11. The input clock for the master PLL in the interface must come from the a clock capable
pair in the I/O column used for the memory interface.
12. There are dedicated V REF pins (not included in the rules above). If an external V REF is not
used, the V REF pins should be pulled to ground by a 500Ω resistor. For more information,
see the UltraScale™ Architecture-Based FPGAs SelectIO™ Resources User Guide (UG571)
[Ref 3]. These pins must be connected appropriately for the standard in use.
13. The interface must be contained within the same I/O bank type (High Range or High
Performance). Mixing bank types is not permitted with the exceptions of the reset_n
in step 6 above and the input clock mentioned in step 11 above.
1 qvld0 T3U_12 – –
1 dq8 T3U_11 N –
1 dq7 T3U_10 P –
1 dq6 T3U_9 N –
1 dq5 T3U_8 P –
1 dq4 T3U_7 N DBC-N
1 dq3 T3U_6 P DBC-P
1 dq2 T3L_5 N –
1 dq1 T3L_4 P –
1 dq0 T3L_3 N –
1 dm0 T3L_2 P –
1 qk0_n T3L_1 N DBC-N
1 qk0_p T3L_0 P DBC-P
1 reset_n T2U_12 – –
1 we# T2U_11 N –
1 a18 T2U_10 P –
1 a17 T2U_9 N –
1 a14 T2U_8 P –
1 a13 T2U_7 N QBC-N
1 a10 T2U_6 P QBC-P
1 a9 T2L_5 N –
1 a8 T2L_4 P –
1 a5 T2L_3 N –
1 a4 T2L_2 P –
1 a3 T2L_1 N QBC-N
1 a0 T2L_0 P QBC-P
1 – T1U_12 – –
1 ba3 T1U_11 N –
1 ba2 T1U_10 P –
1 ba1 T1U_9 N –
1 ba0 T1U_8 P –
1 dk1_n T1U_7 N QBC-N
1 dk1_p T1U_6 P QBC-P
1 dk0_n T1L_5 N –
1 dk0_p T1L_4 P –
1 ck_n T1L_3 N –
1 ck_p T1L_2 P –
1 ref_n T1L_1 N QBC-N
1 cs_n T1L_0 P QBC-P
1 vr T0U_12 – –
1 dq17 T0U_11 N –
1 dq16 T0U_10 P –
1 dq15 T0U_9 N –
1 dq14 T0U_8 P –
1 dq13 T0U_7 N DBC-N
1 dq12 T0U_6 P DBC-P
1 dq11 T0L_5 N –
1 dq10 T0L_4 P –
1 dq9 T0L_3 N –
1 dm1 T0L_2 P –
1 qk1_n T0L_1 N DBC-N
1 qk1_p T0L_0 P DBC-P
Protocol Description
This core has the following interfaces:
• Memory Interface
• User Interface
• Physical Interface
Memory Interface
The RLDRAM 3 memory interface solution is customizable to support several
configurations. The specific configuration is defined by Verilog parameters in the top-level
of the core.
User Interface
The user interface connects to an FPGA user design to the RLDRAM 3 memory solutions
core to simplify interactions between you and the external memory device.
Figure 13-1 shows the user_cmd signal and how it is made up of multiple commands
depending on the configuration.
X-Ref Target - Figure 13-1
ST
&0'! ,OGIC #LOCK TH RD ND
2,$2!- ",
USER?CMD
Figure 13-1: Multiple Commands for user_cmd Signal
The user interface protocol for the RLDRAM 3 four-word burst architecture is shown in
Figure 13-2.
X-Ref Target - Figure 13-2
#,+
USER?CMD?EN
USER?WR?EN
[./0 [./0
[FALL [FALL
./0 ./0
RISE RISE
./0 ./0
FALL FALL
./0 ./0
USER?WR?DM RISE RISE
FALL FALL
FALL ./0
RISE RISE
RISE ./0
FALL FALL
FALL ./0
RISE] RISE]
RISE] ./0]
USER?AFIFO?FULL
USER?WDFIFO?FULL
8
Before any requests can be accepted, the ui_clk_sync_rst signal must be deasserted
Low. After the ui_clk_sync_rst signal is deasserted, the user interface FIFOs can accept
commands and data for storage. The init_calib_complete signal is asserted after the
memory initialization procedure and PHY calibration are complete, and the core can begin
to service client requests.
#,+
USER?CMD?EN
USER?ADDR ! ! ! ! ! !
USER?WR?EN
USER?AFIFO?FULL
USER?WDFIFO?FULL
8
When a read command is issued some time later (based on the configuration and latency of
the system), the user_rd_valid[0] signal is asserted, indicating that user_rd_data is
now valid, while user_rd_valid[1] is asserted indicating that user_rd_data is valid,
as shown in Figure 13-4. The read data should be sampled on the same cycle that
user_rd_valid[0] and user_rd_valid[1] are asserted because the core does not
buffer returning data. This functionality can be added in by you, if desired.
The Memory Controller only puts commands on certain slots to the PHY such that the
user_rd_valid signals are all asserted together and return the full width of data, but the
extra user_rd_valid signals are provided in case of controller modifications.
X-Ref Target - Figure 13-4
#,+
USER?RD?VALID;=
USER?RD?VALID;=
USER?RD?DATA [FALL RISE FALL RISE] [FALL RISE FALL RISE] [FALL RISE FALL RISE] [$.# $.# FALL RISE] [FALL RISE $.# $.#]
5'?C??
Physical Interface
The physical interface is the connection from the FPGA memory interface solution to an
external RLDRAM 3 device. The I/O signals for this interface are defined in Table 13-3. These
signals can be directly connected to the corresponding signals on the RLDRAM 3 device.
If you are customizing and generating the core in the Vivado IP integrator, see the Vivado
Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994) [Ref 4] for
detailed information. IP integrator might auto-compute certain configuration values when
validating or generating the design. To check whether the values do change, see the
description of the parameter in this chapter. To view the parameter value you can run the
validate_bd_design command in the Tcl Console.
For details, see the Vivado Design Suite User Guide: Designing with IP (UG896) [Ref 5] and
the Vivado Design Suite User Guide: Getting Started (UG910) [Ref 6].
Limitations
For details on limitations, see Limitations, page 79.
Output Generation
For details, see the Vivado Design Suite User Guide: Designing with IP (UG896) [Ref 5].
Simulation
For comprehensive information about Vivado® simulation components, as well as
information about using supported third party tools, see the Vivado Design Suite User
Guide: Logic Simulation (UG900) [Ref 7].
Example Design
There is no example design for this IP core release.
Test Bench
There is no test bench for this IP core release.
Debugging
Additional Resources
Debugging
This appendix includes details about resources available on the Xilinx Support website and
debugging tools.
TIP: If the IP generation halts with an error, there might be a license issue. See License Checkers in
Chapter 1 for more details.
Documentation
This product guide is the main document associated with the MIS. This guide, along with
documentation related to all products that aid in the design process, can be found on the
Xilinx Support web page (www.xilinx.com/support) or by using the Xilinx Documentation
Navigator.
Download the Xilinx Documentation Navigator from the Design Tools tab on the Downloads
page (www.xilinx.com/download). For more information about this tool and the features
available, open the online help after installation.
Solution Centers
See the Xilinx Solution Centers for support on devices, software tools, and intellectual
property at all stages of the design cycle. Topics include design assistance, advisories, and
troubleshooting tips.
The Solution Center specific to the MIS core is located at Xilinx MIG Solution Center.
Answer Records
Answer Records include information about commonly encountered problems, helpful
information on how to resolve these problems, and any known issues with a Xilinx product.
Answer Records are created and maintained daily ensuring that users have access to the
most accurate information available.
Answer Records for this core can also be located by using the Search Support box on the
main Xilinx support web page. To maximize your search results, use proper keywords such
as:
• Product name
• Tool message(s)
• Summary of the issue encountered
A filter search is available after results are returned to further target the results.
AR: 58435
1. Navigate to www.xilinx.com/support.
2. Open a WebCase by selecting the WebCase link located under Additional Resources.
Debug Tools
There are many tools available to address MIS design issues. It is important to know which
tools are useful for debugging various situations.
The Vivado lab tools logic analyzer is used to interact with the logic debug LogiCORE IP
cores, including:
See Vivado Design Suite User Guide: Programming and Debugging (UG908) [Ref 9].
Additional Resources
Xilinx Resources
For support resources such as Answers, Documentation, Downloads, and Forums, see the
Xilinx Support website at:
www.xilinx.com/support.
www.xilinx.com/company/terms.htm.
References
These documents provide supplemental material useful with this product guide:
1. JESD79-3F, DDR3 SDRAM Standard and JESD79-4, DDR4 SDRAM Standard, JEDEC ® Solid
State Technology Association
2. Kintex ® UltraScale™ Architecture Data Sheet: DC and AC Switching Characteristics
(DS892)
3. UltraScale Architecture SelectIO™ Resources User Guide (UG571)
4. Vivado ® Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994)
5. Vivado Design Suite User Guide: Designing with IP (UG896)
6. Vivado Design Suite User Guide: Getting Started (UG910)
7. Vivado Design Suite User Guide: Logic Simulation (UG900)
8. Vivado Design Suite User Guide: Implementation (UG904)
9. Vivado Design Suite User Guide: Programming and Debugging (UG908)
10. ISE ® to Vivado Design Suite Migration Guide (UG911)
Revision History
The following table shows the revision history for this document.
Notice of Disclaimer
The information disclosed to you hereunder (the “Materials”) is provided solely for the selection and use of Xilinx products. To the
maximum extent permitted by applicable law: (1) Materials are made available “AS IS” and with all faults, Xilinx hereby DISCLAIMS
ALL WARRANTIES AND CONDITIONS, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY, NON-INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE; and (2) Xilinx shall not be liable (whether
in contract or tort, including negligence, or under any other theory of liability) for any loss or damage of any kind or nature related
to, arising under, or in connection with, the Materials (including your use of the Materials), including for any direct, indirect,
special, incidental, or consequential loss or damage (including loss of data, profits, goodwill, or any type of loss or damage
suffered as a result of any action brought by a third party) even if such damage or loss was reasonably foreseeable or Xilinx had
been advised of the possibility of the same. Xilinx assumes no obligation to correct any errors contained in the Materials or to
notify you of updates to the Materials or to product specifications. You may not reproduce, modify, distribute, or publicly display
the Materials without prior written consent. Certain products are subject to the terms and conditions of the Limited Warranties
which can be viewed at http://www.xilinx.com/warranty.htm; IP cores may be subject to warranty and support terms contained in
a license issued to you by Xilinx. Xilinx products are not designed or intended to be fail-safe or for use in any application requiring
fail-safe performance; you assume sole risk and liability for use of Xilinx products in Critical Applications: http://www.xilinx.com/
warranty.htm#critapps.
© Copyright 2013 Xilinx, Inc. Xilinx, the Xilinx logo, Artix, ISE, Kintex, Spartan, UltraScale, Virtex, Vivado, Zynq, and other
designated brands included herein are trademarks of Xilinx in the United States and other countries. All other trademarks are the
property of their respective owners.