0% found this document useful (0 votes)
116 views

Practical Introduction To PCI Express With FPGAs - Extended

This document provides an overview of PCI Express (PCIe) and how it can be used with FPGAs. It describes the PCIe system architecture including the protocol layers, transaction types, and flow control. It then discusses different options for implementing PCIe in FPGAs using hard IP cores from Altera and Xilinx as well as soft IP cores. It provides examples of reference designs and application notes that demonstrate how to design and simulate PCIe interfaces using the various vendor IP solutions.

Uploaded by

NH Hoang Xuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views

Practical Introduction To PCI Express With FPGAs - Extended

This document provides an overview of PCI Express (PCIe) and how it can be used with FPGAs. It describes the PCIe system architecture including the protocol layers, transaction types, and flow control. It then discusses different options for implementing PCIe in FPGAs using hard IP cores from Altera and Xilinx as well as soft IP cores. It provides examples of reference designs and application notes that demonstrate how to design and simulate PCIe interfaces using the various vendor IP solutions.

Uploaded by

NH Hoang Xuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Practical introduction to PCI

Express with FPGAs

Michal HUSEJKO, John EVANS


[email protected]
IT-PES-ES

v 1.0
Agenda
• What is PCIe ?
o System Level View
o PCIe data transfer protocol

• PCIe system architecture


• PCIe with FPGAs
o Hard IP with Altera/Xilinx FPGAs
o Soft IP (PLDA)
o External PCIe PHY (Gennum)

v 1.0
System Level View
• Interconnection
• Top-down tree hierarchy
• PCI/PCIe configuration space
• Protocol

v 1.0
Interconnection
• Serial interconnection
• Dual uni-directional
• Lane, Link, Port
• Scalable
o Gen1 2.5/ Gen2 5.0/ Gen3 8.0 GT/s
o Number of lanes in FPGAs: x1, x2, x4, x8

• Gen1/2 8b10b
• Gen3 128b/130b

v 1.0 Image taken from “Introduction to PCI Express”


Tree hierarchy
• Top-down tree hierarchy with single host
• 3 types of devices: Root Complex, Endpoint, Switch
• Point-to-point connection between devices without sideband
signalling
• 2 types of ports: downstream/upstream
• Configuration space

Image taken from “Introduction to PCI Express”

v 1.0
PCIe Configuration space
• Similar to PCI conf space – binary compatible for
first 256 bytes
• Defines device(system) capabilities
• Clearly identifies device in the system
o Device ID
o Vendor ID
o Function ID
o All above

• and defines memory space allocated to device.

v 1.0
PCIe transfer protocol
• Transaction categories
• Protocol
• Implementation of the protocol

v 1.0
Transaction categories
• Configuration – move downstream
• Memory – address based routing
• IO – address based routing
• Message – ID based routing

v 1.0
Transaction Types

v 1.0 Table taken from “PCI Express System Architecture”


Non-posted read
transactions

v 1.0 Image taken from “PCI Express System Architecture”


Non-Posted write
transactions

v 1.0 Image taken from “PCI Express System Architecture”


Posted Memory Write
transactions

v 1.0 Image taken from “PCI Express System Architecture”


Posted Message
transactions

v 1.0 Image taken from “PCI Express System Architecture”


PCIe Device Layers
• 3 layer protocol
• Each layer split into
TX and RX parts
• Ensures reliable data
transmission between
devices

v 1.0 Image taken from “PCI Express System Architecture”


Physical Layer
• Contains all the necessary digital and analog
circuits
• Link initialization and training
o Link width
o Link data rate
o Lane reversal
o Polarity inversion
o Bit lock per lane
o Symbol lock per lane
o Lane-to-lane deskew

v 1.0
Data Link layer
• Reliable transport of TLPs from one device to
another across the link
• It’s done by using DLL packets:
o TLP acknowledgement
o Flow control
o Power Management

v 1.0
Transaction layer
• It turns user application data or completion data
into PCIe transaction – TLP
• Header + Payload + ECRC
• used in FPGAs IPs

v 1.0 Image taken from “PCI Express System Architecture”


Flow control

v 1.0 Image taken from “PCI Express System Architecture”


Flow control – posted
transaction

v 1.0 Image taken from “PCI Express System Architecture”


Flow control – non-posted
transaction

v 1.0 Image taken from “PCI Express System Architecture”


Building transaction

v 1.0 Image taken from “PCI Express System Architecture”


v 1.0 Image taken from “PCI Express System Architecture”
Example

v 1.0
CPU MRd targeting an Endpoint

v 1.0 Image taken from “PCI Express System Architecture”


CPU MWr targeting Endpoint

• Transaction routing

v 1.0 Image taken from “PCI Express System Architecture”


Endpoint MRd targeting system memory

• Transaction routing

v 1.0 Image taken from “PCI Express System Architecture”


Packet constraints
• Maximum Payload Size (MPS)
o default 128 Bytes
o least denominator of all devices in the tree

• Maximum Read Request Size (MRRS)


o Defined by RC

• Maximum Payload/ Read req. size 4 kB


o defined by spec
o No 4kB boundary crossing allowed

• Example: Intel x58 : MPS=256B, MRRS=512B

v 1.0
HEADER description
• Little endian
• 3DW or 4DW ( Double Word – 4 bytes)

v 1.0 Image taken from “Introduction to PCI Express”


HEADER – base part
• Fmt – size of the header, is there payload ?
• Length – in DW
• EP – Poisoned
• TC – Traffic class
• TD – TLP digest – ECRC field
• Attr – status (success, aborted)

v 1.0 Image taken from “Introduction to PCI Express”


HEADER Memory Request
• TAG - Number of outstanding request
• Requester ID

v 1.0 Image taken from “Introduction to PCI Express”


HEADER Completion
• TAG - Number of outstanding request
• Requester ID

v 1.0 Image taken from “Introduction to PCI Express”


PCIe System Architecture
• Switches
o Extend interconnection possibilities
o DMA
o Performance improvement functions
o Non Transparent Bridging

• Extending distance
o Bus re-drivers
o Copper and optical cables

v 1.0
PCIe switches
• Non Transparent Bridging (NTB)
• Virtual Partitioning
• Multicasting
• DMA
• Failover

v 1.0 Image taken from IDT documentation


NTB + Virtual Partitioning

v 1.0
Cabling
• Copper cables
• Optical cables
• Cable re-drivers(repeaters)

http://www.alpenio.com/products/pciex4.html
Image taken from www.ioxos.ch

v 1.0
www.idt.com
PCIe with FPGAs
• Technology overview:
o Hard IP – Altera and Xilinx
o Soft IP – PLDA
o External PHY – Gennum PCIe to local bus bridge
• Vendor documents – app notes, ref designs,
Linux/Win device drivers
• Simulation – Endpoint/Root port

v 1.0
Xilinx Hard IP solution
• User backend protocol same for all devices
o Spartan – 6
o Virtex – 5
o Virtex – 6
o Virtex – 7

• Xilinx Local Link (LL) Protocol and ARM AXI


• For new designs: use AXI
• Most of the Xilinx PCIe app notes uses LL

v 1.0
Xilinx Hard IP interface
• External world: gt, clk, rst – (example x1 needs 7
wires)
• CLK/RST/Monitoring
• TLP TX if
• TLP RX if
• CFG if
• MSG/INT if

v 1.0
PCIe LL protocol
• TLP packets are mapped on 32/64/128 bit TRN buses

v 1.0
v 1.0
Xilinx simulation
RP <-> EP
• Gen1, x8, Scrambling disabled in CORE Gen

v 1.0
How to design with Xilinx
PCIe Hard IP
• Application notes
• Reference designs
• CORE Gen Programmable IO (PIO)
hardware/simulation examples

v 1.0
XAPP 1052
• Block DMA in Streaming mode
• No CplD transaction re-ordering

v 1.0
XAPP 1052
• GUI for Win(VisualBasic)
• GUI for Linux (Glade)
• Driver for Win/Linux

v 1.0
v 1.0
XAPP1052

v 1.0
XAPP1052 – performance
• Intel Nehalem 5540 platform
• Fedora 14, 2.35. PAE kernel
• Gen1, x4, PCIe LeCroy analyser
• DMA config
o Host configures (MWr) DMA engine – around 370 ns between 1DW writes
o Host checks DMA status: MRd (1DW) to CplD (1DW) response time – around 40 ns

• DMA operation:
o DMA MRd(1st) -> CplD response time around 2.76 µs
o DMA MRd(8th) -> CplD response time around 3.82 µs
o DMA MWr -> around 750-800 MB/s (Gen1,

v 1.0
XAPP 859
• Block DMA: Host <-> DDR2
• Jungo Win device driver
• C# GUI

v 1.0
XAPP 859

v 1.0
Xilinx V6 Connectivity Kit
• PCIe to XAUI
• PCIe to parallel loopback
• VirtualFIFO based on DDR3 (MIG, SODIMM)
• Northwest Logic User Backend IP – Packet (SG)
DMA

v 1.0
v 1.0
Xilinx S6 Connectivity Kit
• PCIe to 1 Gb Eth
• PCIe to parallel loopback
• VirtualFIFO based on DDR3 (MIG, Component)
• Northwest Logic User Backend – Packet (SG) DMA

v 1.0
TODO – put picture.

v 1.0
Altera Hard IP solution
• Target devices:
o Cyclone IV GX
o Arria I/II GX
o Stratix II/IV GX

• Similar to Xilinx in terms of user interface – TLP over


Avalon ST or User application with Avalon MM
o ST – streaming mode, for high performance designs
o MM – memory mapped, for SOPC builder, lower performance

• CvPCIe – FPGA reconfiguration over PCIe


o I/O and PCIe programmed faster than the rest of the core

v 1.0
Altera Megacore
Reference Designs
• Endpoint Reference Design
o PCIe High Performance Reference Design (AN456) – Chained DMA, uses
internal RAM, binary win driver
o PCIe to External Memory Reference Design (AN431) – Chained DMA, uses
DDR2/DDR3, binary win driver

• Root Port Reference Design


• SOPC PIO

• Chained DMA documentation


o also Linux device driver available

• BFM documentation
o Extensive simulation with Bus Functional Models

v 1.0
v 1.0
SOPC Based Design
• SOPC Builder Based
• Gen 1, x4
• DMA
• Sim and HW

v 1.0
AN431 – PCIe to DDR3

v 1.0
PLDA PCIe IPs
• XpressLite
o currently available at CERN
o Soft IP, Gen1 Endpoint only, x1/x2/x4
o Stratix GX, Stratix II GX, and Arria GX support
o No S4GX, C4GX and A2GX Hard IP support

• EZDMA2 Altera/Xilinx
o Support Hard IP inside Altera: Cyclone IV GX, Arria II GX, and Stratix IV GX
o Hard IP inside Xilinx: Virtex-5/6, Spartan-6
o Same user/DMA interface as XpressLite

• XpressRich – rich version


o Are you rich ?

• Northwest Logic ?

v 1.0
PLDA XpressLite
• Stratix GX, Stratix II GX, and Arria GX support only
o No S4GX, C4GX and A2GX Hard IP support

• Generated with JAVA GUI: Windows/Linux


• Synthesis: single VHDL/Verilog encrypted file
• ModelSim: pre-compiled lib (Win/Linux)
• Ncsim: protected lib (Linux)
• Testbench: RP emulation

• Device drivers, API, tools (C++ source available)

v 1.0
PLDA XpressLite
• Maximum 8 DMA channels with Scatter Gather
• Reference design:
o PCIe Lite – Endpoint only
o Single DMA engine – C2S(WR) + S2C(RD)
o Single target module – accepts WR/RD into SRAM/registers

v 1.0
External PCIe chips -
Gennum
• TLP interface with simple framing signalling
• FPGA serial programming
o FPGA can be reprogramed without affecting PCIe link

• GPIO interface/Interrupts
• IP (with DMA) provided for Altera and Xilinx
• Device drivers and Software DK provided
• Already used at CERN:
o Open source IP for Xilinx device developed by CERN group
o Wishbone
o SG DMA
o device driver
o More info www.ohwr.org

v 1.0
Gennum PHY + Spartan6
• http://www.ohwr.org/projects/spec/wiki
• Open source IP, SG DMA, device driver

v 1.0
More information
• Books:
o Introduction to PCI Express – CERN Library (hardcopy)
o PCI Express standards – CERN Library – CDS.CERN.CH
o PCI Express System Architecture – mindshare.com (ebook+ hardcopy)

v 1.0
[email protected]
• PCIe demos available on request
• IDT PCIe Switch dev. kit. coming soon
• Evaluating EZDMA2 for Xilinx.

v 1.0
Extras

v 1.0
XAPP1052 DMA Config
WR
• Host configures (MWr) DMA engine – around 370 ns between
1DW writes

v 1.0
XAPP 1052 DMA Config
RD
• MRd (1DW) to CplD (1DW) – around 40 ns

v 1.0
MRd to System Memory
• Intel Nehalem 5540 platform
• MRd(1st) -> CplD response time around 2.76 µs
• MRd(8th) -> CplD response time around 3.82 µs

v 1.0
v 1.0
XAPP 859 – Write

v 1.0
XAPP 859 – Read

v 1.0
Endpoint TB

v 1.0
Root Port TB

v 1.0
AN456 – Chained DMA

v 1.0
v 1.0
Endianness
• 0x12345678
• Big-Endian stores the MSB at the lowest memory address. Little-Endian
stores the LSB at the lowest memory address. The lowest memory
address of multi-byte data is considered the starting address of the
data. In Figure 1, the 32-bit hex value 0x12345678 is stored in memory
as follows for each Endian-architecture. The lowest memory address
is represented in the leftmost position, Byte 00.
• http://en.wikipedia.org/wiki/Endianness

v 1.0

You might also like