Peripheral Component Interface
Peripheral Component Interface
Component Interconnect)
Presented by
SHIJU N.P
Youzentech Technologies
1
1
Today topics
• Introduction
• PCI and PCI-X difference
• Introduction to PCIe
• Overview of PCIe topology
*Root complex
* switches
* bridges
2
Introduction
3
2
Data rate on each version
4 3
PCI BUS
TRANSFER
• The round arrow symbol shown on the AD
bus indicates that the tri‐stated bus is
undergoing a “turn‐around cycle” as
ownership of the signals changes.
• For address and date the same signal
is going to use .
5 4
PCI Bus Arbitration
• PCI devices today are almost all capable of
being bus‐master, they are able to do both
DMA and peer‐to‐peer transfers. In a
shared bus architecture like PCI, they have
to take turns on the bus, so a device that
wants to initiate transactions must first
request ownership of the bus from the bus
arbiter.
6 5
5
PCI Transaction
Models
1. Programed I/o
2. DMA
3. Peer to peer
8 7
Retry protocol
9 8
Error
Handling
• No hardware mechanism
to recover error
10 9
Signal Timing Problems
with the Parallel PCI Bus
Model beyond 66 MHz
11 10
Introducing • PCI‐X is backward compatible with PCI in both
hardware and software, but provides better
performance and higher efficiency. It uses the same
12 11
Transaction
• The bus is not going to the
wait state after the data
phase because target
device know exactly what
is the need of initiator
13 12
PCI-X Features
Split- Message
Transaction
Transaction signaled
attributes
Model interrupt
14 13
Split
transaction
• To help keep track of what each device is
doing, the device initiating the read is now
called the Requester, and the device fulfilling
the read request is called the Completer
• Avid wait the wait state and retry transaction
that in the PCI
• Bus utilization is achieved
15 14
Transaction Attributes
• PCI‐X also added another phase to the beginning of each transaction called
the Attribute Phase
• In this time slot the requester delivers information that can be used to help
improve the efficiency of transactions on the bus, such as the byte count for
this request and who the requester is (Bus ,Device ,Function number).
• In addition to those items, two new bits were added to help characterize
this transaction: the ʺNo Snoopʺ bit and the ʺRelaxed Orderingʺ bit.
16 15
Problems with the
Common Clock Approach
of PCI and PCI-X
• Signal skew
• Clock skew
• Time period and flight time
relation
17 16
PCI-X 2.0 Source-Synchronous Model
• The term “source synchronous” means that
the device transmitting the data also
provides another signal that travels the
same basic path as the data. That signal in
PCI‐X 2.0 is called a “strobe” and is used by
the receiver for latching the incoming data
bits.
• Here we can avoid the flight time and the
clock skew
18 17
Introduction to PCIe
19 18
Terms and
Acronyms
Transaction
For RC
1-Downstream port or egress port
For switch 1
2-upstream port or ingress port
3-Downstream port or egress port
2
3
20
Dual simplex link
• PCIe uses a bidirectional connection and is capable of
sending and receiving information at the same time
• The term for this path between the devices is a Link, and is
made up of one or more transmit and receive pairs. One
such pair is called a Lane, and the spec allows a Link to be
made up 1, 2, 4, 8, 12, 16, or 32 Lanes.
• The number of lanes is called the Link Width and is
represented as x1, x2, x4, x8, x16, and x32
• The serial communication is avoid the clock skew , flight
time issue even if the time period is also too small
because the clock is aligned with the data stream
• The signal skew also Eliminated by the reason of 1
bit transfer .
21
19
Differential Signals
• The PLL is done the job of taking the original clock that we are used at
the transmitting side by the Reference clock that is generated
internally
• PLL is continuously adjust the clock using the phase comparison
method
• No transaction is coming this may leads to the wrong frequency state
because the clock is only available when the data stream are there
23
21
PCIe Topology
• The link is point to point connection rather
than the shared that we are used in the PCI
and PCI-x
• Flexible topology option is achieved by
using bridge and switch
• The forward bridge and reverse bridge are
the two type of bridges by these older card
can plug into the PCIe and new card plug
into the PCI and the PCIX
• Newer PCIe devices are MIMO Supported
24 22
Root complex
and switch
• This bus is internal to the Root, its actual logical
design doesn’t have to conform to any standard
and can be vendor specific.
• This software are simpler to the PCI so the
compatibility is maintained .
• Internal bus0 is used for the inside software
connection of the root complex
• the process by which configuration software
discovers the system topology and assigns bus
numbers and system resources, works the same
way, too.
25 23
Low cost
PCI System
• Its consumer desktop machine intel
processor
• This is often called the "Uncore"
processor
• Here the memory controller and some
routing logic has been integrated into
the CPU package.
• Root must also inside the CPU package
26 24
Server PCIe System
27 25
Introduction to
Device Layers
28 26
Why We
need layer
partition ?
29 27
To Achieve
30 28
PCIe Layers
• The Device core will act as the end point, switch
and root core also with different functionality for
each case
• Transaction layer is responsible for Transaction
Layer Packet (TLP) creation on the transmit side
and TLP decoding on the receive side. Some other
duties of this layer are Quality of Service
functionality, Flow Control functionality and
Transaction Ordering functionality.
• Date link layer is responsible for Data Link Layer
Packet (DLLP) creation on the transmit side and
decoding on the receive side. This layer is also
responsible for Link error detection and correction
• This layer is responsible for Ordered‐Set packet
creation on the transmit side and Ordered‐Set
packet decoding on the receive side. This layer
processes all three types of packets (TLPs, DLLPs
and Ordered‐Sets) to be transmitted on the Link
and processes all types of packets received from
the Link. 31 29
Switch Port
Layer
• Each port of the switch is need to follow the PCL layers
architecture because the action need to perform by the
component is followed by the packet information
• Here the rooting information is inside the packet and its
done by the transaction layer
32 30
Detailed Block Diagram of PCI Express
Device’s Layers
33
31
Non posted
read
• The requester include the
ID of its on in the request
packet for the completer
understanding
purpose(device ID)
• Tag is different for
different request.
34 32
Locked
reads
• The processor can only
permit to do the read
• Avoid the race condition
on the bus with the use of
semaphore
35 33
Flow Control
• The completer is need to
update the buffer size
continuously
36 34
Ack/Nak Protocol
• Data Link Layer Replay
Mechanism
• The replay buffer will send the
packets when the error is
reported.
37 35
Configuration
Overview
• This includes the space in which a Function’s
configuration registers are implemented, how a Function
is discovered, how configuration transactions are
generated and routed, the difference between PCI‐
compatible configuration space and PCIe extended
configuration space, and how software differentiates
between an Endpoint and a Bridge
38 36
WHAT IS
BUS ,DEVICE AND
FUNCTION
39 37
BDF
• Every PCIe Function is uniquely identified by the Device it resides within and the Bus to
which the Device connects. This unique identifier is commonly referred to as a ‘BDF’.
Configuration software is responsible for detecting every Bus, Device and Function (BDF)
within a given topology.
40 38
The System
of BDF
41 40
Specialties of BDF
42 41
Configuration Address Space
• The configuration file is shipped into the card (plug and play)
• Standardized configuration registers that permit generic shrink‐wrapped OSs to manage
virtually all system resources.
• PCI Contain 256 Bytes of configuration space.
• When PCIe was introduced, there was not enough room in the original 256‐byte
configuration region to contain all the new capability structures needed. So the size of
configuration space was expanded from 256 bytes per function to 4KB, called the
Extended Configuration Space.
43 42
PCI configuration register space
44 43
4KB Configuration
Space per PCI Express
Function
• The extended space was only accessible
through the enhanced configuration
mechanism
• 4 KB for each function
45 44
Generating Configuration Transactions
• Processors are generally unable to perform configuration read and write requests directly
because they can only generate memory and IO requests. That means the Root Complex
will need to translate certain of those accesses into configuration requests in support of
this process. Configuration space can be accessed using either of two mechanisms
46 45
Single Host
System or
single Root
system
47 46
Multi Root
System
• One Bridge is active at the time of
enumeration while other is passive
• Multi thread problem are solved by
memory mapping .
48 47
Configuration Requests
Two request
types, Type 0 Type 0 ----> Target Bus matches
or Type 1,
may be bridges secondary bus number
generated by
bridges in
response to a
configuration Type 1 ------> Not match with
access
the secondary bus number
49 48
Field indication
50 49
Configuration Read access
51
50
Enumeration - Discovering the Topology
52
Single‐Root System
53
Multi root enumeration
54
Address Space and Transaction Routing
• All Devices have the internal registers and storage location.
• That must be accessible from the outside world
• Accessible is possible only if we addressed those locations
• In order to make this work, these internal locations need to be assigned addresses from
one of the address spaces supported in the system.
• PCI Express supports the exact same three address spaces that were supported in PCI
1) Configuration
2) Memory
3) IO
55
Memory and IO
maps
• Not exclusive to end points
• The switches and root complex
has also have the Device specific
registers accessed via MMIO and
IO addresses
56
Prefetchable vs Non-prefetchable Memory
Space
• In prefetchable the cache process will take. Some additional data also
send back by the responder with respect to the request it will help for
the near future purpose and no date chance will occur when this
process is take place
57
Base Address
Registers
(BARs)
• According to the requirement of the
device The address space is
configured by the BARs register in
the header space
• Lower bits are configured by the
designer and upper bits are
configured through the software
• All BARs is executed sequentially
and capable of changing their size
also
58
BAR Location in Devices
59
Base and Limit Registers
• Register used in the Type 1 request to identify what are the range of
addresses that are in the downstream side.
• Three sets of registers are needed because there can be three
separate address ranges living below a bridge.
60
Setting the base
and limit value
• BAR are configured with
respect to the three
address spaces
61
Prefetchable
Range (P-
MMIO) in the
BAR register
62
TLP
Routing
• PCIe Links are point‐to‐point, more routing will be
needed to deliver transactions between devices. The
multiport PCIe Devices is taking the duty of routing
the packet. if the ingress packet has no error then
1) Accept the traffic and use it internally
2) Forward the traffic to the appropriate outbound
(egress) port
3) Reject the traffic because it is neither the intended
target, nor an interface to it (Note that there are
other reasons why traffic may be rejected)
63
Three methods of TLP Routing
1) TLPs can be routed based on address (either memory or IO)
2) Based on ID (meaning Bus, Device, Function number)
3) Routed implicitly
Messages are the only TLP type that support more than one routing
method. Most of the message TLPs defined in the PCI Express spec use
implicit routing, however, the vendor‐defined messages could use
address routing or ID routing if desired. Messages are introduced in the
PCIe .It uses in band packets as a message
64
Header Fields
• It defines the packet format and type
• This packet format and type will defines the rest of the format as well
as the routing method related to that packet.
• Each TLP has different set of values in the format and type field
• The encoding will take place according to this values
65
66
Applying Routing Mechanisms
• Once the system addresses have been configured and transactions
are enabled, devices examine incoming TLPs and use the
corresponding configuration fields to route the packet.
1) ID routing
2) Address Routing
3) Implicit Routing
67
ID Routing
• The routing is done by the function of BDF
• Eight bits have given to the BUS number.
• Five bits for the Device
• Three bits for the function number
• Both 3DW and 4 DW are possible
68
Check by endpoint and switches
69
Address
Routing
• The Type field indicates address routing is to be
used for a TLP, then the Address Fields in the
header are used to perform the routing check.
These can be 32‐bit addresses or 64‐bit
addresses.
• The endpoint address check is done with the
help of the BARs register.
• The endpoint has only one bus so type 0 header
will be taken into consideration.
70
Implicit Routing
71
In PCI Express, high‐level transactions
originate in the device core of the trans‐
TRANSACTI mitting device and terminate at the core of
the receiving device. The Transaction Layer
ON LAYER acts on these requests to assemble outbound
TLPs in the Transmitter and interpret them at
the Receiver
72
Packet Based
protocol
• Here parallel buses are used with no
control signal on the bus
• We can't find what happening in the
link at a given time
• Use of bit stream with expected size
and recognizable format for receiver
understanding
• Packet format will different for
different type of action performed
73
Advantages
74
TLP Assembly
And Disassembly
• According with the request the header will
generate and append it with the packet then
Digest will create combine this two .
• Verify the receiver ability before it goes to
the data link layer
• Sequence number will added in this layer and
with respect to this two the CRC will added
again
• Thee entire decoding is done in the receiver side
75
Header format
76
Field summary
77
Transaction Descriptor Fields
• As transactions move between requester and completer, it’s
necessary to uniquely identify a transaction, since many split
transactions may be queued up from the same Requester at any
instant
• It's not in a single field collectively key transaction attributes are
1) Transaction ID
2) Traffic class
3) Transaction Attributes
There are some additional rule is also there with the data payload
78
Specific TLP Formats:
1) IO Request
2) Memory Request
3) Configuration Request
4) Message Request
5) Completion
79
Memory
Request
• Different for 3DW and 4
DW
80
Completion TLP
• Completion With data and without
data is there.
• Lowest address is used for the aligned
the address consider with the first
requested data
• Completion TLP also different
for different requests.
81
Message request
• It replace many of signals
• Most of the fields are not in use because
the routing is different from other requests
• 4 DW Header is used
• Depend on the signals different message
code field is used
• Each of the messages are associated with
some Rules
• Eg: Power Management Messages don’t
have a data payload, so the Length field is
reserved.
82
FLOW CONTROL
83
Flow control overview
• Improve Transmission efficiency with multiple VCs
• PCIe support up to 8 VC
• Credit based mechanism is used
84
Buffer
organization
• Flow control logic is the
combination of two layers
• Handlining differently with
different requests
• Header credit and data
credits are reported as
separately
85
Flow control
initialization
• This is done by the DLCMSM
known as the link training
• When the link is ready with
the assertion of linkup signal
the FSM goes to the two
state control
• Violation also be treated as
an error
86
Flow Control Elements
87
Quality of services
• Mechanisms that support Quality of Service and describes the means
of controlling the timing and bandwidth of different packets traversing
the fabric. These mechanisms include application‐specific software
that assigns a priority value to every packet, and optional hardware
that must be built into each device to enable managing transaction
priority.That level of support is called “isochronous” service(Equal
time)
88
Traffic class and virtual channels
• Traffic class is denoted as TC and it will give the priority to the each
packet. It differentiate the packet and located in the header field with
3 bit wide
• Virtual channel is denoted as VC and it will act as a buffer and store
the outgoing packets.VC0 to VC7 combination is there
• Each TC is mapped to a VC for achieve this operation(TC-VC Mapping).
• The VC id and number of VC to be used by the device has to be
configured before start to use it.
89
VC Arbitration
• If a device has more than one VC and they all have a
packet ready to send, VC arbitration determines the
order of packet transmission.
• Three types of VC arbitration is there
1) Strict arbitration
2) Group arbitration
3) Hardware arbitration
90
Strict priority
It will take action when device
have multiple VCs and all are
comes together at the same
time
91
Group Arbitration
• Register 1 for setting the lowest priority
and Register 2 for the highest priority
• The Register 2 will work as the strict
priority manner.
• Register 1 will work in among two of
the method
1) Hardware Based Fixed Arbitration
2) Weighted Round Robin Arbitration
(WRR)
92
Port Arbitration
The port has to be decide which packet will goes to
the VC if the port have multiple VCs
1) Asynchronous (By default)
2) Isochronous
The vendor specific root complex register block will
serve the duty of arbitration for the memory
93
Port Arbitration
Buffering
• The buffer are implemented
between the ingress and the
egress port the routing will
take according to the TC/VC
Mapping
• The port arbitration is same
as VC arbitration the software
is take the role over here with
the different schemes they
are
• Hardware and WRRA.
94
Switch arbitration
• If mentioned traffic class is not
there is will treated as error
• VC arbiter select only sufficient
credit are available in the receiver
95
Transaction Ordering
• This is need because the situation may arrive with different packet
have the same TC which is came to the VC
• Transaction with different TC don not need this ordering mechanism
• The ordering mechanism is take into consideration based om some
features
96
Types of ordering
1) Strong ordering
2) Week ordering
3) Relaxed ordering
97
Ordering Rules
98
Relaxed Ordering
• Switches and root complex will takes place the action of ordering when the RO bit is sees as
set in the packet . This bit is controlled by the software
99
ID Based
Ordering
(IDO)
• TLP stream(packet from
the same requester) have
no connection to the
other stream
100
DATA LINK LAYER
101
• Data Link layer packet DLLPs are used to
support Ack/Nak protocol, power
Roles management, flow control mechanism and
can even be used for vendor‐defined
purposes.
102
Compare over TLPs
• Immediately processed at the receiver
• No acknowledgement protocol instead of that recovery mechanism is
here
• The packet arrive at periodically new one will arrive only after the
missing will updated
• There Is no information carried by the DLLPs
103
Generic
format
• 4 Type of packets are
there
1) ACK/NAK
2) Power management
3) Flow control
4) Vendor specific
104
Ack / Nak DLLPs format
105
Elements of the Ack/Nak Protocol
106
Transmitter elements
1) NEXT_TRANSMIT_SEQ Counter : 12 bit wide counter that generate
the sequence number and assign this to the each incoming TLPs.
2) LCRC Generator : Generate the CRC and with respect to the header
and data
3) Replay Buffer :Store the sequence number and LCRC for the
retransmission purpose if any error occur at the receiver end
4) REPLAY_TIMER Count : watchdog timer to ensure that ack or nak
packet is received for each packet transmission
5) REPLAY_NUM Count : To track the number of replay attempt
107
Transmitter elements
6) ACKD_SEQ Register : Compare the current received TLP with the old
one to make the forward progress. Later term is used instead of less
than
7) DLLP CRC Check : Error check and report is take place . Not capable
to take the recovery action
108
Receiver
Elements
• NRS Means the expected
sequence number and
latency timer used for
making some control over
the timing of ack or nak
109
Ack/Nak
Examples
110
Ack with
Sequence
Number
Rollover
111
Transmitter’s
Response to a Nak
112
Physical Layer
113
Subdivision
• Both the blocks have the separate transmitter as well as the
receiver
• The contents of the layers are conceptual and don’t define precise
logic blocks
• User can make according their need
• Compactable with older versions
• Gen1 ‐ the first generation of PCIe (rev 1.x) operating at 2.5 GT/s
• Gen2 ‐ the second generation (rev 2.x) operating at 5.0 GT/s
• Gen3 ‐ the third generation (rev 3.x) operating at 8.0 GT/s
114
Transmit Logic Overview
• Buffer is used for making the delay for adding new set of
bits to the characters
• Byte striping is divided the characters for each lane
• Scrambler is used for the electrical resonance on the link
• Ath the bottom these bits are converted to serial to parallel
for the transmission
115
Receiver
116
Mux and Control Logic
• Tx buffer is containing the incoming data
• Control token are used for identification of the boundary
of the packets
• Ordered set are used for maintaining the link operation
control characters are
• Logical idle are used to lock the PLL to the transmitter
frequency
117
Byte Striping
• Striping means that each consecutive outbound character in a
character stream is routed onto consecutive Lanes. The number of
Lanes used is configured during the Link training process based on
what is supported by both devices that share the Link.
• Packet format rule is there according with the link size we are goining
to use
• This size is put at the time of link training
118
x4 Format Rules
• STP and SDP characters are always sent on Lane 0.
• END and EDB characters are always sent on Lane 3.
• When an ordered set such as the SKIP is sent, it must appear on all
lanes simultaneously.
• When Logical Idles are transmitted, they must be sent on all lanes
simultaneously.
• Any violation of these rules may be reported as a Receiver Error to
the Data Link Layer.
• Any violation of these rules may be reported as a Receiver Error to the
Data Link Layer
119
x4 Packet
Format
120
Scrambler
• To avoid the repetitive patterns by achieving wide
range of frequency .
• It will avoid the cross talk between the adjacent lanes
• This can be done by the use of scrambler algorithm
• The disable option is also there in the scrambler for the
test purpose use.
• This scrambler is also need to meet some rules and
criteria at the time of implementation
121
8b/10b Encoding
122
Encoding Procedure
123
Control Characters
124
Receive Logic
Details
• The receiver work are just
opposite the work that we are
done in the transmitter
• The clock recovery and de-skew
process are taken care of the
receiver
• Code violation and disparity
checking, byte un-striping all are
take care of the receiver side
125
New Encoding Model in Gen3
126
Ordered Set Blocks
127
Data Block Frame
Construction
• SDP
• DLLP
• IDLA
These 3 are added immediately after the 2 bit syn
header
• EDS
• EDB
These two are the end of the data block
128
Framing
tokens
• The TLP and DLLP have
some requirement in the
both transmitter as well as
in the receiver side also
• If this is not meet by any
of the side it will leads to
the framing error
• Recovery option is there
also for the framing error
in the gen3
129
Gen3 Physical
Layer Transmit
Logic
• The blocks are remain
same as the gen1 and
gen2 but the format of the
packet has some changes
for achieving this the
block is making some
changes in the working
eg) The 2 bit sync header is
added by the mux logic
130
Byte Striping x8
Example
131
Ordered
set
• The sync header is
changing to new state for
the identification of the
packet
• The SOS need to be follow
some rule in the
transmitter and receiver
side
132
Gen3 Physical
Layer Receive
Logic
• The receiver logic will
need to take care of the
things that are extra taken
by the transmitter side
• Need to deal the sync
header as the separate
thing(it's not the part of
the date)
133
Link Initialization & Training
• Link initialization and training is a hardware‐based (not software)
process con‐ trolled by the Physical Layer. The process configures and
initializes a device’s link and port so that normal packet traffic
proceeds on the link.
Some of the things are done at the time of link initialization and traning
They are
1. Bit lock
2. Symbol lock
3. Block lock
134
Process at 1. Link width
the link 2. Lane reversal
initialization 3. Polarity inversion
4. Link data rate
time 5. Lane to lane de skew
135
TS1 and TS2 Ordered Sets
• The link training involves the exchanging of
the ordered set of data these are include
training sequence 1 and training sequence
2
• Its symbol format are different for each
version of the spec
• Each symbol are assigned for some specific
function
• The symbol number functions are also
different for ts1 and ts2 also
136
Link Training and Status State Machine
(LTSSM)
• The LTSSM consists of 11 top‐level states: Detect, Polling,
Configuration, Recovery, L0, L0s, L1, L2, Hot Reset, Loopback, and
Disable. These can be grouped into five categories:
1. Link Training states
2. Re‐Training (Recovery) state
3. Software driven Power Management states
4. Active‐State Power Management (ASPM) states
5. Other states
137
LTSSM FSM
• Receiver detection in the detect state
• Re training is also a kind of recovery
state. It take place when the link
changes the mode of operation in any
of the section in the link
• Here receiver accomplish some of the
work that are mandatory for the
communication eg) bit lock
Some of the state are assigned for to
control the power states of the
communication
138
State and specification
• Detect : Electrically detect the receiver
• Polling : Receiver achieve bit lock, symbol lock with take the TS1 and
TS2
• Configuration : determines the parameters associated with the link
• L0 : normal state of the link which the TLP and DLLP exchanged
• Recovery : link is going to the restoring state
• L1: provides the greater power saving by trading off
• L2 : main power is turned off for achieving power save
139
State and specification
• Loopback : used for testing and validating and it is not need for the
normal operation
• Disable : disable the configurated link
• Hot reset : software reset the link by setting this bit
140
Detect
141
polling
142
Configuration State
143
Designing Devices with
Links that can be Merged
144
Link configuration
example
145
Confirming Link and
Lane Numbers
146
Lane reversal
• Here the Lane numbers of the Link
on the left match between the
Downstream and Upstream Port.
147
Faid lane
148
Faild lane problem solving
• The downstream port will decided
to work with the x2 link and lane 2
and 3 will treated as under the
second link and repeat the process
of negotiation for link
• The same time the link1 will
complete the process and ready for
jump to the L0 state
149
Detailed Configuration Substates
150
L0 State
• This is the normal, fully‐operational Link state, during which Logical
Idle, TLPs and DLLPs are exchanged between Link neighbors
• The next state after the L0 is the recovery if the link want to change
the speed
• If the link partner can also initiated to the recovery as well as the
electrical idle
151
Recovery State
152
Link equalization is employed to mitigate
these signal impairments and ensure reliable
and accurate data transfer. It involves applying
Link specific algorithms and techniques to the
Equalization received signal to compensate for the effects
of attenuation and ISI. By adjusting the signal
Overview characteristics, such as its voltage levels and
timing, link equalization helps to restore the
integrity of the transmitted data.
153