0% found this document useful (0 votes)

8 views

PCIe

The document provides a comprehensive overview of PCIe technology, detailing its evolution from older bus systems like ISA to the more efficient PCI and PCI-X standards. It explains key concepts such as data transfer models, bus arbitration, error handling, and the transition from parallel to serial communication protocols. Additionally, it highlights the inefficiencies of PCI systems and the mechanisms in place to manage data transfers and interrupts effectively.

Uploaded by

Atul Gaunskar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

PCIe

Uploaded by

Atul Gaunskar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 87

A comprehensive overview of PCIe technology

PCIe Notes
From Legacy Systems to High-
Speed Innovation: A Deep Dive
into PCIe Architecture

Anoushka Tripathi
1 VLSI TECH WITH ANOUSHKA | PCIe Series

Background:
What is PCI?

• PCI (Peripheral Component Interface) was created in the early 1990s to fix problems
with older bus systems like ISA (Industry Standard Architecture).

• Back in the day, ISA worked well for older computers (like 286 machines), but it couldn’t
keep up with faster 32-bit computers.

• ISA had problems: it was slow, didn’t have modern features like plug-and-play, and
used big connectors with lots of pins.

• Companies tried to improve it with buses like IBM’s MCA (Micro-Channel

Architecture), EISA (Extended ISA), and VESA (Video Electronics Standards
Association), but none of them gained popularity.

How PCI Changed Everything:

• PCI was developed as an open standard by a group of companies called PCISIG (PCI
Special Interest Group).

• PCI’s advantages:

o Faster than older buses like ISA.

o Allowed software to easily manage device memory and resources.

o Simple plug-and-play setup.

o Became the standard bus for connecting peripherals in computers.

PCI-X: The Next Step

• PCI-X (PCI-eXtended) was developed a few years later to improve PCI’s performance.

• The main goal was to keep PCI-X compatible with PCI, so old devices would still work.

• PCI-X 2.0 boosted speeds even more, reaching up to 4 GB/s.

The Limits of PCI-X:

• PCI-X was still a parallel bus, meaning all data was sent at the same time on multiple
wires, which eventually hit a speed ceiling.

• Parallel buses have a lot of limitations: high pin count, slower speeds, and issues when
trying to go faster.

• To fix this, the industry eventually moved away from parallel buses like PCI-X to serial
buses (like PCI Express).
2 VLSI TECH WITH ANOUSHKA | PCIe Series

Key Points About PCI:

• PCI buses allow multiple devices to be connected, but as clock speeds increase, the
number of devices that can share the bus decreases.

• With PCI-X 2.0, the bus became a point-to-point system, meaning only one device
could connect at a time for faster speeds.

PCI System Example:

• A typical PCI system included a North Bridge (which connected the processor to the
PCI bus) and a South Bridge (connecting PCI to other peripherals like USB or audio
devices).

PCI Bus Cycle (How It Works):

3 VLSI TECH WITH ANOUSHKA | PCIe Series

1. Request: The device wanting to send data signals to the bus (this is called Bus
Mastering).

2. Arbitration: The system decides which device gets to use the bus (handled by an
Arbiter).

3. Data Transfer: The device sends or receives data. Devices can insert a Wait State
(pausing the transaction) if they aren’t ready to transfer.

Reflected-Wave Signaling:
4 VLSI TECH WITH ANOUSHKA | PCIe Series

• PCI uses a trick called reflected-wave signaling to save power. Instead of fully driving a
signal, the device sends a weaker signal that bounces back and strengthens when it
reaches the other end. This helps reduce power use but also limits the number of
devices and the length of the bus.

• • Bus Capacity: PCI can theoretically support up to 32 devices on a single bus.

However, in reality, only about 10 to 12 devices can work at the standard speed of 33
MHz due to electrical limitations.
• • Weak Signal Transmission: The devices on the PCI bus use weak transmit
buffers. This means they don’t drive the signal (the electrical current carrying the
data) to the full voltage required to be detected by the receiver. Instead, they send the
signal with about half the needed voltage.
• • Wave Reflection: The signal travels down the wire (or trace) until it reaches the
end of the line, where there is no termination (meaning no place for the signal to be
absorbed). Because of this, the signal reflects back toward the transmitter.
• • Boost from Reflection: When the signal reflects back, it adds to the original signal,
boosting it to the full voltage. This increase happens as the reflected wave travels
back along the wire toward the transmitting device.
• • Stopping Further Reflections: Once the signal reaches the original device that
sent it, the transmitter's low output impedance absorbs the signal, preventing further
reflections.
• • Timing Constraints: The process of the signal traveling down the wire, reflecting
back, and reaching the receiver must happen quickly—specifically, within the time of
one clock cycle of the PCI bus. This is where timing is crucial.
• • Limitations: The longer the wire (trace) or the more devices connected to the bus,
the longer it takes for the signal to complete this round trip. As a result, the practical
limit of devices on a 33 MHz PCI bus is around 10 to 12. Each additional device, or
electrical load, slows down the process.
• • Connector Load: Each device counts as one load, but if you add a connector slot,
it counts as two loads. So, realistically, a PCI bus running at 33 MHz can only handle
4 or 5 card slots reliably.

what is meant by parallel, what is serial?

o Parallel protocol - Protocol where addr, data, other control information is sent at same edge of
clock i.e we are sending parallely.

o Serial protocol - Protocol where addr, data, other control information is sent at consecutive clock
edges with few bits sent at a time.

- Parallel protocol is using 32 bit interface.Serial protocol is using 1 bit data pin than parallel will
be 32 times more efficient than serial.
- Parallel protocols (AXI,AHB,APB) are limited by frequency of operation,

The PCI (Peripheral Component Interconnect) bus has three main data transfer models:
Programmed I/O (PIO), Direct Memory Access (DMA), and Peer-to-Peer. Here's a
simplified explanation of each, along with some related protocols:
5 VLSI TECH WITH ANOUSHKA | PCIe Series

1. Programmed I/O (PIO):

• In the early days, PIO was common because it was easy to implement.
• How it works: The processor (CPU) directly manages the data transfer. For example,
if a PCI device wants to send data to memory, the CPU:
o Reads the data from the PCI device into its internal registers.
o Writes that data from its registers to memory.
• Drawbacks:
o It generates two bus cycles (one for reading, one for writing), making it slow.
o The CPU is busy managing data instead of performing other tasks, making it
inefficient in modern systems.
• Why it's still used: Although inefficient, PIO is necessary for software to interact
with devices, but it’s rarely used for actual data transfers today.

2. Direct Memory Access (DMA):

• How it works: A separate device, called a DMA engine, handles data transfers,
freeing up the CPU. The CPU just sets the starting address and size of the data, and
the DMA engine does the rest.
• Benefits:
o The CPU can focus on other tasks while the DMA engine transfers data
directly between the device and memory.
o It only takes one bus cycle to move a block of data, which is much more
efficient than PIO.
• Over time, devices integrated DMA functionality, allowing them to perform Bus
Mastering, meaning they can control data transfers without needing external DMA
engines.

3. Peer-to-Peer:
6 VLSI TECH WITH ANOUSHKA | PCIe Series

• How it works: One PCI device (acting as a Bus Master) can transfer data directly to
another PCI device without involving the CPU.
• Benefits: This keeps the PCI bus busy without disturbing the rest of the system,
allowing for more efficient transfers.
• Drawbacks: It’s rarely used because devices often don’t use the same data format, so
the CPU usually needs to intervene to reformat the data.

PCI Bus Arbitration:

• Shared bus: Since many devices share the PCI bus, only one device can use it at a
time. Devices request access from a bus arbiter, which decides who gets the bus
next.
• Hidden arbitration: This arbitration happens in the background, without wasting
clock cycles.

PCI Retry and Disconnect Protocols:

• Retry: When a PCI Bus Master requests data from a device that isn’t ready yet, the
device can ask for a retry. This prevents the bus from being held up by a device that
can’t send data right away. The master has to wait and try again later.
• Disconnect: If a device can transfer some data but not all of it, it can disconnect the
transaction. This frees up the bus for other transfers until the device is ready to
continue the operation.

In short, PCI uses different methods to manage data transfers depending on the complexity
and needs of the system. DMA is preferred for efficiency, PIO is basic but inefficient, and
Peer-to-Peer is rarely used. The PCI bus has mechanisms like arbitration, retries, and
disconnects to keep data flowing smoothly without locking up resources.

PCI Inefficiencies

PCI Retry Protocol

The PCI Retry Protocol comes into play when a PCI master (such as a processor or North
Bridge) initiates a transaction with a target device (e.g., an Ethernet device), but the target is
not ready to complete the data transfer. In this case, the target signals a retry using the
STOP# signal.
7 VLSI TECH WITH ANOUSHKA | PCIe Series

• Mechanism:
o The PCI master begins the transaction by asserting control of the PCI bus.
o If the target device cannot provide the data immediately, it can insert wait-
states (brief pauses). If it requires more than 16 clock cycles, it asserts STOP#
to signal a retry.
o The PCI master then aborts the transaction and waits for at least two clock
cycles before re-arbitrating for control of the bus.
o During the retry, other devices can use the bus, improving efficiency by
preventing long wait periods.
• Efficiency Gains:
Retrying helps avoid holding the bus in an idle state, especially when the target needs
a significant amount of time to prepare the requested data. The master keeps retrying
the transaction until the target is ready and successfully transfers the data.

PCI Disconnect Protocol

The PCI Disconnect Protocol is used when the target device can transfer some, but not all,
of the requested data during a transaction.

• Mechanism:
o A PCI master initiates a transaction (e.g., a burst read from Ethernet).
o The target device transfers a portion of the data but then runs out of data to
send.
o If the target cannot provide more data within 8 clock cycles, it asserts the
STOP# signal to disconnect the transaction.
o The PCI master waits for two clock cycles and then re-arbitrates for the bus to
continue the transaction.
o The disconnect protocol allows some data to be transferred before the bus
cycle ends, unlike the retry protocol, which ends the transaction without any
data transfer.
8 VLSI TECH WITH ANOUSHKA | PCIe Series

• Efficiency Gains:
The disconnect protocol allows more efficient bus utilization since the bus can be
granted to other devices when the original master is waiting for additional data from
the target device.

Summary of Inefficiencies

Both the retry and disconnect protocols are essential to maintain PCI bus efficiency, but they
still introduce delays due to the need to repeatedly re-arbitrate and re-initiate transactions.
The inefficiency stems from:

1. Bus arbitration overhead: The master must continually re-compete for the bus,
adding delay.
2. Wait-state insertion: Wait-states can briefly stall transactions, though the
retry/disconnect mechanisms attempt to limit this.
3. Limited data transfer per cycle: In cases of retries or disconnects, the PCI master
may not transfer data immediately, requiring multiple attempts, which reduces overall
bandwidth efficiency.

PCI Interrupt Handling

In PCI (Peripheral Component Interconnect) systems, interrupt handling is achieved via four
sideband signals: INTA#, INTB#, INTC#, and INTD#. These signals are used by PCI
devices to notify the system of an interrupt request. Here's how the interrupt handling process
works:

1. Interrupt Assertion: When a PCI device needs to request an interrupt, it asserts one
of these signals. In a single-CPU system, this causes the system's interrupt controller
to assert the INTR (Interrupt Request) pin to notify the CPU of the interrupt.
2. Interrupt Processing in Single-CPU Systems:
o The interrupt controller sends the signal to the CPU via the INTR pin.
o The CPU, upon receiving this signal, must identify the source of the interrupt
by querying the devices or the controller, which takes several bus cycles. This
method is slower and less efficient, especially in systems with multiple
devices.
3. Multi-CPU Systems and APIC:
o In multi-CPU systems, handling interrupts became more complex as a single
INTR pin would not suffice. To manage this, the APIC (Advanced
Programmable Interrupt Controller) was introduced.
o The APIC model improves the interrupt handling by using a messaging
mechanism to communicate with multiple CPUs. Instead of relying on a single
INTR pin, the interrupt controller sends messages to the relevant CPUs,
allowing for more efficient interrupt handling in multi-CPU environments.
4. Legacy Interrupt Handling:
o In legacy PCI systems, identifying the source of an interrupt required multiple
bus cycles, making it inefficient.
o The APIC model significantly reduces the overhead of interrupt handling by
streamlining the process and avoiding the delays associated with querying
devices directly via the bus.
9 VLSI TECH WITH ANOUSHKA | PCIe Series

PCI Error Handling

PCI error detection mechanisms involve monitoring transactions for parity errors during
address and data phases. These errors are identified through the use of the PAR (parity)
signal, which ensures even parity across most signals.

1. Data Phase Parity Errors:

o During the data phase, if a PCI device detects a parity error (i.e., the number
of set bits is not even), it asserts the PERR# (parity error) signal.
o Data phase errors are typically recoverable because a simple transaction retry
can often resolve the issue (e.g., in the case of a memory read). However,
recovery is software-driven, as PCI lacks automatic hardware-based error
recovery mechanisms.
2. Address Phase Parity Errors:
o If a parity error occurs during the address phase, the error is more severe. In
this case, the target device may have been incorrectly addressed, making it
impossible to know which device responded to the corrupted address.
o This results in the assertion of the SERR# (system error) pin, which triggers a
system error handler. In older systems, such errors could result in the system
halting with a "blue screen of death" as a precaution.
3. System Error Handling:
o In older machines, both PERR# and SERR# were routed through the South
Bridge, which would assert a NMI (Non-Maskable Interrupt) signal to the
CPU in response to these errors, often leading to a system halt.
10 VLSI TECH WITH ANOUSHKA | PCIe Series

PCI Address Space Map

The PCI architecture supports three types of address spaces:

1. Memory Address Space:

o PCI devices map into the processor's memory address space, supporting either
32-bit or 64-bit memory addressing. The CPU can access this memory space
directly.
2. I/O Address Space:
o PCI also supports I/O address space, with 32-bit address support. However, on
x86 platforms, I/O space is often limited to 64KB (due to the 16-bit addressing
limitation of x86 CPUs).
3. Configuration Space:
o PCI introduces a third address space called configuration space, which allows
the CPU to manage and configure PCI devices. Each PCI device has up to 256
bytes of configuration space, and the CPU accesses this space indirectly
through I/O ports.
o In legacy systems, the CPU accessed configuration space by using the
Configuration Address Port (address CF8h–CFBh) and the Configuration
Data Port (address CFCh–CFFh). In modern PCI Express systems,
configuration space can be mapped directly into the memory address space,
streamlining access.

The total configuration space available in a system is calculated as:

11 VLSI TECH WITH ANOUSHKA | PCIe Series

Total configuration space=256 Bytes/function×8 functions/device×32 devices/bus×256 buses

/system=16 MB\text{Total configuration space} = 256 \text{ Bytes/function} \times 8 \text{
functions/device} \times 32 \text{ devices/bus} \times 256 \text{ buses/system} = 16 \text{
MB}Total configuration space=256 Bytes/function×8 functions/device×32 devices/bus×256
buses/system=16 MB

In summary, PCI interrupt and error handling mechanisms are foundational to system
reliability, providing ways to detect, report, and address issues through both hardware signals
and software intervention.

Need for PCIe

For a Laptop, What is a peripheral?

Anything that is connected externally.

Earlier RS232, protocol was used

What was the problem with that?

➔ To connect a new device we need to restart the system.

➔ Limitations in terms of bandwidth
➔ There was a need to standardize peripheral connections
12 VLSI TECH WITH ANOUSHKA | PCIe Series

Introduction to PCI Express (PCIe)

PCI Express (PCIe) is a significant upgrade from the older PCI architecture, offering
improvements in performance and efficiency. Unlike its predecessor, PCI, which was a parallel
bus system, PCIe operates as a serial bus, more like InfiniBand or Fibre Channel. Although it’s
based on a serial model, PCIe is fully backward compatible with PCI software, making it easy
to integrate into existing systems.

What Makes PCIe Different?

PCIe adopts a dual-simplex connection model, which means that it has separate paths for
sending (transmit) and receiving (receive) data, allowing communication in both directions
simultaneously. This setup is technically full-duplex, but PCIe uses the term dual-simplex to
emphasize that each path is one-way.

Links and Lanes

The path between two PCIe devices is called a Link,

and it's composed of one or more pairs of transmit
and receive lines. Each pair is called a Lane. A Link
can have 1, 2, 4, 8, 12, 16, or 32 Lanes, referred to as
Link Width (x1, x2, x4, etc.). The more lanes there
are, the higher the bandwidth (or data transfer rate),
but this also increases costs, power consumption,
and space requirements.

Backward Compatibility with PCI Software

One of the main advantages of PCIe is its software backward compatibility with older PCI
systems. Even though the hardware architecture has changed, the address spaces for memory,
IO, and configuration in PCIe remain the same. This means that software written for PCI, like
BIOS code and device drivers, can still work seamlessly with PCIe.

Serial Transport in PCIe

The Need for Speed

In PCIe, serial communication is used instead of parallel. Serial communication may seem
slower because it transmits one bit at a time, but it achieves much higher speeds, enabling it to
meet or exceed the bandwidth of parallel buses like traditional PCI. PCIe can reach speeds like
2.5 GT/s (Gigatransfers per second), 5.0 GT/s, and 8.0 GT/s, bypassing the limitations faced by
parallel designs.
13 VLSI TECH WITH ANOUSHKA | PCIe Series

Overcoming Parallel Bus Problems

Three major problems in parallel buses are:

1. Flight Time: This is the delay caused by the time it takes a signal to travel from the
transmitter to the receiver. In parallel buses, signals must arrive before the next clock
cycle, but as the clock speeds increase, it becomes impractical to shorten the physical
length of the traces or reduce load.

2. Clock Skew: This happens when the clock signal arrives at the sender and receiver at
different times, complicating data synchronization.

3. Signal Skew: In parallel buses, all data bits should arrive together at the receiver.
However, signal skew causes the bits to arrive at different times, requiring the system to
wait for the slowest bit, which affects overall performance.

Serial Model Advantages

PCIe overcomes these problems by embedding the clock within the data stream, eliminating
the need for external clock signals. This solves:

• Flight Time: It no longer matters how long the signal takes to reach the receiver because
the clock arrives with the data.

• Clock Skew: There’s no external reference clock, so clock skew is irrelevant.

• Signal Skew: In serial communication, only one bit is transmitted at a time, eliminating
intra-lane skew. In multi-lane setups, any inter-lane skew can be corrected
automatically by the receiver.
14 VLSI TECH WITH ANOUSHKA | PCIe Series

Bandwidth Improvements

PCIe offers impressive bandwidth due to its high speed and ability to use multiple Lanes. For
example:

• Generation 1 (Gen1) has a bit rate of 2.5 GT/s, translating to 0.5 GB/s per Lane.

• Generation 2 (Gen2) doubles this to 1.0 GB/s per Lane.

• Generation 3 (Gen3) reaches 2.0 GB/s per Lane by using a more efficient encoding
method called 128b/130b encoding, which improves bandwidth without increasing the
clock speed as much.
15 VLSI TECH WITH ANOUSHKA | PCIe Series

When multiple Lanes are used (like x4 or x16 Links), the bandwidth multiplies accordingly,
making PCIe highly scalable.

Differential Signals

PCIe uses differential signaling, meaning each lane sends both a positive (D+) and negative (D−)
version of the same signal. This technique doubles the pin count but offers two key advantages:

1. Improved Noise Immunity: Since both signals (D+ and D−) travel closely together, any
noise affecting one affects the other equally. The receiver, which measures the
difference between the two signals, cancels out the noise, improving signal integrity.

2. Reduced Signal Voltage: Differential signaling operates with lower voltages, reducing
power consumption and allowing faster transmission speeds.

In PCIe (Peripheral Component Interconnect Express), a common clock is not needed because
PCIe operates using a source-synchronous model. In this model, the transmitter supplies the
16 VLSI TECH WITH ANOUSHKA | PCIe Series

clock to the receiver indirectly by embedding the clock signal into the data stream. This
eliminates the need for an external or forwarded clock.

How it Works:

• Embedded Clock: Instead of sending a separate clock signal, the clock is embedded
into the data stream using 8b/10b encoding (or other encoding mechanisms like
128b/130b for later PCIe generations). This encoding ensures regular transitions in the
data stream to help the receiver recover the clock.

• Clock Recovery: The Phase-Locked Loop (PLL) in the receiver recovers the clock from
the incoming data. The PLL takes the incoming bitstream as a reference and generates a
clock signal that matches the frequency of the transmitted data. It continually
compares the incoming data’s timing (phase) with its own generated clock and adjusts
until they match (this process is called locking).

• PLL Adjustments: Since factors like temperature or voltage fluctuations can affect the
transmitter’s clock, the PLL continuously fine-tunes the recovered clock to maintain
synchronization with the transmitter.

Why No Common Clock?

• In parallel systems, a common clock is needed for synchronizing data transfer, but in
high-speed serial links like PCIe, clock skew and transmission delays pose significant
problems. By embedding the clock in the data, these issues are avoided, and the need
for a common clock is eliminated.

Challenges & Solutions:

• Transition Density: For the PLL to function correctly, it needs regular transitions
(changes from 1 to 0 or 0 to 1) in the incoming bitstream to maintain phase comparison.
Without transitions, the PLL can lose synchronization. The 8b/10b encoding ensures no
more than 5 consecutive ones or zeroes appear in the data stream, preventing
synchronization loss.

Deserialization & Low Power Modes:

• After recovering the clock, the receiver uses it to latch (capture) the incoming data and
deserialize it (convert the serial data stream into parallel data).

• PCIe links also support low power states where data transmission stops. In these
states, the receiver can no longer rely on the incoming reference clock. Therefore, the
17 VLSI TECH WITH ANOUSHKA | PCIe Series

receiver needs its own internal clock to manage operations when the data stream is
inactive.

Packet-Based Protocol:

• PCIe uses a packet-based protocol to transfer data. Instead of using side-band control
signals to manage data types, as seen in parallel buses, PCIe sends data in structured
packets. The receiver identifies the packet boundaries and interprets the data based on
predefined structures.

This clock recovery and packet-based communication architecture significantly simplifies

the hardware design, reduces pin count, and ensures reliable, high-speed communication in
PCIe systems.

PCIe Topology Overview:

In a PCIe system, devices are connected in a simple tree structure. This means there are no
loops or complicated connections, which makes things easier for the system to manage. The
diagram mentioned shows a typical PCIe setup, with a CPU at the top, connected to various
other devices like memory, endpoints (e.g., graphics cards), and legacy devices.

1. CPU and Root Complex:

• CPU: The brain of the computer. In the PCIe world, the CPU is at the top of the hierarchy,
meaning it controls everything beneath it.

• Root Complex (RC): This is a collection of components that connect the CPU to the
PCIe devices. Think of it as a bridge between the CPU and the PCIe world. It’s the main
hub where the CPU communicates with PCIe devices. The Root Complex usually
includes interfaces for the processor, memory (like DRAM), and the PCIe bus itself.
18 VLSI TECH WITH ANOUSHKA | PCIe Series

2. Switches:

• Switch: This is like a traffic controller for PCIe. If you have multiple devices trying to
connect to the CPU through a single PCIe Port, a switch helps manage this. It decides
where to send the data based on the destination address and routes packets to the
correct device.

3. Bridges:

• Bridge: A bridge connects different types of buses. For example, if you have an older PCI
or PCI-X device, a bridge allows it to communicate with newer PCIe systems. There are
two types:

o Forward Bridge: Allows old PCI/PCI-X devices to connect to newer PCIe

systems.

o Reverse Bridge: Allows newer PCIe devices to work with older PCI systems.

4. Endpoints:

• Endpoints: These are the actual devices at the “end” of the PCIe tree, such as a
graphics card, network card, or SSD. Endpoints are the devices that send or receive
data, and they only have one port facing upward toward the Root Complex. Endpoints
are categorized into two types:

o Native PCIe Endpoints: Devices that were designed specifically for PCIe, like
modern SSDs or graphics cards. They communicate directly with the PCIe
system and use memory-mapped IO (MMIO).

o Legacy PCIe Endpoints: Older devices that originally worked with older buses
(like PCI-X) but were modified to work with PCIe. These devices may still use
older features that aren't used in newer PCIe designs, such as IO space or
locked requests.

Key Characteristics of the Topology:

• The system must be compatible with older PCI software and configuration schemes, so
even though PCIe is newer and faster, the topology and configuration remain somewhat
similar to older PCI systems.

• The tree structure helps keep things simple and ensures that devices can be easily
tracked and managed without complex loops.

In PCle, all terms are discussed w.r.t root complex.

- Upstream lane : A lane that carries the traffic(packets) towards the root complex.
- Downstream lane: A lane that carries the traffic(packets) away from the root complex.
- Upstream port : A port that is pointing towards to RC
- Downstream port : A port that is pointing away from RC
➔ Link consists of lanes in both transmit and receive, If transmit is upstream, receive will
be downstream
19 VLSI TECH WITH ANOUSHKA | PCIe Series

PCIe Day 4
Configuration
20 VLSI TECH WITH ANOUSHKA | PCIe Series

PCIe Gen 1,Gen 2,Gen 3 Throughput

Gen1 numbers

- Per Lane in 1 direction : 2.5GT/s *1 lane = 2.5Gb/s => 2.5 Gb/s * 1Byte/10bits =250MB/s
- Bidirectional => 250MB/s *2=500MB/s=0.5GB/s
- For 32 lanes => cumulative BW = 0.5 * 32 = 16GB/s

Gen2 numbers

- Per Lane : 5GT/s 1 lane = 5Gb/s => 5 Gb/s 8b/10b = 500MB/s

- Bidirectional => 500MB/s * 2 = 1000 MB/s = 1GB/s
- For 32 lanes => cumulative BW = 1.0* 32 = 32GB/s

Gen3 numbers

- Per Lane : 8GT/s * 1 lane = 8Gb/s => 8Gb/s * 128b/130b ~= 1000MB/s~=1GB/s

- Bidirectional => 1GB/s * 2 = 2000 MB/s = 2GB/s
- For 32 lanes => cumulative BW = 2 * 32 = 64GB/s

Gen 4 will give highest throughput of 128GB/s

Gen 5 will give highest throughput of 256GB/s

PCIe Configuration headers :

Configuration headers

o Root complex should know everything(what type of device, who is manufacturer etc) about
the device that is connected in to it.

➔ This entire information is implemented by means of configuration headers.

➔ Once the RC figures out what type of device is connected, it will get the required device
driver in to its memory
➔ RC will use these driver instructions to communicate with the device.

PCI configuration header = 256 bytes

PCIe configuration header = up to 4K bytes

- 256 bytes is compulsory

- 3K + 768 bytes (3.75K) configuration space is optional.

Memory mapped and IO mapped

PCI and PCI-x : All peripherals are IO mapped

➢ all txs that are done to these devices are done using IO writes,
➢ inefficient: each address needs to be addressed individual.

PCIe : all peripherals memory and configuration header is memory mapped

21 VLSI TECH WITH ANOUSHKA | PCIe Series

o all txs that are done to these devices is done using memory writes, memory reads

PCI devices implements configuration space using Configuration headers

➢ Configuration Space
- Same as in PCI

➢ Extended configuration space for devices.

- Added in PCle

Configuration space registers are mapped to memory locations. Device drivers and
diagnostic software must have access to the configuration space, and operating
systems typically use APIs to allow access to device configuration space.
What is Configuration Address Space?

In the early days of computers, when you installed a new device (like a sound card or
network card), you had to manually set switches and jumpers to tell the computer how to
use it. This was like putting together a puzzle without clear instructions, and it often led to
conflicts where two devices tried to use the same memory, I/O ports, or interrupts (signals
that tell the CPU to do something). It was pretty complicated!

Later, systems got smarter with plug-and-play technology, which made it easier for the
computer to figure out how to use new devices automatically. But things really improved
with the PCI system. PCI introduced a new way to automatically manage all the resources
(like memory and I/O) that each device needed, without conflicts, thanks to Configuration
Address Space.

What is Configuration Address Space in PCIe?

When we talk about Configuration Address Space, we’re referring to a special block of
memory that PCIe reserves for each Function of a device (remember, a function is just a
specific job a device can do, like handling audio or controlling your mouse).
This space holds important registers (like small rooms storing information) that allow
the computer to:
1. Detect the function,
2. Configure it to work correctly, and
3. Check its status (whether it’s working fine or if there’s a problem).
Each device function gets its own dedicated space to prevent overlap with other
devices, meaning no conflicts!
Breaking it Down Further:
1. Old Systems vs. PCI:
o In the past, you had to manually assign resources to each device, and it often led
to conflicts because two devices might try to use the same resources (like two
drivers fighting over one parking spot).
o With PCI, the computer can automatically manage these resources using
standardized configuration registers (special settings stored in Configuration
Address Space), so it knows exactly what resources each device needs.
2. Standard Configuration:
o PCIe extends this concept by making it easy for the computer’s software to
handle things like error reporting, interrupts, and address mapping (telling the
22 VLSI TECH WITH ANOUSHKA | PCIe Series

system where each device lives in memory). This ensures smooth

communication between the device and the computer.
o Every function on a PCIe device gets its own Configuration Address Space,
which includes basic information (like an ID card) and optional features (like
special abilities) that the system can use to control the device.
PCI-Compatible Space (The First 256 Bytes):

• When PCI was first designed, every function got 256 bytes of configuration space. These
256 bytes were enough back then because devices didn’t need a lot of extra features.
• The first part of this space, called the configuration header (the first 64 bytes), is used
to set up the basic functionality of the device. There are two types of headers:
o Type 0 header: For most devices,
o Type 1 header: For bridge devices (which connect buses).
• The remaining part of the 256 bytes is used for optional features (like adding new
capabilities to the device, such as power management or hot-plugging support).
Extended Configuration Space (When 256 Bytes Isn't Enough):
• As PCIe evolved, new devices needed more capabilities, and the original 256 bytes
wasn’t enough anymore. So, PCIe introduced the Extended Configuration Space,
which is 4KB (or 4096 bytes) per function.
• This new space allows PCIe devices to include extra registers that give them more
powerful and flexible features, like advanced error reporting or extra power management
options. However, these extended features can only be accessed by newer software
that knows how to use them. Older systems won’t be able to see or use the extended
space, but they can still use the basic 256 bytes.
23 VLSI TECH WITH ANOUSHKA | PCIe Series

In Summary:
• Configuration Address Space is a special area of memory where each PCIe function
gets its own set of settings (called registers) that help the computer detect, configure,
and manage the device.
• Originally, each function had 256 bytes of space, but modern devices needed more
room, so PCIe expanded this to 4KB per function in the form of Extended Configuration
Space.
• This space helps avoid conflicts and allows modern computers to automatically
manage complex devices without needing manual setup.
So, PCIe's Configuration Address Space is like a "control panel" that tells the system
how to manage all the devices connected to it, making sure everything works smoothly
and without conflicts.

1. Bus:

• PCIe supports up to 256 Bus Numbers (0-255), which are assigned by configuration
software. The first bus, typically Bus 0, is associated with the Root Complex, and it
includes Virtual PCI buses and bridges (P2P bridges). These bridges can extend the bus
hierarchy, allowing additional PCIe devices to connect.

• Bus numbers are assigned through a process called depth-first search, where
configuration software starts at Bus 0 and assigns unique bus numbers to each bus it
finds.

2. Device:
24 VLSI TECH WITH ANOUSHKA | PCIe Series

• Each PCIe bus can support up to 32 devices (Device 0 to Device 31). However, due to
the point-to-point nature of PCIe, only one device is directly connected to a PCIe link,
and this device will typically have the Device Number 0.

• Devices may reside on virtual PCI buses, like those in the Root Complex or PCIe
switches, which can support multiple attached devices. Each device is expected to
implement Function 0 and may support up to eight functions (Function 0 to Function 7).

• A device that supports multiple functions is called a multi-function device.

3. Function:

• Each Device can implement up to eight functions. Functions represent specific

hardware interfaces or controllers (e.g., hard drive interfaces, display controllers, USB
controllers).

• Functions do not need to be sequential, meaning a device could implement Function 0,

Function 2, and Function 7 while skipping others. Each function has its own
configuration space, which the system uses to manage the resources allocated to that
function.

What is PCIe and BDF?

PCIe (Peripheral Component Interconnect Express) is a technology used to connect different

devices (like graphics cards, network cards, etc.) inside your computer to the motherboard.
Each device in a PCIe system has a unique "address" to help the computer identify and
communicate with it. This unique address is called BDF, which stands for Bus, Device, and
Function.

Now, let's explain these parts one by one:

1. Bus:

• Think of the Bus as a road that devices use to communicate with the computer. In PCIe,
up to 256 buses can be created (numbered from 0 to 255).

• The first bus, Bus 0, is usually connected to the computer's Root Complex, which is like
a central hub where communication starts.

• Sometimes, there are devices called bridges that connect one bus to another, like an
overpass connecting two roads. Each new bus created by a bridge gets its own unique
bus number. The computer assigns these numbers one by one as it explores the
system, looking for new buses to connect.

2. Device:

• Now imagine that along these buses, there are parking spots for devices, like your
graphics card or sound card. Each bus can have up to 32 devices parked on it (Device 0
to Device 31).

• However, because PCIe is point-to-point, only one device can be directly connected to
a single PCIe link (the path from the motherboard to the device). This device will usually
have the number Device 0.
25 VLSI TECH WITH ANOUSHKA | PCIe Series

• In some special cases, like when the device is connected through a virtual PCI bus (for
example, in a PCIe switch or Root Complex), you can have multiple devices attached
to the same bus.

3. Function:

• Every device on the bus can perform certain functions. Think of it like having a multi-
tool—one tool (the device) can perform multiple tasks (functions).

• Each device can have up to 8 functions (numbered from Function 0 to Function 7). For
example, a single device might handle your USB ports, network connections, and
display output all at once.

• Devices don't always use all their function slots. A device might only use Function 0 and
Function 2, skipping others. Each function has its own space in the computer’s memory,
where the computer can configure and manage it.

How Does BDF Work?

• The BDF (Bus, Device, Function) system is used to give every function of every device its
own address. This is kind of like giving every apartment in a building its own unique
number so that the mail carrier knows exactly where to deliver the mail.

• For example, a network card could have an address like Bus 2, Device 0, Function 1.
This tells the computer:

o Which bus to go to (Bus 2),

o Which device on that bus (Device 0),

o And which function of that device to interact with (Function 1).

Why is BDF Important?

The BDF system is important because it helps the computer’s configuration software (the part
of the system that sets up your hardware) to:

• Find all the devices connected to the system,

• Assign unique addresses to each function, and

• Make sure everything is working properly without conflicts.

In a Nutshell:

• Bus: The highway where devices are connected.

• Device: The device parked on that highway (like your sound card).

• Function: The specific task that device performs (like handling audio).

• BDF: The address (Bus, Device, and Function) that makes sure the computer knows
exactly where to send data and instructions.

So, in PCIe, the BDF system is like a well-organized map that helps your computer keep track of
all the devices connected to it, making sure each one can do its job without stepping on another
device’s toes.
26 VLSI TECH WITH ANOUSHKA | PCIe Series

Host-to-PCI Bridge Configuration Registers

The Host-to-PCI Bridge is like the main connection point between the computer's processor
and the PCI devices. It's responsible for making sure the processor can communicate properly
with PCI devices, like a "translator" between the two.

How Configuration Registers Work Here:

The configuration registers for the Host-to-PCI Bridge don’t have to follow the typical
configuration mechanisms (the methods used in older PCI devices). Instead, these registers are
often placed in the memory address space, which the computer’s firmware (the underlying
27 VLSI TECH WITH ANOUSHKA | PCIe Series

software that controls hardware) knows about. So, when the processor wants to interact with
these registers, it knows where to look.

However, even though the placement is different, the layout and how the registers are used still
need to follow the standard Type 0 template. This standard comes from the PCI 2.3
specification, which means it still needs to behave like a typical PCI device in many ways.

Configuration Space in PCIe

PCIe (the updated version of PCI) expands the configuration space for each function on a
device. Let's talk about the space and what it looks like:

• Every PCIe function has 4KB (4096 bytes) of configuration space.

• This space is divided into several sections:

o The PCI-Compatible Space: This is the original 256-byte configuration area that
older PCI software can access. It's where important things like PCI Express
capability are stored.

o The Extended Configuration Space: This is where more advanced features (like
error reporting, power budgeting, or virtual channels) are stored. This space is
only accessible by modern PCIe systems using an enhanced method.

So, think of the 256 bytes as the "basic" settings area and the extra 4KB as the "advanced"
settings for new PCIe features.

Only the Root Complex Can Make Configuration Requests

In a PCIe system, the Root Complex (which is connected to the processor) is the only part of
the system allowed to make configuration requests. In other words, the Root Complex acts like
the "manager" of all configuration activities. This manager sends configuration requests (like
instructions) to the PCI devices and makes sure everything is set up properly.

Why only the Root Complex? Because allowing other devices to change the configuration could
create chaos. Imagine if every device could start changing things without permission — it would
lead to conflicts. So, only the Root Complex has this special permission.

Configuration Transactions Flow "Downstream"

Since only the Root Complex can send configuration requests, the requests can only flow
downstream. This means configuration requests start at the processor (through the Root
Complex) and travel to the devices below it (like PCIe devices on different buses). Devices on
the same level (peer-to-peer) cannot send configuration requests to each other.

These configuration requests are routed based on something called the BDF (Bus number,
Device number, and Function number), which tells the system exactly where the device is
located in the PCIe topology.

How Configuration Transactions are Generated

Most processors can’t directly make configuration read and write requests. They’re good at
handling memory and I/O requests, but they need help for configuration tasks. That’s where
the Root Complex comes in: it translates the memory and I/O requests from the processor into
configuration requests.
28 VLSI TECH WITH ANOUSHKA | PCIe Series

Two Ways to Access Configuration Space

There are two ways to access the configuration space for a PCI or PCIe device:

1. Legacy PCI Mechanism:

o This uses an I/O-indirect access method (a method where the system

accesses the device’s configuration registers indirectly).

o The problem with older systems was that there wasn’t enough I/O address space
(only 64KB available for I/O). By the time PCI came along, this space was
cluttered with many devices.

o To solve this, PCI used a technique called indirect address mapping. This
means that instead of assigning each device a separate I/O address, the system
uses one register for the target address and another register for the data being
sent to or read from that address.

2. Enhanced Configuration Mechanism:

o This method uses memory-mapped accesses, which is more common in

modern systems. It allows configuration space to be accessed through the
device’s memory address.

How Legacy PCI Configuration Works

In the Legacy PCI mechanism, the PCI configuration registers are accessed indirectly through
the Configuration Address Port and the Configuration Data Port. Here’s how it works:

• The processor writes the target device’s address (Bus, Device, and Function) to the
Configuration Address Port (at a specific I/O address: 0CF8h).

• Then, the processor writes or reads data from the Configuration Data Port (I/O address:
0CFC–0CFFh).

• The Root Complex checks if the target bus is within its range and, if so, initiates the
configuration read or write request.

Configuration Address Port

The Configuration Address Port is an important mechanism used by the processor to access
the configuration space of PCI devices. When a processor needs to interact with the PCI
configuration space, it writes to the Configuration Address Port to specify which PCI device and
register to target. The address is written in a structured 32-bit format, which is detailed below.
29 VLSI TECH WITH ANOUSHKA | PCIe Series

This port is mapped to the I/O address 0CF8h.

Breakdown of the 32-bit Configuration Address Template (Figure 3-4):

1. Bits [1:0]:

o Purpose: These bits are hard-wired and read-only.

o Value: They always return zeros when read.

o Reason: Ensures that the address is dword-aligned. PCI configuration accesses

must operate on whole doublewords (4 bytes), so byte-specific offsets are not
allowed here.

2. Bits [7:2] (Register Number):

o Purpose: Specifies the target dword (or register number) in the configuration
space of the device.

o Value: This defines which doubleword (dword) in the PCI device’s configuration
space you want to access. There are 64 dwords in the first section of the
configuration space, meaning this field can address any of these first 64
locations.

o Limitations: This mechanism can only target the first 64 doublewords (64 DWs)
of a device’s configuration space.

3. Bits [10:8] (Function Number):

o Purpose: Identifies the function within a PCI device.

o Value: Specifies which function (0 to 7) on the device is being accessed. Some

PCI devices can have multiple functions (up to 8), and this field helps
differentiate between them.

4. Bits [15:11] (Device Number):

o Purpose: Identifies the device on the PCI bus.

o Value: This field specifies which device (0 to 31) is being targeted on the PCI bus.
A PCI bus can support up to 32 devices, each with its own unique device
number.

5. Bits [23:16] (Bus Number):

o Purpose: Identifies the bus in the PCI hierarchy.

o Value: Specifies the bus number (0 to 255) on which the target device resides.
In a PCI system, there can be up to 256 buses, and this field indicates the bus
where the device is located.

6. Bits [30:24] (Reserved):

o Purpose: These bits are reserved.

o Value: They must always be set to 0. They are not used in the current
specification.
30 VLSI TECH WITH ANOUSHKA | PCIe Series

7. Bit [31] (Enable Bit):

o Purpose: Controls whether or not the processor’s access to the configuration

space is enabled.

o Value:

▪ When bit 31 = 1, it enables configuration space mapping, meaning the

processor can access the configuration space of the PCI device.

▪ When bit 31 = 0, the access is treated as a normal I/O request, and no

configuration access occurs.

Summary

• Bits [31] must be set to 1 to enable configuration space access.

• Bits [23:16] specify the Bus Number (0-255).

• Bits [15:11] specify the Device Number (0-31).

• Bits [10:8] specify the Function Number (0-7).

• Bits [7:2] specify the Register Number (0-63 dwords).

• Bits [1:0] are always 0 (ensuring dword alignment).

Bus Compare and Data Port Usage in PCI Express

1. Host Bridge and Bus Numbers:

• The Host Bridge in the Root Complex is responsible for connecting the CPU to different
PCI buses and devices. It contains two important registers:

o Secondary Bus Number: This represents the number of the bus directly
connected to the Host Bridge.

o Subordinate Bus Number: This defines the maximum bus number that can be
accessed downstream (below) from the Host Bridge.

• These two registers help the Host Bridge determine which bus the device you want to
communicate with is located on.

2. Configuration Address Port (0CF8h):

• To access any PCI device, the CPU writes a 32-bit value to the Configuration Address
Port (0CF8h). This value includes the Bus Number, Device Number, Function
Number, and the Register Number within the device that you want to access.

o Bits 23:16: The target Bus Number.

o Bits 15:11: The target Device Number.

o Bits 10:8: The target Function Number.

o Bits 7:2: The Register Number in the device's configuration space.

31 VLSI TECH WITH ANOUSHKA | PCIe Series

o Bit 31: Must be set to 1 to enable the configuration access.

3. Checking the Target Bus:

• When a configuration request is made, the Host Bridge checks if the Bus Number in the
request matches the range of buses it can access, which is determined by the
Secondary and Subordinate Bus Numbers.

o If the target bus is equal to the Secondary Bus Number, the Host Bridge
recognizes that the request is for a device directly connected to that bus. It
sends a Type 0 configuration request, which is a request to configure a device
directly on that bus.

o If the target bus falls between the Secondary Bus Number and Subordinate
Bus Number (but is not exactly the Secondary Bus), the Host Bridge forwards
the request as a Type 1 configuration request. This means the request is being
passed to another bus downstream, where another bridge will handle it and
possibly forward it further.

4. Configuration Data Port (0CFCh):

• Once the target bus and device are determined, the CPU can send read or write
requests to the Configuration Data Port (0CFCh). This is where the actual configuration
data is transferred.

o If the Configuration Address Port had bit 31 set to 1, and the Bus Number
matched the range handled by the Host Bridge, then the data in the
Configuration Data Port will be interpreted as a PCI configuration transaction.
Depending on whether it's a read or write request, the Host Bridge will either
retrieve or update the configuration of the device.

5. Type 0 and Type 1 Configuration Requests:

• Type 0 Configuration Request: This is used when the target device is on the same bus
as the Host Bridge's Secondary Bus Number. It means the device is directly accessible
on that bus.

• Type 1 Configuration Request: This is used when the target device is on a different bus
(downstream), and the request has to be forwarded through one or more bridges to
reach the target device.

6. What Happens When a Request is Outside the Range:

• If the Bus Number in the request is not within the range of buses managed by the Host
Bridge (i.e., it’s greater than the Subordinate Bus Number), the request will not be
forwarded, and the Host Bridge won’t perform any configuration transaction.
32 VLSI TECH WITH ANOUSHKA | PCIe Series
33 VLSI TECH WITH ANOUSHKA | PCIe Series

Multi-Host System: Addressing Multiple Root Complexes

In systems with multiple Root Complexes (components that connect the CPU to the PCI
Express bus), there is a challenge when it comes to accessing the configuration space. The
Configuration Address and Data ports (used for configuring PCI devices) can be duplicated
across different Root Complexes, but there must be a system to prevent conflicts. Let’s break
this down step by step:

1. Preventing Contention:

When multiple Root Complexes are present, they share the same IO addresses for
configuration. However, to avoid contention (conflicts) between them, only one Root
Complex’s bridge is active at a time during configuration. Here's how it works:

• When the processor writes to the Configuration Address Port (the port that specifies
which PCI device and function is being accessed), only one of the Root Complexes will
respond and participate in the transaction.

2. Enumeration Process:

During enumeration (a process where the system detects and assigns addresses to PCI
devices):
34 VLSI TECH WITH ANOUSHKA | PCIe Series

• Software discovers all the buses and devices under the active Root Complex and
assigns them bus numbers.

• After the first Root Complex finishes, the second Root Complex is enabled. The software
assigns it a bus number range that does not overlap with the first Root Complex’s bus
numbers.

By ensuring the two Root Complexes have non-overlapping bus numbers, both can operate
without conflict, even though they see the same configuration requests.

3. Configuration Port Access:

Once this setup is done:

• Any access to the Configuration Address Port is seen by both Root Complexes, but
only the one responsible for the target bus will process the request.

• The selected Root Complex acts as a gateway to the appropriate PCI bus:

o If the request is for a device on the Secondary Bus, it converts the request to a
Type 0 configuration access (for devices directly on its bus).

o If the request is for a bus further down the PCI hierarchy, it converts it to a Type 1
configuration access (to pass through to other buses).

Enhanced Configuration Access Mechanism: Addressing Multi-Core, Multi-Threaded

Issues

With modern multi-core, multi-threaded CPUs, the old model for accessing configuration
space no longer works well. Here's why and what the spec writers did to solve this:

1. Problems with the Old IO-Indirect Model:

• In older systems, accessing configuration space was a two-step process:

1. The CPU would write to the Configuration Address Port.

2. The CPU would then perform a corresponding access to the Configuration Data
Port.

• This was fine when there was only one CPU running a single thread. But in modern
systems with multiple cores and threads, different threads might try to access
configuration space simultaneously, which could cause conflicts. For example, Thread
A might write to the Configuration Address Port, but before it can complete the
operation, Thread B could overwrite the address with a new one.

2. The New Approach: Memory-Mapped Configuration Space

To solve this, the spec writers chose a new approach:

• Instead of using the old two-step IO port model, they mapped the entire configuration
space into a block of memory addresses.

• Now, accessing configuration space is done with one memory request, which directly
generates a Configuration Request on the PCI Express bus.

3. Address Space Trade-Off:

35 VLSI TECH WITH ANOUSHKA | PCIe Series

This new method consumes more address space:

• Each PCI Function now gets a 4KB block of configuration space (up from the previous
256 bytes).

• Mapping configuration space for all possible PCI functions requires 256MB of address
space.

However, this is a minor issue because modern CPUs support large memory address spaces (36
to 48 bits of physical address space). In such large address spaces, 256MB is insignificant.

Rules for Configuration Access:

• The Root Complex is not required to support certain advanced behaviors, such as:

o Crossing dword boundaries (accessing two memory dwords at once).

o Bus locking protocols (which ensure uninterrupted sequences of operations).

Thus, software should avoid relying on these features unless it knows that the Root Complex
supports them.

Problem and New Approach:

The issue being addressed is how to access and configure PCI devices efficiently, given the
limitations of legacy methods which used a restricted portion of the address space for
configuration. To solve this, the writers of the PCI Express specification decided to map the
entire PCI configuration space directly into memory addresses. This allows a simpler and faster
way of accessing configuration space with a single memory access command.

Memory Mapping of Configuration Space:

• Instead of using limited IO ports to access configuration registers, all PCI configuration
space is mapped into a dedicated 256MB of memory address space.

• Each PCI Function (a function is a logical subcomponent of a device) gets its own 4KB
of this address space.

• Mapping the entire configuration space into memory allows the system to send a single
memory request, which generates a Configuration Request on the PCI bus.

Trade-Off: Memory Size vs Address Space:

• The trade-off for this new approach is that it consumes 256MB of memory address
space. However, this is insignificant in modern systems where the CPU can address 36
to 48 bits of memory (a very large addressable space, far beyond 256MB).

• Thus, the new method allows for much more straightforward configuration access
without worrying about running out of IO space, but at the cost of using some of the
available memory address range.

Configuration Space Layout:

36 VLSI TECH WITH ANOUSHKA | PCIe Series

Each PCI Function’s 4KB configuration space is mapped starting at a 4KB-aligned address
within this 256MB of memory. This means that the memory address itself now carries
information about which PCI device and function are being targeted.

PCI Express Configuration Requests: Type 0 and Type 1

When configuring devices on a PCI Express (PCIe) bus, configuration requests are used to
access a device's configuration space. There are two types of configuration requests: Type 0
and Type 1, depending on whether the target device is on the current bus or a bus further
downstream. Here's how these two types of requests work:

Type 0 Configuration Request

• A Type 0 configuration request is used when the target bus number matches the
Secondary Bus Number of the bridge. This indicates that the target device is located
directly on the bus connected to the secondary side of the bridge.
37 VLSI TECH WITH ANOUSHKA | PCIe Series

Steps:

1. Devices on the Secondary Bus:

o Devices on the secondary bus check the Device Number field to determine
which device is being targeted. In PCIe, Endpoints on an external link are always
assigned as Device 0.

2. Checking the Function Number:

o The selected device then checks the Function Number to identify which
function within the device is being accessed. Devices can have multiple
functions (for example, a network card could have a control function and a data
transfer function).

3. Selecting the Register:

o The selected function uses the Register Number field in the request to
determine which dword (32-bit block) in the configuration space is being
accessed.

o The First Dword Byte Enable (BE) field specifies which bytes within the selected
dword are to be read or written.

Type 0 Configuration Request Header Format:

• The Type field is 00100, indicating a Type 0 request.

• The Format (Fmt) field specifies whether the request is a read or a write.

Type 1 Configuration Request

• A Type 1 configuration request is used when the target bus number does not match the
Secondary Bus Number of the bridge, but the target bus number is within the range
specified by the bridge's Secondary Bus Number and Subordinate Bus Number. In
this case, the packet is forwarded to the bridge's secondary bus as a Type 1 request.
38 VLSI TECH WITH ANOUSHKA | PCIe Series

Steps:

1. Bridges Handle Type 1 Requests:

o Endpoints (non-bridge devices) on the bus ignore Type 1 requests because

these requests are meant for devices on downstream buses.

2. Forwarding the Request:

o When the request reaches the bridge, the bridge compares the target bus
number with its Secondary and Subordinate Bus Numbers.

o If the target bus falls within this range, the bridge forwards the request to the
secondary bus as a Type 1 request.

3. Handling Further Downstream:

o If the target bus matches the bridge’s secondary bus, the request is converted
from Type 1 to Type 0 and is forwarded to the secondary bus for devices on that
bus to process.

o If the target bus does not match the bridge's secondary bus but is within its
range, the request is passed further downstream as a Type 1 request.

Type 1 Configuration Request Header Format:

• The Type field is 00101, indicating a Type 1 request.

• The Format (Fmt) field indicates whether the request is a read or a write.

Type 0 vs. Type 1 Request Handling:

39 VLSI TECH WITH ANOUSHKA | PCIe Series

• Type 0:

o Used when the device is on the local bus (i.e., the secondary bus of the bridge).

o Devices on the bus directly process the request by checking the device,
function, and register fields.

• Type 1:

o Used when the device is on a bus further downstream.

o Bridges process these requests and forward them to the appropriate bus until
they reach the target bus.

Legacy Configuration Access (CF8h/CFCh Mechanism)

1. MOV Instructions and I/O Port Setup:

o The code mov dx, 0CF8h sets up DX to the configuration address port (0xCF8).

o Then, mov eax, 80040000h sets the EAX register to point to bus 4, device 0,
function 0, and the first DWORD (register 0, containing the Vendor ID). This
value:

▪ 8004 -> 8000 (indicating a configuration request) + Bus number 4.

▪ Device = 0, Function = 0, and DWORD 0 are encoded in the lower bits.

40 VLSI TECH WITH ANOUSHKA | PCIe Series

o The OUT instruction sends this address to the configuration address port
(0xCF8), effectively setting the target.

2. Type 1 Configuration Read Request Initiation:

o The out dx, eax command writes this address to the configuration address port
(0CF8h), indicating a request to read from Bus 4, Device 0, Function 0. Since
the bus number is 4 (non-zero), this triggers a Type 1 Configuration Read
starting from Bus 0.

PCI-Compatible Configuration Access Steps:

1. Root Complex Initiates the Read:

o The Root Complex (often referred to as the Host/PCI Bridge) receives the
configuration request from the processor. It knows the requested bus is
downstream of Bus 0, so it begins a Type 1 Configuration Read targeting Bus 4.

2. Passing the Request through Bridges:

o The Root Complex forwards the Type 1 request to Bus 0.

o The Device 1 on Bus 0 is a PCI-to-PCI (P2P) bridge, and it checks the bus range.
Since Bus 4 is in its range (Bus 1 to Bus 4), it forwards the request downstream.

o The request then travels through Bus 1 and Bus 2, being forwarded by the
respective PCI bridges until it reaches Bus 4.

3. Converting to Type 0 Request:

o Once the request reaches Bus 4, the bridge converts the Type 1 request to a
Type 0 Configuration Read because it is now addressing a device on the local
bus (Bus 4).

o The target device (Device 0) and function (Function 0) are identified from the
configuration request.

4. Reading Vendor ID from Configuration Space:

o The Type 0 request specifies the first DWORD (register 0), which holds the
Vendor ID of the device.

o The PCI device responds with the first two bytes of the Vendor ID.

5. Return of Data:

o The response packet, containing the Vendor ID, is sent back to the Root
Complex and then forwarded to the processor via the configuration data port
(0xCFC), which is read using the IN instruction.

o The Vendor ID is placed into the AX register, completing the read.

Enhanced Memory-Mapped Configuration Access:

• Instead of using the I/O ports (0xCF8 and 0xCFC), Enhanced Configuration Access
uses memory-mapped I/O.
41 VLSI TECH WITH ANOUSHKA | PCIe Series

• In the example, the address E0400000h refers to the configuration space in memory,
allowing the processor to directly perform memory read operations.

• The address bits are broken down into:

o Bits 27:20: Target bus (Bus 4).

o Bits 19:15: Target device (Device 0).

o Bits 14:12: Target function (Function 0).

o Bits 11:2: DWORD (register 0, Vendor ID).

o Bits 1:0: Byte offset (starting byte is 0).

• The processor reads from this memory location, and the Root Complex generates a
Configuration Read request, similar to the legacy access method, but triggered by a
memory read operation.

Enumeration in PCIe: Discovering the Topology

After a system reset or power-up, the configuration software scans the PCI Express (PCIe) fabric
to discover the topology of devices connected to the system. This process is called
enumeration. Here’s how it works step-by-step:

Topology Before Enumeration

At the start, the only known device in the system is the Host/PCI bridge, which serves as the
entry point for configuration and connects the processor and the PCIe fabric. This bridge
assigns Bus 0 to its downstream side, known as the secondary bus. The rest of the topology,
including devices on other buses, is yet to be discovered.

Topology View at Startup

• The processor and the Root Complex (part of the bridge) know that Bus 0 exists.

• Other buses and devices are marked as unknown ("? ?") and are yet to be identified by
the configuration software.

Scanning the PCIe Fabric (Enumeration)

42 VLSI TECH WITH ANOUSHKA | PCIe Series

Enumeration involves the configuration software searching for devices on each bus by sending
Configuration Read Requests to each potential bus, device, and function combination. Here's
how this process is carried out:

1. Reading the Vendor ID Register:

o The software attempts to read the Vendor ID from each device's configuration
space.

o The Vendor ID is a unique 16-bit value assigned by the PCI-SIG to each

manufacturer. It is hardcoded into the device's configuration space.

o By reading the Vendor ID register for all possible Bus, Device, and Function
numbers, the software determines whether a device is present.

2. Targeting Non-Existent Devices:

o During enumeration, the software might attempt to access a

bus/device/function combination where no device is present.

o In this case, the PCIe fabric handles the situation by returning a Completion
with Unsupported Request (UR) status:

▪ The upstream PCIe bridge (above the target) generates this completion.

▪ The Root Complex translates the result to all ones (FFFFh), a reserved
value indicating the absence of a device.

o The software interprets a Vendor ID of FFFFh as the device not being present.
This prevents the system from falsely reporting errors.

3. Error Handling During Enumeration:

o Although a Master Abort error (from legacy PCI) or a UR response could be seen
as an error during normal runtime, it is expected during enumeration and not
treated as a critical issue.

o The system uses a special Unsupported Request Status bit to note these
conditions without halting the enumeration process. This is important to prevent
unnecessary error handling during this stage, as the system might not yet have
all error-handling capabilities active.

Device Not Ready to Respond

Another issue the software might encounter is when a device exists but is not yet ready to
respond to configuration requests. This typically happens after a system reset, during which the
device needs time to initialize.

1. Initialization Delay:

o After reset, PCIe devices require some time to initialize before they can respond
to configuration accesses. If the data rate is 5.0 GT/s or less, software must wait
100 ms before sending a configuration request.

o For higher speeds (e.g., Gen3), the wait time extends beyond 100 ms, due to the
time required for Link training and Equalization.
43 VLSI TECH WITH ANOUSHKA | PCIe Series

2. CRS (Configuration Request Retry Status):

o When a device is temporarily unable to respond to a configuration access, it

returns a CRS status.

o This status signals that the device is not yet ready but will be shortly.

o The Root Complex handles CRS differently depending on the system settings.
During enumeration:

▪ The Root Complex may return a Vendor ID of 0001h (indicating a lengthy

delay), which tells the configuration software to skip the device
temporarily and return later.

▪ If the software requests a configuration write or any other type of read,

the Root Complex may automatically retry the request.

Why Enumeration Matters

Enumeration is crucial because it allows the system to discover the full topology of PCIe
devices and allocate resources like bus numbers, memory addresses, and I/O space. Without
enumeration, the system would not know how many devices are connected or how to
communicate with them.

To determine whether a PCIe function is an endpoint or a bridge, we use information from the
Header Type register (offset 0Eh in the PCI configuration space header). Here's how it works:

Key Points:

1. Header Type Register Structure:

o The lower 7 bits of the Header Type register identify the function type.

o Values:
44 VLSI TECH WITH ANOUSHKA | PCIe Series

▪ 0 = Endpoint (not a bridge)

▪ 1 = PCI-to-PCI Bridge (P2P Bridge) connecting two buses.

▪ 2 = CardBus Bridge (legacy device rarely used today).

2. Bit 7 of the Header Type Register:

o This bit indicates whether the device is single-function or multi-function.

▪ 0 = Single-function device.

▪ 1 = Multi-function device (supports multiple independent functions on

the same device).

Steps to Determine if a Function is a Bridge or Endpoint:

1. Check Header Type:

o If the Header Type register's value is 0 (0000000b), it indicates an Endpoint

function.

o If the Header Type register's value is 1 (0000001b), it indicates a PCI-to-PCI

Bridge.

2. Check Multifunctionality:

o If bit 7 of the Header Type is 0, the function is a single-function device.

o If bit 7 is 1, it is a multi-function device, meaning multiple independent

functions could be implemented within the device.

3. Enumeration Process: During enumeration, the software probes devices and functions
starting from bus 0, device 0. For each device and function found, the Vendor ID and
Header Type registers are checked to determine the nature of the function (endpoint or
bridge). If a function is found to be a bridge, its bus number registers (Primary,
Secondary, and Subordinate) are updated, and the enumeration continues downstream
from that bridge.

Example Enumeration Flow:

1. Bus 0, Device 0, Function 0:

o Read the Vendor ID and Header Type.

o If Header Type is 1 (PCI-to-PCI Bridge), set the Primary, Secondary, and

Subordinate Bus Numbers for this bridge, then proceed to enumerate
downstream buses.

2. Bus 1, Device 0, Function 0:

o If the Header Type is 1, set bus number registers for the bridge and continue
downstream.

3. Bus 2, Device 0, Function 0:

o If Header Type is 0, it’s an Endpoint function. Since endpoints don’t connect

additional buses, there’s no need to assign downstream bus numbers.
45 VLSI TECH WITH ANOUSHKA | PCIe Series

4. Continue scanning for functions in a depth-first manner until all devices and functions
have been discovered and appropriately configured.

Final System Configuration:

Once enumeration completes, each bridge’s Subordinate Bus Number register is updated
with the actual largest bus number downstream, ensuring that the PCIe hierarchy is correctly
mapped for further transactions.

This process is essential for configuring a PCIe system, as it helps organize the hierarchy of
devices and allows the host system to address each device accurately.
46 VLSI TECH WITH ANOUSHKA | PCIe Series

PCIe Day 5
Architecture Overview
47 VLSI TECH WITH ANOUSHKA | PCIe Series

Compatibility with Old Software

Even though PCIe is an advanced technology compared to the older PCI standard, it keeps a
level of compatibility with older systems. One way this is achieved is by keeping the
configuration headers for both Endpoints (like devices) and Bridges (which connect buses)
similar to those in PCI. Think of the configuration headers as the “identity card” for a device.
When older software interacts with PCIe devices, it doesn’t really see the difference between
PCI and PCIe because the basic layout of these headers remains unchanged.

Bridges in PCIe

In PCIe systems, instead of having individual bridges, you often have Switches and Root
Complexes (the Root is the interface between the CPU and PCIe). While older software might
still think it’s dealing with regular PCI bridges, in reality, it’s working with a more complex
internal setup. This setup is hidden from the software, meaning the software only sees what it
expects, even if the internal design of the Root or Switches has changed and become more
advanced.

Configuration Space and Headers

In PCIe, each device or function gets a 256-byte Configuration Space. This space is divided
into sections:

• 64 bytes are reserved for the old PCI configuration (so older software can still function).

• The remaining 192 bytes are used for PCIe-specific or function-specific configurations.
This means that newer PCIe features are supported while still keeping the older system
functionality intact.
48 VLSI TECH WITH ANOUSHKA | PCIe Series

Topology Example

Consider how your computer’s PCIe system looks. At the top is the Root Complex (which
connects to the CPU), and beneath that are several PCIe devices and ports. In older systems,
the software would see this as a series of bridges. With PCIe, while the system structure is more
complex, it is made to look like the old system to the software. The Root Complex has internal
connections that act like PCI buses, even though they aren’t actually physical PCI buses.

Enumeration (Device Detection)

When the system powers on, it goes through a process called enumeration, where it discovers
all the connected devices (like graphics cards or network cards) and assigns them bus numbers
and system resources. This process in PCIe works just like it did in older PCI systems. Once
enumeration is done, the software has a clear map of the devices in the system, making it easier
to manage them.

Switches in PCIe: Acting Like Bridges

In traditional PCI systems, bridges were used to connect different buses. Each bridge would
help route data between different parts of the system. In PCIe, while the technology has
evolved, the way things look to the software hasn't changed much. The Switch in a PCIe system
works internally in a more sophisticated way but still appears to software as a collection of
bridges. These bridges are all connected by a shared bus.

Why is This Useful?

By organizing PCIe Switches to look like PCI bridges to the software, it simplifies compatibility.
This setup means the software doesn't need to be rewritten or heavily modified when moving
from older PCI systems to newer PCIe systems. The software still thinks it's dealing with regular
PCI bridges, but in reality, the Switch is doing the work behind the scenes. This allows
transaction routing—which is the process of sending data between different parts of the
system—to function just like it did in the older PCI systems.

Enumeration Process

Enumeration is a key process that happens when the system starts up. During enumeration,
the configuration software scans the system, identifies all the connected devices, and assigns
them bus numbers and system resources like memory and I/O space.
49 VLSI TECH WITH ANOUSHKA | PCIe Series

In a PCIe system, even though the internal setup might be more complex with Switches and
multiple buses, the enumeration process works the same way as it did with PCI. Once the
system is done with enumeration, each device and bridge gets a bus number and can now
communicate with the rest of the system. This is important because enumeration ensures that
every device knows its place in the system and how to route its transactions.

Example: System Topology

Figure below provides a visual example of how enumeration works. Let’s walk through it in a
simplified way:

• The Root Complex (connected to the CPU) starts by having Internal Bus 0, which is a
virtual bus that connects everything below it.

• The PCI-PCI Bridges appear as different buses (Bus 1, Bus 2, etc.), and each device (like
a PCIe Endpoint) gets a bus number.

• For example, Bus 3 might have a PCIe Endpoint, which is some device like a network
card or a graphics card. Similarly, other endpoints and bridges will be on other buses,
like Bus 5, Bus 7, etc.

Once enumeration finishes, the system knows where every device is located, and
communication can happen smoothly.

Low-Cost vs. High-End Systems

To make this concept even clearer, let’s compare two different types of systems:
50 VLSI TECH WITH ANOUSHKA | PCIe Series

1. Low-Cost Consumer Desktop: In a simple desktop machine, you might have a few
PCIe ports and slots for adding things like a graphics card or a sound card. The internal
structure of this system looks very similar to how older PCI systems were organized. You
still have a Root Complex, a few PCIe Ports, and some slots for add-in cards. The
simplicity of the design makes it easier for software to manage.

2. High-End Server: A server, on the other hand, has a much more complex structure. It
might have multiple networking interfaces and many PCIe slots for connecting storage
devices or other peripherals. Even though the system design is more sophisticated, PCIe
allows these high-end systems to be managed similarly to simpler ones. In the early
days of PCIe, some even thought that PCIe could replace other networking protocols
due to its flexible architecture, but it hasn’t fully replaced them because external
networks typically use different technologies.

Root Complex and Uncore Logic

Another key concept is the Root Complex. The Root Complex is the starting point of the PCIe
system, where the CPU interfaces with the PCIe devices. In modern processors, especially from
companies like Intel, the Root Complex is often integrated into the CPU package itself.
51 VLSI TECH WITH ANOUSHKA | PCIe Series

For example, modern CPUs may have integrated memory controllers (for DRAM), a PCIe x16
port (for graphics cards), and routing logic. All of this logic outside the actual CPU cores is often
referred to as Uncore logic, meaning it’s part of the CPU package but not directly involved in the
processing power of the CPU cores. The Root Complex here handles all the traffic between the
PCIe devices and the CPU, and since part of it resides inside the CPU package, this integration
makes communication faster.

In Summary

• Switches in PCIe systems act like a group of bridges to the software, which helps
maintain compatibility with older software.

• Transaction routing (sending data between devices) and enumeration (discovering and
organizing devices) work the same way in PCIe as they did in PCI.

• In both low-cost and high-end systems, PCIe can be organized in a way that the system
appears simpler to the software.

• The Root Complex connects the CPU to the PCIe devices, and in modern systems, it’s
often integrated inside the CPU package, speeding up communication.

Overview of Device Layers in PCIe

PCIe defines a layered architecture where each layer has a distinct role, and these layers
operate independently for transmit (TX) and receive (RX) traffic. This separation into layers
provides flexibility for hardware designers because it allows upgrades or modifications to one
layer without necessarily affecting the others. However, while the layers are defined for clarity,
there’s no strict requirement for designs to strictly partition hardware in this exact way to meet
PCIe compliance.

The layers, as depicted in figure below, are crucial to understanding how PCIe transfers data
and how the different parts of a device interact with each other.

PCIe Device Core and Interface

52 VLSI TECH WITH ANOUSHKA | PCIe Series

At the heart of the PCIe system is the device core, which implements the primary functionality
of the PCIe device. Depending on the type of device, the core can vary:

• If it’s an endpoint (a device like a graphics card or network card), it may contain multiple
functions (up to 8). Each of these functions has its own configuration space.

• If it’s a switch, the core contains packet routing logic and an internal bus for routing
packets to their destinations.

• If it’s a root complex, which interfaces the CPU to the PCIe devices, the root core
typically includes a virtual PCI bus 0, where embedded devices and virtual bridges
reside.

1. Transaction Layer

The Transaction Layer is responsible for managing high-level data communication:

• It creates Transaction Layer Packets (TLPs) for transmission and decodes them upon
receipt.

• This layer handles key functions like:

o Quality of Service (QoS): Managing data flow based on priority.

o Flow Control: Ensuring that data is sent at a rate the receiver can handle.

o Transaction Ordering: Keeping track of the correct order in which data should
be processed.

Each of these functions ensures that the data flow between devices is efficient and properly
managed. The creation and processing of TLPs occur on both the transmit and receive sides of
the system.

2. Data Link Layer

The Data Link Layer ensures reliable communication between two directly connected devices:

• It handles Data Link Layer Packets (DLLPs) for error management, including the
creation and decoding of these packets.

• One of its primary responsibilities is error detection and correction using the Ack/Nak
protocol.

o Ack/Nak protocol: This is a system for acknowledging correct data transfers

(Ack) or signaling negative acknowledgment (Nak) when errors are detected.

The Data Link Layer thus ensures that any errors in transmission are caught and corrected to
maintain data integrity.

3. Physical Layer

The Physical Layer is responsible for the actual transmission of data across the PCIe link. It
handles the conversion of data into a form that can be transmitted over the physical
connection, as well as the reception and conversion of incoming data into a usable format.

Key responsibilities of the Physical Layer include:

53 VLSI TECH WITH ANOUSHKA | PCIe Series

• Ordered-Set Packet Creation: On the transmit side, it creates Ordered-Sets, which are
groupings of bytes used for synchronization and control during transmission.

• Packet Processing: It processes all types of packets:

o Transaction Layer Packets (TLPs)

o Data Link Layer Packets (DLLPs)

o Ordered-Sets for transmission over the link.

On the transmit side, the Physical Layer processes these packets through various steps:

• Byte Striping: Distributes the data across different lanes for parallel transmission.

• Scrambling: Randomizes the data to avoid patterns that could cause interference.

• 8b/10b Encoding (for Gen1/Gen2) or 128b/130b Encoding (for Gen3): Converts the
data into a format suitable for transmission.

• Serialization: Converts the data from parallel to serial form for transmission across the
link.

On the receive side, the Physical Layer performs the reverse operations:

• Deserialization: Converts serial data back into parallel form.

• Decoding: Decodes the 8b/10b or 128b/130b encoded data.

• De-scrambling: Returns the data to its original form after transmission.

• Clock and Data Recovery (CDR): Uses the incoming data stream to recover the clock
signal and ensure accurate data timing.

• Elastic Buffers: Buffer incoming data to handle any clock or data rate mismatches.

The Physical Layer also includes the Link Training and Status State Machine (LTSSM), which is
responsible for initializing and training the link to ensure it operates correctly. This process
adjusts settings like link speed and the number of active lanes to maximize data transfer
efficiency.

Example: Device-to-Device Communication

Imagine two PCIe devices (Device A and Device B) communicating. The layered architecture
ensures that:

• On the transmit side (TX) of Device A, the Transaction Layer creates TLPs, which then
pass through the Data Link Layer for error management, and finally, the Physical Layer
converts them into a form suitable for transmission over the PCIe link.

• On the receive side (RX) of Device B, the Physical Layer receives the serialized data,
decodes it, and passes it through the Data Link Layer for error checking, before finally
reaching the Transaction Layer, where the original data is reconstructed.

Switch Ports and the Role of Layers

A common question is whether Switch Ports need to implement all these layers since they
primarily route packets between devices. The answer is yes—Switch Ports must implement the
54 VLSI TECH WITH ANOUSHKA | PCIe Series

full stack of layers, especially the Transaction Layer, because they need to inspect the packet
contents to determine how to route them. The Transaction Layer logic is essential for reading
packet headers, managing flow control, and maintaining transaction ordering across the PCIe
topology.

Switch Ports and Layer Responsibilities

A common question is whether a Switch Port needs to implement all the PCIe layers, given that
its main role is to forward packets between devices. The answer is yes. Switch Ports must
implement all the layers (Transaction Layer, Data Link Layer, and Physical Layer), even though
they primarily act as intermediaries between devices. The reason is that in order to correctly
forward packets, the Switch needs to evaluate the contents of the packets. This evaluation,
including checking packet headers and determining routing paths, takes place in the
Transaction Layer.

Layer Communication in PCIe Interfaces

Each layer in the PCIe interface has a specific role, and it communicates with its corresponding
layer in the device at the other end of the PCIe Link. For example:

• The Transaction Layer in the transmitting device communicates with the Transaction
Layer in the receiving device.

• Similarly, the Data Link Layer and Physical Layer in the transmitter communicate with
their counterparts in the receiver.

This communication occurs through a process of packetization. The upper two layers
(Transaction Layer and Data Link Layer) organize data into packets, with each layer adding
specific information necessary for that layer’s function.

Packet Flow Overview

Here’s a simplified flow of how packets move through the layers in a PCIe interface:

1. Transaction Layer (Transmit Side):

55 VLSI TECH WITH ANOUSHKA | PCIe Series

o The Transaction Layer in the transmitting device assembles a packet based on

information provided by the device’s core logic (sometimes referred to as the
"Software Layer," though the PCIe spec doesn’t officially use that term).

o This packet contains details like the command type, the target address, and
other attributes (e.g., read/write operations).

o The packet is stored in a buffer, called a Virtual Channel, until it’s ready to be
passed down to the next layer.

2. Data Link Layer (Transmit Side):

o The Data Link Layer adds additional information for error checking.

o A local copy of the packet is stored so that it can be retransmitted if an error

occurs during transmission.

3. Physical Layer (Transmit Side):

o In the Physical Layer, the packet is encoded and transmitted across the PCIe
Link using all available lanes. Transmission is typically done using differential
signaling for robustness.

4. Physical Layer (Receive Side):

o On the receiving device, the Physical Layer decodes the incoming data from the
PCIe Link and checks for any errors.

o If no errors are found, the packet is forwarded up to the Data Link Layer.

5. Data Link Layer (Receive Side):

o The Data Link Layer performs further error checks to ensure the packet is
intact. If no errors are detected, the packet is passed up to the Transaction
Layer.

6. Transaction Layer (Receive Side):

o The Transaction Layer on the receiving side buffers the packet, checks for any
additional errors, and then disassembles the packet to retrieve the original
information (such as the command type, target address, etc.).

o Finally, the contents are delivered to the device core of the receiving device,
allowing the core logic to process the request (e.g., executing a command or
transferring data).
56 VLSI TECH WITH ANOUSHKA | PCIe Series

Importance of Layer Interaction

This layered communication structure ensures that each layer handles a specific function, and
by doing so, the process of transmitting and receiving data becomes modular and manageable.
Each layer performs its role (whether that’s adding/removing error checking, organizing data into
packets, or physically transmitting bits), and in doing so, the PCIe system can reliably transfer
data across devices.

Transaction Layer Overview in PCIe

The Transaction Layer in PCIe is responsible for managing how devices communicate with each
other by sending and receiving data. It handles the main tasks of sending requests (like "read" or
"write" commands) and receiving responses from other devices connected to the PCIe system.

Here’s a simple breakdown:

1. Making Requests: Whenever a device wants to perform an operation, like reading or

writing to memory, it sends a request to another device. These requests are packaged
into what we call TLPs (Transaction Layer Packets).

2. Receiving Responses: For some requests, the device that sent the request expects a
response. For example, if a device asks to read data, the device that has the data will
send a completion packet back, confirming the operation and returning the requested
data.

Transaction Layer Packets (TLPs)

The transactions at this layer are communicated using Transaction Layer Packets (TLPs). TLPs
handle various types of requests, which can be categorized into the following four groups:

1. Memory requests (e.g., memory reads/writes)

57 VLSI TECH WITH ANOUSHKA | PCIe Series

2. IO requests (for I/O space operations)

3. Configuration requests (for device configuration)

4. Messages (new to PCIe, used for control and event signaling)

Memory, IO, Configuration, and Messages

• The first three types (Memory, IO, and Configuration) were carried over from PCI and
PCI-X, but Messages are specific to PCIe.

• A Transaction in PCIe is defined as a combination of a Request packet (e.g., a Memory

Read request) and any Completion packets (e.g., data being returned after a Memory
Read) that the target device sends in response.

Table below lists the types of requests along with whether they are Posted or Non-Posted
transactions.

Posted vs. Non-Posted Transactions

• Non-Posted Transactions: In these transactions, the Requester (the device initiating

the request) expects a Completion packet in response. Non-posted requests include:

o Memory Reads: Since data needs to be returned, it is always non-posted.

o I/O Writes and Configuration Writes: Although they send data to the target,
they still require a completion to confirm that the write was successful. This
follows the split transaction protocol from PCI-X, where the request and
completion occur in separate packets.

• Posted Transactions: These transactions do not require a completion packet to be

returned to the Requester. This leads to better performance because the Requester
does not need to wait for a response. Posted transactions include:

o Memory Writes: The requester sends the write command and assumes it will be
completed successfully without any explicit confirmation from the target device.

o Messages: These are control or event signaling packets that also do not require
a completion packet.
58 VLSI TECH WITH ANOUSHKA | PCIe Series

Although posted transactions don’t receive a completion packet, they still use the Ack/Nak
protocol at the Data Link Layer to ensure reliable delivery. For example, even in a posted write,
the Data Link Layer will acknowledge that the packet was successfully transmitted. This is
important because it ensures data integrity without the performance overhead of completion
packets.

TLP Basics

TLPs originate in the Transaction Layer of the transmitter and terminate in the Transaction
Layer of the receiver. As the packet moves through the PCIe layers, the Data Link Layer and
Physical Layer add information to the TLP, ensuring reliable transmission and error-checking.

• The Data Link Layer adds error-checking data to ensure the packet is transmitted
correctly across the link.

• The Physical Layer handles the actual transmission of the packet over the physical
connection between devices.

Upon receiving the packet, the same layers at the receiving end verify that the data was
transmitted correctly, and the Transaction Layer at the receiver processes the packet based on
its content (e.g., memory operations or configuration commands).

A TLP (Transaction Layer Packet)

is the fundamental unit of
communication in PCI Express
(PCIe). When a device sends a
request or a response, it is packed
into a TLP, which then travels
across the PCIe link to the other
device. This packet goes through
various layers, and each layer
adds different parts to the packet
to ensure it’s transmitted properly.
59 VLSI TECH WITH ANOUSHKA | PCIe Series

TLP Packet Assembly in PCIe

Let’s break this process down in easy-to-understand steps:

1. Core Section of the TLP (Transaction Layer)

• The Transaction Layer starts the process by creating the core part of the TLP. This part
includes important information, such as a header, and in some cases, data (like when
writing to memory).

• Header: Every TLP has a header, which acts like an address label that tells the system
where the packet is going and what type of request it is (for example, a memory read or
write).

• Data: Some packets, like a write request, will have data that needs to be transferred.
But other packets, like a read request, won’t include data—they just contain
instructions.

• Optional ECRC (End-to-End Cyclic Redundancy Check): The Transaction Layer can
also add an ECRC at the end of the packet. This is a type of error detection code that
ensures no errors occurred during transmission between the sender and receiver.

• The packet, along with the ECRC (if used), is then passed down to the next layer, the
Data Link Layer.

2. Data Link Layer Adds Sequence Number and CRC (Link-Level Error Detection)

• The Data Link Layer is responsible for ensuring the packet is transmitted error-free
across the immediate PCIe link between two devices.

• Sequence Number: The Data Link Layer adds a sequence number to the packet, which
helps the receiving device keep track of the order of packets.

• LCRC (Link Cyclic Redundancy Check): The Data Link Layer also adds another error-
checking code called the LCRC. This code allows the receiving device to check if any
errors occurred during transmission over that specific link. If there’s an error, the
receiver notifies the sender, and the packet is retransmitted.

Why Do We Need Both ECRC and LCRC?

You might wonder, if we already have LCRC for error checking, why do we need ECRC?

• LCRC only checks for errors on the link between two neighboring devices.

• ECRC, on the other hand, checks for errors across the entire path from the sender to
the final destination. This is useful because errors could happen inside the devices (like
switches or ports) that route the packet, and LCRC won’t catch those.

By having both, we make sure errors are caught at different stages, providing better protection.
60 VLSI TECH WITH ANOUSHKA | PCIe Series

3. Physical Layer Adds Transmission Information

• The Physical Layer is responsible for converting the packet into a format that can be
transmitted over the physical connection (the wires) between devices.

• Control Characters (in PCIe Gen 1 and 2): In earlier versions of PCIe (Generations 1
and 2), the Physical Layer added special control characters at the beginning and end of
the packet. These characters told the receiver how to handle the packet.

• New Encoding for PCIe Gen 3 and Beyond: In PCIe Generation 3, the control
characters were replaced with a different encoding method that packs more information
into each transmission, improving efficiency.

• The Physical Layer also encodes the packet and sends it over the PCIe link using
differential transmission (which is a way of sending signals to reduce noise and errors).

TLP Packet Disassembly in PCIe

When a Transaction Layer Packet

(TLP) is received, it has gone
through several steps of assembly,
with different layers adding bits
and fields for error-checking,
routing, and transmission. The
disassembly process is when the
receiving device reverses these
steps to extract the original
information. Here's how the TLP
packet is disassembled layer by
layer:

1. Physical Layer Strips Off Transmission Bits

• When the receiving device sees the incoming TLP packet, the Physical Layer is the first
to handle it.

• The Physical Layer checks for special characters (like Start and End characters) that
were added during transmission. These characters tell the receiver where the packet
begins and ends.

• After verifying these control characters, the Physical Layer removes them and forwards
the rest of the packet to the next layer, the Data Link Layer.

2. Data Link Layer Checks for Errors

• The Data Link Layer is responsible for ensuring the packet arrived without errors.
61 VLSI TECH WITH ANOUSHKA | PCIe Series

• It does this by checking two things:

1. LCRC (Link Cyclic Redundancy Check): This checks for transmission errors
between the neighboring devices (i.e., on the PCIe link).

2. Sequence Number: It checks the sequence number to ensure packets are in

the right order.

• If there are no errors, the Data Link Layer removes the LCRC and sequence number
fields (which were added for error detection) and passes the remaining packet to the
Transaction Layer.

3. Transaction Layer Processes the Core Packet

• The Transaction Layer now works with the core part of the packet (the header, data,
and possibly the ECRC field). If the receiving device is a Switch, it will check the header
to see where the packet should go.

• Switches: A switch in the PCIe system is like a traffic controller. It looks at the header of
the TLP to see where the packet is headed (which device or port). The switch doesn't
modify the ECRC (End-to-End Cyclic Redundancy Check), but it can check it for errors
and report them if needed.

• If the receiving device is the target (the packet's final destination), it checks the ECRC
(if it's enabled to do so) to ensure there were no errors in the packet's journey across all
the links and devices.

4. Final Step: Forwarding the Core Information

• Once the ECRC (if present) is checked and there are no errors, the Transaction Layer
removes the ECRC field.

• What’s left is the header and the data (if any), which are then passed up to the Software
Layer of the device. The Software Layer is where the device processes the request or
data that was originally sent.

Non-Posted Transactions in PCIe

A non-posted transaction means that when a request is sent, the sender waits for a response.
This is commonly used in read operations, where a device requests data from memory and
expects a response with that data.

Here’s a breakdown of the process:

62 VLSI TECH WITH ANOUSHKA | PCIe Series

Ordinary Reads (Memory Read Request)

1. Requesting Data:

o An Endpoint (a device like a GPU or network card) wants to read data from the
system memory.

o It sends a Memory Read Request (MRd). The request includes a target

address, which tells the system where the memory is located.

2. Routing the Request:

o The request travels through the PCIe system, passing through Switches that
help route the packet to the correct destination.

o In this example, the Root Complex (which is part of the CPU) recognizes that the
request is for system memory.

3. Fetching the Data:

o The Root Complex reads the memory at the requested address and gathers the
data to be sent back to the Endpoint.

4. Sending the Data Back (Completion Packets):

o The Root Complex sends the data back to the Endpoint in Completion Data
(CplD) packets.

o If a lot of data is requested, multiple completion packets may be sent. For

example, a large data read might require several packets, as PCIe allows a
maximum of 4 KB of data per packet.

How the Data Finds its Way Back

63 VLSI TECH WITH ANOUSHKA | PCIe Series

When the Endpoint made the request, it included its return address in the packet. This address
is a combination of three numbers:

• Bus number

• Device number

• Function number

These three together form the BDF (Bus, Device, Function), which tells the system where to
send the completion packets.

Additionally, the request also had a Tag. Each request is given a unique Tag to help the Endpoint
match the incoming completion packets with the correct request, especially when multiple
requests are being handled simultaneously.

Handling Errors

If an error occurs, the Completer (the device responding to the request) can indicate this by
setting specific bits in the completion status field of the packet. This lets the Endpoint know
something went wrong, though how to handle such errors depends on the software, not the
PCIe specification.

Locked Reads (Special Case)

Locked Memory Reads are used in special cases where a processor needs to ensure that no
other device can access or modify a specific piece of memory while it is performing a critical
operation. This is mainly used for operations like Atomic Read-Modify-Write, which are
essential in tasks such as managing a semaphore (a variable that controls access to a shared
resource).

Here's how Locked Reads work:

64 VLSI TECH WITH ANOUSHKA | PCIe Series

Why Are Locked Reads Needed?

1. Atomic Operations:

o Imagine you have a semaphore (a variable that prevents multiple devices from
using the same resource at the same time).

o When a processor checks the semaphore (like testing if a resource is free) and
decides to change its value (like marking the resource as "in use"), no other
processor should be able to modify the semaphore during this process.

o To ensure this, the processor "locks" the memory location containing the
semaphore. This prevents other devices from accessing or modifying it until the
lock is released.

2. Race Conditions:

o Without locking, two processors could try to modify the semaphore at the same
time, leading to a race condition, which could cause unpredictable results.

o The lock ensures that one processor completes its operation before another one
can access the same memory.

How Locked Reads Work in PCIe

1. Sending a Locked Request (MRdLk):

o A processor (like a CPU) sends a Locked Memory Read Request (MRdLk) to a

specific memory address.

o This request travels through the PCIe system, moving through Switches and
other routing devices, eventually reaching the memory or device where the data
is stored.

2. Locking the Path:

o As the request passes through each routing device (like switches), the egress
port (the port where packets leave the switch) gets locked.

o This means no other packets can pass through that port until the locked
transaction is completed.

3. Completing the Transaction:

o When the memory device receives the locked request, it fetches the data and
sends it back in a Locked Completion (CplDLk) packet.

o The completion packet travels back to the original requester (the CPU), and the
ports unlock as the packet passes through them.

4. Handling Errors:
65 VLSI TECH WITH ANOUSHKA | PCIe Series

o If something goes wrong, like if the data cannot be fetched, the device sends a
Locked Completion without data, indicating an error.

o The status field in this completion packet will tell the requester (CPU) what
went wrong, and the lock will be cancelled. After that, the software will need to
decide how to handle the error (for example, by retrying the request or taking
some other action).

Legacy Support in PCIe

• Locked Reads are mainly a legacy feature carried over from older systems (like PCI)
because they were once used for processor buses.

• New PCIe devices are not required to support locked requests unless they are Legacy
Devices that self-identify as such. So, only certain devices like CPUs or root ports can
initiate locked transactions in modern PCIe systems.

IO and Configuration Writes

In computer systems, I/O (Input/Output) write transactions are a way for a processor to send
data to a specific device (called an Endpoint). These transactions follow specific rules to ensure
the data is delivered and confirmed.

Here’s a step-by-step breakdown of what’s happening:

1. Targeting Legacy Endpoints:

An I/O write can only be sent to a device designed to work with this type of
communication (a "Legacy Endpoint").
66 VLSI TECH WITH ANOUSHKA | PCIe Series

2. Routing the Request:

The data travels through a network of Switches (like a series of roads) based on its I/O
address until it reaches the target device.

3. Acknowledgement of Data:
Once the target device (called the Completer) receives the data, it sends back a
confirmation message (called a completion packet) to the processor.

o This confirmation does not contain any data—just a status field indicating
whether everything went okay or if there was an error.

4. Error Handling:
If an error occurred, the processor's software is responsible for fixing it.

5. Ensuring Delivery Before Proceeding:

The confirmation tells the processor that the data has been successfully delivered. Only
after receiving this confirmation can the processor move to the next instruction.

o This is the key difference between a non-posted write and other types of writes
(like memory writes):

▪ Non-posted write: The processor must wait for confirmation before

continuing.

▪ Memory write: The processor doesn’t wait—it assumes the data will
eventually get there.

6. Why Wait?
Waiting ensures that the next operation (depending on the successful delivery of the
data) doesn’t happen too soon, avoiding errors.

7. Processor-Exclusive Writes:
Only the processor can initiate non-posted writes because it’s closely involved in
coordinating these critical steps.

Posted Writes (Memory Writes) in PCIe

In computer systems, posted writes are a fast and efficient way to send data, typically used for
memory operations. Let’s break it down:
67 VLSI TECH WITH ANOUSHKA | PCIe Series

Key Points About Posted Writes:

1. No Waiting for Feedback:

o When a device (called the Requester) sends data to memory or another device,
it doesn’t wait for a confirmation message (unlike non-posted writes).

o This saves time and bandwidth, making the system faster and more efficient.

2. How It Works:

o The Requester sends the data along with the memory address (this is called the
Memory Write Request or MWr).

o The data packet travels through the system, being forwarded by Switches until it
reaches the destination (called the Completer).

o Once a Switch successfully sends the data to the next step, it considers its job
done and is ready to handle the next transaction.

3. Completion of the Transaction:

o The transaction is considered finished for the Requester as soon as the data is
sent.

o The Completer eventually receives the data and stores it in memory, officially
completing the process.

4. No Error Feedback to Requester:

o Since the Requester doesn’t wait for a response, it won’t know if something goes
wrong during the transaction.

o If there’s an error, the Completer might log it and send a Message to the Root
Complex (a central controller in PCIe) to alert the system software.

5. Why Posted Writes Are Faster:

o No time is wasted waiting for a completion message.

o The links between devices are freed up sooner for other transactions.

Trade-Off: Speed vs. Error Reporting

• Advantage: Posted writes improve system performance by avoiding delays.

• Disadvantage: Errors aren’t directly reported to the Requester, which could make
troubleshooting more complex.

Analogy: Sending a Letter Without Tracking

• Imagine mailing a letter to someone without using tracking or requiring confirmation of

delivery.
68 VLSI TECH WITH ANOUSHKA | PCIe Series

• How it works:

o You drop the letter in the mailbox (Requester sends data).

o Once the mail carrier takes it, you assume it will reach its destination (Requester
doesn’t wait for feedback).

o The recipient eventually receives the letter (Completer finishes the transaction).

• Pros: It’s fast and efficient because you don’t wait for confirmation.

• Cons: If the letter gets lost, you won’t know unless someone informs you later.

Message Writes in PCIe

In addition to memory writes, messages are special transactions in PCIe used to communicate
system events like errors or power management. These messages are flexible:

1. Routing Options: They can be sent:

o Directly to a specific device.

o Broadcast to all devices.

o From a device to the Root Complex.

2. Purpose:

o Messages replace the older "side-band signals" (extra wires for communication)
by using the regular data paths, simplifying the design and reducing hardware
complexity.
69 VLSI TECH WITH ANOUSHKA | PCIe Series

Quality of Service (QoS) in PCIe

Quality of Service (QoS) in PCIe ensures that time-sensitive data, like video or audio streams,
is delivered on time while still supporting less urgent data, like file transfers. Here's how it works
in simple terms:

The Problem:

Imagine a video camera and a file transfer device (like a hard drive) both need to send data to
your computer's memory (DRAM).

• Video Camera Data: Needs to arrive on time, or the video will become choppy or lose
frames.

• File Transfer Data: Doesn’t care much about timing—it just needs to arrive without
errors.

QoS ensures that the video data gets priority so it arrives on time, even when the system is busy.

How QoS Works:

1. Priority Levels (Traffic Class):

o Each packet of data is given a priority level by the software using a 3-bit field
called the Traffic Class (TC).

o Higher numbers mean higher priority.

2. Virtual Channels (VC):

o Each port in the system has multiple "lanes" or buffers to handle different
priority packets.

o Packets are placed into the right buffer based on their Traffic Class.

3. Arbitration Logic:

o When there are multiple packets ready to be sent, the system uses rules
(arbitration logic) to decide which packet to send first based on priority.

4. Port Arbitration:

o In addition to managing buffers, the system also decides which input port gets
access to an output port when multiple ports compete for resources.

o This can be handled by hardware or controlled by software.

5. Guaranteed Service:

o With all these mechanisms in place, the system can guarantee that high-priority
data (like video) gets enough bandwidth and low latency, ensuring smooth
performance.
70 VLSI TECH WITH ANOUSHKA | PCIe Series

Example to Understand QoS:

• Scenario:

o A video camera is streaming live footage to memory.

o A hard drive (SCSI device) is also sending backup data to memory.

• What Happens Without QoS:

o Both devices compete for the same bandwidth.

o If the video doesn’t get enough bandwidth, frames are dropped, and the video
looks choppy.

• What Happens With QoS:

o The video packets are given a higher priority (a higher Traffic Class).

o The system ensures the camera gets enough bandwidth and fast delivery, while
the SCSI data waits if needed.

o The video stays smooth, and the backup still completes—just a bit slower.

Benefits of QoS:

• Prioritization: Ensures critical data gets delivered on time.

• Efficiency: Allows less urgent data to be handled without interfering with time-sensitive
tasks.

• Reliability: Can guarantee specific performance levels for critical operations.

Transaction Ordering (Ensuring Packets Arrive in the Right Order)

What is Transaction Ordering?

In PCIe, when data packets travel through the system, they usually need to arrive in the same
order they were sent. This helps maintain the integrity of data and avoids issues like deadlocks
(when data flow stops entirely) or live-locks (when data keeps circling but never reaches its
destination).

Key Points About Transaction Ordering:

1. Virtual Channels (VCs):

o Packets within the same VC (lane for data) always follow the order they arrived in
unless specific "relaxed ordering" rules apply.

o This keeps things organized and avoids conflicts.

2. Traffic Classes (TCs):

o Packets with the same TC (priority level) are routed in order.

71 VLSI TECH WITH ANOUSHKA | PCIe Series

o Packets with different TCs may not follow the same rules because they don’t
share an ordering relationship (they’re treated independently).

3. Why It Matters:

o Ordering ensures the system operates efficiently without getting stuck or

causing errors.

o For example, if a video packet (high-priority) is sent after a regular data packet,
the video packet might still be processed first due to its priority, but the system
ensures both packets reach their destinations without confusion.

Analogy for Transaction Ordering:

Think of a toll booth with multiple lanes (Virtual Channels).

• Same Lane: Cars in the same lane follow the "first come, first serve" rule (strict
ordering).

• Different Lanes: Cars in separate lanes don’t affect each other’s order—they can move
independently (no ordering relationship).

Flow Control (Preventing Overflows and Ensuring Smooth Data Flow)

What is Flow Control?

Flow control ensures that the system doesn’t send more data than the receiver can handle. If
the receiver’s buffers (storage spaces for incoming data) are full, the transmitter must wait until
there’s space available. This prevents data loss and avoids unnecessary retries.

How It Works:

1. Buffers on the Receiver Side:

o The receiver has buffers (temporary storage) to hold incoming data packets
(Transaction Layer Packets, or TLPs).

o These buffers can fill up if too much data arrives too quickly.

2. Flow Control Updates:

o The receiver constantly updates the transmitter about how much buffer space is
available using Data Link Layer Packets (DLLPs).
72 VLSI TECH WITH ANOUSHKA | PCIe Series

o The transmitter keeps track of this and only sends data when there’s enough
space.

3. Why Use DLLPs for Updates?

o Unlike normal data packets (TLPs), DLLPs are small and can always be sent,
even if the receiver’s buffers are full. This ensures updates about available space
are never delayed.

4. Automatic Management:

o Flow control is handled entirely by hardware, so software doesn’t need to worry

about managing this process.

Analogy for Flow Control:

Imagine a busy restaurant:

• Receiver: The kitchen has a limited number of plates (buffers) to prepare orders.

• Transmitter: The waiter sends orders to the kitchen (data packets).

• Flow Control: The kitchen updates the waiter when it has room for more orders (DLLPs).

• Result: Orders are sent only when there’s space, avoiding chaos in the kitchen.

Benefits of These Mechanisms:

• Transaction Ordering: Prevents data from arriving out of order, maintaining system
reliability.

• Flow Control: Ensures smooth data flow without overloading parts of the system.

Data Link Layer (DLL) in PCIe

The Data Link Layer in PCI Express (PCIe) serves as the middle manager between the physical
layer (hardware-level signals) and the transaction layer (high-level data handling). Its primary
responsibilities include error correction, flow control, and power management.

Key Functions of the Data Link Layer

1. TLP Error Correction:

o What’s a TLP?
Transaction Layer Packets (TLPs) carry the actual data for operations like
memory reads or writes.

o Error Checking Process:

Each TLP is sent with error-detection codes (like LCRC – Link Cyclic Redundancy
Check).
73 VLSI TECH WITH ANOUSHKA | PCIe Series

▪ If the receiver detects an error, it sends a Nak (Negative

Acknowledgement) back to the sender.

▪ The sender replays the TLP from its Replay Buffer until it gets an Ack
(Acknowledgement) confirming successful reception.

2. Flow Control:

o Flow control ensures the sender does not overwhelm the receiver with too much
data.

o The receiver communicates its buffer availability through small DLLPs (Data
Link Layer Packets).

o These DLLPs act like “tickets” that tell the sender, “You’re allowed to send more
data now.”

3. Power Management:

o The Data Link Layer also manages power efficiency by communicating power
state changes through DLLPs.

Data Link Layer Packets (DLLPs)

• Purpose: DLLPs are special, small packets used for control purposes like
acknowledgements (Ack/Nak) and buffer space updates (flow control).

• Characteristics:

o Size: 8 bytes, much smaller than TLPs, reducing communication overhead.

o Travel Scope: They only move between neighboring devices (e.g., from a switch
to an endpoint) and don’t propagate through the entire network.

• Assembly and Disassembly of DLLPs:

o Assembly:

▪ At the transmitter, the DLLP is created and a CRC (error-checking code)

is added.
74 VLSI TECH WITH ANOUSHKA | PCIe Series

▪ It is passed to the physical layer, which attaches additional framing

information and sends it across the link.

o Disassembly:

▪ At the receiver, the physical layer removes framing info and sends the
DLLP to the Data Link Layer.

▪ The Data Link Layer verifies the CRC for errors and takes appropriate
action (e.g., update flow control or retry TLPs).

Ack/Nak Protocol (Error Correction)

This protocol ensures reliable communication by allowing the sender to retry in case of errors.

1. Sending TLPs:

o The sender keeps a copy of each outgoing TLP in its Replay Buffer until it
receives an Ack DLLP confirming successful delivery.

2. Receiving TLPs:

o If the receiver detects no errors, it sends an Ack DLLP back to the sender, which
then deletes the TLP from its Replay Buffer.

o If an error is detected, the receiver sends a Nak DLLP, prompting the sender to
resend the TLPs from the Replay Buffer.

How It All Works in a System

75 VLSI TECH WITH ANOUSHKA | PCIe Series

A DLLP (Data Link Layer Packet) is a small packet of data used for communication within the
Data Link Layer of the PCIe architecture. It performs important management tasks for the link
between two neighboring devices.

Basic Structure:

• 4-Byte DLLP Type Field: This identifies the type of DLLP and can include additional
information depending on the task.

• 2-Byte CRC: This is a "Cyclic Redundancy Check" value added to the packet for error
detection.

Memory Read Example Across a Switch

Let’s go step-by-step through the process of how a memory read (MRd) request flows through a
system with a Requester, a Switch, and a Completer (the device holding the requested data).

Step 1: Requester Sends Memory Read Request

1. Step 1a:

o The Requester creates a memory read request in the form of a Transaction

Layer Packet (TLP).

o It saves a copy of this request in its Replay Buffer (a memory area for
retransmissions if needed).

o The request is sent to the Switch, which checks the packet for errors using the
LCRC (Link Cyclic Redundancy Check) and the sequence number.

2. Step 1b:

o If no error is found, the Switch sends an Ack DLLP (Acknowledgment) back to

the Requester.

o On receiving this Ack, the Requester deletes the saved copy of the request from
its Replay Buffer.

Step 2: Switch Forwards Request to Completer

1. Step 2a:

o The Switch uses the memory address in the request to route the TLP to the
correct output port (Egress Port).

o It saves a copy of the TLP in the Egress Port’s Replay Buffer for possible
retransmission.

o The TLP reaches the Completer, which checks it for errors.

2. Step 2b:
76 VLSI TECH WITH ANOUSHKA | PCIe Series

o If no errors are found, the Completer sends an Ack DLLP back to the Switch.

o The Switch then removes the TLP copy from its Replay Buffer.

Step 3: Completer Responds with Requested Data

1. Step 3a:

o The Completer fetches the requested data from memory.

o It creates a Completion with Data TLP (CplD) containing the data and saves a
copy in its Replay Buffer.

o The CplD is sent to the Switch, which checks for errors.

2. Step 3b:

o If no errors are found, the Switch sends an Ack DLLP to the Completer, which
then deletes the saved copy of the CplD from its Replay Buffer.

Step 4: Switch Forwards Data to Requester

1. Step 4a:

o The Switch uses the Requester ID in the CplD to route the packet to the correct
Egress Port.

o It saves a copy of the CplD in the Egress Port’s Replay Buffer.

o The Requester receives the CplD and checks it for errors.

2. Step 4b:

o If no errors are found, the Requester sends an Ack DLLP back to the Switch.

o The Switch deletes its saved copy of the CplD from the Replay Buffer.

o The Requester checks the optional ECRC (End-to-End CRC) for additional error
detection. If no issues are found, the data is passed to the core logic for further
processing.

Key Features of DLLPs in This Process

• Error Detection and Acknowledgment:

o The Ack/Nak protocol ensures data integrity.

o Ack DLLP confirms successful receipt, and the sender deletes the saved copy of
the packet.

o Nak DLLP signals an error, prompting the sender to retransmit all

unacknowledged packets.
77 VLSI TECH WITH ANOUSHKA | PCIe Series

• Replay Mechanism:

o Copies of TLPs are stored temporarily in Replay Buffers to handle

retransmissions in case of errors.

o Errors are often caused by transient issues and can usually be corrected through
retransmissions.

Flow Control and Power Management

1. Flow Control:

o The Data Link Layer manages how data flows between devices, ensuring smooth
communication.

o It’s initialized automatically after power-up or reset and continuously updated

during operation.

2. Power Management:

o DLLPs are used to manage link and system power states, helping conserve
energy.

o For example, the devices might negotiate to enter low-power states during
periods of inactivity.

What is the Physical Layer?

The Physical Layer is the foundation of the PCIe architecture. Its job is to handle the physical
transmission of data over the link between devices. It consists of two main parts:

1. Logical Physical Layer: Deals with digital logic to prepare packets for transmission and
to process incoming packets.

2. Electrical Physical Layer: Handles the actual analog signaling between devices over
the PCIe lanes.
78 VLSI TECH WITH ANOUSHKA | PCIe Series

How Data Travels Through the Physical Layer

Data from higher layers (TLPs and DLLPs) is passed to the Physical Layer for actual
transmission. Here’s how it works:

Step 1: Adding Framing Characters

• TLPs (Transaction Layer Packets) and DLLPs (Data Link Layer Packets) are first placed
into a buffer in the Physical Layer.

• Framing characters (Start and End characters) are added to each packet to mark its
boundaries. These help the receiver detect the beginning and end of the packet during
transmission.

Example Packet Structure (at the Physical Layer):

• TLP: Start → Header → Payload → LCRC → End

• DLLP: Start → DLLP Type → CRC → End

Step 2: Splitting Data Across Lanes (Byte Striping)

PCIe uses multiple lanes to transmit data simultaneously, like a multi-lane highway. Each byte
of data is:

• Striped: Split into chunks and sent across the available lanes. Each lane operates as an
independent serial path.

• At the receiver end, the bytes are reassembled into their original order.

Step 3: Scrambling for EMI Reduction

• To avoid repetitive patterns in the data, which could cause electromagnetic

interference (EMI), the bytes are scrambled. Scrambling randomizes the data but
ensures that it can still be unscrambled correctly at the receiver.

Step 4: Encoding Data (8b/10b vs. 128b/130b)

79 VLSI TECH WITH ANOUSHKA | PCIe Series

Encoding is the process of converting data into a format suitable for transmission.

• Gen1 and Gen2 (8b/10b Encoding):

o Each 8-bit byte is converted into a 10-bit symbol.

o This adds some overhead but provides benefits like error detection and ensuring
enough transitions for clock synchronization.

• Gen3 and Beyond (128b/130b Encoding):

o Uses a more efficient encoding scheme where 128 bits of data are packed into
130 bits.

o This reduces overhead compared to 8b/10b and increases efficiency for higher
speeds.

Step 5: Serialization and Transmission

• After encoding, the data is serialized: converted into a continuous stream of bits.

• It is then transmitted differentially over the lanes at very high speeds:

o Gen1: 2.5 GT/s (Giga-transfers per second)

o Gen2: 5 GT/s

o Gen3: 8 GT/s

How the Receiver Handles Data

1. Clock Recovery and Deserialization:

o The receiver’s clock synchronizes with the incoming data stream.

o The serial bit stream is converted back into a parallel stream (bytes) using a
deserializer.

2. Elastic Buffer:

o The data passes through an elastic buffer, which compensates for small timing
differences between the sender and receiver clocks.

3. Decoding:

o For Gen1 and Gen2, the 10-bit symbols are decoded back into 8-bit characters
using an 8b/10b decoder.

o For Gen3, this step is skipped since it uses 128b/130b encoding.

4. Descrambling:

o The scrambled bytes are descrambled to retrieve the original data.

5. Byte Unstriping:
80 VLSI TECH WITH ANOUSHKA | PCIe Series

o Bytes received across multiple lanes are reassembled into a single, ordered data
stream.

6. Delivering Data:

o The final reconstructed data is sent up to the Data Link Layer for further
processing.

Electrical Physical Layer

The Electrical Physical Layer is responsible for the actual signaling:

• Differential Signaling:

o Each lane uses a pair of wires (one positive, one negative) to transmit data. This
helps reduce noise and improves signal integrity.

• Analog Components:

o Differential drivers and receivers handle the electrical signals.

Key Terms and Concepts

1. Framing Characters:

o Special characters that mark the start and end of a packet.

2. Byte Striping:

o Splitting data across multiple lanes for parallel transmission.

3. 8b/10b Encoding:

o Adds overhead but ensures robust communication in Gen1/Gen2.

4. 128b/130b Encoding:

o A more efficient encoding used in Gen3 and later.

5. Elastic Buffer:

o A mechanism to handle slight differences in clock speeds between devices.

Link Training and Initialization: Simplified Explanation

The Link Training and Initialization process is like setting up a handshake between two PCIe
devices before they start exchanging data. It ensures that both devices agree on how they will
communicate, such as the number of lanes, speed, and how the physical connection is
configured. This process is fully automatic and involves several key steps.

Key Steps in Link Training

1. Determining Link Width

81 VLSI TECH WITH ANOUSHKA | PCIe Series

• What is Link Width?

o PCIe links can consist of 1 to 32 lanes, where each lane can transmit data
independently.

o During initialization, the devices negotiate how many lanes will be used for the
connection (e.g., x1, x4, x8, x16).

2. Setting Link Data Rate

• What is Link Data Rate?

o PCIe supports multiple speeds depending on the generation (e.g., Gen1: 2.5
GT/s, Gen2: 5 GT/s, Gen3: 8 GT/s).

o The training process determines the fastest speed both devices can support
reliably.

3. Handling Lane Reversal

• If lanes are connected in reverse order (e.g., Lane 1 is connected to Lane 4), the training
process detects and corrects it automatically.

4. Polarity Inversion

• If the positive and negative signals of a lane are swapped during connection, the training
process adjusts for it.

5. Bit Lock (Clock Recovery)

• The receiver synchronizes with the transmitter’s clock to accurately recover the data
being sent.

6. Symbol Lock

• The receiver identifies patterns in the data stream (symbols) to determine how data is
organized.

7. Lane-to-Lane De-skew

• For multi-lane links, data traveling through different lanes may arrive slightly out of sync
due to variations in distance or signal timing. De-skewing realigns this data to ensure
everything is synchronized.
82 VLSI TECH WITH ANOUSHKA | PCIe Series

Physical Layer – Electrical

AC-Coupled Connections

• What is AC-Coupling?

o The connection between the transmitter and receiver uses a capacitor in the
signal path.

o This blocks low-frequency (DC) signals and allows high-frequency (AC) signals
to pass through.

• Why AC-Coupling?

o It allows the transmitter and receiver to have different reference voltages. For
example, this is useful when the devices are far apart or in different
environments.

Impedance Matching

• The link maintains specific impedances to ensure signal integrity:

o 50 Ohms: DC common-mode impedance.

o 100 Ohms: Differential impedance (between the positive and negative signal
lines).

• Coupling capacitors typically range from 75-200 nF.

83 VLSI TECH WITH ANOUSHKA | PCIe Series

Ordered Sets: What Are They?

Ordered Sets are special patterns of characters used by the Physical Layer for specific
purposes. They are not traditional data packets and do not have Start or End characters.

Uses of Ordered Sets

1. Link Training:

o Help synchronize the transmitter and receiver during initialization.

2. Clock Tolerance Compensation:

o Adjust for slight differences in the clocks of the two devices.

3. Power State Changes:

o Indicate when the link is entering or exiting a low-power state.

Structure of Ordered Sets

• In Gen1 and Gen2, an Ordered Set begins with a special COM character followed by
three or more additional characters.

• Ordered Sets are always a multiple of 4 bytes in size.

• In Gen3 and later, the format of Ordered Sets changes to accommodate faster speeds
and new requirements.

Summary of Link Training Process

The Link Training process is crucial to ensure the PCIe connection works efficiently and reliably:

• It automatically negotiates the best lane width and speed for the link.
84 VLSI TECH WITH ANOUSHKA | PCIe Series

• It handles signal alignment issues, such as polarity inversion, lane reversal, and de-
skewing.

• Ordered Sets play a vital role in this process, ensuring synchronization, clock
adjustments, and power management.

Why Is This Important?

Without proper Link Training and Initialization, communication between PCIe devices would be
error-prone or might not work at all. This process ensures that the physical link is stable,
optimized, and ready for data transmission.

1. Memory Read Request:

This is the first phase where a device (the Requester) sends a request to another device (the
Completer) to read some data from memory.

Step-by-step Process:

• Requester’s Device Core or Software Layer: This part prepares the request, including
the address of the memory to be read, the transaction type (what kind of request it is),
the data size (how much data to read), and additional information like traffic class and
byte enables (to define which bytes of the data are important).

• Transaction Layer: The Transaction Layer then builds a Memory Read Request (MRd)
Transaction Layer Packet (TLP) using this information. This packet can be 3 or 4 Double
Words (DW) long, depending on whether the address is 32-bit or 64-bit. It also includes
the Requester ID (a unique identifier for the device making the request).

• Flow Control: Before the packet is sent, the Flow Control Logic ensures that there’s
enough space at the destination device to receive the packet. Only when there’s enough
room, the packet proceeds to the next layer.
85 VLSI TECH WITH ANOUSHKA | PCIe Series

• Data Link Layer: Here, the TLP gets a Sequence Number and a LCRC (a kind of
checksum to check for errors). A copy of the TLP with these added is stored in the
Replay Buffer.

• Physical Layer: In the Physical Layer, the TLP is prepared for transmission. It’s
converted into serial data (so it can be sent over the physical link), scrambled to prevent
interference, and encoded (using 8b/10b encoding) before being sent across the link.

• Receiver Side (Completer): The Completer receives the data, de-serializes it (turns the
serial data back into parallel form), and passes it through an elastic buffer (to deal with
timing differences between the devices). After decoding the data and removing the
start/end markers, it sends the TLP to its own Data Link Layer.

• Data Link Layer (Completer): This layer checks if there are any errors in the received
packet. If everything is okay, the Data Link Layer creates an Acknowledgment (Ack)
with the same Sequence Number and sends it back to the Requester, confirming the
packet was received.

• Requester’s Data Link Layer: Upon receiving the Ack, the Requester’s Data Link Layer
checks that the CRC (error check) is valid. If it is, the TLP is removed from the Replay
Buffer, meaning the request has been successfully acknowledged.

2. Completion with Data:

This phase happens after the memory read request has been processed by the Completer and
it’s time to send the requested data back to the Requester.

Step-by-step Process:

• Completer’s Transaction Layer: The Completer creates a Completion with Data

(CplD) TLP, which includes the requested data, a Requester ID (so it knows who to
86 VLSI TECH WITH ANOUSHKA | PCIe Series

send the data back to), and additional information like the transaction type and status of
the request. This TLP is then sent to the Data Link Layer.

• Flow Control: Just like in the request phase, the Flow Control logic ensures there is
space to send the completion packet before it’s transmitted.

• Data Link Layer (Completer): The Data Link Layer adds a Sequence Number and
LCRC to the TLP, stores a copy in the Replay Buffer, and then sends it to the Physical
Layer.

• Physical Layer (Completer): The packet goes through the same process as in the
request phase—adding Start and End characters, scrambling the data, encoding it, and
then serializing it for transmission.

• Requester Side: The Requester receives the CplD TLP on the Physical Layer, de-
serializes it, decodes it, and removes the Start/End characters. It then sends the data to
the Data Link Layer.

• Data Link Layer (Requester): The Data Link Layer checks for errors in the received
packet. If everything is okay, it sends an Ack DLLP back to the Completer to confirm the
packet was received.

• Completion Confirmation: If the Ack is valid, the Requester processes the completion
data and forwards it to its Software Layer, completing the process. If there were errors,
the Requester may ask for the data to be resent.

Key Points:

• Transaction Layer: This layer is responsible for creating the memory read request and
the completion response, using TLPs to send data between devices.

• Data Link Layer: Adds extra reliability to the process by adding Sequence Numbers
and CRC checks to the packets, and ensuring proper flow control.

• Physical Layer: Handles the actual transmission of data over the physical link,
including converting data to a form suitable for transmission, such as serializing and
encoding it.

This process, from Memory Read Request to Completion with Data, ensures that devices in a
PCIe system can communicate efficiently and reliably, even over high-speed links.

PCIe Training PDF
83% (6)
PCIe Training PDF
133 pages
Packet Squirrel Basics - Hak5
0% (2)
Packet Squirrel Basics - Hak5
3 pages
Multiple Choice Quiz 2
0% (1)
Multiple Choice Quiz 2
5 pages
PCIe
No ratings yet
PCIe
20 pages
pCIENEW
No ratings yet
pCIENEW
8 pages
Peripheral Component Interface
No ratings yet
Peripheral Component Interface
153 pages
PCIe BUS a State of the Art Review
No ratings yet
PCIe BUS a State of the Art Review
5 pages
2.1. Overview of PCI Express Bus
No ratings yet
2.1. Overview of PCI Express Bus
19 pages
PCI
No ratings yet
PCI
26 pages
Pci e
No ratings yet
Pci e
19 pages
Interfacing Processors and Peripherals: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Interfacing Processors and Peripherals: CS151B/EE M116C Computer Systems Architecture
31 pages
Computer System Organization
No ratings yet
Computer System Organization
13 pages
Pci Express
No ratings yet
Pci Express
43 pages
White Paper: Pci Express Technology
100% (1)
White Paper: Pci Express Technology
11 pages
Pciexpress Overview
No ratings yet
Pciexpress Overview
46 pages
Index
No ratings yet
Index
20 pages
Pciee
No ratings yet
Pciee
5 pages
PCIe Basics
No ratings yet
PCIe Basics
34 pages
Lecture 3 Cont. Top Level View of Computer Function and Interconnection
No ratings yet
Lecture 3 Cont. Top Level View of Computer Function and Interconnection
31 pages
Lecture Notes of Week 4-5
No ratings yet
Lecture Notes of Week 4-5
28 pages
Pci Local Bus: Peripheral Component Interconnect (PCI)
100% (1)
Pci Local Bus: Peripheral Component Interconnect (PCI)
41 pages
PCIe
100% (3)
PCIe
93 pages
Peripheral Component Interconnect (Pci) Bus
100% (1)
Peripheral Component Interconnect (Pci) Bus
19 pages
PCI Express White Paper
No ratings yet
PCI Express White Paper
8 pages
EE6304 Lecture14 Buses (1) 1
No ratings yet
EE6304 Lecture14 Buses (1) 1
63 pages
Pci
No ratings yet
Pci
6 pages
PCI Express Basics & Background: Richard Solomon Synopsys
100% (1)
PCI Express Basics & Background: Richard Solomon Synopsys
45 pages
Embedded System On Pci: Abstract
No ratings yet
Embedded System On Pci: Abstract
12 pages
PCI Bus-by-Lauren-Greenfield-Matt-Pozun-Lindsay-Stenger-Olivia-Ting-2004-Spring
No ratings yet
PCI Bus-by-Lauren-Greenfield-Matt-Pozun-Lindsay-Stenger-Olivia-Ting-2004-Spring
17 pages
Computer Interfacinglecture3
No ratings yet
Computer Interfacinglecture3
20 pages
Pci Bus
No ratings yet
Pci Bus
17 pages
Improving I/O Bandwidth Technologies Pci, Pci-Express, and Infiniband Technologies
No ratings yet
Improving I/O Bandwidth Technologies Pci, Pci-Express, and Infiniband Technologies
8 pages
Notes:-Different Bus Standards
No ratings yet
Notes:-Different Bus Standards
31 pages
PCIE Protocol
No ratings yet
PCIE Protocol
29 pages
Hardware Implementation of PCI Interface Using Verilog & FPGA
No ratings yet
Hardware Implementation of PCI Interface Using Verilog & FPGA
5 pages
PCI Express Basics
No ratings yet
PCI Express Basics
34 pages
Chapter7_2
No ratings yet
Chapter7_2
23 pages
6-Buses-Part 2
No ratings yet
6-Buses-Part 2
36 pages
Asynchronous Buses: Handshake Protocol Between A Master and A Slave Device
No ratings yet
Asynchronous Buses: Handshake Protocol Between A Master and A Slave Device
20 pages
COmputer Booklet
No ratings yet
COmputer Booklet
7 pages
Lecture 8 - IO Buses
No ratings yet
Lecture 8 - IO Buses
70 pages
03 29 PCI Express Basics
No ratings yet
03 29 PCI Express Basics
40 pages
Pci CDRW TFT
No ratings yet
Pci CDRW TFT
3 pages
How PCI Works: Hop On The Bus, Gus
No ratings yet
How PCI Works: Hop On The Bus, Gus
7 pages
The PCI Bus
No ratings yet
The PCI Bus
25 pages
Io Buses
No ratings yet
Io Buses
56 pages
Standard I/O Interfances Reference: Call Hamacher, "Computer Organization"
No ratings yet
Standard I/O Interfances Reference: Call Hamacher, "Computer Organization"
10 pages
Eripheral Omponent Nterconnect LOCAL BUS: Hardware Technology Development Group, C-DAC, Pune, India
No ratings yet
Eripheral Omponent Nterconnect LOCAL BUS: Hardware Technology Development Group, C-DAC, Pune, India
69 pages
Buses
No ratings yet
Buses
4 pages
PCIe
No ratings yet
PCIe
5 pages
PCI Express Wiki
No ratings yet
PCI Express Wiki
19 pages
Done-Design and Verification of Flexible Interface For Multicore System Using PCIe IO Virtualization2016
No ratings yet
Done-Design and Verification of Flexible Interface For Multicore System Using PCIe IO Virtualization2016
5 pages
Computer IO Buses and Interfaces
No ratings yet
Computer IO Buses and Interfaces
28 pages
Parallel Bus Device Protocols: Parallel Communication Network Using ISA/EISA PCI/PCI-X Advanced Buses
100% (1)
Parallel Bus Device Protocols: Parallel Communication Network Using ISA/EISA PCI/PCI-X Advanced Buses
14 pages
Introduction To The PCI Interface: Karumanchi Narasimha Naidu Instructor: Prof. Girish P. Saraph
No ratings yet
Introduction To The PCI Interface: Karumanchi Narasimha Naidu Instructor: Prof. Girish P. Saraph
52 pages
PCI Bus Architecture: by S.Senthilmurugan Asst - Professor/ICE SRM University. Chennai
No ratings yet
PCI Bus Architecture: by S.Senthilmurugan Asst - Professor/ICE SRM University. Chennai
17 pages
Week 4:: Computer Interconnection Structures, Bus Interconnection, PCI
No ratings yet
Week 4:: Computer Interconnection Structures, Bus Interconnection, PCI
16 pages
PCI Express Evolution
From Everand
PCI Express Evolution
Mei Gates
No ratings yet
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
The complete guide to Hardware Technician Terminology: A simplified guide
From Everand
The complete guide to Hardware Technician Terminology: A simplified guide
Sumitra Kumari
No ratings yet
Essential Guide to CAN Communication
From Everand
Essential Guide to CAN Communication
Engineer's Essentials
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Hadi Abdullah: I.T. Professional
No ratings yet
Hadi Abdullah: I.T. Professional
3 pages
CSE CPL Manual-2016 PDF
No ratings yet
CSE CPL Manual-2016 PDF
79 pages
Advanced Packaging Trends
100% (1)
Advanced Packaging Trends
36 pages
03 - Overview of Telecom Industry in Indonesia
No ratings yet
03 - Overview of Telecom Industry in Indonesia
28 pages
ZTE GPON Low Level Design 20082014
No ratings yet
ZTE GPON Low Level Design 20082014
26 pages
(Tutorial) Android-Host LocalMultiplayer Using PortableWifiHotspot
No ratings yet
(Tutorial) Android-Host LocalMultiplayer Using PortableWifiHotspot
9 pages
Embedded Systems Embedded Processors-1
No ratings yet
Embedded Systems Embedded Processors-1
7 pages
Location Map Using Flutter
No ratings yet
Location Map Using Flutter
16 pages
Streamcube: Audio Streaming Over Ip
No ratings yet
Streamcube: Audio Streaming Over Ip
2 pages
Extract Paragraphs From PDF
No ratings yet
Extract Paragraphs From PDF
2 pages
Analysis of Low Power High Speed Design of Multipliers in CMOS Technologies
No ratings yet
Analysis of Low Power High Speed Design of Multipliers in CMOS Technologies
4 pages
4800 Baud Modem Daughter Board PDF
No ratings yet
4800 Baud Modem Daughter Board PDF
12 pages
Data Communication and Networking Worksheet Final Correct A
No ratings yet
Data Communication and Networking Worksheet Final Correct A
16 pages
SMS Call Flow
No ratings yet
SMS Call Flow
6 pages
Imabalacat Docu
No ratings yet
Imabalacat Docu
114 pages
Renungan Harian Air Hidupp
No ratings yet
Renungan Harian Air Hidupp
269 pages
BE1-11g Generator Protection System
No ratings yet
BE1-11g Generator Protection System
21 pages
Build vs. Buy: Making The Right Choice: For A Great Data Product
No ratings yet
Build vs. Buy: Making The Right Choice: For A Great Data Product
56 pages
Study of GMSK New
No ratings yet
Study of GMSK New
3 pages
Free YouTube To MP3 Converter Download YouTube
No ratings yet
Free YouTube To MP3 Converter Download YouTube
1 page
1.1 Project Summary 1.2 Scope 1.3 Objective 1.4 Project Overview 1.5 Technology and Literature Review
No ratings yet
1.1 Project Summary 1.2 Scope 1.3 Objective 1.4 Project Overview 1.5 Technology and Literature Review
18 pages
01-Ethernet Port Commands
No ratings yet
01-Ethernet Port Commands
33 pages
OV7670 300KP: Software UART Between All The Modules
No ratings yet
OV7670 300KP: Software UART Between All The Modules
4 pages
5548c Ssu E60 Bo
No ratings yet
5548c Ssu E60 Bo
67 pages
IP8332 Network Bullet Camera H.264, Outdoor, Day & Night 2.01 Manufacturer
No ratings yet
IP8332 Network Bullet Camera H.264, Outdoor, Day & Night 2.01 Manufacturer
3 pages
Introduction To Active Directory
60% (10)
Introduction To Active Directory
39 pages
Chapter 01 Introduction
No ratings yet
Chapter 01 Introduction
45 pages
EB8000 User Manual
No ratings yet
EB8000 User Manual
685 pages

PCIe

Uploaded by

PCIe

Uploaded by

A comprehensive overview of PCIe technology

• Companies tried to improve it with buses like IBM’s MCA (Micro-Channel

How PCI Changed Everything:

o Faster than older buses like ISA.

o Allowed software to easily manage device memory and resources.

o Simple plug-and-play setup.

o Became the standard bus for connecting peripherals in computers.

PCI-X: The Next Step

• PCI-X 2.0 boosted speeds even more, reaching up to 4 GB/s.

The Limits of PCI-X:

Key Points About PCI:

PCI System Example:

PCI Bus Cycle (How It Works):

• • Bus Capacity: PCI can theoretically support up to 32 devices on a single bus.

what is meant by parallel, what is serial?

1. Programmed I/O (PIO):

2. Direct Memory Access (DMA):

PCI Bus Arbitration:

PCI Retry and Disconnect Protocols:

PCI Retry Protocol

PCI Disconnect Protocol

PCI Interrupt Handling

PCI Error Handling

1. Data Phase Parity Errors:

PCI Address Space Map

The PCI architecture supports three types of address spaces:

1. Memory Address Space:

The total configuration space available in a system is calculated as:

Total configuration space=256 Bytes/function×8 functions/device×32 devices/bus×256 buses

Need for PCIe

Anything that is connected externally.

Earlier RS232, protocol was used

What was the problem with that?

➔ To connect a new device we need to restart the system.

Introduction to PCI Express (PCIe)

What Makes PCIe Different?

Links and Lanes

The path between two PCIe devices is called a Link,

Backward Compatibility with PCI Software

Serial Transport in PCIe

The Need for Speed

Overcoming Parallel Bus Problems

Three major problems in parallel buses are:

Serial Model Advantages

• Clock Skew: There’s no external reference clock, so clock skew is irrelevant.

• Generation 2 (Gen2) doubles this to 1.0 GB/s per Lane.

Why No Common Clock?

Challenges & Solutions:

Deserialization & Low Power Modes:

This clock recovery and packet-based communication architecture significantly simplifies

PCIe Topology Overview:

1. CPU and Root Complex:

o Forward Bridge: Allows old PCI/PCI-X devices to connect to newer PCIe

Key Characteristics of the Topology:

In PCle, all terms are discussed w.r.t root complex.

PCIe Gen 1,Gen 2,Gen 3 Throughput

- Per Lane : 5GT/s *1 lane = 5Gb/s => 5 Gb/s * 8b/10b = 500MB/s

- Per Lane : 8GT/s * 1 lane = 8Gb/s => 8Gb/s * 128b/130b ~= 1000MB/s~=1GB/s

Gen 4 will give highest throughput of 128GB/s

Gen 5 will give highest throughput of 256GB/s

PCIe Configuration headers :

➔ This entire information is implemented by means of configuration headers.

PCI configuration header = 256 bytes

PCIe configuration header = up to 4K bytes

- 256 bytes is compulsory

Memory mapped and IO mapped

PCI and PCI-x : All peripherals are IO mapped

PCIe : all peripherals memory and configuration header is memory mapped

PCI devices implements configuration space using Configuration headers

➢ Extended configuration space for devices.

What is Configuration Address Space in PCIe?

system where each device lives in memory). This ensures smooth

• A device that supports multiple functions is called a multi-function device.

• Each Device can implement up to eight functions. Functions represent specific

• Functions do not need to be sequential, meaning a device could implement Function 0,

What is PCIe and BDF?

- Per Lane : 5GT/s 1 lane = 5Gb/s => 5 Gb/s 8b/10b = 500MB/s