Hyper Transport Technology: The I/O Bandwidth Problem
Hyper Transport Technology: The I/O Bandwidth Problem
com
INTRODUCTION
Hyper Transport technology is a very fast, low latency, point-to-point link used for inter-
connecting integrated circuits on board. Hyper Transport, previously codenamed as Lightning Data
Transport (LDT), provides the bandwidth and flexibility critical for today's networking and computing
platforms while retaining the fundamental programming model of PCI. Hyper Transport was invented by
AMD and perfected with the help of several partners throughout the industry.
While microprocessor performance continues to double every eighteen months, the performance
of the I/O bus architecture has lagged, doubling in performance approximately every three years. This I/O
bottleneck constrains system performance, resulting in diminished actual performance. Over the past 20
years, a number of legacy buses, such as ISA, VL-Bus, AGP, LPC, PCI-32/33, and PCI-X, have emerged
that must be bridged together to support a varying array of devices. Servers and workstations require
multiple high-speed buses, including PCI-64/66, AGP Pro, and SNA buses like InfiniBand. The hodge-
podge of buses increases system complexity, adds many transistors devoted to bus arbitration and bridge
logic, while delivering less than optimal performance.
A number of new technologies are responsible for the increasing demand for additional bandwidth.
High-resolution, texture-mapped 3D graphics and high-definition streaming video are escalating
bandwidth needs between CPUs and graphics processors.
Technologies like high-speed networking (Gigabit Ethernet, InfiniBand, etc.) and wireless
communications (Bluetooth) are allowing more devices to exchange growing amounts of data at
rapidly increasing speeds.
Software technologies are evolving, resulting in breakthrough methods of utilizing multiple
system processors. As processor speeds rise, so will the need for very fast, high-volume inter-
processor data traffic.
While these new technologies quickly exceed the capabilities of today’s PCI bus, existing interface
functions like MP3 audio, v.90 modems, USB, 1394, and 10/100 Ethernet are left to compete for the
remaining bandwidth. These functions are now commonly integrated into core logic products.
Higher integration is increasing the number of pins needed to bring these multiple buses into and
out of the chip packages. Nearly all of these existing buses are single- ended, requiring additional power
and ground pins to provide sufficient current return paths. Reducing pin count helps system designers to
reduce power consumption and meet thermal requirements.
In response to these problems, AMD began developing the HyperTransport™ I/O link architecture
in 1997. Hyper Transport technology has been designed to provide system architects with significantly
more bandwidth, low-latency responses, lower pin counts, compatibility with legacy PC buses, extensibility
to new SNA buses, and transparency to operating system software, with little impact on peripheral drivers.
Hyper Transport technology, formerly codenamed Lightning Data Transfer (LDT), was developed
at AMD with the help of industry partners to provide a high-speed, high performance, point-to-point link
for inter -connecting integrated circuits on a board. With a top signaling rate of 1.6 GHz on each wire pair,
a Hyper Transport technology link can support a peak aggregate bandwidth of 12.8 Gbytes/s. The Hyper
Transport specification provides both link- and system-level power management capabilities optimized for
processors and other system devices. Hyper Transport technology is targeted at networking ,
telecommunications , computer and high performance embedded applications and any other application in
which high speed, low latency, and scalability is necessary.
In developing HyperTransport technology, the architects of the technology considered the design
goals presented in this section. They wanted to develop a new I/O protocol for “in-the-box” I/O
connectivity that would:
Improve system performance
Provide increased I/O bandwidth
Reduce data bottlenecks by moving slower devices out of critical information paths
Reduce the number of buses within the system
Ensure low latency responses
Reduce power consumption
Simplify system design
Use a common protocol for “in-chassis” connections to I/O and processors
Use as few pins as possible to allow smaller packages and to reduce cost
Increase I/O flexibility
Provide a modular bridge architecture
Allow for differing upstream and downstream bandwidth requirements
Maintain compatibility with legacy systems
Complement standard external buses
Have little or no impact on existing operating systems and drivers
Ensure extensibility to new system network architecture (SNA) buses
Provide highly scalable multiprocessing systems
Flexible I/O Architecture
The resulting protocol defines a high-performance and scalable interconnect between CPU,
memory, and I/O devices. Conceptually, the architecture of the HyperTransport I/O link can be mapped
into five different layers, which structure is similar to the Open System Interconnection (OSI) reference
model.
In HyperTransport technology:
The physical layer defines the physical and electrical characteristics of the protocol.This layer
interfaces to the physical world and includes data, control, and clock lines.
The data link layer includes the initialization and configuration sequence, periodic cyclic
redundancy check (CRC), disconnect or reconnect sequence, information packets for flow control
and error management, and doubleword framing for other packets.
The protocol layer includes the commands, the virtual channels in which they run, and the
ordering rules that govern their flow.
The transaction layer uses the elements provided by the protocol layer to perform actions, such
as reads and writes.
The session layer includes rules for negotiating power management state changes, as well as
interrupt and system management activities.
Device Configurations
Technical Overview
Physical Layer
Each HyperTransport link consists of two point-to-point unidirectional data paths, as illustrated in Figure.
Data path widths of 2, 4, 8, and 16 bits can be implemented either upstream or downstream,
depending on the device-specific bandwidth requirements.
Commands, addresses, and data (CAD) all use the same set of wires for signaling, dramatically
reducing pin requirements.
All HyperTransport technology commands, addresses, and data travel in packets. All packets are
multiples of four bytes (32 bits) in length. If the link uses data paths narrower than 32 bits, successive
bit-times are used to complete the packet transfers. The Hyper Transport link was specifically designed to
deliver a high-performance and scalable interconnect between CPU, memory, and I/O devices, while using
as few pins as possible.
To achieve very high data rates, the Hyper Transport link uses low-swing differential signaling
with on-die differential termination.
To achieve scalable bandwidth, the Hyper Transport link permits seamless scalability of both
frequency and data width.
The designers of HyperTransport technology wanted to use as few pins as possible to enable
smaller packages, reduced power consumption, and better thermal characteristics, while reducing total
system cost. This goal is accomplished by using separate unidirectional data paths and very low-voltage
differential signaling.
The signals used in Hyper Transport technology are summarized in Table given below
Commands, addresses, and data (CAD) all share the same bits.
Each data path includes a Control (CTL) signal and one or more Clock (CLK) signals.
The CTL signal differentiates commands and addresses from data packets.
For every grouping of eight bits or less within the data path, there is a forwarded CLK signal.
Clock forwarding reduces clock skew between the reference clock signal and the signals
traveling on the link. Multiple forwarded clocks limit the number of signals that must be
routed closely in wider Hyper Transport links.
For most signals, there are two pins per bit.
In addition to CAD, Clock, Control, VLDT power, and ground pins, each Hyper Transport device
has Power OK (PWROK) and Reset (RESET#) pins. These pins are single-ended because of their
low-frequency use.
Devices that implement Hyper Transport technology for use in lower power applications such as
notebook computers should also implement Stop (LDTSTOP#) and Request (LDTREQ#). These
power management signals are used to enter and exit low-power states.
The signaling technology used in HyperTransport technology is a type of low voltage differential
signaling (LVDS ). However, it is not the conventional IEEE LVDS standard. It is an enhanced LVDS
technique developed to evolve with the performance of future process technologies. This is designed to
help ensure that the Hyper Transport technology
standard has a long lifespan. LVDS has been widely used in these types of applications because it requires
fewer pins and wires. This is also designed to reduce cost and power requirements because the
transceivers are built into the controller chips.
Hyper Transport technology uses low-voltage differential signaling with differential impedance
(ZOD) of 100 ohms for CAD, Clock, and Control signals, as illustrated in Figure. Characteristic line
impedance is 60 ohms. The driver supply voltage is 1.2 volts, instead of the conventional 2.5 volts for
standard LVDS. Differential signaling and the chosen impedance provide a robust signaling system for use
on low-cost printed circuit boards. Common four-layer PCB materials with specified di-electric, trace, and
space dimensions and tolerances or controlled impedance boards are sufficient to implement a Hyper
Transport I/O link. The differential signaling permits trace lengths up to 24 inches for 800 Mbit/s
operation.
At first glance, the signaling used to implement a Hyper Transport I/O link would seem to increase pin
counts because it requires two pins per bit and uses separate upstream and downstream data paths.
However, the increase in signal pins is offset by two factors:
By using separate data paths, Hyper Transport I/O links are designed to operate at much higher
frequencies than existing bus architectures. This means that buses delivering equivalent or better
bandwidth can be implemented using fewer signals.
Differential signaling provides a return current path for each signal, greatly reducing the number
of power and ground pins required in each package.
Commands, addresses, and data traveling on a HyperTransport link are double pumped, where
transfers take place on both the rising and falling edges of the clock signal. For example, if the link clock is
800 MHz, the data rate is 1600 MHz.
An implementation of HyperTransport links with 16 CAD bits in each direction with a 1.6-GHz
data rate provides bandwidth of 3.2 Gigabytes per second in each direction, for an aggregate
peak bandwidth of 6.4 Gbytes/s, or 48 times the peak bandwidth of a 33-MHz PCI bus.
A low-cost, low-power HyperTransport link using two CAD bits in each direction and clocked at
400 MHz provides 200 Mbytes/s of bandwidth in each direction, or nearly four times the peak
bandwidth of PCI 32/33.
The data link layer includes the initialization and configuration sequence, periodic cyclic
redundancy check (CRC), disconnect/reconnect sequence, information packets for flow control and error
management, and double word framing for other packets.
Initialization
HyperTransport technology-enabled devices with transmitter and receiver links of equal width can
be easily and directly connected. Devices with asymmetric data paths can also be linked together easily.
Extra receiver pins are tied to logic 0, while extra transmitter pins are left open. During power-up, when
RESET# is asserted and the Control signal is at logic 0, each device transmits a bit pattern indicating the
width of its receiver. Logic within each device determines the maximum safe width for its transmitter.
While this may be narrower than the optimal width, it provides reliable
Communications between devices until configuration software can optimize the link to the widest common
width.
For applications that typically send the bulk of the data in one direction, component vendors can
save costs by implementing a wide path for the majority of the traffic and a narrow path in the lesser used
direction. Devices are not required to implement equal width upstream and downstream links.
The protocol layer includes the commands, the virtual channels in which they run, and the
ordering rules that govern their flow. The transaction layer uses the elements provided by the protocol
layer to perform actions, such as read request and responses.
Commands
All HyperTransport technology commands are either four or eight bytes long and begin with a 6-bit
command type field. The most commonly used commands are Read Request, Read Response, and Write.
A virtual channel contains requests or responses with the same ordering priority.
When the command requires an address, the last byte of the command is concatenated with an additional
four bytes to create a 40-bit address.
Data Packets
A Write command or a Read Response command is followed by data packets. Data packets are
four to 64 bytes long in four-byte increments. Transfers of less than four bytes are padded to the four-
byte minimum. Byte granularity reads and writes are supported with a four-byte mask field preceding the
data. This is useful when transferring data to or from graphics frame buffers where the application should
only affect certain bytes that may correspond to one primary color or other characteristics of the displayed
pixels. A control bit in the command indicates whether the writes are byte or doubleword granularity.
Address Mapping
Reads and writes to PCI I/O space are mapped into a separate address range, eliminating the
need for separate memory and I/O control lines or control bits in read and write commands.
Additional address ranges are used for in-band signaling of interrupts and system management
messages. A device signaling an interrupt performs a byte-granularity write command targeted at the
reserved address space. The host bridge is responsible for delivery of the interrupt to the internal target.
Communications between the HyperTransport host bridge and other HyperTransport technology-
enabled devices use the concept of streams. A HyperTransport link can handle multiple streams between
devices simultaneously. HyperTransport technology devices are daisy-chained, so that some streams
may be passed through one node to the next.
Packets are identified as belonging to a stream by the Unit ID field in the packet header. There
can be up to 32 unique IDs within a Hyper Transport chain. Nodes within a HyperTransport chain may
contain multiple units.It is the responsibility of each node to determine if information sent to it is targeted
at a device within it. If not, the information is passed through to the next node. If a device is located at
the end of the chain and it is not the target device, an error response is passed back to the host bridge.
Commands and responses sent from the host bridge have a Unit ID of zero. Commands and
responses sent from other HyperTransport technology devices on the chain have their own unique ID.
Ordering Rules
Within streams, the HyperTransport I/O link protocol implements the same basic ordering rules
as PCI. Additionally, there are features that allow these ordering rules to be relaxed. A Fence command
aligns posted cycles in all streams, and a Flush command flushes the posted write channel in one stream.
These features are helpful in handling protocols for bridges to other buses such as PCI, InfiniBand, AGP.
Session Layer
The session layer includes link width optimization and link frequency optimization along with
interrupt and power state capabilities.
Devices enabled with HyperTransport technology use standard “Plug ‘n Play” conventions for
exposing the control registers that enable configuration routines to optimize the width of each data path.
AMD registered the HyperTransport Specific Capabilities Block with the PCI SIG. This Capabilities Block,
illustrated in Figure , permits devices enabled with HyperTransport technology to be configured by any
operating system that supports a PCI architecture.
Drivers for devices enabled with HyperTransport technology are unique to the devices just as
they are to PCI I/O devices, but the similarities are great. Companies that build a PCI I/O device and then
create an equivalent device enabled with Hyper Transport technology should have no problems porting the
driver. To make porting easier, the chain from a host bridge is enumerated like a PCI bus, and devices and
functions within a device enabled with HyperTransport technology are enumerated like PCI devices and
functions, as shown in Figure
The initial link-width negotiation sequence may result in links that do not operate at their
maximum width potential. All 16-bit, 32-bit, and asymmetrically-sized configurations must be enabled by
a software initialization step. At cold reset, all links power-up and synchronize according to the protocol.
Firmware (or BIOS) then interrogates all the links in the system, reprograms them to the desired width,
and takes the system through a warm reset to change the link widths. Devices that implement the
LDTSTOP# signal can disconnect and reconnect rather than enter warm reset to invoke link width
changes.
At cold reset, all links power-up with 200-MHz clocks. For each link, firmware reads a specific register of
each device to determine the supported clock frequencies. The reported frequency capability, combined
with system-specific information about the board layout and power requirements, is used to determine the
frequency to be used for each link. Firmware then writes the two frequency registers to set the frequency
for each link. Once all devices have been configured, firmware initiates an LDTSTOP# disconnect
or RESET# of the affected chain to cause the new frequency to take effect.
Implementation Examples
Daisy Chain
HyperTransport technology has a daisy-chain topology, giving the opportunity to connect multiple
HyperTransport input/output bridges to a single channel. Hyper -Transport technology is designed to
support up to 32 devices per channel and can mix and match components with different link widths and
speeds. This capability makes it possible to create Hyper Transport technology devices that are building
blocks capable of spanning a range of platforms and market segments. For example, a low-cost entry in a
mainstream PC product line might be designed with an AMD Duron™ processor. With very little redesign
work, as shown in Figure this PC design could be upgraded to a high-end workstation by substituting high-
end AMD Athlon™ processors and bridges with HyperTransport technology to
expand the platform’s I/O capabilities. Figure 10 also illustrates the concept of tunnels, in which multiple
HyperTransport tunnels can be daisy-chained onto a single I/O link. A tunnel can be viewed as a basic
building block for complex system designs.
Switched Environment
Conclusion