UCIe Physical Layer
UCIe Physical Layer
set of signals that two UCIe interfaces will use to talk to each other ,and if a higher
.bandwidth is needed between the two chiplets those clusters are replicated multiple times
Additional lanes for valid, tracking, differential forwarded clock, and sideband
(clock+data) signals
Electrical Layer (analog) performs high-speed data transmission. It can operate at different
data rates :
• All the lower rates below the highest supported data rates must be supported
(e.g. 4, 8, 12, 16G for 16G device) in case the communication fails to operate at
maximum data rate.
Logical Layer (digital) performs calibration and training. It's function is to adjust the link
speed , reference voltage ,calibrate the clock phase and the skew between different data
lanes.
In the electrical layer , The Mainband Transmitter (MB TX) consists of :
The Phase-Locked Loop multiplies the input clock frequency with the multiplication
factor determined by the data rate
Voltage-Controlled Delay Line (VCDL) to a half clock period and tapping true &
complementary outputs of the stages .
• Phase-Only Detector (PD) :An RS-latch PD that produces equal UP & DN pulses
when the two input clocks have 180° phase difference . The rising edge of each
input
• DLL Loop Filter adjusts the control voltage by the amount proportional to the
UP/DN pulse width differences
So we must use phase mux with enable , when the enable is zero the clock is off but
we can't turn off the clock when it is high to avoid altering the pulse width of the
clock ,and when enabling the clock back do it immediately (as it doesn't depend on
the output clock as it is disabled).
Note: this part is unsynthesizable and the clock domain crossing is tricky to handle in
typical synthesis tools .
Check the functionality of the 16:1 serializer by measuring the BER of the serialized
data stream
The valid signal is high when the frame starts to indicate that the frame is valid also
the clock starts to toggle , the clock keeps toggling till the valid is low for 16 UIs to
make sure the data in intermediate stages like the deserializer are out .
The channels for the different lanes are not well matched and must be adjusted
before the data transmission starts ,the skew may span a wider range than one UI
especially at the highest data rate like (32 gega transfer/sec) , the phase interpolator
can adjust skew within one UI ,so to adjust skew greater than one UI we will use the
fifo to shift the bits that will be transmitted to create additional delay in units of UI.
The FIFO Transfers data between two clock domains (core and UCIe interface)
TX_syncgen block produces clock & valid pattern: (10101010… for the clock and
11110000… for the valid ).It also supporting bitwise shifts in the patterns.
The receiver also consists of clock path that recovers clock of TX and generates quarter-rate
clocks from it and data path that samples the data and deserializes them.
The RX Data Path made of quarter-rate sampling receiver that needs a reference voltage
level to detect the input whether it is one or zero ,then we take the data which is
deserialized into four streams to additional deserializing stages to get 16 parallel streams at
the output and each stage need different clock that can be produced by the clock divider.
An
Each comparator samples two inputs at the rising edge of the clock and when its polarity is
positive it produce one and when it is negative it produce zero, after this stage the latches
realign the timing of the comparators outputs to fit in the same clock domain .
• UCIe logical layer performs the following functions:
During initialization, the link's settings such as lane width, speed, and voltage levels
are adjusted. The training phase ensures that the link can maintain a stable
connection by calibrating the transmitters and receivers, which is critical for
achieving reliable high-speed data transfers, especially at bandwidths reaching up to
32 GT/s in UCIe.
Sideband messages handle control information and management signals that are
sent independently from the main data stream.
These messages carry crucial non-payload data such as power management signals,
error reports, and status updates. By separating control signals from the primary
data flow, sideband messaging ensures efficient communication without
interrupting the primary data bandwidth. This separation is essential for reducing
latency and improving communication efficiency.
Scrambling randomizes the data patterns being transmitted to reduce the possibility
of EMI, which can affect signal integrity. Meanwhile, training patterns are pre-
defined data sequences used to align the receiver’s clock and signal sampling points,
ensuring error-free communication at high speeds
5. Lane Reversal
Lane reversal allows flexible physical connections by enabling the transmitter and
receiver to function regardless of the physical ordering of the lanes.
In UCIe, physical lanes might be connected in reverse order due to layout
constraints. The lane reversal feature automatically adjusts the logical mapping of
the lanes, ensuring that data is transmitted correctly even if the physical lane
ordering doesn't match the expected configuration. This reduces design complexity
and increases flexibility in chiplet packaging
6. Width Degradation
This mechanism ensures that a link remains operational, even when some lanes
encounter issues like physical damage or signal loss. When this occurs, the system
reduces the overall bandwidth by utilizing the functional lanes to continue data
transmission. While the data throughput may drop, the critical benefit is that the
system avoids a complete link failure, allowing communication to persist at a
reduced rate until repairs or adjustments can be made.
The LFSR is a shift register used for generating PRBS (Pseudo-Random Binary
Sequence, which are crucial for scrambling data and reducing the likelihood of
repeating patterns during transmission.
The generated sequence is deterministic, meaning both the transmitter and receiver
use the same LFSR settings to scramble and later unscramble the data, ensuring
accurate communication.
The LTFSM is responsible for managing the link initialization and training phases of
the UCIe communication process.
Link training involves testing and optimizing the communication link between two
chiplets. The LTFSM moves through various states during this process, adjusting
parameters like signal timing and voltage to ensure a stable connection.
o LFSRs help generate CRC codes used for verifying the integrity of transmitted
data. If any corruption occurs during transmission, the mismatch between
the transmitted and received CRC values will trigger error correction
processes.
o The PRBS sequence validation, typically the final stage in link training,
ensures that the transmitter (TX) and receiver (RX) are in perfect alignment,
and that the link is ready for error-free, high-speed data transmission.
The LTFSM (Link Training Finite State Machine) RTL (Register-Transfer Level) model
is designed as a hierarchical finite-state machine. This structure allows for a
modular and organized approach to managing complex link training processes,
making it more efficient and easier to implement in hardware.
There is one top level FSM and each state can enable a sub-FSM and when it is done
it will return back to the top level FSM and move to the next statethat may enable
another sub-FSM
Example: The LTFSM might contain sub-FSMs for link detection, clock
synchronization, and error handling. Each of these FSMs operates independently
but is part of the larger hierarchical structure
TX-Initiated Point Training is a critical process in the UCIe protocol where the
transmitter (TX) of the main module initiates a sequence of data transmission tests
to validate and optimize the link with the receiver (RX). This process ensures proper
alignment and signal integrity between the TX and RX during high-speed data
transmission. Here's a detailed breakdown of the steps involved:
Sideband (SB) messages are exchanged between the transmitter and receiver to
negotiate the start of the point training process.
The LFSR (Linear Feedback Shift Register) is reset to prepare for generating the
PRBS (Pseudo-Random Binary Sequence) patterns used in the training.
The TX sends the PRBS patterns through the Main Band (MB), which is the primary
data transmission channel.
These patterns are transmitted to test the quality and integrity of the
communication link. The patterns simulate real data transmission and help detect
any signal integrity issues, such as jitter or bit errors, that could occur at high speeds.
The TX requests the comparison results from the RX, which logs any mismatches or
errors detected during the transmission.
The RX uses its comparator to check the received PRBS patterns against the
expected sequence. The log contains information about any errors, which helps the
TX adjust parameters (like signal amplitude or lane configuration) to optimize the
link. This feedback is crucial for fine-tuning the connection and ensuring reliable
high-speed communication.
VALVREF, DATAVREF: These steps determine the optimal voltage reference (Vref)
settings for valid and data signals. It is done at the lowest data rate 4 GT/s.
SPEEDIDLE: This step initializes or adjusts the data transmission rate to higher date
rates if possible.
VALTRAIN_CENTER: This calibration step finds the optimal Phase Interpolation (PI)
code for the valid transmission, enhancing accuracy in signal processing.
VALTRAIN_VREF: This step recalibrates the Vref control code for valid signals,
ensuring it remains optimal at higher data rates.
DATATRAIN_VREF: This step recalibrates the Vref control code for data signals,
ensuring consistent signal quality.
RXDESKEW: This step adjusts the data-to-clock skew at the receiver (RX), crucial for
synchronized signal interpretation.
LINKSPEED: This final step checks the system's operation at the current data rate,
validating that the previous calibrations allow for efficient communication.