0% found this document useful (0 votes)
290 views16 pages

UCIe Physical Layer

Uploaded by

ramishokeir99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
290 views16 pages

UCIe Physical Layer

Uploaded by

ramishokeir99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

The UCIe interface can be constructed in units called clusters or modules which defines the

set of signals that two UCIe interfaces will use to talk to each other ,and if a higher
.bandwidth is needed between the two chiplets those clusters are replicated multiple times

: Each cluster contains

 N single-ended, unidirectional, full-duplex data lanes

 Additional lanes for valid, tracking, differential forwarded clock, and sideband
(clock+data) signals

UCIe The Physical Layer

The UCIe physical layer is made of two sub layers:

Electrical Layer (analog) performs high-speed data transmission. It can operate at different
data rates :

 Mainband data rates per lane: 4, 8, 12, 16, 24, 32GT/s

• All the lower rates below the highest supported data rates must be supported
(e.g. 4, 8, 12, 16G for 16G device) in case the communication fails to operate at
maximum data rate.

• Actual data rate is determined during link training (MBTRAIN)

 Sideband data rates: fixed with 800MHz clock

Logical Layer (digital) performs calibration and training. It's function is to adjust the link
speed , reference voltage ,calibrate the clock phase and the skew between different data
lanes.
In the electrical layer , The Mainband Transmitter (MB TX) consists of :

• TX datapath : serializes and drives data through channels

• TX clockpath drives forwarded clocks with abilities to adjust their frequency,


phase, and skew .It generates a set of clocks for data, valid & clock drivers from the
same source clk_ref (2 G HZ).

The Phase-Locked Loop multiplies the input clock frequency with the multiplication
factor determined by the data rate

• Duty-Cycle Corrector: maintains 50% duty-cycle of the clock

• Delay-Locked Loop: generates a set of 8 multi-phase clocks by locking a 4-stage

Voltage-Controlled Delay Line (VCDL) to a half clock period and tapping true &
complementary outputs of the stages .

• Phase-Only Detector (PD) :An RS-latch PD that produces equal UP & DN pulses
when the two input clocks have 180° phase difference . The rising edge of each
input

• DLL Loop Filter adjusts the control voltage by the amount proportional to the
UP/DN pulse width differences

• Phase Interpolator (PI) Stage produces a clock with digitally-adjustable phase by


selecting two from 8 multiphase clocks using two multipexers and Interpolating
between the two adjacent clock phases to produce the middle phase .A digital
encoder controls the selection & interp. weights based on ctrl[5:0] (64 steps
spanning the entire 360 degrees)
When we do sideband communication like training which is slow (800 Mhz) ,the
main band is toggling clock at 32 Ghz which is very high and it will be waiting for the
side band to finish communication ,so the clock must be turned of to save power.

So we must use phase mux with enable , when the enable is zero the clock is off but
we can't turn off the clock when it is high to avoid altering the pulse width of the
clock ,and when enabling the clock back do it immediately (as it doesn't depend on
the output clock as it is disabled).

 TX clockpath produces clocks for the TX drivers

• Adjustable-phase clocks for TX data and valid drivers

• Fixed-phase clocks for TX clock/track drivers


In UCIe ,clocks are not continuous ,they stop when a frame is done and start when a
new frame is transmitted. So when the clock starts ,the three clocks
(data,valid,clock) starts simultaneously but with arbitrary phases ,so we don't know
which one will be toggled first. So the buffer make sure that the PI=0 has the
earliest phase and PI=63 has the latest phase , This can be achieved by enabling the
clock for clock then clock for data & valid
The serializer takes parallel input data of low rate and transfer it to serial high rate
output . It is done by cascading 2:1 serializer stages ,the mux select one input when
the clock is high and the other input when the clock is low to serialize two inputs in
one clock period and make the output have twice the data rate then the process is
repeated in all the stages to increase the output data rate. Each serializer stage
requires a clock with 1/2 frequency of the clock used in the next stage, .It can be
done by using Clock Frequency Dividers to divide the highest frequency that drives
the last stage (32 GHZ) and produce the lower frequencies for the previous stages.

 Note: this part is unsynthesizable and the clock domain crossing is tricky to handle in
typical synthesis tools .
 Check the functionality of the 16:1 serializer by measuring the BER of the serialized
data stream

 The valid signal is high when the frame starts to indicate that the frame is valid also
the clock starts to toggle , the clock keeps toggling till the valid is low for 16 UIs to
make sure the data in intermediate stages like the deserializer are out .
The channels for the different lanes are not well matched and must be adjusted
before the data transmission starts ,the skew may span a wider range than one UI
especially at the highest data rate like (32 gega transfer/sec) , the phase interpolator
can adjust skew within one UI ,so to adjust skew greater than one UI we will use the
fifo to shift the bits that will be transmitted to create additional delay in units of UI.

 The FIFO Transfers data between two clock domains (core and UCIe interface)

 TX_syncgen block produces clock & valid pattern: (10101010… for the clock and
11110000… for the valid ).It also supporting bitwise shifts in the patterns.
The receiver also consists of clock path that recovers clock of TX and generates quarter-rate
clocks from it and data path that samples the data and deserializes them.

The RX Data Path made of quarter-rate sampling receiver that needs a reference voltage
level to detect the input whether it is one or zero ,then we take the data which is
deserialized into four streams to additional deserializing stages to get 16 parallel streams at
the output and each stage need different clock that can be produced by the clock divider.

An

Each comparator samples two inputs at the rising edge of the clock and when its polarity is
positive it produce one and when it is negative it produce zero, after this stage the latches
realign the timing of the comparators outputs to fit in the same clock domain .
• UCIe logical layer performs the following functions:

1. Link Initialization and Training

 This process establishes communication between two chiplets by configuring the


physical link to ensure proper data transmission.

 During initialization, the link's settings such as lane width, speed, and voltage levels
are adjusted. The training phase ensures that the link can maintain a stable
connection by calibrating the transmitters and receivers, which is critical for
achieving reliable high-speed data transfers, especially at bandwidths reaching up to
32 GT/s in UCIe.

2. Transmitting and Receiving Sideband Messages

 Sideband messages handle control information and management signals that are
sent independently from the main data stream.

 These messages carry crucial non-payload data such as power management signals,
error reports, and status updates. By separating control signals from the primary
data flow, sideband messaging ensures efficient communication without
interrupting the primary data bandwidth. This separation is essential for reducing
latency and improving communication efficiency.

3. Scrambling and Training Pattern Generation

 Scrambling prevents the transmission of repetitive patterns that could cause


electromagnetic interference (EMI), while training patterns are used to test and
adjust the link during initialization.

 Scrambling randomizes the data patterns being transmitted to reduce the possibility
of EMI, which can affect signal integrity. Meanwhile, training patterns are pre-
defined data sequences used to align the receiver’s clock and signal sampling points,
ensuring error-free communication at high speeds

4. Interconnect Redundancy Mapping

 This function improves reliability by dynamically re-routing data in case of failure or


underutilization of certain lanes.

 In scenarios where a physical lane fails or becomes unreliable, UCIe uses


redundancy mapping to shift data to functioning lanes. This ensures that data
transmission can continue without loss of bandwidth or significant delays. It’s
particularly important for high-reliability systems where minimizing downtime and
maintaining data integrity is essential.

5. Lane Reversal

 Lane reversal allows flexible physical connections by enabling the transmitter and
receiver to function regardless of the physical ordering of the lanes.
 In UCIe, physical lanes might be connected in reverse order due to layout
constraints. The lane reversal feature automatically adjusts the logical mapping of
the lanes, ensuring that data is transmitted correctly even if the physical lane
ordering doesn't match the expected configuration. This reduces design complexity
and increases flexibility in chiplet packaging

6. Width Degradation

 This mechanism ensures that a link remains operational, even when some lanes
encounter issues like physical damage or signal loss. When this occurs, the system
reduces the overall bandwidth by utilizing the functional lanes to continue data
transmission. While the data throughput may drop, the critical benefit is that the
system avoids a complete link failure, allowing communication to persist at a
reduced rate until repairs or adjustments can be made.

 The logical layer consists of :

1. LFSR (Linear Feedback Shift Register)

 The LFSR is a shift register used for generating PRBS (Pseudo-Random Binary
Sequence, which are crucial for scrambling data and reducing the likelihood of
repeating patterns during transmission.

 The generated sequence is deterministic, meaning both the transmitter and receiver
use the same LFSR settings to scramble and later unscramble the data, ensuring
accurate communication.

2. LTFSM (Link Training Finite State Machine)

 The LTFSM is responsible for managing the link initialization and training phases of
the UCIe communication process.
 Link training involves testing and optimizing the communication link between two
chiplets. The LTFSM moves through various states during this process, adjusting
parameters like signal timing and voltage to ensure a stable connection.

o It manages the transition through multiple operational states, such as


detecting the physical link, synchronizing the clocks, and verifying data
transmission readiness.

o The FSM ensures that each step is completed successfully before


progressing to the next, preventing transmission errors and ensuring a
reliable connection at the desired bandwidth.

3. LFSR in Error Detection and Last Stage Validation:

o LFSRs help generate CRC codes used for verifying the integrity of transmitted
data. If any corruption occurs during transmission, the mismatch between
the transmitted and received CRC values will trigger error correction
processes.

o The PRBS sequence validation, typically the final stage in link training,
ensures that the transmitter (TX) and receiver (RX) are in perfect alignment,
and that the link is ready for error-free, high-speed data transmission.

The LTFSM (Link Training Finite State Machine) RTL (Register-Transfer Level) model
is designed as a hierarchical finite-state machine. This structure allows for a
modular and organized approach to managing complex link training processes,
making it more efficient and easier to implement in hardware.

 There is one top level FSM and each state can enable a sub-FSM and when it is done
it will return back to the top level FSM and move to the next statethat may enable
another sub-FSM
 Example: The LTFSM might contain sub-FSMs for link detection, clock
synchronization, and error handling. Each of these FSMs operates independently
but is part of the larger hierarchical structure
TX-Initiated Point Training is a critical process in the UCIe protocol where the
transmitter (TX) of the main module initiates a sequence of data transmission tests
to validate and optimize the link with the receiver (RX). This process ensures proper
alignment and signal integrity between the TX and RX during high-speed data
transmission. Here's a detailed breakdown of the steps involved:

Steps in TX-Initiated Point Training:

1. Exchange Sideband (SB) Messages to Start the Point Training:

 Sideband (SB) messages are exchanged between the transmitter and receiver to
negotiate the start of the point training process.

 SB messages act as control signals to initiate communication and synchronization


between the TX and RX. The point training process starts once both ends agree on
the parameters of the training, such as the training mode, data rate, and lane
configuration.

2. Reset LFSR Pattern Generator and Comparator:

 The LFSR (Linear Feedback Shift Register) is reset to prepare for generating the
PRBS (Pseudo-Random Binary Sequence) patterns used in the training.

 The LFSR generates a pseudo-random sequence of data that will be transmitted to


the receiver. Both the TX and RX reset their LFSR and comparator circuits to
synchronize the generation and comparison of these patterns. The comparator at
the receiver side checks for mismatches or errors in the received data, ensuring
proper transmission.

3. Send Patterns via Main Band (MB):

 The TX sends the PRBS patterns through the Main Band (MB), which is the primary
data transmission channel.

 These patterns are transmitted to test the quality and integrity of the
communication link. The patterns simulate real data transmission and help detect
any signal integrity issues, such as jitter or bit errors, that could occur at high speeds.

4. Request the Partner's Log of Comparison Results:

 The TX requests the comparison results from the RX, which logs any mismatches or
errors detected during the transmission.

 The RX uses its comparator to check the received PRBS patterns against the
expected sequence. The log contains information about any errors, which helps the
TX adjust parameters (like signal amplitude or lane configuration) to optimize the
link. This feedback is crucial for fine-tuning the connection and ensuring reliable
high-speed communication.

Note :The point training could be also RX-Initiated Point Training


Mainband (MB) Training Performs a set of calibrations using a set of TX- or
RXinitiated point trainings

 VALVREF, DATAVREF: These steps determine the optimal voltage reference (Vref)
settings for valid and data signals. It is done at the lowest data rate 4 GT/s.

 SPEEDIDLE: This step initializes or adjusts the data transmission rate to higher date
rates if possible.

 VALTRAIN_CENTER: This calibration step finds the optimal Phase Interpolation (PI)
code for the valid transmission, enhancing accuracy in signal processing.

 VALTRAIN_VREF: This step recalibrates the Vref control code for valid signals,
ensuring it remains optimal at higher data rates.

 DATATRAIN_CENTER1: Similar to VALTRAIN_CENTER, this step finds the optimal PI


code for the data transmission.

 DATATRAIN_VREF: This step recalibrates the Vref control code for data signals,
ensuring consistent signal quality.

 RXDESKEW: This step adjusts the data-to-clock skew at the receiver (RX), crucial for
synchronized signal interpretation.

 DATATRAIN_CENTER2: This further calibrates the PI code, ensuring precise control


over the data signals.

 LINKSPEED: This final step checks the system's operation at the current data rate,
validating that the previous calibrations allow for efficient communication.

You might also like