0% found this document useful (0 votes)
10 views

Week 6 Lecture Material_watermark

The document discusses timing closure in chip design, focusing on the integration of placement and routing solutions to meet geometric and timing constraints. It covers components such as timing-driven placement, routing, and physical synthesis, and emphasizes the importance of static timing analysis (STA) for ensuring that setup and hold time constraints are met. Additionally, it introduces the Zero-Slack Algorithm for optimizing gate and wire delays while establishing timing budgets during the physical design process.

Uploaded by

R INI BHANDARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Week 6 Lecture Material_watermark

The document discusses timing closure in chip design, focusing on the integration of placement and routing solutions to meet geometric and timing constraints. It covers components such as timing-driven placement, routing, and physical synthesis, and emphasizes the importance of static timing analysis (STA) for ensuring that setup and hold time constraints are met. Additionally, it introduces the Zero-Slack Algorithm for optimizing gate and wire delays while establishing timing budgets during the physical design process.

Uploaded by

R INI BHANDARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Lecture 32: TIMING CLOSURE (PART 1)

PROF. INDRANIL SENGUPTA


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Introduction
• The layout of a chip must satisfy:
– Geometric constraints (e.g. non-overlapping cells and routability)
– Timing constraints of the design (e.g. setup and hold constraints)
• The optimization process that meets the above requirements
and constraints is often called timing closure.
• Integrates placement and routing solutions with specialized
methods to improve circuit performance.

2
Components of Timing Closure
1. Timing-driven placement
 Minimizes signal delays when assigning locations to circuit elements.
2. Timing-driven routing
 Minimizes signal delays when selecting routing topologies and specific routes.
3. Physical synthesis
 Sizing transistors or gates to decrease the delay or increase the drive strength of a
gate.
 Inserting buffers into nets to decrease propagation delays.
 Restructuring the circuit along its critical paths.

3
Background
• For many years, signal propagation delay in logic gates was
the main contributor to circuit delay, while wire delay was
negligible.
– Cell placement and wire routing did not affect circuit performance.
• Technology scaling post-1990 significantly increased the
relative impact of wire-induced delays.
– High-quality placement and routing have become critical for timing
closure.

4
Background
15% delay  Mid 80 Scenario
 Most of the input to output delay
85% delay
of the logic is due to gate delay.
50% delay
 Mid 90 Scenario
50% delay  Half of input to output delay of the
logic is due to wire delay.

80% delay  Today’s Scenario


 Most of input to output delay of
20% delay the logic is due to wire delay.

5
Quick Recap of Setup and Hold Times
• Timing optimization tools adjust propagation delays through
circuit components, with the primary goal of satisfying timing
constraints. Two ways:
– Setup (long-path) constraints: Amount of time a data input signal
should be stable before the clock edge for each storage element.
– Hold (short-path) constraints: Amount of time a data input signal
should be stable after the clock edge at each storage element.

6
(a) Setup Constraints
• Ensure that no signal transition occurs too late.
• Initial phases of timing closure focus on these types of
constraints:
tcycle ≥ tcombDelay + tsetup + tskew
• Checking whether a circuit meets setup constraints requires
estimating how long signal transitions will take to propagate
from one storage element to the next.
– Typically uses Static Timing Analysis.

7
• What is Static Timing Analysis?
– Propagates actual arrival times (AAT) and required arrival times (RAT)
to the terminals of every gate or cell.
– Can quickly identify timing violations, and diagnose them by tracing
out critical paths in the circuit that are responsible for these timing
failures.
– Models propagation of signal transitions with the worst possible delay.
– Typically excludes false paths from the analysis.

8
– For every timing point x in the circuit netlist, the timing slack is
computed as:
SLACK(x) = RAT(x) – AAT(x)
– Positive slack means timing has been met; negative means violation.
– Guided by slack values, physical synthesis restructures the netlist to
make it more suitable for high-performance layout implementation.
• Gates lying on critical paths can be upsized to propagate signals faster.
• Buffers may be inserted into long critical wires.
• The netlist tree can be restructured to decrease the overall depth.

9
Hold-time Constraints
• Ensure that signal transitions do not occur too early.
– Hold violations can occur when a signal path is too short, allowing a
receiving flip-flop to capture the signal at the same cycle instead of
the next cycle.
• Hold-time constraint is given by:
tcombDelay ≥ thold + tskew
– Clock skew affects hold-time constraints significantly more than setup
constraints. So, hold-time constraints are typically enforced after
synthesizing the clock network.

10
END OF LECTURE 32

11
Lecture 33: TIMING CLOSURE (PART 2)

PROF. INDRANIL SENGUPTA


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Timing Analysis and Performance Constraints
• Almost all digital ICs are synchronous Finite State Machines (FSM).
– Transitions occur at a set clock frequency.
– A sequential circuit, unrolled in time:

Combinational Combinational Combinational


Logic FF Logic FF Logic FF
Copy 1 Copy 2 Copy 3

Clock

2
• The maximum clock frequency for a given design depends upon:
– Gate delays, which are the signal delays due to gate transitions.
– Wire delays, which are the delays associated with signal propagation
along wires.
– Clock skew.
• Need to quickly estimate sequential circuit timing:
– Perform static timing analysis (STA).
– Assume clock skew is negligible, postpone until after clock network
synthesis.

3
Static Timing Analysis
• We represent a combinational logic netlist as a directed acyclic graph (DAG).
• The inputs are annotated with times 0, 0 and 0.6 time units respectively, at
which signal transitions occur relative to the start of the clock cycle.
• The gate and wire delays are also shown.

a <0> (0.15) (0.2)


y (2) w (2) (0.2) f
(0.1)
b <0> (0.1) x (1) (0.3) (0.25)
z (2)
c <0.6> (0.1)

4
DAG Representation
• The graph has one vertex for each input and output, as well as one vertex
for each logic gate.
• A source node s is introduced with a directed edge to each input.
• Vertices corresponding to logic gates are labeled with the respective gate
delays.
• Directed edges from the source to the inputs are labeled with transition
times, and directed edges between gate vertices are labeled with wire
delays.

5
a <0> (0.15) (0.2)
y (2) w (2) (0.2) f
(0.1)
b <0> (0.1) x (1) (0.3) (0.25)
z (2)
c <0.6> (0.1)

a (0) (0.15) y (2)

(0) (0.1) (0.2)

s (0) b (0) (0.1) x (1) w (2) (0.2) f (0)

(0.6) (0.3) (0.25)


DAG
c (0) (0.1) z (2)

6
Actual Arrival Time (AAT)
• The AAT of a given node v V, denoted as AAT(v), is defined as the latest
transition time at v measured from the beginning of the clock cycle.
– By convention, AAT(v) records the arrival time at the output side of node v.
– In the previous example, AAT(x) = 0.1 + 1 = 1.1, AAT(y) = 1.1 + 0.1 + 2 = 3.2
• Formal definition:
AAT (v ) = max ( AAT (u ) + t (u , v ) )
u∈FI ( v )

where FI(v) is the set of all nodes from which there exists a directed edge
to v, and t(u,v) is the delay corresponding to the (u,v) edge.

7
• All AAT values in the DAG can be computed in O(|V| + |E|) time.
– Linear in number of gates are edges.
• This linear scaling of runtime makes STA applicable to modern designs
with hundreds of millions of gates.
a (0) (0.15) y (2)
A0 A 3.2
(0) (0.1) (0.2)

s (0) b (0) (0.1) x (1) w (2) (0.2) f (0)


A0
A0 A 1.1 (0.3) (0.25) A 5.65 A 5.85
(0.6)

c (0) (0.1) z (2)


A 0.6 A 3.4

8
Required Arrival Time (RAT)
• The RAT of a given node v V, denoted as RAT(v), is defined as the time by
which the latest transition at a given node v must occur in order for the
circuit to operate correctly within a given clock cycle.
– Unlike AATs, which are determined from multiple paths from upstream inputs
and flip-flop outputs, RATs are determined from multiple paths to downstream
outputs and flip-flop inputs.
• Formal definition:
RAT (v ) = min (RAT (u ) − t (u, v) )
u∈FO ( v )
where FO(v) is the set of all vertices with a directed edge from v.

9
• It is assumed that the RAT values for the outputs are given.
• For the example, suppose that RAT(f) = 5.5 .

a (0) (0.15) y (2)


R 0.95 R 3.1
(0) (0.1) (0.2)

s (0) b (0) (0.1) x (1) w (2) (0.2) f (0)


R -0.35 R -0.35 R 0.75 (0.3) (0.25) R 5.3 R 5.5
(0.6)

c (0) (0.1) z (2)


R 0.95 R 3.05

10 10
Slack Computation
• The correct operation of the chip with respect to setup constraints (e.g.
maximum path delay), requires that AAT at each node does not exceed RAT.
– That is, for all vertices v V, we must have AAT(v) ≤ RAT(v).
• The slack of a node v is computed as:
slack (v ) = RAT (v ) − AAT (v )
– Critical paths or critical nets are signals that have negative slack.
– Non-critical paths or non-critical nets have positive slack.

11
Final Result with Slacks A: AAT
Computed R: RAT
S: Slack
a (0) (0.15) y (2)
A0 A 3.2
(0) R 0.95 (0.1) R 3.1 (0.2)
S 0.95 S -0.1
s (0) b (0) (0.1) x (1) w (2) (0.2) f (0)
A0 A0
R -0.35 A 1.1 A 5.65 A 5.85
(0.6) R -0.35 R 0.75 (0.3) (0.25) R 5.3 R 5.5
S -0.35 S -0.35 S -0.35 S -0.35 S -0.35
c (0) (0.1) z (2)
A 0.6 A 3.4
R 0.95 R 3.05
S 0.35 S -0.35

12 12
Current Practice
• In modern designs, separate timing analyses are performed for the cases of
rise delay (rising transitions) and fall delay (falling transitions).
• Signal integrity extensions to STA consider changes in delay due to switching
activity on neighboring wires of the path under analysis.
– For signal integrity analysis, the STA tool keeps track of windows (intervals) of
AATs and RATs.
– Typically executes multiple timing analysis iterations before these timing
windows stabilize.
• Statistical STA is a generalization of STA where gate and wire delays are
modeled by random variables and represented by probability distributions.

13
Drawbacks of STA
1. Assumption of a clock.
Not applicable to asynchronous subsystems.

2. Assumption that all paths are sensitizable.


Optimization tools waste considerable runtime and chip resources (e.g.
power, area, speed) satisfying phantom constraints.
– False paths, which are never activated.
– Multi-cycle paths, where signal transitions do not need to finish within one
clock cycle.

14
END OF LECTURE 33

15
Lecture 34: TIMING CLOSURE (PART 3)

PROF. INDRANIL SENGUPTA


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Delay Budgeting with the Zero-Slack Algorithm
• In timing-driven physical design, both gate and wire delays must be
optimized to obtain a timing-correct layout.
• There exists a dilemma:
– Timing optimization requires knowledge of capacitive loads, and hence the
actual wire length.
– Wire lengths are unknown until placement and routing are completed.
• Timing budgets are used to establish delay and wire length constraints for
each net, for guiding placement and routing to a timing-correct result.
– Best-known approach to timing budgeting is the Zero Slack Algorithm.

2
Basic Idea
• Some notations:
– Consider a netlist consisting of logic gates v1, v2, …, vn
– Consider a set of nets e1, e2, …, em, where ei is the output net of gate vi.
– Let t(v) and t(e) denote gate delay and wire delay, respectively.

3
• The ZSA takes the netlist as input, and tries to decrease positive slacks of all
nodes to zero by increasing t(v) and t(e) values.
• These increased delay values together constitutes the Timing Budget TB(v) of
node v, which should not be exceeded during placement and routing.
TB(v) = t(v) + t(e)
• If TB(v) is exceeded, then the place-and-route tool typically:
(i) decrease the wirelength of e, or (ii) changes the size of gate v.
– The delay impact of a wire or gate size change can be estimated using the Elmore
delay model.

4
• If most arcs (branches) of a timing path are within budget, then the path
may meet its timing constraints even if some arcs exceed their budgets.
– Thus, another approach to satisfying the timing budget is rebudgeting.

• The zero slack algorithm shall be explained with the help of an illustrative
example.

5
Basic Steps in ZSA
1. Determine the initial slacks of all the nodes, and select a node vmin with
minimum positive slack slackmin.
2. Find a path of vertices that dominates slackmin, i.e. any change in the
delays in vertices along the path will cause slackmin to change.
3. Evenly distribute the slack by increasing TB(v) for each vertex v in the
path. Each budget increment will decrement the slack value of a vertex.
By repeating the process, the slack of each node in V will end up at zero.
The resulting timing budgets at all nodes are the final output of ZSA.

6
Example
• Use the zero-slack algorithm to distribute slack
• Format: <AAT, Slack, RAT>, [timing budget]
O1: <13,4,17>
I1 <1,4,5> [0] <3,4,7> [0] O2: <6,8,14>
2
I2
<0,5,5> [0]
<7,4,11> [0]
4 <13,4,17> [0]
6 O1
I3
<1,6,7> [0]

<6,8,14> [0]
3 0 O2
I4 <3,5,8> [0] <6,5,11> [0]

7 7
Example
• Find the path with the minimum non-zero slack (MARKED IN RED).

O1: <13,4,17>
I1 <1,4,5> [0] <3,4,7> [0] O2: <6,8,14>
2
I2
<0,5,5> [0]
<7,4,11> [0]
4 <13,4,17> [0]
6 O1
I3
<1,6,7> [0]

<6,8,14> [0]
3 0 O2
I4 <3,5,8> [0] <6,5,11> [0]

8 8
Example
• Find the path with the minimum non-zero slack.
• Distribute the slacks and update the timing budgets.
O1: <17,0,17>
I1 <1,0,1> [1] <4,0,4> [1] O2: <6,8,14>
2
I2
<0,2,2> [0]
<9,0,9> [1]
4 <16,0,16> [1]
6 O1
I3
<1,4,5> [0]

<6,8,14> [0]
3 0 O2
I4 <3,4,7> [0] <6,4,10> [0]

9 9
Example
• Find the path with the minimum non-zero slack.
• Distribute the slacks and update the timing budgets.
O1: <17,0,17>
I1 <1,0,1> [1] <4,0,4> [1] O2: <6,8,14>
2
I2
<0,2,2> [0]
<9,0,9> [1]
4 <16,0,16> [1]
6 O1
I3
<1,4,5> [0]

<6,8,14> [0]
3 0 O2
I4 <3,4,7> [0] <6,4,10> [0]

10 10
Example
• Find the path with the minimum non-zero slack.
• Distribute the slacks and update the timing budgets.
O1: <17,0,17>
I1 <1,0,1> [1] <4,0,4> [1] O2: <6,8,14>
2
I2
<0,0,0> [2]
<9,0,9> [1]
4 <16,0,16> [1]
6 O1
I3
<1,4,5> [0]

<6,8,14> [0]
3 0 O2
I4 <3,4,7> [0] <6,4,10> [0]

11 11
Example
• Find the path with the minimum non-zero slack.
• Distribute the slacks and update the timing budgets.
O1: <17,0,17>
I1 <1,0,1> [1] <4,0,4> [1] O2: <6,8,14>
2
I2
<0,0,0> [2]
<9,0,9> [1]
4 <16,0,16> [1]
6 O1
I3
<1,4,5> [0]

<6,8,14> [0]
3 0 O2
I4 <3,4,7> [0] <6,4,10> [0]

12 12
Example
• Find the path with the minimum non-zero slack.
• Distribute the slacks and update the timing budgets.
O1: <17,0,17>
I1 <1,0,1> [1] <4,0,4> [1] O2: <6,8,14>
2
I2
<0,0,0> [2]
<9,0,9> [1]
4 <16,0,16> [1]
6 O1
I3
<1,2,3> [2]

<6,8,14> [0]
3 0 O2
I4 <3,2,5> [0] <6,2,8> [2]

13 13
Example
• Find the path with the minimum non-zero slack.
• Distribute the slacks and update the timing budgets.
O1: <17,0,17>
I1 <1,0,1> [1] <4,0,4> [1] O2: <6,8,14>
2
I2
<0,0,0> [2]
<9,0,9> [1]
4 <16,0,16> [1]
6 O1
I3
<1,2,3> [2]

<6,8,14> [0]
3 0 O2
I4 <3,2,5> [0] <6,2,8> [2]

14 14
Example
• Find the path with the minimum non-zero slack.
• Distribute the slacks and update the timing budgets.
O1: <17,0,17>
I1 <1,0,1> [1] <4,0,4> [1] O2: <10,4,14>
2
I2
<0,0,0> [2]
<9,0,9> [1]
4 <16,0,16> [1]
6 O1
I3
<1,0,1> [3]

<10,4,14> [0]
3 0 O2
I4 <3,1,4> [0] <7,0,7> [3]

15 15
Example
• Find the path with the minimum non-zero slack.
• Distribute the slacks and update the timing budgets.
O1: <17,0,17>
I1 <1,0,1> [1] <4,0,4> [1] O2: <14,0,14>
2
I2
<0,0,0> [2]
<9,0,9> [1]
4 <16,0,16> [1]
6 O1
I3
<1,0,1> [3]

<10,0,10> [4]
3 0 O2
I4 <3,0,3> [1] <7,0,7> [3]

16 16
A Modification: Early Mode Analysis
• ZSA uses late-mode analysis with respect to setup constraints, i.e. the
latest times by which signal transitions can occur for the circuit to operate
correctly.
• Correct operation also depends on satisfying hold-time constraints on the
earliest signal transition times.
• Early-mode analysis considers these constraints.

17
How it Works?
• To correctly analyze this timing constraint, the earliest actual arrival time
of signal transitions at each node must be determined.
• The required arrival time of a sequential element in early mode is the time
at which the earliest signal can arrive and still satisfy the library-cell hold-
time requirement.
• For each gate v, AATEM(v) ≥ RATEM(v) must be satisfied.
– AATEM(v) is the earliest actual arrival time of a signal transition at gate v
– RATEM(v) is the required arrival time in early mode at gate v

18
• The early-mode slack can be defined as:
slackEM(v) = AATEM(v) – RATEM(v)

• When adapted to early-mode analysis, ZSA is also called the near zero-
slack algorithm.
– The modified algorithm seeks to decrease TB(v) by decreasing t(v) or t(e), so
that all nodes have minimum early-mode timing slacks.
– Since t(v) and t(e) cannot be negative, node slacks may not necessarily all
become zero.

19
To Summarize
• In practice, if the delay of a node does not satisfy its early-mode timing
budget, the delay constraint can be satisfied by adding additional delay
(padding) to appropriate components.
– The additional delay can violate late-mode timing constraints.
• Thus, a circuit should be first designed with ZSA and late-mode analysis.
Early-mode analysis may then be used to confirm that early-mode
constraints are satisfied, or to guide circuit modifications to satisfy such
constraints.

20
END OF LECTURE 34

21
Lecture 35: TIMING CLOSURE (PART 4)

PROF. INDRANIL SENGUPTA


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
False Paths
 Paths that physically exist in a design but are not logic
/functional paths.
 These paths never get sensitized under any input conditions.
 An example is shown on the next slide.

2
An example:
The path of length 400 is never exercised.
u x
200 1 200 1
MUX fi MUX
0 0
v y
100 100

3
Multi-cycle Paths
• Data paths that require more than one clock period for
execution.

2 clock period delay

4
Timing Analysis Problems
• We want to determine the true critical paths of a circuit in
order to:
– Determine the minimum cycle time for which the circuit will function.
– Identify critical paths from performance optimization – do not try to
optimize the wrong (non-critical) paths
• Implications:
– Do not want false paths (produced by static delay analysis).
– Delay model is worst case model.

5
Functional Timing Analysis
• Estimate when the output of a given circuit gets stable.

0
Combinational
block
0
clock 0 T

6
Why Timing Analysis?
• Timing verification
– Verifies whether a design meets a given timing constraint.
• Example: cycle-time constraint
• Timing optimization
– Needs to identify critical portion of a design for further optimization.
• Critical path identification
• In both cases, higher the accuracy, the better.

7
Timing Analysis - Basics
• Naïve approach - Simulate all input vectors with SPICE
– Accurate, but too expensive.
• Gate-level timing analysis
– Less accurate than SPICE due to the level of abstraction, but much more
efficient.
– Scenario:
• Gate/wire delays are pre-characterized (accuracy loss).
• Perform timing analysis of a gate-level circuit assuming the gate/wire
delays.

8
Gate-level Timing Analysis
False path z • A naive approach is topological analysis.
aware – Easy longest-path problem
arr(z)? 1 – Linear in the size of a network
• Not all paths can propagate signal events.
– False paths
1 – If all longest paths are false, topological
analysis gives delay overestimate.
Functional timing analysis = false-path-
x1 x2 aware timing analysis
– Compute false-path-aware arrival time
arr(x1)=0 arr(x2)=0

9
Example: 2-bit Carry-skip Adder
c_in s0

Length 5 Length 1
a0
b0 s1
1
0
a1 c_out
b1

10
False Path Analysis - Basics
• Is a path responsible for delay?
– If the answer is no, can ignore the path for delay computation.
• Check the falsity of long paths until we find the longest true path.
– How can we determine whether a path is false?

• Delay underestimation is unacceptable.


– Can lead to overlooking a timing violation.
• Delay overestimation is not desirable, but acceptable.
– Topological analysis can give overestimate, but never give underestimate.

11
Possible Approach :: Boolean Difference
fi-1 fi Fi+1

• Path P = {f0, f1, f2, … , fn}


∂f i
gives conditions under which node fi is “sensitive” to node fi-1
∂f i −1

• So output P is sensitive to f0 if

12
Example :: Static False Path
u x fj
200 1 200 1
MUX fi MUX
0 0
v y
100 100

∂f i ∂f j
The path is not sensitizable and hence is false. Hence, ⋅ =0
∂u ∂x

13
Definitions
• Given a simple gate (i.e. AND, OR, NAND, NOR), a controlling value on an
input determines the output of the gate independent of the other inputs.
• Given a simple gate (i.e. AND, OR, NAND, NOR), a non-controlling value on
an input cannot determine the output of the gate independent of the
other inputs.
– 0 is a controlling value for AND gate; 1 is non-controlling value for AND gate.
• Controlling / non-controlling value is merely a specialization of the
Boolean difference to simple gates.

14
a
f
b

a
g
b

15
Controlling/Non-Controlling Values
Controlled value of AND
0 0 1

Controlling value of AND Non-Controlling value of AND

Controlled value of OR
1 1 0

Controlling value of OR Non-Controlling value of OR

16
END OF LECTURE 35

17
Lecture 36: TIMING CLOSURE (PART 5)

PROF. INDRANIL SENGUPTA


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Static Sensitization
• A path is statically-sensitizable if there exists an input vector such that
all the side inputs to the path are set to non-controlling values.
– This is independent of gate delays. The longest true path
is of length 2?
1 Controlling value!
0
t=0
t=0
These paths are not
1 statically-sensitizable
t=0 0

2
Static Sensitization (contd.)
• The (dashed) path is responsible for delay!
• Delay underestimation by static sensitization (delay = 2 when true
delay = 3)
– incorrect condition

1
0
1 2 3
0 2
0

3
What is Wrong with Static Sensitization?
• The idea of forcing non-controlling values to side inputs is
okay, but timing was ignored.
– The same signal can have a controlling value at one time and a non-
controlling value at another time.

• How about timing simulation as a correct method?

4
Timing Simulation
0
2 2
2 3
1
1 1
1
4
0 4
Implies that delay = 0 for these inputs
BUT!

5
0
2 2
2 3
1
1 3 4
1
1
4->2
0 2
Implies that delay = 4 with the same set of inputs.

6
What is Wrong with Timing Simulation?
• If gate delays are reduced, delay estimates can increase.
• Not acceptable since
– Gate delays are just upper-bounds, actual delay is in [0,d].
• Delay uncertainty due to manufacturing.
– We are implicitly analyzing a family of circuits where gate delays are
within the upper-bounds.

7
Monotone Speedup Property
• Definition: For any circuit C, if
a) C’ is obtained from C by reducing some gate delays, and
b) delay_estimate(C’) ≤ delay_estimate(C),
then delay_estimate has Monotone Speedup property.

Timing simulation does not have this property.

8
Timing Simulation Revisited
0 2
2
3
1
1 4
1
1
4
0 4
means that the rising signal occurs anywhere
4 between t = -∞ and t = 4.

9
What we just saw …
• Timed 3-valued (0,1,X) simulation
– called X-valued simulation.
• Monotone speedup property is satisfied.

10
SAT Based False Path Analysis
• Satisfiability (SAT) solvers are used for solving a wide range of
problems.
• Modern SAT solvers run very fast and can handle a large
number of variables.
• Basically, given a Boolean function F in product-of-sum form, a
SAT solver tries to find some assignment of the variables for
which F = 1.

11
The SAT Formulation
Decision problem:
Is there an input vector under which the output gets stable only after t = T ?
Idea:
1. Characterize the set of all input vectors S(T) that make the output stable
no later than t = T.
2. Check if S(T) contains S = all possible input vectors.
This check is solved as a SAT problem:
Is S \ S(T) empty?  set difference + emptiness check
• Let F and F(T) be the characteristic functions of S and S(T)
• Is F !F(T) satisfiable?

12
Example
d
g
a
b e f
c

Assume all the PIs arrive at t = 0, all gate delays = 1.


Is the output stable at time t > 2?

13
g(1,t=2) : the set of input vectors under which
g gets stable to value = 1 no later than t =2
d
g
a
b e f Onset:
stabilized by t=2?
c
g(1,t=2) = d(1,t=1) ∩ f(1,t=1)
= (a(0,t=0) ∩ b(0,t=0)) ∩ (c(1,t=0) ∪ e(1,t=0))
= !a!b(c ∪ ∅) = !a!bc = S1(t=2)
g(1,t=∞) = on-set = !a!bc = g(1,t=2) = S1

14
g(0,t=2) : the set of input vectors under which
g gets stable to value = 0 no later than t=2
d
g
a
b e f

c
g(0,t=2) = d(0,t=1) ∪ f(0,t=1)
= (a(1,t=0) ∪ b(1,t=0)) ∪ (c(0,t=0) ∩ e(0,t=0))
= (a+b) + (!c ∩ ∅) = a+b = S0(t=2)
g(0,t=∞) = off-set = a+b+!c = S0

15
g(0,t=2) : the set of input vectors under which
g gets stable to 0 no later than t=2
d
g
a
b e Offset:
f NOT stabilized by t=2
under abc = 000
c
g(0,t=2) = a+b
g(0,t=∞) = offset = a+b+!c
g(0,t=∞) \ g(0,t=2) = (a+b+!c) !(a+b) = !a !b !c = satisfiable

16
Summary
• False-path-aware arrival time analysis is well-understood.
– Practical algorithms exist.
• Can handle industrial circuits easily.
• Remaining problems:
– Incremental analysis (make it so that a small change in the circuit
does not make the analysis start all over).
– Integration with logic optimization.
– DSM issues such as cross-talk-aware false path analysis.

17
END OF LECTURE 36

18
Lecture 37: TIMING DRIVEN PLACEMENT

PROF. INDRANIL SENGUPTA


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Timing Driven Placement (TDP)
• TDP optimizes circuit delay, either to satisfy all timing constraints, or to
achieve the greatest possible clock frequency.
• It uses the results of STA to identify critical nets and attempts to improve
signal propagation delay through those nets.
• TDP minimizes one or both of the following: WNS = min (slack ( τ) )
τ∈Τ
a) Worst negative-slack (WNS)
b) Total negative slack (TNS)
TNS = ∑ slack ( τ)
τ∈Τ, slack ( τ ) < 0

where T is the set of timing endpoints (i.e. primary outputs, or inputs to


flip-flops).

2
Techniques for Timing-Driven Placement
• Algorithmic techniques for TDP can be categorized as net-based,
path-based, or integrated.
• Two types of net-based techniques:
1. Delay budgeting, which assigns upper bounds to the timing or length of
individual nets.
2. Net weighting, which assign higher priorities to critical nets during placement.
• Path-based techniques seek to shorten or speedup all timing-critical paths
rather than individual nets.
– More accurate but does not scale to large designs because number of paths
can grow exponentially with number of gates (e.g. multiplier).

3
• Both path-based and net-based approaches rely on support within the
placement algorithm, and require a dedicated infrastructure for
incremental calculation of timing statistics and parameters.
• Integrated techniques typically use constraint-driven mathematical
formulation in which STA results are incorporated as constraints and
possibly in the objective function.
• In practice, some industrial flows do not incorporate timing-driven
methods during initial placement because timing information can be quite
inaccurate until locations are available.
– Instead, subsequent placement iterations, especially during detailed
placement, perform timing optimizations.

4
Net Based Techniques
• These approaches impose either quantitative priorities that
reflect timing criticality (net weights), or upper bounds on the
timing of nets in the form of net constraints (delay budgets).
• Net weights are more effective at the early design stages,
while delay budgets are more meaningful if timing analysis is
more accurate.

5
(a) Net Weighting
• A traditional placer optimizes total wirelength and routability.
• To account for timing, a placer can minimize the total weighted wirelength,
where each net is assigned a net weight.
– The higher the net weight is, the more timing-critical the net is.
• Net weights can be assigned either statically or dynamically to improve the
timing.

6
Static Net Weights
• They are computed before placement and do not change.
• They are usually based on slack: the more critical the net (i.e. smaller
slack), greater is the weight.
• Static net weights can be either discrete:
 ω if slack > 0
w= 1 where ω1 > 0, ω2 > 0, and ω2 > ω1
ω 2 if slack ≤ 0
• Or they can be continuous:
α
 slack 
w = 1 − 
 t 
where t is the longest path delay and α is a criticality exponent.

7
• Alternatively, net weights can be assigned based on sensitivity, as:
w = wo + α( slack target − slack ) ⋅ s wSLACK + β ⋅ s w
TNS

where w0 is the original net weight TNS: Total Negative Slack


slack is the computed slack value of the net WNS: Worst Negative Slack
slacktarget is the target slack of the design
swSLACK is the slack sensitivity to the weight of the net
swTNS is the TNS sensitivity to the net weight
α and β are constants bounds on the net weight change that control the
tradeoff between WNS and TNS.

8
Dynamic Net Weights
• They are computed during placement iterations and keep an updated timing
profile.
• This can be more effective than static net weights, since they are computed
before placement, and can become outdated when net lengths change.
• Estimated slack of a net at iteration k can be computed as:
slack k = slack k −1 − s LDELAY ⋅ ∆L
where ΔL is the change in wirelength between iterations (k-1) and k
slackk is the slack at iteration k
sLDELAY is the delay sensitivity to the wirelength

9
• After the timing information has been updated, the net weights should be
adjusted accordingly.
– This incremental method of weight modification is based on previous iterations.
• The net criticality at iteration k is computed as:
1
 2 (υ k −1 + 1) if among the top 3% of critical nets
υk =
1
 υ k −1
2 otherwise
• And then the net weight is updated as:
wk = wk −1 ⋅ (1 + υ k )

10
Integrated Technique using Linear Programs
• Unlike net-based methods, where the timing requirements are mapped to
net weights or net constraints, path-based methods directly optimize the
design’s timing.
– As the number of paths can grow quickly, this method is much slower than
net-based approaches.
• To improve scalability, timing analysis may be captured by a set of
constraints and an optimization objective.
– For example, in a linear programming framework.

11
• In the context of timing-driven placement, a linear program (LP) minimizes
a function of slack (e.g. TNS), subject to two main types of constraints:
1. Physical constraints, which define the locations of the cells.
2. Timing constraints, which define the slack requirements.

• In addition, some electrical constraints may also be incorporated.

12
Physical Constraints:
• Given a set of cells V and the set of nets E, we define the notations:
– xv and yv denote the center of cell v V
– Ve denotes the set of cells connected to net e E

– left(e), right(e), bottom(e), and top(e) respectively denote the coordinates


of the left, right, bottom, and top boundaries of e’s bounding box
– δx(v,e) and δy(v,e) denote pin offsets from xv and yv for v’s pin connected to e

13
• Then, for all v ∈ Ve: left (e) ≤ xv + δ x (v, e)
Every pin of a given
right (e) ≥ xv + δ x (v, e)
net e must be
bottom (e) ≤ yv + δ y (v, e) contained within e’s
top (e) ≥ yv + δ y (v, e) bounding box.

• Then, e’s half-parameter wire-length (HPWL) is defined as


L(e) = right (e) − left (e) + top (e) − bottom(e)

14
Timing Constraints:
• For timing constraints, let
– tGATE(vi,vo) be the gate delay from an input pin vi to the output pin vo for cell v
– tNET(e,uo,vi) be net e’s delay from cell u’s output pin uo to cell v’s input pin vi
– AAT(vj) be the arrival time on pin j of cell v

• For every input pin vi of cell v, the arrival time at vi is the arrival time at
the previous output pin u0 of cell u plus the net delay:
AAT (vi ) = AAT (u o ) + t NET (u o , vi )

15
• For every output pin v0 of cell v, the arrival time at v0 should be greater
than or equal to the arrival time plus gate delay of each input vi. That is,
for each input vi of cell v,
AAT (vo ) ≥ AAT (vi ) + t GATE (vi , vo )

• For every pin τp in a sequential cell τ, the slack is computed as the


difference between the required arrival time RAT(τp ) and actual arrival
time AAT(τp ),
slack ( τ p ) ≤ RAT ( τ p ) − AAT ( τ p )
• Upper bound all pin slacks by zero (or a small positive value),
slack(τp) ≤ 0

16
Objective Functions:
a) Optimize the total negative slack (TNS) max : ∑ slack (τ
τ p ∈Pins ( τ ), τ∈Τ
p)

where Pins(τ) is the set of pins of cell τ, and


T is the set of all sequential elements or endpoints.
b) Optimize the worst negative slack (WNS)
max : WNS
where WNS ≤ slack(τp) for all pins.
c) Optimize some combination of wirelength and slack
where E is the set of all nets, α is a constant
min : ∑ L(e) − α ⋅ WNS
between 0 and 1 that trades off WNS and e∈E
wirelength, and L(e) is the HPWL of net e.

17
END OF LECTURE 37

18

You might also like