0% found this document useful (0 votes)
417 views

Tempus STA and Tempus DSTA: - An Overview

file
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
417 views

Tempus STA and Tempus DSTA: - An Overview

file
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Tempus STA and Tempus DSTA

- An Overview

1 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Agenda

• Tempus STA vs DSTA


• Four steps to convert STA to DSTA
• Performance recommendations
• Working with Tempus DSTA logs and machines
• Tempus Block Scope
• How to use Block Scope
• Training Resources

2 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Tempus STA vs DSTA
Which Tempus to Choose
• Tempus STA:
– When your maximum threading of one machine gives good runtime
– When your designs fit in the memory of one machine
– When your designs are below 50 million instances

• Tempus DSTA:
– If Tempus STA runtime is more than overnight
– If Tempus STA is running out of memory
– When your design is larger than 50 million instances
– You want 50% less runtime and will allow 5X the CPU count

• Tempus Block Scope:


– When your Full Chip run is too expensive to run every time
– If a few block level netlists are changing often
– If your block constraints are not showing the true top level violations

• Tempus Block Scope with DSTA clients:


– If your ECO blocks are each over 30 Million instances

4 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Running the Tempus Binary

• To run Tempus STA:


Tempus -nowin -overwrite -log tempus_sta.log

• To run Tempus DSTA:


tempus -distributed -nowin -overwrite -log tempus_dsta.log

5 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Tempus STA vs DSTA

• Tempus Static Timing Analysis (STA)


– Utilizes a single machine with multi-threading
– Performance is improved over ETS
– Maintains familiar TCL syntax & reports
– Maintains signoff accuracy
– Ideal for designs 20M instances or smaller
• Tempus Distributed Static Timing Analysis (DSTA)
– Utilizes multiple machines in parallel each with multi-threading
– Reduces overall runtime on large designs
– Reduces per-process memory footprint on large designs
– Maintains familiar TCL syntax & reports
– Maintains signoff accuracy
– Requires some minor script changes
– Can leverage many less powerful machines
– Ideal for designs 20M instances and larger
6 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..
Two methods for Multiple Timing Views

• D-MMMC
– A view is a combinations of library PVT, constraints, RC extractions
– Tempus STA can spawn each view as parallel runn.
– A 10 view design would take 10 machines with DMMMC.
– Multiple scripts, multiple machines, multiple sessions.

• C-MMMC
– One script can time multiple views at once.
– One script, one set of licenses
– This works with Tempus STA and DSTA.
– This is the Tempus DSTA preferred mode.
– No need for multiple directories, and multiple reports.
– Define the full job and let Tempus DSTA run with it.

7 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


DSTA and DMMMC different things

• Tempus STA with C-MMMC


– Goal: Use one machine to run Static Timing Analysis
– Uses one machine and one set of Tempus licenses

• Tempus STA with D-MMMC


– Goal: Split up views across many Tempus STA sessions
– Splits up the work by dividing up the views over parallel sessions
– Uses multiple machines and multiple Tempus licenses
– Each machine runs a standalone Tempus STA script

• Tempus DSTA with C-MMMC


– Goal: Use a lot of CPUs to speed up timing analysis
– Splits up the work by dividing up the netlist
– One script, one set of licenses
– No need for multiple directories, and multiple reports
8 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..
Four Steps to Convert STA to DSTA

9 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Step 1: Run an STA Script
# Non-distributed script
Puts “Starting script:
set_multi_cpu_usage -localCpu 4

read_verilog
read_lib
set_top module
read_spef

read_sdc
update_timing
report_timing

• Create a working STA script


– Run it on a single machine

10 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Step 2: Add DSTA Commands
# Non-distributed script # Distributed script
Puts “Starting script: Puts “Starting script:
set_multi_cpu_usage -localCpu 4 set_multi_cpu_usage …
set_distribute_host …
read_verilog distribute_start_clients
read_lib
set_top module read_verilog
read_spef read_lib
set_top module
read_spef
distribute_partition
read_sdc
update_timing read_sdc
report_timing update_timing
report_timing
• Basic scripting rules
– You must have at least 2 clients
– You must use set_multi_cpu_usage before starting clients
– You must load parasitics before distribute_partition
– You must load constraints after distribute_partition

11 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Step 3: Configure Multi-threading Options

• set_multi_cpu_usage
-cpuPerRemoteHost # Number of threads per client
-localCpu # Number of threads in master
-remoteHost # Number of clients

• Example:
set clientCnt 2
set threadCnt 4
set_multi_cpu_usage -cpuPerRemoteHost $threadCnt \
-localCpu $threadCnt -remoteHost $clientCnt

12 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Step 4: Configure how to get client machines
• You need to tell Tempus how to get access to clients
– The command is set_distribute_host
– There are several options to consider

• set_distribute_host
-timeout 300 # Safe default for LSF
-shellTimeout 300 # Only for SSH and RSH
-lsf, -sge, -rsh, ssh # Use lsf or sge sun grid engine or direct login
-queue ssv # Choose your own LSF queue name
-args {} # Choose your LSF options
-use_lsf_reservation # For use with LSF blaunch
-local # Use the one machine you have.

13 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Step 4: Methods to get client machines
• Blaunch method
– Assumes that LSF granted all required machines when the master started.
– Allows many Tempus DSTA runs to be batched up
– set_distribute_host -use_lsf_reservation

• Double LSF method


– Original Tempus method. No benefit over blaunch
– set_distribute_host -timeout 300 -lsf -queue lnx64 –args \
{-n 4 –R “CPUS>=4 span[hosts=1] rusage[mem=30000:tmp=15000]"}

• Password-less ssh method


– Requires IT to enable passwordless SSH or RSH on all machines in the list
– Good for customers without LSF or SGE
– set_distribute_host -ssh -shellTimeout 35 -add \
[list sjfsb417 sjfsb415]

• Local machine method


– Good if you only have access to one machine.
– Typically used for training on small testcases. Very hard on memory.
– set_distribute_host -local

14 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


LSF Blaunch Linux syntax
#! /bin/csh -f
setenv TEMPUS “tempus -distributed -log tempus_dsta.log –init run_dsta.tcl"
setenv BSUB /grid/sfi/hpc/lsf/v9.1.1/linux2.6-glibc2.3-x86_64/bin/bsub
set clientCnt = 4
set threadCnt = 8 Design dependant choices
setenv maxMem 90000
setenv tmpDisk 10000

-P May be optional
set machineCnt = `expr $clientCnt + 1 ` -q needs to be changed
set procCnt = `expr $machineCnt \* $threadCnt ` -m May be optional

$BSUB -Ip -J batch${machineCnt}m${threadCnt}t -P TEMPUS:14.1:PE:build \


-q ssv -m "ssvgv" -n $procCnt -R " \
${threadCnt}*{rusage[mem=${maxMem}:tmp=${tmpDisk}] span[ptile=${threadCnt}]} + \
${threadCnt}*{rusage[mem=${maxMem}:tmp=${tmpDisk}] span[ptile=${threadCnt}]} + \
${threadCnt}*{rusage[mem=${maxMem}:tmp=${tmpDisk}] span[ptile=${threadCnt}]} + \
${threadCnt}*{rusage[mem=${maxMem}:tmp=${tmpDisk}] span[ptile=${threadCnt}]} + \
${threadCnt}*{rusage[mem=${maxMem}:tmp=${tmpDisk}] span[ptile=${threadCnt}]} \
$TEMPUS

For a batch job remove -Ip One line for the master process and
one additional line for each client.

15 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


LSF Blaunch Tempus syntax
# Tempus script
set_distribute_host -use_lsf_reservation
Puts "Master threading is: [get_multi_cpu_usage -localCpu]"
distribute_start_clients

When using blaunch:


– Tempus automatically knows the client count.
– Tempus automatically knows the thread count
– Thread count can be reduced set_multi_cpu_usage if you like.
# Logfile
Master
<CMD> set_distribute_host -use_lsf_reservation
LSF has reserved the following hosts(cpus): sjfsb086 ( 8 cpus) Client0
sjfsb091 ( 8 cpus)
sjfsb094 ( 8 cpus)
Client1
Submit command for task runs will be: {{blaunch -n sjfsb091}1}{{blaunch -n sjfsb094}1}
<CMD> distribute_start_clients
Starting 2 client processes.
Connected to sjfsb087 50433 1 ( PID=31035 ) Client0 Client1
Connected to sjfsb095 49589 0 ( PID=22190 )

16 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Performance Recommendations

17 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


DSTA Hardware Recommendations

• All clients should have similar performance


– The slowest client can dominate overall runtime
– Roughly similar machines will balance the workload well

• Memory Expectations
– 1.5GB peak memory per million instances (master)
– Clients may use less.

• Local Disk Space Expectations


– Each master and client machine needs local /tmp
– 1-2 Gig of /tmp for each million instances
– Many chips can consume 10-50 Gig of /tmp on each machine

18 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Machine configurations

• Choosing DSTA configuration per design size

Design Client Master Resulting


Instance Client Thread Thread CPU
Count Cnt Cnt Cnt Cnt Comment
A few million 2 2 2 6 Tempus RAK training only
20 million 4 4 4 20 Small design
50 million 4 8 8 40 Medium design
100 million 8 8 8 72 Larger design
Master needs more threads
Increased client count saves
200 million 12 8 12 108 memory
Very large designs can benefit
from a few more Master process
400 million 12 12 16 160 threads and client threads.

19 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Tempus STA
Diminishing Returns with high thread counts

STA
Minutes
600.0
Minutes

400.0

200.0 STA
Minutes
0.0
0.0 5.0 10.0 15.0 20.0 25.0 30.0
CPU Count

At some point there can be diminishing returns.


This is where Tempus DSTA might start to help.
20 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..
DSTA takes over where STA leaves off
Minutes vs Cpu Count
600.0
500.0
400.0
Minutes

300.0 STA
Minutes
200.0 DSTA
100.0 Minutes

0.0
0.0 20.0 40.0 60.0 80.0
CPU Count

STA clearly hit a limit on this design around 20 CPUs


DSTA just gets started around 20 CPUs
42 million instances, 4 unique timing views CMMMC
21 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..
DSTA Memory Footprint
With increased client count 57 million design
Main Master Client Client Client Client Client Client Client Client
Client Thread runtime mem 0 1 2 3 4 5 6 7
Count Count (min) Gig Mem Mem Mem Mem Mem Mem Mem Mem
4 8 165.4 59.2 124.2 80.4 75.9 76.3
5 8 137.4 60.2 118.3 67.8 70.0 61.7 66.1
7 8 104.4 59.2 95.0 58.4 57.5 58.3 56.8 57.7 53.6
8 4 162.5 58.4 107.2 52.9 55.4 57.2 56.2 54.5 47.6 41.0

• Each additional client reduces memory footprint


• By design Client0 uses more memory than other clients
– User can designate special machine for client0
• 5 clients can use 64G machines (some swap)
• Single-machine solution requires 256G machine

22 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


General Performance Tips (STA & DSTA)

• Use LDB for libraries


– Using LDB for libraries can save a few minutes
– You can create this yourself
• Use RCDB for parasitics (Mostly for STA)
– Using RCDB for parasitics to save an hour.
– You can create this yourself
• Use precomplied constraints
– Read in TCL constraints in STA mode
– Write_sdc then use that SDC file for all future runs
• SPEF reading will use up to 8 threads

23 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


More DSTA runtime gains after update_timing

• Report are very fast in Tempus


– Major source of the runtime improvements
– Reports are threaded with set_multi_cpu_usage in the master script.
– Reports are distributed among the clients automatically

• PBA reports are very fast in Tempus


# set_global report_timing_format { .. retime_slew retime_delay …}
set_global report_timing_format { hpin arc cell slew load delay arrival \
retime_slew retime_delay }
set_global timing_report_group_based_mode true
set_global timing_report_group_based_worst_path_required true
report_timing -retime path_slew_propagation -max_paths 500000 \
-nworst 1 -path_type full_clock > pba_200K.rpt.gz

• Regular Timing reports are fast


report_timing -late -max_paths 200000 -nworst 1 -net > setup_200K.rpt
report_timing -early -max_paths 200000 -nworst 1 -net > hold_200K.rpt

24 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Working with Tempus DSTA

25 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Where are the DSTA log files

• Master log
– The default logfile names start with ./tempus.log
– They will increment to tempus.log1 in the next run

• Tempus options
– “tempus -log” will let you choose the logfile name
– “tempus -overwrite” will insure the name does not increment

• The Master log contents


– The Master starts the clients and loads the design
– The clients do most of the work with constraints, timing, and reports

26 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


DSTA Client logs

• Client logs
– Each client is running a copy of Tempus
– Each client has its own directory and log file
– partOutput_0/tempus.log
– partOutput_1/tempus.log
– partOutput_2/tempus.log

• Other client files


– tempus.logv Like the tempus.log but with timestamps
– tempus.cmd Shows the commands that have run
– cmd.stdout Shows the last command outputs
– cmd.stderr Shows the last command errors

27 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


DSTA Master log example
• Master logs are unique to DSTA
--- Running on sjfsb419 (x86_64 w/Linux 2.6.18-308.el5) ---
This version was compiled on Sun Sep 1 14:53:30 PDT 2013.
INFO (DSTA-1025): Sourcing /icd/flow/ETS/ETS132/13.20-
b059_1/lnx86/share/tcltools/icd8.5.9/lib/tcl8.5/history.tcl
INFO (DSTA-1025): Sourcing ../blank/run_dsta.tcl

INFO (DSTA-1017): Starting 2 client processes.


INFO (DSTA-1013): Reading verilog file: ./zondammav10gs80.final.v
INFO (DSTA-1600): Loading library file: ./GS80_W_-40_0.92_0.92_CORE.lib.gz
INFO (DSTA-1025): Sourcing ./tim_ets_settings.tcl

INFO (DSTA-1193): Reading spef file: ./zondammav10gs80.spef.maxc_maxvia_-40.gz

INFO (DSTA-1421): After generating partitions: Cpu=00:01:43 Real=00:01:40


Peak_mem=4257meg Cur_mem=4257meg

INFO (DSTA-1020): Running command: update_timing -full


INFO (DSTA-1389): Cond arc const client 1 starts at 1378131471.
INFO (DSTA-1389): Cond arc const client 0 starts at 1378131522.
INFO (DSTA-1389): Cond arc const client 0 all done at 1378131523.
INFO (DSTA-1389): Cond arc const client 1 all done at 1378131523.

28 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


DSTA Client log examples
• Client logs are just like a Tempus STA log
--- Starting "Tempus Timing Signoff Solution v13.20-b059_1" on Mon Sep 2 12:38:07
2013 (mem=70.9M) ---
Server is up on sjfsb419:44529 ( PID=830 )
Multi-CPU acceleration using 4 CPU(s).

Scheduling LEF file(s) ./uc_CS402LND.lef to be loaded when the set_top_module command


is issued.
Scheduling LEF file(s) ./uc_CS402LZD.lef to be loaded when the set_top_module command
is issued.
Scheduling timing library file(s) ./LIB/scIMux_f_worst.lib to be loaded when the
set_top_module command is issued.
Scheduling timing library file(s) ./LIB/scxOpt_f_worst.lib to be loaded when the
set_top_module command is issued.

Set top cell to scTop.


** info: there are 406704 modules.
** info: there are 13891666 stdCell insts.
** info: there are 53 Pad insts.
** info: there are 293926 macros.

<CMD> report_timing
Analyzing view default_emulate_view with delay corner[0]
default_emulate_delay_corner, rc corner[0] ...
All-RC-Corners-Per-Net-In-Memory is turned ON...
Analyzing view default_emulate_view with delay corner[0]
default_emulate_delay_corner, rc corner[0] …

29 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Tracking Master Runtime and Memory

• A new DSTA command tracks master process performance

• dist_print_usage
– Will print the resource usage for the master

• Example output
Cpu=03:02:02 Real=02:31:31 Peak_mem=70502meg Cur_mem=70494meg

30 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Tracking Client Runtime and Memory

• A new DSTA command tracks client performance

• dist_print_client_usage
– Will print the resource usage for each of the clients
– Will not interrupt or wait for whatever the client is currently doing

• Example output
INFO (DSTA-1039): Client resource usage:
Client 0: Cpu=00:44:25 Real=01:25:57 Peak_mem=11095meg Cur_mem=9816meg
Client 1: Cpu=00:44:29 Real=01:25:58 Peak_mem=11338meg Cur_mem=10072meg
Client 2: Cpu=00:44:25 Real=01:25:57 Peak_mem=11095meg Cur_mem=9816meg
Client 3: Cpu=00:44:29 Real=01:25:58 Peak_mem=11338meg Cur_mem=10072meg

31 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


How to Access Specific Machines (by name)
• Using LSF to avoid a bad machine
bsub -q ssv -m ssvpe -n 4 \
-R “(hname != sjfsb416 ) span[hosts=1] rusage[mem=3000]”

• Using LSF to targeting a specific machine


bsub -q ssv -m ssvpe -n 4 \
-R "hname = sjfsb416 span[hosts=1] rusage[mem=3000]”

• Using LSF to targeting a group of machines


bsub -q ssv -m ssvpe -n 4 \
-R "(hname = sjfsb415 || hname = sjfsb416 || \
hname = sjfsb417 || hname = sjfsb418) \
span[hosts=1] rusage[mem=30000]

• Need more tmp space


bsub -q ssv -m ssvpe -n 4 \
-R " span[hosts=1] rusage[mem=3000:tmp=10000]“

32 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


STA and DSTA can coexists in one run.tcl

• One run.tcl can be use for both STA and DSTA


• Avoids mistakes when showing off the DSTA advantages
• Wrap the few DSTA commands with an IF statement
• Allow STA to skip the commands and DSTA to use them
• Method for starting Tempus automatically triggers the IF statement

if {[info command distribute_partition] != "" } {


Puts "You are using Tempus DSTA“
set_multi_cpu_usage -localCpu 4 -cpuPerRemoteHost 4 -remoteHost 2
} else {
Puts "You are using Tempus STA“
set_multi_cpu_usage -localCpu 4
}

33 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Tempus DSTA with LSF blaunch

Benefits of LSF blaunch:


• All machines are granted before the master or clients are started
• Tempus DSTA uses only the CPUs that LSF granted
• Script will not exit because a client timeout
• LSF job with PEND until all your CPUs are available
• Now you can batch up many scripts
• Simple master syntax for set_distribute_host

34 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


report_disk adds Messages to the Master log
Disk full
May need 10 -100 gig
For Tempus RCDB and scratch space

report_disk -dir /tmp


DISK_INFO (Master) : 150.81 Mb/S 190G total 16G used 9G free /tmp
DISK_INFO (Client 0) : 194.31 Mb/S 190G total 9.7G used 170G free /tmp
DISK_INFO (Client 1) : 293.83 Mb/S 190G total 19G used 161G free /tmp

report_disk -dir ./
DISK_INFO (Master) : 96.02 Mb/S 2.7T total 3.4G used 383G free ./
DISK_INFO (Client 0) : 22.91 Mb/S 2.7T total 609K used 383G free ./
DISK_INFO (Client 1) : 76.82 Mb/S 2.7T total 177K used 11G free ./

Very slow disk Disk full


70 – 700 is a good range May need 10gig
For timing reports

35 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Tempus DSTA monitor_host.log

============ ======== ======================== ==================


Host CPU Memory (GB) TMPDIR (GB)
name(id) util % total progs cache %progs used avail
============ ======== ===== ===== ===== ====== ==================
10:24:06
All clients are the
sjfib232(0) 24 E 377 372 122 98 B 57.73 157.97 E
same machine this can
consume a lot of
sjfib232(1) 25 E 377 373 122 98 B 57.73 157.97 E
memory
sjfib232(2) 25 E 377 373 122 98 B 57.73 157.97 E
10:38:09
sjfib232(0) 23 E 377 377 104 100 T 57.73 157.97 E
Hosts memory at 100% used
sjfib232(1) 24 E 377 377 104 100 T 57.73 157.97 E
sjfib232(2) 24 E 377 377 104 100 T 57.73 157.97 E
10:40:57
sjfib232(0) Two hosts failed
sjfib232(1) 29 E 377 377 113 100 T 0.00 0.00 T to report stats.
sjfib232(2)
Memory is likely the
cause of the crash

36 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


Training Resources
Latest Tempus User Guides and References
 All latest online documentation is available via
https://support.cadence.com > Product Pages > Product Manuals > SSV
17.1
− Tempus Error Message Guide
− Tempus Foundation Flow User Guide
− Tempus Known Problems and Solutions
− Tempus Menu Reference
− Tempus Text Command Reference
− Tempus User Guide
− Tempus What's New

38 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


38
Application Notes and Rapid Adoption Kits
 Application notes
− Login to https://support.cadence.com
− Navigate: Resources > Application Notes
− Filter using product as Tempus

 Tempus Rapid Adoption Kits (RAKs)


− Login to https://support.cadence.com
− Navigate: Resources > Rapid Adoption Kits
− Filter using product as Tempus

39 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


39
Cadence Customer Training
 Go to http://www.cadence.com/training/na/Pages/default.aspx
− Expand Digital IC Design – Encounter.
− Click Online Courses to see a list of courses that are offered in that format.
− Click Register for the Tempus or any other class.

40 Copyright © 2017 Cadence Design Systems, Inc. All rights reserved..


40

You might also like