0% found this document useful (0 votes)
15 views

Gan Cts

Uploaded by

madhuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Gan Cts

Uploaded by

madhuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

GAN-CTS: A Generative Adversarial Framework

for Clock Tree Prediction and Optimization


Yi-Chen Lu1 , Jeehyun Lee1 , Anthony Agnesina1 , Kambiz Samadi2 , and Sung Kyu Lim1
1
School of ECE, Georgia Institute of Technology, Atlanta, GA
2
Qualcomm Technologies, Inc., San Diego, CA
[email protected]

Abstract—In this paper, we propose a complete framework • Generality of Model: No previous work conducts the
named GAN-CTS which utilizes conditional generative adver- prediction experiments on the unseen netlists. They utilize
sarial network (GAN) and reinforcement learning to predict the same netlists during training and testing stages.
and optimize clock tree synthesis (CTS) outcomes. To precisely
• Interpretability of Parameters: No previous work inter-
characterize different netlists, we leverage transfer learning to
extract design features directly from placement images. Based prets the importance of each CTS input parameter subject
on the proposed framework, we further quantitatively interpret to distinct CTS outcomes.
the importance of each CTS input parameter subject to var- • Optimization of Metrics: No previous work demon-
ious design objectives. Finally, to prove the generality of our strates the capability to optimize the CTS outcomes with
framework, we conduct experiments on the unseen netlists which
are not utilized in the training process. Experimental results respect to various design objectives.
performed on industrial designs demonstrate that our framework • Suggestion of Inputs: Given a design, no previous work
(1) achieves an average prediction error of 3%, (2) improves the exhibits the ability to suggest the CTS input parameters
commercial tool’s auto-generated clock tree by 51.5% in clock sets that achieve optimized clock trees. They merely
power, 18.5% in clock wirelength, 5.3% in the maximum skew, adopt discriminative approaches to model the CTS stage.
and (3) reaches an F1-score of 0.952 in the classification task of
determining successful and failed CTS processes. These aspects which previous works leave under-addressed
are the most crucial factors that determine whether the pro-
I. I NTRODUCTION posed modeling methods are practical or not. In this paper,
we solve all the issues presented above. Furthermore, a unique
Clock tree synthesis (CTS) is a critical stage of physical aspect of our work is that we directly extract design features
design, since clock network often constitutes a high percentage from placement layout images to perform practical CTS out-
of the overall power in the final full-chip design. An optimized comes predictions in ultra-high precision. We demonstrate that
clock tree helps to avoid serious design issues such as exces- the features extracted from the layouts are indeed highly useful
sive power consumption, high routing congestion, elongated to improve the accuracy of our models.
timing closure, etc [3]. However, due to the high complexity The goal of this work is to construct a general and practical
and run time of electronic design automation (EDA) tools, CTS framework which owns the capability to predict CTS
designers are struggling to synthesize high quality clock trees outcomes in high precision as well as the facility to recom-
that optimize key desired metrics such as clock power, skew, mend designers the input parameters sets leading to optimized
clock wirelength, etc. To find the input parameters that achieve clock trees even for unseen netlists. Our proposed framework
desired CTS outcomes, designers have to search in a wide consists of a regression model and a variation of generative
range of candidate parameters, which is usually fulfilled in a adversarial network (GAN) [5] named conditional GAN [14].
manual and highly time-consuming calibration fashion. Apart from the conventional GAN training process, in this
To automate this task and alleviate the burden for designers, work, a reinforcement learning is introduced to the generator,
several machine learning (ML) techniques have been proposed and a multi-task learning is presented to the discriminator.
to predict the clock network metrics. Previous work [10] The reinforcement learning is conducted by the regression
utilizes data mining tools to estimate the achieved skew and model which is trained prior to the conditional GAN. It
insertion delay. The authors of [8] employ statistical learning serves as a supervisor to guide the generator to produce the
and meta-modeling methods to predict more essential metrics CTS input parameters sets that lead to optimized clock trees.
such as clock power and clock wirelength. The authors of [9] As for the discriminator, it not only predicts whether the
consider the effect of non-uniform sinks and different aspect input is generated or real as the conventional approach, but
ratios. A recent work [12] utilizes artificial neural networks also predicts if the input results in a successful CTS run or
(ANNs) to predict the transient clock power consumption not. In this paper, a successful CTS run indicates that the
based on the estimation of clock tree components. However, all achieved clock tree outperforms the one auto-generated by the
the previous works only focus on enhancing the performance commercial tool.
of prediction. There are four highly crucial aspects that all of The rest of the paper is organized as follows. Section II
them neglect, namely: presents a solid overview of our framework. Section III

978-1-7281-2350-9/19/$31.00 ©2019 IEEE


random we select three essential CTS metrics that characterize a clock
Generator
noise tree to predict and to further optimize in later processes. The
2 three selected metrics are clock power, clock wirelength, and

Feature Extraction
Flip Flop generated real the achieved maximum skew. As mentioned in Section I, we
CTS inputs CTS inputs do not consider our model as a black box as previous works.
To further interpret the predictions made by the regression
model, we leverage a gradient-based attribution method [1]
Clock Net Regression to quantitatively interpret the importance of each CTS input
Discriminator
model parameter subject to different target outcomes.
1 3 The last training stage of the framework involves a rein-
CTS outcome fake/real success/fail forcement learning and a generative adversarial learning. We
Routing prediction decision decision leverage a conditional GAN to perform the CTS optimization
Fig. 1: A high-level view of this work and the three objectives we and classification tasks. The regression model trained prior at
have achieved. The first objective is to predict the CTS outcomes in this stage now acts as a supervisor to guide the generator to
high precision. The second objective is to recommend designers good generate CTS input parameters sets which lead to optimized
CTS input settings. The third objective is to determine whether the clock trees. The guidance is conducted by a reinforcement
input settings outperform commercial tool’s auto-setting. learning technique named policy gradient [16].
In conditional GAN, we take the extracted placement fea-
tures as the conditional input, where the original inputs of
describes the problem formulations, database generation, and
the vanilla GAN model are considered as regular inputs. The
identifies the difficulties of the problems. Section IV illustrates
great advantage of having the conditional input is that it
the algorithms and the detailed structures of the models.
allows the model to control the modes of the generated data.
Experimental results are demonstrated in Section V, and finally
In this work, we consider different benchmarks as different
we conclude our work in Section VI.
modes. Therefore, with the aid of the conditional approach,
our framework has the facility to optimize unseen benchmarks
II. OVERVIEW
which are not utilized during the training process.
It has been widely acknowledged that GAN is a promising Finally, a highlight of our framework is that a multi-task
model that learns the complicated real-world data through a learning is conducted in the discriminator. In addition to the
generative approach [17]. A vanilla GAN structure contains a conventional task of distinguishing between generated and real
generator and a discriminator which are both neural networks. samples, we introduce a new task of classifying successful
The goal of the generator is to generate meaningful data from and failed CTS runs. In this paper, we strictly define a CTS
a given distribution (e.g. random noise), while the objective run as successful if two out of the three achieved target CTS
of the discriminator is to distinguish the generated samples metrics mentioned earlier are better than the ones achieved
from the real samples in the database. Upon the vanilla automatically by the commercial tool.
GAN structure, in conditional GAN, an external conditional In the end of the training process, we acquire four models
input is further introduced to both generator and discriminator. as follows:
This conditional input enables the model to direct the data • A placement feature extractor E which precisely charac-
generation process, which benefits us to generate suitable CTS terizes different designs from placement images.
input parameters sets with respect to different benchmarks and • A regression model R which performs high precision
even the ones which are not utilized in the training process. predictions of target CTS outcomes.
A high-level view of our framework named GAN-CTS is • A generator G which generates CTS input parameters
shown in Figure 1. The framework is comprised of three sets that lead to optimized clock trees.
training stages. The first training stage is to extract key design • A discriminator D which predicts the success and failure
features from placement images which represent flip flop of CTS runs.
distribution, clock net distribution, and trial routing result.
Note that trial routing is a process performed in the end III. D ESIGNING E XPERIMENTS
of the placement stage, which provides an estimation of the We formally define the clock tree synthesis (CTS) prediction
routing congestion in a given placement. To extract features and optimization problems as follows:
from images, we introduce transfer learning to a pre-trained Problem 1 (CTS Outcomes Prediction). Given a pre-CTS
convolutional neural network (CNN) named ResNet-50 [6], placement P and a CTS input parameters set X, predict the
which is a 50-layer residual network. post-CTS outcomes Y without performing any actual CTS.
Following from the feature extraction process, we utilize the
extracted features as well as the clock trees in the database to Problem 2 (CTS Outcomes Optimization). Given a pre-
train the regression model and to predict CTS outcomes. The CTS placement P , generate a CTS input parameters set X̂
regression model is a multi-output deep neural network which that leads to optimized CTS outcomes Ŷ .
predicts multiple CTS metrics simultaneously. In this work,
TABLE I: Our benchmarks and their attributes.
Netlist Name # Nets # Flip Flops # Total Cells
AES-128 90,905 10,688 113,168
Arm-Cortex-M0 13,267 1,334 12,942
NOVA 138,171 29,122 136,537
ECG 85,058 14,018 84,127
JPEG 231,934 37,642 219,064 NOVA
LDPC 42,018 2,048 39,377
TATE 185,379 31,409 184,601

TABLE II: Modeling parameters we use and their values.


type parameters values or ranges
aspect ratio {0.5, 0.75, 1.0, 1.25, 1.5}
placement
utilization {0.4, 0.45, 0.5, · · · , 0.7}
Fig. 2: Total power (mW) distribution among 24.5k full-chip designs
max skew (ns) [0.01, 0.2]
max fanout [50, 250] of all 7 benchmarks in our database.
max cap trunk (pF) [0.05, 0.3]
max cap leaf (pF) [0.05, 0.3] TABLE III: Clock trees with different input slew constraints for Nova
CTS
max slew trunk (ns) [0.03, 0.3] benchmark.
max slew leaf (ns) [0.03, 0.3]
max latency (ns) [0, 1] tree A tree B tree C
max early routing layer {2, 3, 4, 5, 6} max trunk slew (ns) 0.1 0.1 0.05
min early routing layer {1, 2, 3, 4, 5} max leaf slew (ns) 0.1 0.05 0.05
max buffer density [0.3, 0.8] # inserted buffers 2,556 600 1,124
total power (mW) 164.7 154.5 158.8
clock power (mW) 27.9 22.6 24.9
clock wirelength (mm) 118.7 97.3 101.6
A. Database Construction maximum skew (ns) 0.06 0.1 0.07
To build the database, we utilize Synopsys Design Compiler
2015 to synthesize the netlists and leverage Cadence Innovus
Implementation System v18.1 to perform the placement and decreases. However, if tighter slew constraints are given on
CTS processes. The database is constructed based on TSMC- both trunk cells and leaf cells (from design A to design
28nm technology node. Table I shows the netlists utilized in C), more buffers are inserted compared with the previous
our work and their attributes after synthesized at 1125MHZ. approach. In summary, the above analyses show that the
Table II defines the modeling features and their ranges of behavior of CTS is very sophisticated, which implies great
values that we utilize in our work. The ranges of the CTS benefits to employ ML methods in modeling the CTS process.
related parameters are determined by reasonably widening the In this paper, we consider the clock tree auto-generated
commercial tool’s auto-setting values. Among the modeling by the commercial tool as a baseline to evaluate the trees
features, aspect ratio and utilization rate represent physical generated by our framework. As mentioned in Section II, we
structures. The min and max early routing layers (min ≤ max) define a CTS run as successful if two out of the three achieved
indicate the metal layers utilized in early global routing (EGR), target metrics, which are clock power, clock wirelength, and
which is a procedure to reserve space for future detailed signal clock skew, are better than the ones of the auto-generated tree.
routing during the CTS stage. In Section V, we demonstrate the CTS metrics and layout
The combinations of the two placement related parameters comparisons between the optimized clock trees generated by
give us 35 different placements per netlist. By running CTS our framework and the one auto-generated by the commercial
with randomly sampled CTS input parameters, we generate tool.
100 clock trees per placement. Therefore, in total, we have
IV. GAN-CTS A LGORITHMS
24.5k datum across 7 different netlists in our database. To
substantiate the generality of our framework, in the experi- In this section, we first describe the process of feature
ments, we only utilize 5 netlists (ARM, ECG, JPEG, LDPC, extraction. Then, we present our methodologies to solve the
TATE) in the training process and perform the validations on CTS outcomes prediction and optimization problems. In the
the rest 2 unseen netlists (AES, NOVA). meantime, we illustrate the detailed structures of the models.
In the end, we summary the overall training process.
B. Database Analysis
Figure 2 shows the total power distribution of all 24.5k A. Placement Image Feature Extraction
clock trees in the database. It demonstrates the variety of One of the innovations of this work is that we directly
our database as well as the difficulty to model the CTS utilize placement images as inputs to predict and optimize
process across different benchmarks. Table III demonstrates CTS outcomes. The key rationale is that placement images
the complicated impact of different input slew constraints on contain important information of designs. Previous work [19]
essential CTS metrics. We observe that if a single tighter has demonstrated the efficiency of using placement images
slew constraint is merely given on leaf cells (from design A to predict routability and number of design rule violations
to design B), the total number of inserted buffers drastically (DRVs). In this work, we leverage the extracted features
Original Image Vectors Extracted Feature Vectors

embedding dimension 2
NOVA

(a) original image (b) image in four different filters

Fig. 3: Visualization of trial routing image in convolutional filters. embedding dimension 1 embedding dimension 1
Fig. 5: t-SNE visualizations of the original placement image vectors
Auxiliary Input and the extracted feature vectors.
(# Flip Flops)
Concatenate Layer
extracted CTS
FC 1(512 neurons) features parameters

ResNet-50 ResNet-50 ResNet-50


FC 2 (256 neurons) Shared FC layer 1

FC 3 (64 neurons) Shared FC layer 2

Shared FC layer 3
FC 4 (1 neuron)

power FC layer 1 WL FC layer 1 skew FC layer 1


Trial Routing Flip Flops Clock Net Estimated Power

Fig. 4: Our image feature extraction flow. The extracted features are power FC layer 2 WL FC layer 2 skew FC layer 2
colored in red.
power FC layer 3 WL FC layer 3 skew FC layer 3

from placement images to solve CTS outcomes prediction power prediction WL prediction skew prediction
and optimization problems. Our approach is built upon con- Fig. 6: Our multi-output regression model.
volutional neural network (CNN) and transfer learning, where
we devise our own fully connected (FC) layers upon the pre-
trained model named ResNet-50 [6], which is a CNN-based output vector of the first FC layer with 512 dimensions denoted
model pre-trained on the well-known ImageNet [4] dataset in red in the figure as the final extracted feature vector.
with millions of images. In the implementation, all the images are in a dimension
Figure 3 shows the visualization of a trial routing image of 700 x 717 x 3. The achieved mean absolute percentage
passing through four convolutional filters inside the pre-trained error of the unseen validation netlists to predict the estimated
ResNet-50 model. In the figure, we observe that important power is only 1%. However, since a low prediction error does
information such as usage of metal layers is well captured. not guarantee a good feature representation, we leverage a
Despite that ResNet-50 is powerful on many image datasets, dimension reduction technique named t-distributed stochastic
it is not devised specifically for physical design problems. neighbor embedding (t-SNE) [13] to visualize the extracted
Therefore, to extract key design features from the high di- features (∈ R512 ) in R2 as shown in Figure 5. In the figure,
mensional vectors, we devise a feature extraction flow with we observe that different placements belonging to the same
transfer learning as shown in Figure 4. Four self-designed FC netlist are clustered together and those belonging to different
layers are appended on top of the ResNet-50 model to predict netlists are well separated. Therefore, we conclude that the
the commercial tool’s estimation of total power right after extracted features well capture the characteristics of a design.
the placement stage. As shown in the figure, since placement
images cannot precisely reflect the actual size of a design, B. CTS Outcomes Prediction
we introduce an auxiliary input with one dimension to our Constructing a precise regression model is the key step to
FC layers which represents the total number of flip flops. In solve Problem 1. Following the feature extraction process,
the training process, we fix the parameters of the pre-trained we train the regression model with the extracted features to
ResNet-50 model and only update the parameters of the FC predict the target CTS outcomes. As mentioned in Section II,
layers. After training, for each pre-CTS placement, we take the in this paper, we target at predicting and optimizing three
target CTS outcomes: clock power, clock wirelength, and the extracted random
maximum skew. The regression model is a multi-output deep features noise input layer (64 neurons)
neural network as shown in Figure 6. The model takes the
extracted features along with the CTS parameters as inputs and leaky ReLU
outputs three predictions simultaneously. Note that a multi- input
input layer
layer (256
(256 neurons)
neurons)
task learning is introduced in here. The reason we leverage batch normalization
a single multi-output model rather than multiple single-output leaky ReLU
models as previous work [12] is that different CTS outcomes output layer (D neurons)
batch normalization
are not independent of each other. Therefore, shared layers
enable different objectives to own mutual information, which tanh
hidden layer (128 neurons)
reduces the training time as well as enhances the performance
of each prediction. In the training process, we utilize mean leaky ReLU
squared error as the loss function, and leverage dropout layers regression discriminator
inside the model to prevent it from overfitting. The validation batch normalization model
results of this experiment are shown in Section V.
In this work, we do not treat the regression model as a black Fig. 7: Detailed structure of our GAN generator.
box as previous works. Instead, we leverage a gradient-based
attribution algorithm named DeepLIFT [15] to interpret the
predictions. The algorithm aims to determine the attribution of CTS modeling parameters, and a hyperbolic tangent layer
(relevance) value of each input neuron subject to different is chosen as the activation function to match the domain of
outputs. Assume our regression model R takes an input vector the normalized samples drawn from the database. Since when
x = [x1 , ..., xN ] ∈ RN and produces an output vector training the discriminator, we normalize the real samples x
S = [S1 , ..., Sk ]. DeepLIFT proceeds ali , the attribution of from the database to x̂ ∈ [−1, 1] as
neuron i at layer l, in a backward fashion by calculating the x
activation difference of the neuron between the target input x x̂ = × 2 − 1. (3)
maxx∈supp(x) (x)
and reference input x̂. The procedure is derived as
( In GAN-CTS, the generator has two important objectives.
L Si (x) − Si (x̂) if neuron i is the output of interest
ai = The first is to generate realistic samples that deceive the
0 otherwise, discriminator, where the corresponding objective function is
(1)
l
X zji − ẑji l+1 LGD = Ez,f [log(D(G(z, f ))]. (4)
ai = P P · aj , (2)
j i zji − i ẑji
The second objective is to generate the samples that lead
where L denotes the output layer, and zji is the weighted to optimized clock trees by maximizing reward r from the
activation of neuron i onto neuron j in the next layer. In the pre-trained regression model R. The reward r is defined as
implementation, we take the reference input x̂ as the input N
parameters of the auto-generated clock tree.
Y Ri (G(z, f ))
r := R(G(z, f )) = − , (5)
i=1
auto-setting result of target i
C. CTS Optimization
To solve Problem 2, we introduce a reinforcement learning where N denotes the number of target CTS outcomes and Ri
to the conditional GAN by taking the pre-trained regression denotes the corresponding prediction of the regression model.
model as a supervisor of the generator. Before illustrating the To optimize the reward defined in Equation 5, we leverage
objectives of the optimization process, we first describe the a policy gradient algorithm named REINFORCE [18]. The
structure of the generator. The generator G is a neural network algorithm considers the generator as an agent that strives to
parameterized by θg which samples a regular input z with maximize the accumulative reward by producing wise actions
100 dimensions from a N (0, 1) Gaussian distribution pz and G(z, f ) from a given environment created by the discriminator.
samples the extracted placement features f from the database The corresponding objective is derived as
pd as the conditional input.
LGP = Ez,f [r · ln(µ(G(z, f )))], (6)
Figure 7 shows the detailed structure of the generator. The
leaky ReLU [20] layers are employed as activation functions where µ is a softmax function that maps G(z, f ) to [0, 1].
of the input and hidden layers to project latent variables With the above objective function, the generator is expected to
onto a wider domain, which eliminates the bearing of van- generate the CTS input parameters sets G(z, f ) that maximize
ishing gradients. Batch normalization [7] layers are utilized the rewards r, which results in optimized clock trees.
to normalize the inputs of each hidden layer to zero-mean Finally, by combining the two objective functions illus-
and unit-variance, which accelerates the training process since trated, we formulate the training process of the generator as
the oscillation of gradient descent is reduced. Finally, for the
output layer, the number of neurons D denotes the number max E z∼pz [log(D(G(z, f ))) + r · ln(µ(G(z, f )))]. (7)
G f ∼pd
generated CTS inputs extracted real CTS inputs
Algorithm 1 GAN-CTS training methodology.
(generator) features (database) We use default values of αr = 0.0001, αGAN = 0.0001,
β1 = 0.9, β2 = 0.999, m = 128.
Input: {f }: extracted placement features, {x}: training data,
input layer (128 neurons) {Y }: target CTS metrics for prediction and optimization.
Input: αr : learning rate of regression model, αGAN : learning
leaky ReLU
rate of GAN, m: batch size, {θr }0 : initial parameters
hidden layer (64 neurons) of regression model, θg0 : initial parameters of generator,
{θd }0 : initial parameters of discriminator, {β1 , β2 }: Adam
leaky ReLU parameters.
Output: R: regression model, G: generator, D: discriminator.
1: N ← length(y)
hidden layer (64 neurons) hidden layer (64 neurons) 2: while {θr } do not converge do
3: Sample a batch of training data {x(i) }m i=1 ∼ pd
leaky ReLU leaky ReLU
4: Take features {f (i) }m i=1 corresponding to {x }i=1
(i) m

hidden layer (64 neurons) hidden layer (64 neurons) 5: for k ← 1 to N do


1
Pm (i) (i)
6: grk ← ∇r [ m i=1 (Rk (f , x(i) ) − Yk )2 ]
leaky ReLU leaky ReLU 7: θrk ← Adam(αR , θrk , grk , β1 , β2 )
8: end for
generated / real decision success / failure decision 9: end while
10: while θg and θd do not converge do
Fig. 8: Detailed structure of our GAN discriminator.
11: Sample a batch of training data {x(i) }m i=1 ∼ pd
12: Take features {f (i) }m i=1 corresponding to {x(i) }m i=1
(i) m
D. Success vs. Failure Classification 13: Sample a batch P of random vectors {z }i=1 ∼ pz
1 m
14: gd1 ← ∇θd1 [ m log(Dθd1 (x(i) , f (i) ))
The classification of successful and failed CTS runs are 1
Pm i=1
+ m i=1 log(1 − Dθd1 (Gθg (z (i) , f (i) )))]
performed in the discriminator. To describe how the clas-
15: θd1 ← Adam(αP GAN , θd1 , gd1 , β1 , β2 )
sification task works, we first illustrate the structure of the 1 m (i) (i)
16: gd2 ← ∇θd2 [ m i=1 log(Dθd2 (x , f ))]
discriminator as shown in Figure 8. The discriminator takes a
17: θd2 ← Adam(αGAN , θd2 , gd2 , β1 , β2 )
regular input which is either the generated samples G(z, f ; θg )
18: Sample a batch of random vectors {z (i) }m i=1 ∼ pz
or the real samples x from the database pd , and a conditional (i) m
19: Sample a batch of features {f }i=1
input which denotes the features extracted from placement 1
Pm (i) (i)
20: ggen ← −∇θg m i=1 log(Dθd1 (Gθg (z , f )))
images. Note that when the regular input represents the real
21: θg ← Adam(αGAN , θg , ggen , β1 , β2 )
samples x, the conditional input f must be aligned to x. The QN Rk (G({z(i) }m (i) m
}i=1 ))
i=1 ,{f
reason we introduce a conditional input to the discriminator is 22: r ← k=1 auto-setting result of outcome k
1
P m (i) (i)
that it helps the discriminator to distinguish better between 23: gθ ← ∇θ m i=1 r · ln(µ(G(z , f )))
the generated and the real samples under different modes 24: θg ← Adam(αGAN , θg , ggen , β1 , β2 )
(benchmarks). As shown in Equation 3, since the unit of each 25: end while
CTS input parameter is different, we normalize real samples
x from the database to x̂ ∈ [−1, 1] to improve the stability of
the GAN training process. shared in the early network as shown in Figure 8. Finally, the
In GAN-CTS, the discriminator has two objectives. One is training process of the discriminator is summarized as
to distinguish the generated samples from the real samples,
max E(x,f )∼pd [log(Dg (x, f )) + log(Ds (x, f ))]
which is derived as D
(10)
LDg =E(x,f )∼pd [log(Dg (x, f ))] + E z∼pz [log(1 − Dg (G(z, f )))].
f ∼pd
(8)
+ E z∼pz [log(1 − Dg (G(z, f )))].
f ∼pd E. Training Methodology
The other is to classify whether a CTS run is successful or Based on the structures and the objective functions pre-
not, which is formulated as sented, we now illustrate the training process of our framework
in Algorithm 1, where a gradient descent optimizer Adam [11]
LDs = Ex∼pd [log(Ds (x, f ))]. (9)
is utilized across different training stages. First, we train the
Note that the definition of successful and failed CTS runs regression model (lines 2-9) which serves as a supervisor
are defined in Section II. The reason we introduce a multi- in the training process of conditional GAN. Then, we train
task learning to the discriminator is that the attributes of the the generator and discriminator alternatively (lines 10-25),
two objectives are similar, where both of them are performing since the two networks have antagonistic objectives. The
binary classification. Therefore, some latent features can be parameters of the discriminator are split into θd1 and θd2
TABLE IV: Mean absolute percentage error (MAPE) (%) and corre-
we define the importance through a gradient-based attribution
lation coefficient (CC) of our predictor tested using unseen netlist.
method named DeepLIFT [15]. The algorithm quantifies the
unseen clock power clock wirelength achieved skew relevance of input parameters with respect to different outputs.
netlist MAPE CC MAPE CC MAPE CC
AES 1.26 0.978 1.64 0.986 3.57 0.965 Since the input of the regression model contains the CTS input
NOVA 1.19 0.982 1.24 0.981 2.66 0.979 parameters and the extracted features, in this interpretation
experiment, we focus on determining the relative importance
Skew among CTS input parameters by further normalizing the rele-
Power vance scores to [0, 1]. Note that the normalization is performed
WL within the CTS input parameters. Below, we explain two
important phenomenons observed from Figure 9.
1) The slew constraint for leaf cells has great impacts on
clock power and clock wirelength. Indeed, with a tight
slew constraint, more buffers need to be inserted to
meet the timing target, which ultimately results in higher
clock power and clock wirelength.
2) The max EGR layer has high impacts on maximum skew
and clock wirelength. The reason is that signal nets are
often routed in top metal layers (e.g. M5, M6). If signal
nets are forced to route in low metal layers (e.g. M1.
0 0.2 0.4 0.6 0.8 1 M2) that are reserved to route clock nets, there will be
many detours in clock routing, which results in long
Fig. 9: Relative importance of CTS input parameters on skew, power,
and wirelength for AES benchmark.
clock paths and hence a large maximum skew.
B. CTS Optimization Results
(θd1 ∩ θd2 6= φ) to represent different tasks, since a multi-task In this experiment, we demonstrate the optimization re-
learning is conducted. The overall training process is done sults achieved by our GAN-CTS framework, where a joint
when the losses of the generator and the discriminator reach optimization is performed on three target CTS metrics: clock
an equilibrium, which takes about 6 hours on a machine with wirelength, clock power, and the maximum skew. Figure 10
2.40 GHz CPU and a NVIDIA RTX 2070 graphics card. shows the optimization result of AES benchmark, where the
blue dots denote the original clock trees in the database, and
V. E XPERIMENTAL R ESULTS the red stars represent the clock trees generated by GAN-CTS.
In this section, we describe several experiments that demon- To plot the figure, we first take the extracted features of the
strate the achievements of GAN-CTS framework. The frame- pre-CTS placements as conditional inputs, and then utilize
work is implemented in Python3 with Keras [2] library. As the generator to suggest 200 sets of CTS input parameters.
mentioned in Section III, we utilize Cadence Innovus to gener- With these suggested parameters sets, we further leverage the
ate a database containing 24.5k clock trees with 245 different commercial tool to perform actual CTS processes. Finally,
placements across 7 netlists under TSMC-28nm technology according to the target optimization metrics, we plot the
node. All the netlists utilized are from OpenCores.org. To scatter distributions of the clock trees suggested by GAN-CTS
prove the generality of our framework, we only use 5 netlists together with the ones originally generated in the database.
during the training process, and leverage the remaining two Note that the input parameters of the clock trees in the database
(AES, NOVA) to perform the validation. are randomly sampled from the ranges shown in Table II.
Table V further shows the detailed comparisons between
A. CTS Prediction and Interpretation Results the commercial tool auto-generated clock tree and the selected
In this experiment, we evaluate the regression model on GAN-CTS generated clock trees, where the selection is con-
three target CTS outcomes with two evaluation metrics: the ducted by taking the tree with minimum clock power. It is
mean absolute percentage error (MAPE) and the correlation shown that compared with the auto-generated clock tree, the
coefficient (CC). Table IV shows the evaluation results. It is clock trees generated by GAN-CTS significantly reduce many
shown that the regression model achieves very low evaluation crucial CTS metrics. Figure 11 further exhibits the layout
errors among all target metrics on the unseen netlists, which comparison of AES benchmark, where the clock wirelength
substantiates that the regression model predicts in high preci- of the GAN-CTS optimized tree is much shorter than the one
sion and earns a good generality. auto-generated by the commercial engine.
However, accurate prediction results are not sufficient for
designers to trust a model. Understanding the reasons behind C. Success vs. Failure Classification Results
the predictions is crucial. As shown in Figure 9, we evaluate In the final experiment, we demonstrate the classification
the importance of each CTS input parameter based on the pre- results achieved by the discriminator of determining successful
dictions presented in Table IV. As mentioned in Section IV-B, and failed CTS runs. As mentioned in Section II, success and
TABLE V: Commercial auto-setting vs. GAN-CTS comparison.
netlist CTS Metrics Auto-Setting GAN-CTS
# inserted buffers 695 83 (-88.1%)
clock power (mW) 20.34 9.86 (-51.5%)
aes
clock wirelength (mm) 34.09 27.78 (-18.5%)
maximum skew (ns) 0.019 0.018 (-5.3 %)
# inserted buffers 2617 524 (-79.9%)
clock power (mW) 43.59 19.86 (-54.4%)
nova
clock wirelength (mm) 118.24 97.92 (-17.2%)
maximum skew (ns) 0.031 0.033 (+6.4 %)

(a) GAN-CTS optimized (b) commercial auto-setting


Fig. 11: Clock tree layout comparison with AES benchmark.

TABLE VI: Confusion matrix of success vs. failure classification in


NOVA benchmark. Failure means worse than auto-setting.
Predictions
Success Failure Total
Ground Success 1848 111 1959
Truths Failure 74 1453 1527
Total 1922 1564 3486

[4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet:
A large-scale hierarchical image database. In 2009 IEEE conference on
computer vision and pattern recognition, 2009.
Fig. 10: Distributions of random generated vs. GAN-CTS generated [5] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In
clock trees for AES benchmark.
Advances in neural information processing systems, 2014.
[6] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision
failure are defined by comparing the CTS metrics of the clock and pattern recognition, 2016.
[7] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network
trees generated by GAN-CTS to the one auto-generated by training by reducing internal covariate shift. arXiv:1502.03167, 2015.
the commercial tool. Table VI summarizes the classification [8] A. B. Kahng, B. Lin, and S. Nath. Enhanced metamodeling techniques
results in a confusion matrix with NOVA benchmark. The for high-dimensional ic design estimation problems. In Proceedings of
the Conference on Design, Automation and Test in Europe, 2013.
accuracy and the F1-score are 0.947 and 0.952, respectively. [9] A. B. Kahng, B. Lin, and S. Nath. High-dimensional metamodeling for
prediction of clock tree synthesis outcomes. In System Level Interconnect
VI. C ONCLUSION AND F UTURE W ORK Prediction (SLIP), ACM/IEEE International Workshop on. IEEE, 2013.
[10] A. B. Kahng and S. Mantik. A system for automatic recording and
In this paper, we have proposed a complete framework prediction of design quality metrics. In isqed. IEEE, 2001.
named GAN-CTS to solve the complicated CTS outcomes [11] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.
optimization and input generation problems which all previous arXiv preprint arXiv:1412.6980, 2014.
[12] Y. Kwon, J. Jung, I. Han, and Y. Shin. Transient clock power
works leave under-addressed. The fact that the proposed estimation of pre-cts netlist. In Circuits and Systems (ISCAS), 2018
framework demonstrates promising results on the unseen IEEE International Symposium on. IEEE, 2018.
benchmarks proves that it is general and practical. [13] L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of
machine learning research, 9(Nov), 2008.
In spite of the superior performance achieved, the proposed [14] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv
framework is currently dependent on the technology node. In preprint arXiv:1411.1784, 2014.
the future, we aim to overcome this dependence. [15] A. Shrikumar, P. Greenside, A. Shcherbina, and A. Kundaje. Not just a
black box: Learning important features through propagating activation
VII. ACKNOWLEDGMENT differences. arXiv preprint arXiv:1605.01713, 2016.
[16] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour. Policy gra-
This work is supported by the National Science Foundation dient methods for reinforcement learning with function approximation.
under Grant No. CNS 16-24731 and the industry members of In Advances in neural information processing systems, 2000.
[17] K. Wang, C. Gou, Y. Duan, Y. Lin, X. Zheng, and F.-Y. Wang.
the Center for Advanced Electronics in Machine Learning. Generative adversarial networks: introduction and outlook. IEEE/CAA
Journal of Automatica Sinica, 4(4):588–598, 2017.
R EFERENCES [18] R. J. Williams. Simple statistical gradient-following algorithms for
connectionist reinforcement learning. Machine learning, 8(3-4), 1992.
[1] M. Ancona et al. Towards better understanding of gradient-based attri- [19] Z. Xie, Y.-H. Huang, G.-Q. Fang, H. Ren, S.-Y. Fang, Y. Chen,
bution methods for deep neural networks. In International Conference and J. Hu. Routenet: routability prediction for mixed-size designs
on Learning Representations, 2018. using convolutional neural network. In 2018 IEEE/ACM International
[2] F. Chollet et al. Keras, 2015. Conference on Computer-Aided Design (ICCAD). IEEE, 2018.
[3] A. Datli, U. Eksi, and G. Isik. A clock tree synthesis flow tailored for [20] B. Xu, N. Wang, T. Chen, and M. Li. Empirical evaluation of rectified
low power. In https://www.design-reuse.com/articles/33873/clock-tree- activations in convolutional network. arXiv:1505.00853, 2015.
synthesis-flow-tailored-for-low-power.html, 2013.

You might also like