Gan Cts
Gan Cts
Abstract—In this paper, we propose a complete framework • Generality of Model: No previous work conducts the
named GAN-CTS which utilizes conditional generative adver- prediction experiments on the unseen netlists. They utilize
sarial network (GAN) and reinforcement learning to predict the same netlists during training and testing stages.
and optimize clock tree synthesis (CTS) outcomes. To precisely
• Interpretability of Parameters: No previous work inter-
characterize different netlists, we leverage transfer learning to
extract design features directly from placement images. Based prets the importance of each CTS input parameter subject
on the proposed framework, we further quantitatively interpret to distinct CTS outcomes.
the importance of each CTS input parameter subject to var- • Optimization of Metrics: No previous work demon-
ious design objectives. Finally, to prove the generality of our strates the capability to optimize the CTS outcomes with
framework, we conduct experiments on the unseen netlists which
are not utilized in the training process. Experimental results respect to various design objectives.
performed on industrial designs demonstrate that our framework • Suggestion of Inputs: Given a design, no previous work
(1) achieves an average prediction error of 3%, (2) improves the exhibits the ability to suggest the CTS input parameters
commercial tool’s auto-generated clock tree by 51.5% in clock sets that achieve optimized clock trees. They merely
power, 18.5% in clock wirelength, 5.3% in the maximum skew, adopt discriminative approaches to model the CTS stage.
and (3) reaches an F1-score of 0.952 in the classification task of
determining successful and failed CTS processes. These aspects which previous works leave under-addressed
are the most crucial factors that determine whether the pro-
I. I NTRODUCTION posed modeling methods are practical or not. In this paper,
we solve all the issues presented above. Furthermore, a unique
Clock tree synthesis (CTS) is a critical stage of physical aspect of our work is that we directly extract design features
design, since clock network often constitutes a high percentage from placement layout images to perform practical CTS out-
of the overall power in the final full-chip design. An optimized comes predictions in ultra-high precision. We demonstrate that
clock tree helps to avoid serious design issues such as exces- the features extracted from the layouts are indeed highly useful
sive power consumption, high routing congestion, elongated to improve the accuracy of our models.
timing closure, etc [3]. However, due to the high complexity The goal of this work is to construct a general and practical
and run time of electronic design automation (EDA) tools, CTS framework which owns the capability to predict CTS
designers are struggling to synthesize high quality clock trees outcomes in high precision as well as the facility to recom-
that optimize key desired metrics such as clock power, skew, mend designers the input parameters sets leading to optimized
clock wirelength, etc. To find the input parameters that achieve clock trees even for unseen netlists. Our proposed framework
desired CTS outcomes, designers have to search in a wide consists of a regression model and a variation of generative
range of candidate parameters, which is usually fulfilled in a adversarial network (GAN) [5] named conditional GAN [14].
manual and highly time-consuming calibration fashion. Apart from the conventional GAN training process, in this
To automate this task and alleviate the burden for designers, work, a reinforcement learning is introduced to the generator,
several machine learning (ML) techniques have been proposed and a multi-task learning is presented to the discriminator.
to predict the clock network metrics. Previous work [10] The reinforcement learning is conducted by the regression
utilizes data mining tools to estimate the achieved skew and model which is trained prior to the conditional GAN. It
insertion delay. The authors of [8] employ statistical learning serves as a supervisor to guide the generator to produce the
and meta-modeling methods to predict more essential metrics CTS input parameters sets that lead to optimized clock trees.
such as clock power and clock wirelength. The authors of [9] As for the discriminator, it not only predicts whether the
consider the effect of non-uniform sinks and different aspect input is generated or real as the conventional approach, but
ratios. A recent work [12] utilizes artificial neural networks also predicts if the input results in a successful CTS run or
(ANNs) to predict the transient clock power consumption not. In this paper, a successful CTS run indicates that the
based on the estimation of clock tree components. However, all achieved clock tree outperforms the one auto-generated by the
the previous works only focus on enhancing the performance commercial tool.
of prediction. There are four highly crucial aspects that all of The rest of the paper is organized as follows. Section II
them neglect, namely: presents a solid overview of our framework. Section III
Feature Extraction
Flip Flop generated real the achieved maximum skew. As mentioned in Section I, we
CTS inputs CTS inputs do not consider our model as a black box as previous works.
To further interpret the predictions made by the regression
model, we leverage a gradient-based attribution method [1]
Clock Net Regression to quantitatively interpret the importance of each CTS input
Discriminator
model parameter subject to different target outcomes.
1 3 The last training stage of the framework involves a rein-
CTS outcome fake/real success/fail forcement learning and a generative adversarial learning. We
Routing prediction decision decision leverage a conditional GAN to perform the CTS optimization
Fig. 1: A high-level view of this work and the three objectives we and classification tasks. The regression model trained prior at
have achieved. The first objective is to predict the CTS outcomes in this stage now acts as a supervisor to guide the generator to
high precision. The second objective is to recommend designers good generate CTS input parameters sets which lead to optimized
CTS input settings. The third objective is to determine whether the clock trees. The guidance is conducted by a reinforcement
input settings outperform commercial tool’s auto-setting. learning technique named policy gradient [16].
In conditional GAN, we take the extracted placement fea-
tures as the conditional input, where the original inputs of
describes the problem formulations, database generation, and
the vanilla GAN model are considered as regular inputs. The
identifies the difficulties of the problems. Section IV illustrates
great advantage of having the conditional input is that it
the algorithms and the detailed structures of the models.
allows the model to control the modes of the generated data.
Experimental results are demonstrated in Section V, and finally
In this work, we consider different benchmarks as different
we conclude our work in Section VI.
modes. Therefore, with the aid of the conditional approach,
our framework has the facility to optimize unseen benchmarks
II. OVERVIEW
which are not utilized during the training process.
It has been widely acknowledged that GAN is a promising Finally, a highlight of our framework is that a multi-task
model that learns the complicated real-world data through a learning is conducted in the discriminator. In addition to the
generative approach [17]. A vanilla GAN structure contains a conventional task of distinguishing between generated and real
generator and a discriminator which are both neural networks. samples, we introduce a new task of classifying successful
The goal of the generator is to generate meaningful data from and failed CTS runs. In this paper, we strictly define a CTS
a given distribution (e.g. random noise), while the objective run as successful if two out of the three achieved target CTS
of the discriminator is to distinguish the generated samples metrics mentioned earlier are better than the ones achieved
from the real samples in the database. Upon the vanilla automatically by the commercial tool.
GAN structure, in conditional GAN, an external conditional In the end of the training process, we acquire four models
input is further introduced to both generator and discriminator. as follows:
This conditional input enables the model to direct the data • A placement feature extractor E which precisely charac-
generation process, which benefits us to generate suitable CTS terizes different designs from placement images.
input parameters sets with respect to different benchmarks and • A regression model R which performs high precision
even the ones which are not utilized in the training process. predictions of target CTS outcomes.
A high-level view of our framework named GAN-CTS is • A generator G which generates CTS input parameters
shown in Figure 1. The framework is comprised of three sets that lead to optimized clock trees.
training stages. The first training stage is to extract key design • A discriminator D which predicts the success and failure
features from placement images which represent flip flop of CTS runs.
distribution, clock net distribution, and trial routing result.
Note that trial routing is a process performed in the end III. D ESIGNING E XPERIMENTS
of the placement stage, which provides an estimation of the We formally define the clock tree synthesis (CTS) prediction
routing congestion in a given placement. To extract features and optimization problems as follows:
from images, we introduce transfer learning to a pre-trained Problem 1 (CTS Outcomes Prediction). Given a pre-CTS
convolutional neural network (CNN) named ResNet-50 [6], placement P and a CTS input parameters set X, predict the
which is a 50-layer residual network. post-CTS outcomes Y without performing any actual CTS.
Following from the feature extraction process, we utilize the
extracted features as well as the clock trees in the database to Problem 2 (CTS Outcomes Optimization). Given a pre-
train the regression model and to predict CTS outcomes. The CTS placement P , generate a CTS input parameters set X̂
regression model is a multi-output deep neural network which that leads to optimized CTS outcomes Ŷ .
predicts multiple CTS metrics simultaneously. In this work,
TABLE I: Our benchmarks and their attributes.
Netlist Name # Nets # Flip Flops # Total Cells
AES-128 90,905 10,688 113,168
Arm-Cortex-M0 13,267 1,334 12,942
NOVA 138,171 29,122 136,537
ECG 85,058 14,018 84,127
JPEG 231,934 37,642 219,064 NOVA
LDPC 42,018 2,048 39,377
TATE 185,379 31,409 184,601
embedding dimension 2
NOVA
Fig. 3: Visualization of trial routing image in convolutional filters. embedding dimension 1 embedding dimension 1
Fig. 5: t-SNE visualizations of the original placement image vectors
Auxiliary Input and the extracted feature vectors.
(# Flip Flops)
Concatenate Layer
extracted CTS
FC 1(512 neurons) features parameters
Shared FC layer 3
FC 4 (1 neuron)
Fig. 4: Our image feature extraction flow. The extracted features are power FC layer 2 WL FC layer 2 skew FC layer 2
colored in red.
power FC layer 3 WL FC layer 3 skew FC layer 3
from placement images to solve CTS outcomes prediction power prediction WL prediction skew prediction
and optimization problems. Our approach is built upon con- Fig. 6: Our multi-output regression model.
volutional neural network (CNN) and transfer learning, where
we devise our own fully connected (FC) layers upon the pre-
trained model named ResNet-50 [6], which is a CNN-based output vector of the first FC layer with 512 dimensions denoted
model pre-trained on the well-known ImageNet [4] dataset in red in the figure as the final extracted feature vector.
with millions of images. In the implementation, all the images are in a dimension
Figure 3 shows the visualization of a trial routing image of 700 x 717 x 3. The achieved mean absolute percentage
passing through four convolutional filters inside the pre-trained error of the unseen validation netlists to predict the estimated
ResNet-50 model. In the figure, we observe that important power is only 1%. However, since a low prediction error does
information such as usage of metal layers is well captured. not guarantee a good feature representation, we leverage a
Despite that ResNet-50 is powerful on many image datasets, dimension reduction technique named t-distributed stochastic
it is not devised specifically for physical design problems. neighbor embedding (t-SNE) [13] to visualize the extracted
Therefore, to extract key design features from the high di- features (∈ R512 ) in R2 as shown in Figure 5. In the figure,
mensional vectors, we devise a feature extraction flow with we observe that different placements belonging to the same
transfer learning as shown in Figure 4. Four self-designed FC netlist are clustered together and those belonging to different
layers are appended on top of the ResNet-50 model to predict netlists are well separated. Therefore, we conclude that the
the commercial tool’s estimation of total power right after extracted features well capture the characteristics of a design.
the placement stage. As shown in the figure, since placement
images cannot precisely reflect the actual size of a design, B. CTS Outcomes Prediction
we introduce an auxiliary input with one dimension to our Constructing a precise regression model is the key step to
FC layers which represents the total number of flip flops. In solve Problem 1. Following the feature extraction process,
the training process, we fix the parameters of the pre-trained we train the regression model with the extracted features to
ResNet-50 model and only update the parameters of the FC predict the target CTS outcomes. As mentioned in Section II,
layers. After training, for each pre-CTS placement, we take the in this paper, we target at predicting and optimizing three
target CTS outcomes: clock power, clock wirelength, and the extracted random
maximum skew. The regression model is a multi-output deep features noise input layer (64 neurons)
neural network as shown in Figure 6. The model takes the
extracted features along with the CTS parameters as inputs and leaky ReLU
outputs three predictions simultaneously. Note that a multi- input
input layer
layer (256
(256 neurons)
neurons)
task learning is introduced in here. The reason we leverage batch normalization
a single multi-output model rather than multiple single-output leaky ReLU
models as previous work [12] is that different CTS outcomes output layer (D neurons)
batch normalization
are not independent of each other. Therefore, shared layers
enable different objectives to own mutual information, which tanh
hidden layer (128 neurons)
reduces the training time as well as enhances the performance
of each prediction. In the training process, we utilize mean leaky ReLU
squared error as the loss function, and leverage dropout layers regression discriminator
inside the model to prevent it from overfitting. The validation batch normalization model
results of this experiment are shown in Section V.
In this work, we do not treat the regression model as a black Fig. 7: Detailed structure of our GAN generator.
box as previous works. Instead, we leverage a gradient-based
attribution algorithm named DeepLIFT [15] to interpret the
predictions. The algorithm aims to determine the attribution of CTS modeling parameters, and a hyperbolic tangent layer
(relevance) value of each input neuron subject to different is chosen as the activation function to match the domain of
outputs. Assume our regression model R takes an input vector the normalized samples drawn from the database. Since when
x = [x1 , ..., xN ] ∈ RN and produces an output vector training the discriminator, we normalize the real samples x
S = [S1 , ..., Sk ]. DeepLIFT proceeds ali , the attribution of from the database to x̂ ∈ [−1, 1] as
neuron i at layer l, in a backward fashion by calculating the x
activation difference of the neuron between the target input x x̂ = × 2 − 1. (3)
maxx∈supp(x) (x)
and reference input x̂. The procedure is derived as
( In GAN-CTS, the generator has two important objectives.
L Si (x) − Si (x̂) if neuron i is the output of interest
ai = The first is to generate realistic samples that deceive the
0 otherwise, discriminator, where the corresponding objective function is
(1)
l
X zji − ẑji l+1 LGD = Ez,f [log(D(G(z, f ))]. (4)
ai = P P · aj , (2)
j i zji − i ẑji
The second objective is to generate the samples that lead
where L denotes the output layer, and zji is the weighted to optimized clock trees by maximizing reward r from the
activation of neuron i onto neuron j in the next layer. In the pre-trained regression model R. The reward r is defined as
implementation, we take the reference input x̂ as the input N
parameters of the auto-generated clock tree.
Y Ri (G(z, f ))
r := R(G(z, f )) = − , (5)
i=1
auto-setting result of target i
C. CTS Optimization
To solve Problem 2, we introduce a reinforcement learning where N denotes the number of target CTS outcomes and Ri
to the conditional GAN by taking the pre-trained regression denotes the corresponding prediction of the regression model.
model as a supervisor of the generator. Before illustrating the To optimize the reward defined in Equation 5, we leverage
objectives of the optimization process, we first describe the a policy gradient algorithm named REINFORCE [18]. The
structure of the generator. The generator G is a neural network algorithm considers the generator as an agent that strives to
parameterized by θg which samples a regular input z with maximize the accumulative reward by producing wise actions
100 dimensions from a N (0, 1) Gaussian distribution pz and G(z, f ) from a given environment created by the discriminator.
samples the extracted placement features f from the database The corresponding objective is derived as
pd as the conditional input.
LGP = Ez,f [r · ln(µ(G(z, f )))], (6)
Figure 7 shows the detailed structure of the generator. The
leaky ReLU [20] layers are employed as activation functions where µ is a softmax function that maps G(z, f ) to [0, 1].
of the input and hidden layers to project latent variables With the above objective function, the generator is expected to
onto a wider domain, which eliminates the bearing of van- generate the CTS input parameters sets G(z, f ) that maximize
ishing gradients. Batch normalization [7] layers are utilized the rewards r, which results in optimized clock trees.
to normalize the inputs of each hidden layer to zero-mean Finally, by combining the two objective functions illus-
and unit-variance, which accelerates the training process since trated, we formulate the training process of the generator as
the oscillation of gradient descent is reduced. Finally, for the
output layer, the number of neurons D denotes the number max E z∼pz [log(D(G(z, f ))) + r · ln(µ(G(z, f )))]. (7)
G f ∼pd
generated CTS inputs extracted real CTS inputs
Algorithm 1 GAN-CTS training methodology.
(generator) features (database) We use default values of αr = 0.0001, αGAN = 0.0001,
β1 = 0.9, β2 = 0.999, m = 128.
Input: {f }: extracted placement features, {x}: training data,
input layer (128 neurons) {Y }: target CTS metrics for prediction and optimization.
Input: αr : learning rate of regression model, αGAN : learning
leaky ReLU
rate of GAN, m: batch size, {θr }0 : initial parameters
hidden layer (64 neurons) of regression model, θg0 : initial parameters of generator,
{θd }0 : initial parameters of discriminator, {β1 , β2 }: Adam
leaky ReLU parameters.
Output: R: regression model, G: generator, D: discriminator.
1: N ← length(y)
hidden layer (64 neurons) hidden layer (64 neurons) 2: while {θr } do not converge do
3: Sample a batch of training data {x(i) }m i=1 ∼ pd
leaky ReLU leaky ReLU
4: Take features {f (i) }m i=1 corresponding to {x }i=1
(i) m
[4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet:
A large-scale hierarchical image database. In 2009 IEEE conference on
computer vision and pattern recognition, 2009.
Fig. 10: Distributions of random generated vs. GAN-CTS generated [5] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In
clock trees for AES benchmark.
Advances in neural information processing systems, 2014.
[6] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision
failure are defined by comparing the CTS metrics of the clock and pattern recognition, 2016.
[7] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network
trees generated by GAN-CTS to the one auto-generated by training by reducing internal covariate shift. arXiv:1502.03167, 2015.
the commercial tool. Table VI summarizes the classification [8] A. B. Kahng, B. Lin, and S. Nath. Enhanced metamodeling techniques
results in a confusion matrix with NOVA benchmark. The for high-dimensional ic design estimation problems. In Proceedings of
the Conference on Design, Automation and Test in Europe, 2013.
accuracy and the F1-score are 0.947 and 0.952, respectively. [9] A. B. Kahng, B. Lin, and S. Nath. High-dimensional metamodeling for
prediction of clock tree synthesis outcomes. In System Level Interconnect
VI. C ONCLUSION AND F UTURE W ORK Prediction (SLIP), ACM/IEEE International Workshop on. IEEE, 2013.
[10] A. B. Kahng and S. Mantik. A system for automatic recording and
In this paper, we have proposed a complete framework prediction of design quality metrics. In isqed. IEEE, 2001.
named GAN-CTS to solve the complicated CTS outcomes [11] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.
optimization and input generation problems which all previous arXiv preprint arXiv:1412.6980, 2014.
[12] Y. Kwon, J. Jung, I. Han, and Y. Shin. Transient clock power
works leave under-addressed. The fact that the proposed estimation of pre-cts netlist. In Circuits and Systems (ISCAS), 2018
framework demonstrates promising results on the unseen IEEE International Symposium on. IEEE, 2018.
benchmarks proves that it is general and practical. [13] L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of
machine learning research, 9(Nov), 2008.
In spite of the superior performance achieved, the proposed [14] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv
framework is currently dependent on the technology node. In preprint arXiv:1411.1784, 2014.
the future, we aim to overcome this dependence. [15] A. Shrikumar, P. Greenside, A. Shcherbina, and A. Kundaje. Not just a
black box: Learning important features through propagating activation
VII. ACKNOWLEDGMENT differences. arXiv preprint arXiv:1605.01713, 2016.
[16] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour. Policy gra-
This work is supported by the National Science Foundation dient methods for reinforcement learning with function approximation.
under Grant No. CNS 16-24731 and the industry members of In Advances in neural information processing systems, 2000.
[17] K. Wang, C. Gou, Y. Duan, Y. Lin, X. Zheng, and F.-Y. Wang.
the Center for Advanced Electronics in Machine Learning. Generative adversarial networks: introduction and outlook. IEEE/CAA
Journal of Automatica Sinica, 4(4):588–598, 2017.
R EFERENCES [18] R. J. Williams. Simple statistical gradient-following algorithms for
connectionist reinforcement learning. Machine learning, 8(3-4), 1992.
[1] M. Ancona et al. Towards better understanding of gradient-based attri- [19] Z. Xie, Y.-H. Huang, G.-Q. Fang, H. Ren, S.-Y. Fang, Y. Chen,
bution methods for deep neural networks. In International Conference and J. Hu. Routenet: routability prediction for mixed-size designs
on Learning Representations, 2018. using convolutional neural network. In 2018 IEEE/ACM International
[2] F. Chollet et al. Keras, 2015. Conference on Computer-Aided Design (ICCAD). IEEE, 2018.
[3] A. Datli, U. Eksi, and G. Isik. A clock tree synthesis flow tailored for [20] B. Xu, N. Wang, T. Chen, and M. Li. Empirical evaluation of rectified
low power. In https://www.design-reuse.com/articles/33873/clock-tree- activations in convolutional network. arXiv:1505.00853, 2015.
synthesis-flow-tailored-for-low-power.html, 2013.