Saini S.vlsi and Hardware Implementations..Learning Methods 2022
Saini S.vlsi and Hardware Implementations..Learning Methods 2022
IMPLEMENTATIONS
USING MODERN
MACHINE LEARNING
METHODS
VLSI AND HARDWARE
IMPLEMENTATIONS
USING MODERN
MACHINE LEARNING
METHODS
Edited by
Sandeep Saini, Kusum Lata, and
G.R. Sinha
First edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
© 2022 selection and editorial matter, Sandeep Saini, Kusum Lata and G.R. Sinha;
individual chapters, the contributors
First edition published by CRC Press 2022
CRC Press is an imprint of Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the author
and publisher cannot assume responsibility for the validity of all materials or the
consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders
if permission to publish in this form has not been obtained. If any copyright material has not
been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted,
reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means,
now known or hereafter invented, including photocopying, microfilming, and recording, or in any
information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access
www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please
contact [email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks
and are used only for identification and explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
Names: Saini, Sandeep, editor. | Lata, Kusum (Electronics engineer), editor. | Sinha, G. R., 1975-
editor.
Title: VLSI and hardware implementations using modern machine learning methods / edited by
Sandeep Saini, Kusum Lata and G.R. Sinha.
Description: First edition. | Boca Raton, FL : CRC Press, 2022. | Includes bibliographical
references and index. |
Summary: “Machine learning is a potential solution to resolve bottleneck issues in VLSI via
optimizing tasks in the design process. This book aims to provide the latest machine learning
based methods, algorithms, architectures, and frameworks designed for VLSI design. Focus is on
digital, analog, and mixed-signal design techniques, device modeling, physical design, hardware
implementation, testability, reconfigurable design, synthesis and verification, and related areas. It
contains chapters on case studies as well as novel research ideas in the given field. Overall, the
book provides practical implementations of VLSI design, IC design and hardware realization
using machine learning techniques”‐‐ Provided by publisher.
Identifiers: LCCN 2021037483 (print) | LCCN 2021037484 (ebook) | ISBN 9781032061719
(hbk) | ISBN 9781032061726 (pbk) | ISBN 9781003201038 (ebk)
Subjects: LCSH: Integrated circuits‐‐Very large scale integration‐‐Design and construction‐‐Data
processing. | Machine learning.
Classification: LCC TK7874.75 .V564 2022 (print) | LCC TK7874.75 (ebook) | DDC 006.3/
1‐‐dc23/eng/20211014
LC record available at https://lccn.loc.gov/2021037483
LC ebook record available at https://lccn.loc.gov/2021037484
Typeset in Times
by MPS Limited, Dehradun
Contents
Preface......................................................................................................................vii
About the Editors .....................................................................................................ix
Contributors ............................................................................................................xiii
Chapter 6 Online Test Derived from Binary Neural Network for Critical
Autonomous Automotive Hardware .................................................97
Dr. Philemon Daniel
v
vi Contents
Index......................................................................................................................305
Preface
VLSI is a well-established field of research that ignited the modern computing revolution.
Moore’s law provided the direction in this field for a number of decades, which allowed
the shrinking size and increasing speed of the next generation of circuits. In every future
generation, the computing hardware is becoming more compact and faster. Designing and
manufacturing chips below 10 nm and 7 nm is very challenging; thus, alternate ways are
being explored for higher performance IC designs. Machine learning is emerging as one
of the potential solutions to resolve some of the bottleneck issues in VLSI as well. The
machine-learning–based architecture and models help in optimizing or deciding a few
tasks in the design process, at various stages of system design, to achieve the goals. In this
book, we provide a compilation of the latest machine-learning–based methods for the
VLSI design domain.
The computing era is making a transition from conventional computing to cognitive
computing. Artificial intelligence and its subset, machine-learning–based approaches,
are providing solutions for all future technologies and filling the gap to move towards
the cognitive era. In verification and fabrication as well, the time taken to process the
whole system at 10 nm and 7 nm is very large. Machine-learning–based methods help
to reduce characterization time by weeks as well as reduce the number of resources.
Thus, machine-learning–based approaches will be extensively used in the future in this
field, and this book provides future paths for such developments.
This book aims to provide the latest machine-learning–based methods, algorithms,
architectures, and frameworks designed for VLSI design and implementations of
hardware. The scope of the book includes machine-learning–based methods and models
for digital, analog, and mixed-signal design techniques; device modeling; physical
design; hardware implementation; IC testing; the manufacturing process; reconfigurable
design; FPGA-based systems; machine-learning–based IoT; VLSI implementation of
ANN; machine-learning–based image processing systems; synthesis and verification;
and related areas. The book contains chapters on case studies as well as novel research
ideas in the given field.
vii
About the Editors
Sandeep Saini received his B.Tech. degree in electronics and
communication engineering from the International Institute
of Information Technology, Hyderabad, India, in 2008. He
completed his M.S. from the same institute in 2010. He earned
his Ph.D. from Malaviya National Institute of Technology,
Jaipur, in 2020.
He is working at LNM Institute of Information Technology,
Jaipur, as an Assistant Professor from 2011 onward. He has
worked as adjunct faculty at the International Institute of Infor-
mation Technology (IIIT), Bangalore (Deputation at Myanmar Institute of Information
Technology, Mandalay, Myanmar) for two years, and a Lecturer at Jaypee University of
Engineering and Technology, Guna, for 3 semesters. His research interests are in deep
learning, machine learning, natural language processing, cognitive modeling of language
learning models, biomedical and agricultural applications of deep learning. Sandeep is a
member of IEEE since 2009 and an active member of ACM as well.
ix
x About the Editors
xiii
xiv Contributors
Niketa Sharma
Swami Keshwanand Institute of
Technology Jaipur
Jaipur, India
1 VLSI and Hardware
Implementation Using
Machine Learning
Methods: A Systematic
Literature Review
Kusum Lata1, Sandeep Saini1, and G. R. Sinha2
1
Department of Electronics and Communication
Engineering, The LNM Institute of Information Technology,
Jaipur, India
2
Department of Electronics and Communication
Engineering, Myanmar Institute of Information Technology,
Mandalay, Myanmar
CONTENTS
1.1 Introduction....................................................................................................... 2
1.2 Motivation......................................................................................................... 2
1.3 Contributions .................................................................................................... 3
1.4 Literature Review ............................................................................................. 3
1.5 Methods ............................................................................................................ 5
1.5.1 Search Strategy..................................................................................... 5
1.5.2 Inclusion and Exclusion Rules............................................................. 6
1.5.3 Data Extraction Strategy ...................................................................... 6
1.5.4 Synthesis of Extracted Data................................................................. 7
1.5.5 Results and Discussions ....................................................................... 7
1.5.6 Study Overview.................................................................................... 7
1.6 Hardware Implementation of ML/AI Algorithms ........................................... 8
1.6.1 FPGA-Based Implementation .............................................................. 9
1.6.2 GPU-Based Implementation...............................................................10
1.6.3 ASICs-Based Implementations ..........................................................10
1.6.4 Other Implementations .......................................................................11
1.6.5 SLR Discussions and Recommendations ..........................................12
1.7 Conclusions..................................................................................................... 13
References................................................................................................................14
DOI: 10.1201/9781003201038-1 1
2 VLSI and Hardware Implementations
1.1 INTRODUCTION
Machine learning (ML) technologies have gained a lot of traction as a result of
advancement in design technology of computer systems in terms of design con
straints like functioning, power consumption, and area used over the last decade. In
order to get better outcomes than conventional methods, a wide range of applica
tions began to use ML algorithms. There are many applications, including image
processing in health systems such as cancer detection [1], image classifications [2],
banking and risk management [3], healthcare and clinical note analysis [4],
managing energy efficiency of the public sector towards smart cities [5], and au
tomatic database management systems [6]. Integrated vision systems provide
hassle-free integration for various industries, such as full/semi-autonomous vehicles
[7,8], and many security [9] applications, including cyber security [10].
In the application domain, such as autonomous and semi-autonomous vehicles,
ML is completely transforming the field [11]. ML is also used in location-based
services (LBS), such as global positioning system (GPS)-based vehicle navigation.
Individual users can also utilize LBS to get information on nearby live entertainment
and intelligent path navigation [12]. There are numerous applications where accurate,
precisely calculated values that improve efficiency also can lead to the drastic dif
ference between life and death. ML is at the heart of many such industrial applica
tions. This pattern demonstrates that ML technologies continue to pique people's
interest and have a lot of promise. Every electrical or embedded system is becoming
more and more reliant on these technologies. The showstopper to this new paradigm
is that ML algorithms are power hungry when it comes to their hardware im
plementation. Because of this fact, lots of improvements can be seen in developing
hardware accelerators to provide essential computing power to such applications.
By reducing the power consumption of boosted performance and lowering es
sential design resources, intelligently tailored and optimized hardware implementation
can significantly lower overall system design costs [13]. Field-programmable gate
arrays (FPGAs), graphics processing units (GPUs), and application-specific integrated
circuits (ASICs) are being considered for hardware implementation, each with their
own set of advantages and disadvantages. In comparison to regular central processing
units (CPUs), these hardware accelerators take advantage of parallelism to boost
throughput and deliver substantially greater performance [14].
We give a complete survey that includes a systematic literature review (SLR) to
aid academics working on ML hardware acceleration. The survey includes hard
ware implementation research for ML algorithms from 2011 through 2020. We
collected a total of 150 different research papers, out of which 113 papers addressed
hardware implementation of ML algorithms. We adopted narrow exclusion criteria
for this work to consider as many papers as possible in order to cover a wide
perspective on the issue at hand.
1.2 MOTIVATION
ML algorithms have been implemented using a variety of design strategies for a variety
of purposes. General-purpose processors (GPPs), ASICs, and scalable/reconfigurable
VLSI and Hardware Implementation 3
1.3 CONTRIBUTIONS
The SLR focuses mainly on different types of hardware implementations of widely
used ML methods for various applications. Kitchenham and Charter's [15] meth
odology is followed for the SLR, which is illustrated in Figure 1.1.
The main objective of this review article is to provide answers to a number of
research questions involving ML hardware accelerators, which include the following:
Q1. What are the most common applications for which ML methods are used in
hardware implementation?
Q2. Identify the ML algorithms and techniques produced between 2011 and 2020
that are most frequently used for hardware implementation?
Q3. What are the advantages and disadvantages of employing multiple hardware
platforms for ML acceleration?
comprehensive review and comparative study that analyzes the existing work that has
been done in the last decade from GPU, FPGA, and ASIC perspectives.
This work complements the existing SLR surveys reported in the literature and
contributes towards providing the complete background of hardware implementa
tions of ML algorithms.
1.5 METHODS
Systematic literature studies entail that the technique outlined below is strictly
followed. In this section, main components of the review process are described. It
starts with the considered data sources and search strategy, the inclusion/exclusion
criteria, and then search methods. Later, data screening methodologies and data
synthesis are described.
• Google Scholar
• ACM Digital Library
• IEEE Explorer Digital Library
• Springer Digital Library
• Elsevier Digital Library
For performing the related search, we used various keywords that are mostly related
to ML algorithms and hardware implementation approaches for them:
All searches were carried out regarding title, abstract, and keywords in conjunction.
If this combination was not possible, we used the title for the search. The searches
were conducted between January 2011 and December 2020. Hence, articles pub
lished beyond January 1, 2021, were out of the scope. Any article with a posterior
published date was manually included and used for reference purposes. Our search
was limited to VLSI and hardware implementations of various ML algorithms.
Further search was also limited to journal and conference papers only.
6 VLSI and Hardware Implementations
Step 2. To avoid any irrelevant documents, follow the inclusion and exclusion
criteria.
Step 4. Using the reference lists of the collected papers, look for any new si
milar publications and repeat the process.
The following principles were used for the inclusion and exclusion of research work
available in the literature:
• Relevant Work: Include only reported literature works that are relevant to
our designed queries (Q1–Q3). Exclude all studies that do not satisfy any of
these predefined categories.
• Most Recent (2011–2020): To make sure only recent studies are in our
chapter, choose studies reported from 2011 to 2020. Exclude all other studies
that are published before 2011.
• Publishing Organizations: Select only reported works that are present in any
of these five scientific databases, i.e. IEEE, SPRINGER, ELESVIER, ACM,
and Google Scholar.
• VLSI and Hardware Implementation-based Results: Select only works that
provide VLSI and hardware implementation details of ML algorithms for any
application domain. Hardware implementation could be based on GPUs, FPGAs,
and ASIC or complementary metal-oxide-semiconductor (CMOS)-based designs.
• Publications Format: Select only journals and conference papers that dis
cuss the VLSI or hardware implementation of ML algorithms using FPGAs,
GPUs, CPUs, or ASICs.
difficult. For example, some studies [27–29] discussed compiling software into a
hardware description language that might be implemented on FPGAs, and some
studies presented the automated design methodology to implement these algorithms
on FPGAs. It is also obvious that all the papers are not responding to all three
research questions that are set to complete this systematic review.
computer vision are evolving rapidly. Therefore, the development of various tools
and methodologies are being devolved towards the broader applicability of ML/AI,
specifically deep learning across multiple fields [36–39]. GPU-based accelerators
also speed up the execution about hundreds times for ML/AI algorithms. GPUs are
intended for high-performance scalar and parallel computations, which enhances
the applicability of GPUs in a more meaningful manner [40].
Another platform that is used to implement the ML/AI algorithms is ASIC im
plementations. Since these are the chips that are designed for the specific purpose,
many companies have started designing their AI chips for various applications [41],
and lot of research is in progress in this direction. It is also claimed that specialized AI
chips would perform better than CPUs and GPUs [42].
Recently, researchers have started implementation ML/AI algorithms using
small single-board computers, e.g. Raspberry Pi for many applications [43–45].
Because of their tiny size, low cost, and lower power needs, single-board computers
are a popular choice for AI applications.
• FPGA-based implementation
• GPU-based implementation
• ASIC-based implementation
• Other implementation
Some of the accelerators are presented where FPGAs are combined with CPUs to
take advantage of both the architectures. Moss et al. [80] proposed the accelerator,
which is designed in such a manner that the most intensive computational part of
binarized neural network (BNN) is implemented on FPGA, whereas the rest of the
implementation is done on XeonTM CPU. It is also claimed that the designed ac
celerator on FPGA along with Xeon give better performance and energy efficiency
than the high-end GPU. Chi Zhang et al. [63] proposed a CPU-FPGA–based CNN
accelerator, where they implemented the frequency domain algorithms on FPGA
and used the data layout in shared memory to provide the essential communication
to the data between FPGA and CPU. On the other hand, to design the hardware
accelerator for GCN based on CPU-FPGA, Zeng and Prasanna [81] proposed the
heterogeneous platform, where they performed the pre-processing on CPU and the
rest of the computation was done on FPGA. It was also claimed that the magnitude
of the training speeded up, with almost no accuracy loss compared to the existing
implementations on multi-core platforms.
Other types of heterogeneous architectures that combine ASICs with FPGAs
were proposed in Nurvitadhi et al. [82] for persistent recurrent neural networks
(RNNs). Nurvitadhi et al. also discussed the integration of FPGA with ASIC chiplet
TensorRAM to boost memory capacity and bandwidth. It also provides necessary
throughput for matching bandwidth. Nurvitadhi et al. [83] focused on the deep
learning (DL) domain with efficient tensor matrix/vector operations by proposing
TensorTile ASICs for Stratix10.
A few researchers have presented the implementation of hardware accelerators
of ML/AI algorithms on FPGA-embedded processors, such as NIOS-II by Santos
et al. [84], in that the accelerator is designed for a pre-trained feedforward artificial
neural network using the NIOS II processor. Jin Hee Kim et al. [85] presented the
synthesized accelerator, which uses the Intel Arria 10 SoC FPGA with embedded
ARM processor. Heekyung Kim et al. presented the FPGA-SoC design for CNN-
based hardware implementation with respect to power consumption aspects. The
authors have applied the proposed low-power scheme to Processing System (PS)
and Programmable Logic (PL) architecture of the design to reduce power con
sumption effectively. Angelos Kyriakos et al. [86] focused on designing CNN
accelerators based on Myriad2 and FPGA architecture.
FPGAs allow you to choose any precision that is appropriate for the intended
CNN implementation. FPGA mappings, on the other hand, often rely on DSP
blocks for maximum efficiency. Only fixed-precision DSP blocks are available. As
10 VLSI and Hardware Implementations
a result, employing fixed-point quantization can boost speed while also conserving
hardware resources [66,87].
A binarized CNN (BCNN) with weights and activation constrained to 2-values
(+1/-1) has recently been presented. In well-optimized software on the CPU and
GPU, BCNN can lead to huge increases in performance and power efficiency. Some
research publications focused on using FPGAs to accelerate these BCNNs
[53,54,57,57,57,60,88]. Dong Nguyen et al. [66] showed that CNNs are resilient up
to 8-bit precision and proposed double MAC, which can increase throughput
computationally of a CNN layer at a significant level. As a result, some im
plementations used 8-bit fixed point, which outperformed the 16-bit fixed-point
implementation by more than 50% [89].
Some researchers have presented the ML algorithm implementations on FPGAs
[90–93]. Jokic [93] presented an FPGA-based streaming camera, which can be
classified as ROI with a BNN in real-time mode. Results show the energy savings of
three times with respect to the external image processing.
TABLE 1.1
Summary of Work Done on GPU-Based Implementations of ML/AI Algorithms
References Algorithms Implementations
and 840.2 GSOP/s with the energy-efficiency of 781.52 pJ/pixels and 0.35 pJ/SOP.
Chuang et al. [109] implemented a low-power and low-cost BW-SNN ASIC using
90-nm CMOS technology for image classification. The designed ASIC was demon
strated in real-time for bottled-drink recognition. It also provided very good accuracy
and outperformed its efficiency.
detection up to 5 meters height of the camera from the ground. Lage et al. [113] present
the low cost IoT surveillance system. The system is designed and tested using
Raspberry Pi and Up Squared Board devices. In addition to the person detection with
an average precision of 0.48, the designed system is also tested for the benefits of
hardware-acceleration by Intel® MovidiusTM Neural Compute Stick (NCS). Zhang
et al. [112] presented the hardware accelerator of YOLO using open source RISC-V
core ROCKET as its controller. The hardware designed for real-time object detection
system was verified using Xilinx Virtex-7 FPGA VC709.
TABLE 1.2
Comparisons of ML/AI Algorithms’ Implementation Approaches
ASIC FPGA GPU
TABLE 1.3
Research Question Findings from the Survey
Research Question Description Survey Findings
Question
Q1 What are the most common applications Major applications are related to image
for which ML/AI methods are used in processing by applying hardware
hardware implementation? implementations of ML/AI algorithms.
Robotics, autonomous/semi-
autonomous vehicles, navigation
systems, and cyber security are other
applications where ML/AI are
implemented at the hardware level
and used.
Q2 Identify the ML/AI algorithms and Algorithms and techniques include Haar-
techniques produced between 2011 and Classifier, SURF, SVM, AdaBoost,
2020 that are most frequently used for computer vision algorithms, ANN,
hardware implementation. CNN, DNN, spike neural network,
ICNN, YOLO, SLAM, BCNN, LSTM,
YOLOv3, FP-DNN, etc.
Q3 What are the advantages and Different types of hardware platforms
disadvantages of employing multiple used have their own advantages and
hardware platforms for ML/AI disadvantages, and these are given in
acceleration? Table 1.2 in detail.
Hardware accelerators, whether they are based on FPGAs, ASICs, or GPUs, are
becoming increasingly popular. It's noticeable in the deployment of AI and ML
algorithms. These algorithms' superiority necessitates extremely high computing
power and memory use, which hardware accelerators may provide. AI and ML are
critical technologies that support daily social life, economic operations, and even
medical applications. As a result of the ongoing development of these algorithms,
new applications with larger resource demands will emerge. Implementing these
algorithms on hardware will result in faster, more efficient, and more accurate AI
processing, which will benefit all industries. Finally, Table 1.3 summarizes the
research study's main findings.
1.7 CONCLUSIONS
In this chapter, we present an SLR study about VLSI and hardware implementations
of ML/AI algorithms over the period between 2011 and 2020. The main objective
of this SLR was to answer three research questions that address the survey from an
application and implementation point of view. Only journal and conference papers
were considered in this chapter from relevant search databases. Research papers that
focused only on the VLSI and hardware implementations of ML/AI algorithms
were selected. Hardware implementation selected here for the study were based in
14 VLSI and Hardware Implementations
FPGAs, ASICs, and GPUs. All the papers were selected in such a manner so that
they were helpful to answer our set search questions. The work presented in this
chapter gives a thorough analysis of different types of implementation platforms
and approaches used in the last decade. This chapter would be useful for readers to
understand the various approaches reported to implement ML/AI algorithms using
ASICs, FPGAs, and GPUs, and it will also give a good understanding of hardware
selection in terms of development time, performance results, and cost, also.
REFERENCES
[1] G. Meenalochini and S. Ramkumar, “Survey of machine learning algorithms for
breast cancer detection using mammogram images,” Mater. Today Proc., vol. 37,
pp. 2738–2743, 2021, doi: 10.1016/j.matpr.2020.08.543.
[2] P. Wang, E. Fan, and P. Wang, “Comparative analysis of image classification al
gorithms based on traditional machine learning and deep learning,” Pattern
Recognit. Lett., vol. 141, pp. 61–67, 2021, doi: 10.1016/j.patrec.2020.07.042.
[3] M. Leo, S. Sharma, and K. Maddulety, “Machine learning in banking risk manage
ment: A literature review,” Risks, vol. 7, no. 1, 2019, doi: 10.3390/risks7010029.
[4] A. Mustafa and M. Rahimi Azghadi, “Automated machine learning for healthcare and
clinical notes analysis,” Computers, vol. 10, no. 2, 2021, doi: 10.3390/computers1002
0024.
[5] M. Zekić-Sušac, S. Mitrović, and A. Has, “Machine learning based system for
managing energy efficiency of public sector as an approach towards smart cities,” Int.
J. Inf. Manage., vol. 58, p. 102074, 2021, doi: 10.1016/j.ijinfomgt.2020.102074.
[6] D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang, “Automatic database man
agement system tuning through large-scale machine learning,” in Proceedings of
the 2017 ACM International Conference on Management of Data, 2017,
pp. 1009–1024, doi: 10.1145/3035918.3064029.
[7] S. Baee, E. Pakdamanian, V. Ordonez, I. Kim, L. Feng, and L. Barnes, “EyeCar:
Modeling the visual attention allocation of drivers in semi-autonomous vehicles,”
arXiv Prepr. arXiv1912.07773, 2019.
[8] A. P. Sligar, “Machine learning-based radar perception for autonomous vehicles
using full physics simulation,” IEEE Access, vol. 8, pp. 51470–51476, 2020, doi:
10.1109/ACCESS.2020.2977922.
[9] A. L. Buczak and E. Guven, “A survey of data mining and machine learning
methods for cyber security intrusion detection,” IEEE Commun. Surv. Tutorials,
vol. 18, no. 2, pp. 1153–1176, 2016, doi: 10.1109/COMST.2015.2494502.
[10] K. Shaukat, S. Luo, V. Varadharajan, I. A. Hameed, and M. Xu, “A survey on
machine learning techniques for cyber security in the last decade,” IEEE Access,
vol. 8, pp. 222310–222354, 2020.
[11] J. Stilgoe, “Machine learning, social learning and the governance of self-driving
cars,” Soc. Stud. Sci., vol. 48, no. 1, pp. 25–56, 2018.
[12] Z. Li, K. Xu, H. Wang, Y. Zhao, X. Wang, and M. Shen, “Machine-learning-based
positioning: A survey and future directions,” IEEE Netw., vol. 33, no. 3,
pp. 96–101, 2019, doi: 10.1109/MNET.2019.1800366.
[13] J. Misra and I. Saha, “Artificial neural networks in hardware: A survey of two
decades of progress,” Neurocomputing, vol. 74, no. 1–3, pp. 239–255, 2010.
[14] T. Baji, “Evolution of the GPU device widely used in AI and massive parallel
processing,” in Proceedings of the 2018 IEEE 2nd Electron Devices Technology
and Manufacturing Conference (EDTM), 2018, pp. 7–9.
VLSI and Hardware Implementation 15
[32] E. Nurvitadhi et al., “Can fpgas beat gpus in accelerating next-generation deep
neural networks?,” in Proceedings of the 2017 ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays, 2017, pp. 5–14.
[33] T. Ben-Nun and T. Hoefler, “Demystifying parallel and distributed deep learning: An
in-depth concurrency analysis,” ACM Comput. Surv., vol. 52, no. 4, pp. 1–43, 2019.
[34] Y. LeCun, “1.1 Deep learning hardware: Past, present, and future,” in Proceedings
of the 2019 IEEE International Solid- State Circuits Conference - (ISSCC), 2019,
pp. 12–19, doi: 10.1109/ISSCC.2019.8662396.
[35] Y. Chen, B. Zheng, Z. Zhang, Q. Wang, C. Shen, and Q. Zhang, “Deep learning on
mobile and embedded devices: State-of-the-art, challenges, and future directions,”
ACM Comput. Surv., vol. 53, no. 4, pp. 1–37, 2020.
[36] J. Lemley, S. Bazrafkan, and P. Corcoran, “Deep learning for consumer devices and
services: Pushing the limits for machine learning, artificial intelligence, and com
puter vision,” IEEE Consum. Electron. Mag., vol. 6, no. 2, pp. 48–56, 2017, doi:
10.1109/MCE.2016.2640698.
[37] T. Gong, T. Fan, J. Guo, and Z. Cai, “GPU-based parallel optimization of immune
convolutional neural network and embedded system,” Eng. Appl. Artif. Intell., vol.
62, pp. 384–395, 2017.
[38] L. N. Huynh, Y. Lee, and R. K. Balan, “Deepmon: Mobile gpu-based deep learning
framework for continuous vision applications,” in Proceedings of the 15th Annual
International Conference on Mobile Systems, Applications, and Services, 2017,
pp. 82–95.
[39] M. A. Raihan, N. Goli, and T. M. Aamodt, “Modeling deep learning accelerator
enabled GPUs,” in Proceedings of the 2019 IEEE International Symposium on
Performance Analysis of Systems and Software (ISPASS), 2019, pp. 79–92, doi:
10.1109/ISPASS.2019.00016.
[40] N. Singh and S. P. Panda, “Enhancing the proficiency of artificial neural network on
prediction with GPU,” in Proceedings of the 2019 International Conference on
Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), 2019,
pp. 67–71, doi: 10.1109/COMITCon.2019.8862440.
[41] D. Monroe, “Chips for artificial intelligence,” Commun. ACM, vol. 61, no. 4,
pp. 15–17, 2018.
[42] S. Greengard, “Making chips smarter,” Commun. ACM, vol. 60, no. 5, pp. 13–15, 2017.
[43] D. K. Dewangan and S. P. Sahu, “Deep learning-based speed bump detection model
for intelligent vehicle system using raspberry Pi,” IEEE Sens. J., vol. 21, no. 3,
pp. 3570–3578, 2020.
[44] A. A. S. Zen, B. Duman, and B. Sen, “Benchmark analysis of Jetson TX2, Jetson
Nano and Raspberry PI using deep-CNN,” in Proceedings of the 2020 International
Congress on Human-Computer Interaction, Optimization and Robotic Applications
(HORA), 2020, pp. 1–5, doi: 10.1109/HORA49412.2020.9152915.
[45] H. A. Shiddieqy, F. I. Hariadi, and T. Adiono, “Implementation of deep-learning
based image classification on single board computer,” in Proceedings of the 2017
International Symposium on Electronics and Smart Devices (ISESD), 2017,
pp. 133–137, doi: 10.1109/ISESD.2017.8253319.
[46] M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi, “Design space exploration of FPGA-
based deep convolutional neural networks,” in Proceedings of the 2016 21st Asia and
South Pacific Design Automation Conference (ASP-DAC), 2016, pp. 575–580.
[47] A. Rahman, J. Lee, and K. Choi, “Efficient FPGA acceleration of convolutional
neural networks using logical-3D compute array,” in Proceedings of the 2016
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016,
pp. 1393–1398.
VLSI and Hardware Implementation 17
[48] Q. Xiao, Y. Liang, L. Lu, S. Yan, and Y.-W. Tai, “Exploring heterogeneous algo
rithms for accelerating deep convolutional neural networks on FPGAs,” in
Proceedings of the 54th Annual Design Automation Conference 2017, 2017, pp. 1–6.
[49] Y. Ma, Y. Cao, S. Vrudhula, and J. Seo, “Optimizing loop operation and dataflow in
FPGA acceleration of deep convolutional neural networks,” in Proceedings of the
2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays,
2017, pp. 45–54.
[50] S. I. Venieris and C.-S. Bouganis, “Latency-driven design for FPGA-based con
volutional neural networks,” in Proceedings of the 2017 27th International
Conference on Field Programmable Logic and Applications (FPL), 2017, pp. 1–8,
doi: 10.23919/FPL.2017.8056828.
[51] F. Sun et al., “A high-performance accelerator for large-scale convolutional neural
networks,” in Proceedings of the 2017 IEEE International Symposium on Parallel
and Distributed Processing with Applications and 2017 IEEE International
Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017,
pp. 622–629.
[52] J. Zhang and J. Li, “Improving the performance of OpenCL-based FPGA accel
erator for convolutional neural network,” in Proceedings of the 2017 ACM/SIGDA
International Symposium on Field-Programmable Gate Arrays, 2017, pp. 25–34.
[53] H. Yonekawa and H. Nakahara, “On-chip memory based binarized convolutional
deep neural network applying batch normalization free technique on an FPGA,” in
Proceedings of the 2017 IEEE International Parallel and Distributed Processing
Symposium Workshops (IPDPSW), 2017, pp. 98–105.
[54] H. Nakahara, H. Yonekawa, and S. Sato, “An object detector based on multiscale
sliding window search using a fully pipelined binarized CNN on an FPGA,” in
Proceedings of the 2017 International Conference on Field Programmable
Technology (ICFPT), 2017, pp. 168–175.
[55] H. Nakahara, T. Fujii, and S. Sato, “A fully connected layer elimination for a bi
narizec convolutional neural network on an FPGA,” in Proceedings of the 2017
27th International Conference on Field Programmable Logic and Applications
(FPL), 2017, pp. 1–4.
[56] L.-W. Kim, “DeepX: Deep learning accelerator for restricted boltzmann machine
artificial neural networks,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no.
5, pp. 1441–1453, 2017.
[57] R. Zhao et al., “Accelerating binarized convolutional neural networks with software-
programmable fpgas,” in Proceedings of the 2017 ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays, 2017, pp. 15–24.
[58] U. Aydonat, S. O’Connell, D. Capalija, A. C. Ling, and G. R. Chiu, “An opencl:
Deep learning accelerator on arria 10,” in Proceedings of the 2017 ACM/SIGDA
International Symposium on Field-Programmable Gate Arrays, 2017, pp. 55–64.
[59] X. Wei et al., “Automated systolic array architecture synthesis for high throughput
CNN inference on FPGAs,” in Proceedings of the 54th Annual Design Automation
Conference 2017, 2017, pp. 1–6.
[60] M. Shimoda, S. Sato, and H. Nakahara, “All binarized convolutional neural network
and its implementation on an FPGA,” in Proceedings of the 2017 International
Conference on Field Programmable Technology (ICFPT), 2017, pp. 291–294.
[61] A. X. M. Chang and E. Culurciello, “Hardware accelerators for recurrent neural
networks on FPGA,” in Proceedings of the 2017 IEEE International Symposium on
Circuits and Systems (ISCAS), 2017, pp. 1–4.
[62] J. Guo, S. Yin, P. Ouyang, L. Liu, and S. Wei, “Bit-width based resource parti
tioning for CNN acceleration on FPGA,” in Proceedings of the 2017 IEEE 25th
18 VLSI and Hardware Implementations
[78] Y. Ma, Y. Cao, S. Vrudhula, and J.-S. Seo, “Performance modeling for CNN in
ference accelerators on FPGA,” IEEE Trans. Comput. Des. Integr. Circuits Syst.,
vol. 39, no. 4, pp. 843–856, 2019.
[79] M. P. Véstias, R. P. Duarte, J. T. de Sousa, and H. C. Neto, “A fast and scalable
architecture to run convolutional neural networks in low density FPGAs,”
Microprocess. Microsyst., vol. 77, p. 103136, 2020.
[80] D. J. M. Moss et al., “High performance binary neural networks on the Xeon
+FPGA platform,” in Proceedings of the 2017 27th International Conference on
Field Programmable Logic and Applications (FPL), 2017, pp. 1–4, doi: 10.23919/
FPL.2017.8056823.
[81] H. Zeng and V. Prasanna, “GraphACT: Accelerating GCN training on CPU-FPGA
heterogeneous platforms,” in Proceedings of the 2020 ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays, 2020, pp. 255–265.
[82] E. Nurvitadhi et al., “Why compete when you can work together: FPGA-ASIC
integration for persistent RNNs,” in Proceedings of the 2019 IEEE 27th Annual
International Symposium on Field-Programmable Custom Computing Machines
(FCCM), 2019, pp. 199–207, doi: 10.1109/FCCM.2019.00035.
[83] E. Nurvitadhi et al., “In-package domain-specific asics for intel®stratix®10 fpgas: A
case study of accelerating deep learning using tensortile asic,” in Proceedings of the
2018 28th International Conference on Field Programmable Logic and
Applications (FPL), 2018, pp. 106–1064.
[84] P. Santos, D. Ouellet-Poulin, D. Shapiro, and M. Bolic, “Artificial neural network
acceleration on FPGA using custom instruction,” in Proceedings of the 2011 24th
Canadian Conference on Electrical and Computer Engineering(CCECE), 2011,
pp. 450–455, doi: 10.1109/CCECE.2011.6030491.
[85] J. H. Kim, B. Grady, R. Lian, J. Brothers, and J. H. Anderson, “FPGA-based CNN
inference accelerator synthesized from multi-threaded C software,” in Proceedings
of the 2017 30th IEEE International System-on-Chip Conference (SOCC), 2017,
pp. 268–273, doi: 10.1109/SOCC.2017.8226056.
[86] A. Kyriakos, E.-A. Papatheofanous, B. Charalampos, E. Petrongonas, D. Soudris, and
D. Reisis, “Design and performance comparison of CNN accelerators based on the Intel
Movidius Myriad2 SoC and FPGA embedded prototype,” in Proceedings of the 2019
International Conference on Control, Artificial Intelligence, Robotics Optimization
(ICCAIRO), 2019, pp. 142–147, doi: 10.1109/ICCAIRO47923.2019.00030.
[87] G. Feng, Z. Hu, S. Chen, and F. Wu, “Energy-efficient and high-throughput FPGA-
based accelerator for Convolutional Neural Networks,” in Proceedings of the 2016
13th IEEE International Conference on Solid-State and Integrated Circuit
Technology (ICSICT), 2016, pp. 624–626.
[88] Y. Yoshimoto, D. Shuto, and H. Tamukoh, “FPGA-enabled binarized convolutional
neural networks toward real-time embedded object recognition system for service
robots,” in Proceedings of the 2019 IEEE International Circuits and Systems
Symposium (ICSyS), 2019, pp. 1–5, doi: 10.1109/ICSyS47076.2019.8982469.
[89] K. Guo et al., “Angel-eye: A complete design flow for mapping cnn onto embedded
fpga,” IEEE Trans. Comput. Des. Integr. circuits Syst., vol. 37, no. 1, pp. 35–47, 2017.
[90] N. Paulino, J. C. Ferreira, and J. M. P. Cardoso, “Optimizing OpenCL code for
performance on FPGA: k-Means case study with integer data sets,” IEEE Access,
vol. 8, pp. 152286–152304, 2020, doi: 10.1109/ACCESS.2020.3017552.
[91] L. A. Dias, J. C. Ferreira, and M. A. C. Fernandes, “Parallel implementation of K-
means algorithm on FPGA,” IEEE Access, vol. 8, pp. 41071–41084, 2020, doi:
10.1109/ACCESS.2020.2976900.
[92] T. Aoki, E. Hosoya, T. Otsuka, and A. Onozawa, “A novel hardware algorithm for
real-time image recognition based on real AdaBoost classification,” in Proceedings
20 VLSI and Hardware Implementations
of the 2012 IEEE International Symposium on Circuits and Systems (ISCAS), 2012,
pp. 1119–1122, doi: 10.1109/ISCAS.2012.6271427.
[93] P. Jokic, S. Emery, and L. Benini, “BinaryEye: A 20 kfps streaming camera system
on FPGA with real-time on-device image recognition using binary neural networks,”
in Proceedings of the 2018 IEEE 13th International Symposium on Industrial
Embedded Systems (SIES), 2018, pp. 1–7, doi: 10.1109/SIES.2018.8442108.
[94] B. Ramesh, E. Shea, and A. D. George, “Investigation of multicore SoCs for on-
board feature detection and segmentation of images,” in Proceedings of the
NAECON 2018 - IEEE National Aerospace and Electronics Conference, 2018,
pp. 375–381, doi: 10.1109/NAECON.2018.8556637.
[95] S. Prabhu, V. Khopkar, S. Nivendkar, O. Satpute, and V. Jyotinagar, “Object detection
and classification using GPU acceleration,” in Proceedings of the International
Conference On Computational Vision and Bio Inspired Computing, 2019, pp. 161–170.
[96] C. Oh, S. Yi, and Y. Yi, “Real-time face detection in Full HD images exploiting both
embedded CPU and GPU,” in Proceedings of the 2015 IEEE International Conference
on Multimedia and Expo (ICME), 2015, pp. 1–6, doi: 10.1109/ICME.2015.7177522.
[97] V. Mutneja and S. Singh, “GPU accelerated face detection from low resolution
surveillance videos using motion and skin color segmentation,” Optik (Stuttg)., vol.
157, pp. 1155–1165, 2018.
[98] M. R. Ikbal, M. Fayez, M. M. Fouad, and I. Katib, “Fast implementation of face
detection using LPB classifier on GPGPUs,” in Proceedings of the Intelligent
Computing-Proceedings of the Computing Conference, 2019, pp. 1036–1047.
[99] Y. Lee, C. Jang, and H. Kim, “Accelerating a computer vision algorithm on a
mobile SoC using CPU-GPU co-processing: A case study on face detection,” in
Proceedings of the International Conference on Mobile Software Engineering and
Systems, 2016, pp. 70–76.
[100] A. Yazdanbakhsh, J. Park, H. Sharma, P. Lotfi-Kamran, and H. Esmaeilzadeh,
“Neural acceleration for gpu throughput processors,” in Proceedings of the 48th
international symposium on microarchitecture, 2015, pp. 482–493.
[101] X. Zhang, N. Gu, and H. Ye, “Multi-gpu based recurrent neural network language
model training,” in Proceedings of the International Conference of Pioneering
Computer Scientists, Engineers and Educators, 2016, pp. 484–493.
[102] A. Prakash, N. Ramakrishnan, K. Garg, and T. Srikanthan, “Accelerating computer
vision algorithms on heterogeneous edge computing platforms,” in Proceedings of
the 2020 IEEE Workshop on Signal Processing Systems (SiPS), 2020, pp. 1–6, doi:
10.1109/SiPS50750.2020.9195221.
[103] F. Conti, D. Rossi, A. Pullini, I. Loi, and L. Benini, “PULP: A ultra-low power
parallel accelerator for energy-efficient and flexible embedded vision,” J. Signal
Process. Syst., vol. 84, no. 3, pp. 339–354, 2016.
[104] D. Jeon et al., “A 23-mW face recognition processor with mostly-read 5T memory
in 40-nm CMOS,” IEEE J. Solid-State Circuits, vol. 52, no. 6, pp. 1628–1642,
2017, doi: 10.1109/JSSC.2017.2661838.
[105] N. Zheng and P. Mazumder, “A low-power hardware architecture for on-line su
pervised learning in multi-layer spiking neural networks,” in Proceedings of the
2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018,
pp. 1–5, doi: 10.1109/ISCAS.2018.8351516.
[106] G. K. Chen, R. Kumar, H. E. Sumbul, P. C. Knag, and R. K. Krishnamurthy, “A
4096-Neuron 1M-Synapse 3.8-pJ/SOP spiking neural network with on-chip STDP
learning and sparse weights in 10-nm FinFET CMOS,” IEEE J. Solid-State
Circuits, vol. 54, no. 4, pp. 992–1002, 2019, doi: 10.1109/JSSC.2018.2884901.
[107] C. Frenkel, J.-D. Legat, and D. Bol, “MorphIC: A 65-nm 738k-Synapse/mm$^2$
quad-core binary-weight digital neuromorphic processor with stochastic spike-
VLSI and Hardware Implementation 21
driven online learning,” IEEE Trans. Biomed. Circuits Syst., vol. 13, no. 5,
pp. 999–1010, 2019, doi: 10.1109/TBCAS.2019.2928793.
[108] H. Kim, H. Tang, W. Choi, and J. Park, “An energy-quality scalable stdp based sparse
coding processor with on-chip learning capability,” IEEE Trans. Biomed. Circuits
Syst., vol. 14, no. 1, pp. 125–137, 2020, doi: 10.1109/TBCAS.2019.2963676.
[109] P.-Y. Chuang, P.-Y. Tan, C.-W. Wu, and J.-M. Lu, “A 90nm 103.14 TOPS/W
binary-weight spiking neural network CMOS ASIC for real-time object classifi
cation,” in Proceedings of the 2020 57th ACM/IEEE Design Automation
Conference (DAC), 2020, pp. 1–6, doi: 10.1109/DAC18072.2020.9218714.
[110] G. S. Nagpal, G. Singh, J. Singh, and N. Yadav, “Facial detection and recognition
using OpenCV on Raspberry Pi Zero,” in Proceedings of the 2018 International
Conference on Advances in Computing, Communication Control and Networking
(ICACCCN), 2018, pp. 945–950, doi: 10.1109/ICACCCN.2018.8748389.
[111] H. Daryanavard and A. Harifi, “Implementing face detection system on UAV using
Raspberry Pi platform,” in Proceedings of the Iranian Conference on Electrical
Engineering (ICEE), 2018, pp. 1720–1723, doi: 10.1109/ICEE.2018.8472476.
[112] G. Zhang, K. Zhao, B. Wu, Y. Sun, L. Sun, and F. Liang, “A RISC-V based
hardware accelerator designed for Yolo object detection system,” in Proceedings of
the 2019 IEEE International Conference of Intelligent Applied Systems on
Engineering (ICIASE), 2019, pp. 9–11, doi: 10.1109/ICIASE45644.2019.9074051.
[113] E. S. Lage, R. L. Santos, S. M. T. Junior, and F. Andreotti, “Low-cost IoT sur
veillance system using hardware-acceleration and convolutional neural networks,”
in Proceedings of the 2019 IEEE 5th World Forum on Internet of Things (WF-IoT),
2019, pp. 931–936, doi: 10.1109/WF-IoT.2019.8767325.
2 Machine Learning for
Testing of VLSI Circuit
Abhishek Choubey and Shruti Bhargava Choubey
Sreenidhi Institute of Science and Technology, Hyderabad,
India
CONTENTS
2.1 Introduction..................................................................................................... 23
2.2 Machine Learning Overview .........................................................................25
2.3 Machine Learning Applications in IC Testing..............................................28
2.4 ML in Digital Testing ....................................................................................29
2.5 ML in Analog Circuit Testing .......................................................................30
2.6 ML in Mask Synthesis and Physical Placement ...........................................35
2.7 Conclusion ......................................................................................................36
Acknowledgment .....................................................................................................37
References................................................................................................................37
2.1 INTRODUCTION
Digital and analog devices and circuits are basic electronic components in the
widest type of electronic devices. In addition to consumer electronics markets, the
IC industry more than ever is pressed by the enormous demand for medical,
healthcare, automotive, or security electronics [1]. Analog/random forest (RF)
components are already present in more than 50% of the total IC shipments yearly;
thus, their design, test, and validation are fundamental tasks to meet the stringent
time-to-market constraints and production costs [2].
The complexity of components and scaling of a device has led to much effort in
testing Very Large Scale Integrated (VLSI). To solve the gradually prominent test
cost issue of analog and digital devices, the machine learning (ML) approach has
received a lot of consideration [3]. ML algorithms are presently incorporated in
solutions to numerous VLSI testing issues. ML stances are at the height of current
technological development, enhanced by the influence of digital cloud computing
and the availability of vast data and storage. In modern manufacturing, ML-based
approaches are now commonly used as they can provide effective results to com
plex issues that were thought insoluble a decade earlier. Nearly any domain spans
the spectrum of their implementations and retains commitments that are restricted
only by human imagination. In recent years, there have been various developments
in ML and artificial intelligence (AI), including the emergence of deep learning
DOI: 10.1201/9781003201038-2 23
24 VLSI and Hardware Implementations
Specification(1,2…N)
Analysis
Design
implementation
Simulation
No Expected
output
Yes
Development
structured data such as videos and images that are more complex, to unstructured
data such as graphs. The basic procedure for the implementation of ML algorithms
is as follows:
i. In the preprocessing step, the appropriate features are first chosen and the
data with these characteristics are then extracted from the raw data so that
they can be used to discriminate between the various values of the target
outputs. After that, multiple data cleansing and engineering features, such as
the preference of key features, scaling, and generating the sample dataset for
learning. Dimensionality reduction is performed.
ii. In the learning process, to extract the models from the training dataset, the
required learning algorithms are chosen and executed. These models are asso
ciated with a certain underlying data principle. Cross-validation, assessment of
performance, and hyper-parameters are performed to achieve the final versions.
iii. In the assessment phase, the final models are evaluated to measure their
performance, the test dataset. In practice, it is possible to pick and configure
assessment parameters according to various scenarios.
iv. The final models are analyzed in the simulation process to infer the predicted
target output values for the new input results.
In compliance with the essence of the data types that ML processes, there are two
main types of categories for learning that are digital and analog areas that are
commonly explored: supervised learning and unsupervised learning [10]. The su
pervised learning approach utilizes the labeled knowledge to carry out model
training. The final models to estimate the target classes are chosen and provide new
input data; for unsupervised learning, on the other hand, techniques concentrate
mainly on identifying the possible relations, characterizing the data when the data
labels for all area groups are inaccessible through learning from them. In fact, the
data marks being open is the main distinction between supervised learning and
unsupervised learning.
Figure 2.2 shows a description of the most popular ML algorithms in digital and
analog/RF testing. The prominent approaches used for unsupervised ML are
Bayesian induction, clustering of k-means, and spectral clustering. Artificial neural
networks (ANNs), decision trees, support vector machines (SVMs), RFs, and
Bayesian networks are widespread methods that are used for supervised learning.
Due to the availability of standard instruments, particularly ANNs and SVMs,
supervised learning is more popular. The ML-based design produces responsive
solutions [11]. The caveat is how many effective models are obtained. The appli
cation of the principles and methods of ML to digital and analog devices should
come as no surprise, especially in the semiconductor industry, which produces
billions of devices each year. The ML-based design flow is shown in Figure 2.3.
Many problems that can be solved using ML methods are common in semiconductor
manufacturing, such as feature learning (to determine the optimal circuit that produces a
certain output given a certain input), prediction (to determine chip efficiency indirectly),
classification (to classify a device as functional or defective, given a set of test results and
data from previous training). In numerous processes spanning the entire chip design and
Testing of VLSI Circuit 27
ML algorithm
Decision tree
K-Nearest neighbors
Specification(1,2…N)
Machine learning
algorithm
Machine learning
algorithm training and
simulation
Model
Development
production period, these issues appear, including physical design, routing, optimization of
design, performance representation testing for fault detection, validation, health estima
tion, and security, to name only the most representative [12].
28 VLSI and Hardware Implementations
New device
Yes
No
Strict defect filter Suspicious
process variations(pass)
Lenient defect filter Yes
learned regression
functions Fail
Marginal Gross defect fail
Pass or fail
Performance prediction
pass
TABLE 2.1
Summary ML in IC Testing
S.No. Applications of ML in Test Contribution Reference
Inputs Combinational
Outputs
circuit (CUT)
Flip-flop
Stored exact
response
11…00,01,10 10…10,00,11
Combinational
Comparator Test result
circuit (CUT)
Inputs patterns
particularly SVMs and ANNs [24], supervised learning is more recent. While su
pervised learning is often chosen over unsupervised methods, labels are often not
present or tough to obtain. Therefore, the method must rely on the type of available
data. There are frequent prospects for the application of supervised learning in the
field of digital logic testing. Table 2.2 shows several data sources that have been
used or may be used for ML applications in the field of digital electronic testing.
The technique of defect location uses a feedback mechanism called diagnosis. If IC
technology decreases and the level of integration increases, the number and variety of
bugs eventually increase, and the diversity and number of defects certainly escalate.
Conventionally, defects are found using a physical-level approach called physical
failure analysis (PFA). Defects can affect both memory and combinational compo
nents. In scan-chain diagnosis, the flip-flops are tested first and diagnosed for defects
during the manufacturing test [11,31–35]. Thereafter, the faults in the rest of the circuit
are diagnosed. Diagnosis is hierarchically performed. ML methods that have been used
for diagnosis at different levels of circuit hierarchy are shown in Table 2.3.
TABLE 2.2
Summary of Digital Electronic Testing and ATPG with ML Techniques
S.No. Contribution Reference
TABLE 2.3
Summary of ML Techniques for a Different Levels of Circuits
S.No. Test Article Main Research Contribution
1 Wafer-level diagnosis [ 13, 36, 37] i. Applying SVM approach using historical
data set to find automated die inking
ii. Identifying defect clusters using failure
dataset and clustering approach
iii. Applying statistical correlation method to
the correction of failure and parameters
using the failure dataset
2 Scan-chain diagnosis [ 10,27] i. Using Bayesian approach with failure
dataset to target hard-to-model faults
3 Fault diagnosis: pre-processing [ 31, 32] i. Using classification technique with failure
dataset to the regulation of test data volume
ii. Using random forest method with failure
dataset to find inferring diagnostic
efficiency
4 Fault diagnosis: Defect [ 24, 33] i. Defect classification using ANN approach
postprocessing identification with the simulated dataset
ii. Using decision tree approach with
simulated and failure dataset to identifying
bridging defects
Improving [ 25, 26] i. Transient and intermittent faults
diagnostic identification using Bayesian network with
resolution simulated dataset
ii. Improving diagnostic resolution fault
detection using the simulated dataset
5 Volume diagnosis [ 11,25] i. Volume diagnosis of unmodeled faults by
SVM approach with the simulated dataset
ii. Volume diagnosis for root cause
identification by MLE and Bayesian
network algorithm with simulated dataset
iii. Identification of systematic defects with
failure dataset using clustering technique
6 Board-level diagnosis [ 28,29] i. Fault isolation with past data using ANN/
SVM decision tree technique
ii. The computation of missing syndrome
using the Bayes technique
7 Test compression [ 5] Test cost optimization with simulated dataset
using SVR algorithm
8 Circuit testability [ 12,30] i. Prediction of X-sensitivity with structural
feature and simulated dataset using SVR
approach
ii. Test point insertion with structural feature
and simulated dataset using GCN technique
9 Timing analysis [ 34] Based on PSN with simulated dataset using
multiple tools
Testing of VLSI Circuit 33
ML-in circuit
modeling
For RF and microwave modeling and architecture, neural networks have been
used, where passive/active component/circuit models based on ANN are then used
at higher levels of design. Therefore, in contrast to the costly system, a reliable
response of the whole system can be achieved within a shorter period. From theory
to implementation, modeling and architecture was discussed. The authors noted that
neural networks are desirable alternatives to traditional approaches, such as com
putationally costly numerical simulation methods or analytical methods that may be
hard to achieve for new devices [41]. They included examples where neural net
works are used in printed circuit boards (PCBs), coplanar waveguide (CPW) dis
continuities, and MESFETs to model signal propagation delays of a VLSI
interconnect network, all from previous works in the literature.
To address the problem of time efficiency, ML-based synthesis approaches have
become popular. The concept behind the use of ML in circuit synthesis is to replace
simulations with practical simulations. Model(s) were developed with ML techni
ques; therefore, during the synthesis process, the excessive number of simulations
can be prevented. The concept behind the use of ML in circuit synthesis is to
replace the simulations with the functional model(s) produced by ML techniques, so
during the synthesis process, the unnecessary number of simulations can be
avoided. Optimization-based circuit synthesis, which uses an optimization approach
to explore the design space, is the most known method for automating circuit
synthesis. The design time in which many nature-inspired algorithms (evolutionary,
particle swarm, reinforcement learning, etc.) are used to scan the design space to
find an optimal solution for a given circuit problem is probably accelerated by
analog/RF circuit optimization tools. One use of ML-based optimization techniques
is to leverage optimization tools for dataset creation [40,42,43].
The widespread application of ML to numerous fields, including the automation
of analog/RF IC layout, opens new perspectives for the creation of push-button
technologies that combine legacy data or expert design insights simultaneously in a
way that was not feasible in previous generations of EDA software. These recent
layout automation ML technologies vary from placement software to drafters for
routing, but also pre- and post-placement analysis. The several ML approaches,
analog and RF circuits, are summarized in Table 2.4.
34 VLSI and Hardware Implementations
TABLE 2.4
Summary of ML Techniques for Analog and RF Circuits
Article Main Research Contribution
TABLE 2.5
Summary of ML Method for Mask Synthesis
Article Main Research Contribution
Sub-Resolution Assist [ 55] Supervised learning can classify efficiently approximate model-
Features (SRAFs) based SRAF and predict whether SRAFs can cover pixels.
Optical Proximity Correction [ 56] Bayes model (HBM) and generalized linear mixed model
(GLMM) can explore OPC problem with CCAS feature like
convex, line-end edge, different edge types, etc.
Clock optimization [ 57] A learning-based model that can be classified for latch
optimization is proposed.
Lithography hotspot detection [ 58] The deep convolutional neural network (CNN) can classify the
feature of lithography hotspot detection.
2.7 CONCLUSION
We also looked at numerous issues that emerge in the research and diagnosis of
VLSI circuits where ML has been applied. In managing the complexities of the
problem, they have outperformed conventional heuristic-based methods and given
practical solutions much faster. ML-based approaches have recently been used
successfully in many applications, where increased learning capacity makes them
unique in solving any complex/nonlinear problem. IC architecture has also bene
fited from ML techniques at various stages of design, from device simulation to
fabricated research. The attempts in device/circuit/system modeling have been
Testing of VLSI Circuit 37
aimed at generating precise models at various abstraction levels and replacing the
simulator with these models especially in RF applications; thus, it is possible to
minimize human effort and design time.
In the future, solutions to automatic feature and data generation innovation
problems will set off further implementation of ML methods to other chip testing
issues. Useless to mention, ample scope remnants for data generation and the de
velopment of digital circuit illustration procedures that will enrich industrial as well
as an academic study in the field of ML-guided testing.
ACKNOWLEDGMENT
The authors would like to thank the Sreenidhi Institute of Science and Technology
Hyderabad for providing the infrastructure to conduct the research.
REFERENCES
[1] Wang, F. et al., “Bayesian model fusion: Large-scale performance modeling of
analog and mixed-signal circuits by reusing early-stage data,” IEEE Trans.
Comput.-Aided Design Integr. Circuits Syst., vol. 35, no. 8, pp. 1255–1268, 2016.
[2] Afacan, E., Lourenço, N., Martins, R., & Dündar, G., “Review: Machine learning
techniques in analog/RF integrated circuit design, synthesis, layout, and test,”
Integration, vol. 77, pp. 113–130, 2020.
[3] Gusmao, A. et al., “Semi-supervised artificial neural networks towards analog IC
placement recommender,” in Proc. IEEE International Symposium on Circuits and
Systems (ISCAS), 2020, pp. 1–5.
[4] Pradhan, M., & Bhattacharya, B. B., “A survey of digital circuit testing in the light
of machine Learning,” WIREs Data Mining Knowl. Discov., vol. 11, 2020.
[5] Li, Z., Colburn, J. E., Pagalone, V., Narayanun, K., & Chakrabarty, K., “Test-cost
optimization in a scan-compression architecture using support-vector regression,” in
Proc. VTS, 2017, pp. 1–6.
[6] Pan, P.-C., Huang, C.-C., & Chen, H.-M., “Late breaking results: An efficient
learning-based approach for performance exploration on analog and RF circuit
synthesis,” in Proc. 56th ACM/IEEE Design Automation Conference (DAC),
2019, pp. 1–2.
[7] Xiao, Y., & He, Y., “A novel approach for analog fault diagnosis based on neural
networks and improved kernel PCA,” Neurocomputing, vol. 74, no. 7,
pp. 1102–1115, 2011.
[8] Islamoglu, G. et al., “Artificial neural network assisted analog IC sizing tool,” in
Proc. 16th International Conference on Synthesis, Modeling, Analysis and
Simulation Methods and Applications to Circuit Design (SMACD), IEEE, 2019,
pp. 9–12.
[9] Murphy, K. P., “Machine learning a probabilistic perspective (adaptive computation
and machine learning series),” MIT Press, 2012.
[10] Huang, Y., Benware, B., Klingenberg, R., Tang, H., Dsouza, J., & Cheng, W., “Scan
chain diagnosis based on unsupervised machine learning,” in Proc. ATS, 2017,
pp. 225–230.
[11] Cheng, W., Tian, Y., & Reddy, S. M., “Volume diagnosis data mining,” In Proc.
Proc. ETS, 2017, pp. 1–10.
[12] Pradhan, M., Bhattacharya, B. B., Chakrabarty, K., & Bhattacharya, B. B.,
“Predicting X-sensitivity of circuit-inputs on testcoverage: A machine-learning
38 VLSI and Hardware Implementations
approach,” IEEE Trans. Comput.-Aided Design, vol. 38, no. 12, pp. 2343–2356,
December 2019.
[13] Tikkanen, J., Siatkowski, S., Sumikawa, N., Wang, L., & Abadir, M. S., “Yield
optimization using advanced statistical correlation methods,” in Proc. ITC, 2014,
pp. 1–10.
[14] Hsiao, S.-W. et al., “Analog sensor-based testing of phase-locked loop dynamic
performance parameters,” in Proc. IEEE Asian Test Symp., 2013, pp. 50–55.
[15] Sumikawa, N. et al., “An experiment of burn-in time reduction based on parametric
test analysis,” in Proc. Proc. IEEE Int. Test Conf., 2012.
[16] Vasan, A. S. S. et al., “Diagnostics and prognostics method for analog electronic
circuits,” IEEE Trans. Ind. Electron., vol. 60, no. 11, pp. 5277–5291, 2013.
[17] Andraud, M. et al., “One-shot non-intrusive calibration against process variations
for analog/RF circuits,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 63, no. 11,
pp. 2022–2035, 2016.
[18] Lin, F. et al., “AdaTest: An efficient statistical test framework for test escape
screening,” in Proc. IEEE Int. Test Conf., 2015.
[19] Stratigopoulos, H.-G., & Streitwieser, C., “Adaptive test with test escape estimation
for mixed-signal ICs,” IEEE Trans. Computer.-Aided Design Integr. Circuits Syst.,
vol, 37, no. 10, pp. 2125–2138, 2018.
[20] Huang, K. et al., “Low-cost analog/RF IC testing through combined intra-and inter-
die correlation models,” IEEE Des. Test. Comput., vol. 32, no. 1, pp. 53–60, 2015.
[21] Stratigopoulos, H.-G., “Test metrics model for analog test development,” IEEE
Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 31, no. 7, pp. 1116–
1128, 2012.
[22] Banerjee, D. et al., “Real-time use-aware adaptive RF transceiver systems for en
ergy efficiency under BER constraints,” IEEE Trans.Comput.-Aided Design Integr.
Circuits Syst., vol. 34, no. 8, pp. 1209–1222, 2015.
[23] Wang, L.-C., “Experience of data analytics in EDA and test principles, promises,
and challenges,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 36,
no. 6, pp. 885–898, 2017.
[24] Nelson, J. E., Tam, W. C., & Blanton, R. D., “Automatic classification of bridge
defects,” in Proc. ITC, 2010, pp. 1–10.
[25] Huisman, L. M., Kassab, M., & Pastel, L., “Data mining integrated circuit fails with
fail commonalities,” in Proc. ITC, 2004.
[26] Xue, Y., Poku, O., Li, X., & Blanton, R. D., “Padre: Physically-aware diagnostic
resolution enhancement,” in Proc. ITC, 2013, pp. 1–10.
[27] Chern, M., Lee, S.-W., Huang, S.-Y., Huang, Y., Veda, G., Tsai, K.-H. H., & Cheng,
W.-T., “Improving scan chain diagnostic accuracy using multi-stage artificial neural
networks,” in Proc. ASPDAC, 2019, pp. 341–346. New York, NY: ACM.
[28] Sun, Z., Jiang, L., Xu, Q., Zhang, Z., Wang, Z., & Gu, X., “Agentdiag: An agent-
assisted diagnostic framework for boardlevel functional failures,” in Proc. ITC,
2013, pp. 1–8.
[29] Ye, F., Chakrabarty, K., Zhang, Z., & Gu, X., “Self-learning and adaptive board-
level functional fault diagnosis,” in Proc. ASPDAC, 2015, pp. 294–301.
[30] Ma, Y., Ren, H., Khailany, B., Sikka, H., Luo, L., Natarajan, K., & Yu, B., “High
performance graph convolutional networks with applications in testability analysis,”
in Proc. DAC, 2019, pp. 1–6.
[31] Wang, H., Poku, O., Yu, X., Liu, S., Komara, I., & Blanton, R. D., “Test-data
volume optimization for diagnosis,” in Proc. DAC, 2012, pp. 567–572.
[32] Huang, Q., Fang, C., Mittal, S., & Blanton, R. D. S., “Improving diagnosis effi
ciency via machine learning,” in Proc. ITC, 2018, pp. 1–10.
Testing of VLSI Circuit 39
[33] Gómez, L. R., & Wunderlich, H., “A neural-network-based fault classifier,” in Proc.
ATS, 2016, pp. 144–149.
[34] Ye, F., Firouzi, F., Yang, Y., Chakrabarty, K., & Tahoori, M. B., “On-chip droop-
induced circuit delay prediction based on support-vector machines,” IEEE Trans.
Comput.-Aided Design, vol. 35, no. 4, pp. 665–678, April 2016.
[35] Zhang, Q.-J., Gupta, K. C., & Devabhaktuni, V. K., “Artificial neural networks for
RF and microwave design-from theory to practice,” IEEE Trans. Microw. Theory
Tech., vol. 51, no. 4, pp. 1339–1350, 2003.
[36] Sumikawa, N., Nero, M., & Wang, L., “Kernel based clustering for quality im
provement and excursion detection,” in Proc. ITC, 2017, pp. 1–10.
[37] Xanthopoulos, C., Sarson, P., Reiter, H., & Makris, Y., “Automated die inking: A
pattern recognition-based approach,” in Proc. ITC, 2017, pp. 1–6.
[38] Kaya, E., Afacan, E., & Dundar, G., “An analog/RF circuit synthesis and design
assistant tool for analog IP: DATA-IP,” in Proc. 15th International Conference on
Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit
Design (SMACD), IEEE, 2018, pp. 1–9.
[39] Stratigopoulos, H.-G., “Machine learning pplications in IC testing,” in Proc. 2018
23rd IEEE ETS, 2018.
[40] Grabmann, M., Feldhoff, F., & Gläser, G., “Power to the model: Generating energy-
aware mixed-signal models using machine learning,” in Proc. 16th International
Conference on Synthesis, Modeling, Analysis and Simulation Methods and
Applications to Circuit Design (SMACD), IEEE, 2019, pp. 5–8.
[41] Liu, H. et al., “Remembrance of circuits past: Macro modeling by data mining in
large analog design spaces,” in Proc. 39th Annual Design Automation Conference,
2002, pp. 437–442.
[42] Watson, P. M., & Gupta, K. C., “Design and optimization of CPW circuits using
EM-ANN models for CPW components,” IEEE Trans. Microw. Theory Tech., vol.
45, no. 12, pp. 2515–2523, 1997.
[43] Ceperic, V., & Baric, A., “Modeling of analog circuits by using support vector
regressionmachines,” in Proc. 2004 11th IEEE International Conference on
Electronics, Circuits and Systems (ICECS), IEEE, 2004, pp. 391–394.
[44] Harkouss, Y. et al., “The use of artificial neural networks in nonlinear microwave
devices and circuits modeling: An application to telecommunication systemdesign
(invited article),” Int. J. RF and Microw. Comput.-Aided Eng., vol. 9, no. 3,
pp. 198–215, 1999.
[45] Vural, R. et al., “Process independent automated sizing methodology for current
steering dac,” Int. J. Electron., vol. 102, no. 10, pp. 1713–1734, 2015.
[46] Bhatia, V., Pandey, N., & Bhattacharyya, A., “Modelling and design of inverter
threshold quantization based current comparator using artificial neural networks,”
Int. J. Electr. Comput. Eng., vol. 6, no. 1, pp. 2088–8708, 2016.
[47] Wolfe, G., & Vemuri, R., “Extraction and use of neural network models in auto
mated synthesis of operational amplifiers,” IEEE Trans. Comput.-Aided Design
Integr. Circuits Syst., vol. 22, no. 2, pp. 198–212, 2003.
[48] Zhu, K., et al., “Geniusroute: A new analog routing paradigm using generative
neural network guidance,” in Proc. International Conference on Computer Aided
Design (ICCAD), 2019.
[49] Kunal, K. et al., “Align: Open-source analog layout automation from the ground
up,” in Proc. 56th Annual Design Automation Conference (DAC), 2019, pp. 1–4.
[50] Xu, B. et al., “Wellgan: Generative-adversarial-network-guided well generation for
analog/mixed-signal circuit layout,” in Proc. 56th ACM/IEEE Design Automation
Conference (DAC), IEEE, 2019, pp. 1–6.
40 VLSI and Hardware Implementations
[51] Andraud, M., Stratigopoulos, H., & Simeu, E., “One-shot non-intrusive calibration
against process variations for analog/RF circuits,” IEEE Trans. Circuits Syst. I
Regul. Pap., vol. 63, no. 11, pp. 2022–2035, 2016.
[52] Stratigopoulos, H. et al., “RF specification test compaction using learning ma
chines,” IEEE Trans. Very Large Scale Integr. VLSI Syst., vol. 18, no. 6,
pp. 998–1002, 2010.
[53] Binu, D., & Kariyappa, B. S., “RideNN: A new rider optimization algorithm-based
neural network for fault diagnosis in analog circuits,” IEEE Trans. Instrum. Meas.,
vol. 68, no. 1, pp. 2–26, 2019.
[54] Zhang, C. et al., “A multiple heterogeneous kernel RVM approach for analog circuit
fault prognostic,” Cluster Comput., vol. 22, no. 2, pp. 3849–3861, 2019.
[55] Xu, X., Matsunawa, T., Nojima, S., Kodama, C., Kotani, T., & Pan, D. Z., “A machine
learning based framework for sub-resolution assist feature generation,” in Proc. ACM
International Symposium on Physical Design (ISPD), 2016, pp. 161–168.
[56] Matsunawa, T., Yu, B., & Pan, D. Z., “Optical proximity correction with hier
archical bayes model,” J. Micro Nanolithogr. MEMS MOEMS, vol. 15, no. 2,
pp. 021,009–021,009, 2016.
[57] Ward, S. I., Viswanathan, N., Zhou, N. Y., Sze, C. C., Li, Z., Alpert, C. J., & Pan, D.
Z., “Clock power minimization using structured latch templates and decision tree
induction,” in Proc. IEEE/ACM International Conference on Computer-Aided
Design (ICCAD), 2013, pp. 599–606.
[58] Yu, Y. T., Chan, Y. C., Sinha, S., Jiang, I. H. R., & Chiang, C., “Accurate process-
hotspot detection using critical design rule extraction,” in Proc. ACM/IEEE Design
Automation Conference (DAC), 2012.
3 Online Checkers to
Detect Hardware Trojans
in AES Hardware
Accelerators
Sree Ranjani Rajendran1 and
Rajat Subhra Chakraborty 2
1
RISE Lab, Indian Institute of Technology Madras, Chennai,
Tamil Nadu, India
2
Indian Institute of Technology Kharagpur, Kharagpur,
West Bengal, India
CONTENTS
3.1 Introduction: Background and Driving Forces..............................................41
3.1.1 Threat Model ......................................................................................43
3.2 Proposed Methodology: Online Monitoring for HT Detection ....................44
3.2.1 Reliability-Based Node Selection to Insert Checker.........................44
3.3 Results and Discussion...................................................................................46
3.3.1 Results of Benchmark Circuits ..........................................................46
3.3.2 Results for AES Encryption Unit ......................................................46
3.4 Conclusion ......................................................................................................51
References................................................................................................................51
DOI: 10.1201/9781003201038-3 41
42 VLSI and Hardware Implementations
The design stages of the entire IC design flow is insecure for various vulnerabilities.
The adversaries in the design house will have access to the design and might sabotage
the design to serve other interests. The mitigation of these security threats, with
minimal overhead on design, fabrication, and testing of ICs, is a primary motivation of
widespread research. The security threats in various stages of IC design flow and their
corresponding countermeasures are described in Figure 3.1.
Strategies either to simplify detection of HT or prevention of HT insertion become
a prominent research. HT prevention schemes, with significant design overhead,
modify the circuit at various levels of design abstraction by adding additional locking
circuitry [5–10]. Thus, reverse engineering to the adversary at the foundry is difficult
in the obfuscated circuit. These DFTr techniques can be broadly categorized, as shown
in Figure 3.2.
Sametinger et al. [11] discussed security challenges for medical devices. The
authors addressed security risk in medical devices due to HT. HT implanted in
FIGURE 3.1 Modern IC life cycle and its security threats with countermeasures.
pacemakers installed in the human heart is a real-time example. Dong et al. [12]
developed a framework to protect Internet of Things chips from multi-layer HT called
RG-Secure, which combines the third-party intellectual property trusted design
strategy with the scan-chain netlist feature analysis technology. The RG-Secure detects
HTs by analyzing multilayer rather than directly analyzing the functionality based on
netlists. However, the design overhead is very high in RG-Secure framework. With
this motivation and background, we propose an online monitoring system of HT
detection for smart healthcare devices during runtime of the device. The main ob
jective of the proposed work is to establish a secured hardware adopting a detection
scheme, through the development of low hardware and computational overhead,
which can counteract the security threats at various stages of modern IC life-cycle. The
EDA tools are used to implement the proposed schemes with automated design
methodology. The main goal of this research is to provide a comprehensive solution
for all the stages of the design flow. This is achieved by the promising online checking
technique [13], to detect hard-to-detect HTs with acceptable and controlled hardware
overhead. Detection and prevention of HTs at the circuit netlist-level have thus
emerged as a significance research area along with the investigation of newer threats.
This work includes detection of HT at the design phase with minimal hardware
overhead with high accuracy due to the node selection by reliability analysis.
• The HT instances are inserted within a radius of two logic gates of the in
serted checker.
• The HT instances are inserted at a radius of ten or more logic gates away from
the inserted checker.
TABLE 3.1
Threat Model
Scenarios IP Vendor Design House Foundry End-user
i ☻ ☻ ☻ ☺
• For a given netlist, the internal nodes are identified to which the HTs are inserted.
• The low-overhead checkers are inserted at a selected subset of internal nodes.
• The logic malfunctions are identified and reported by the online checkers.
• The logic errors are propagated to the primary outputs, and they are in
dependent to the test vectors generated.
• The numbers of checkers inserted are restricted to have a low hardware
overhead with maximum HT detection accuracy.
The proposed technique is based on the following main observation: that there exists a
logical correlation between the HT-infected node to its nearby nodes. By neighbor
hood of a node N which is input to c different gates G1; G2;…; GC, we mean the input
and output nodes of these c logic gates. Hence, the checkers are inserted at well-
chosen specific nodes, to monitor and report logic malfunctions in its proximity.
Figure 3.3 shows an example of both single-rail and double-rail checkers are
included in this work. The checker does not add any functionality to the circuit; that
is, circuit functionality and as well as the triggering condition of the HT will not be
altered by the proposed checker.
FIGURE 3.3 Single-rail and double-rail checker at the probable HT insertion sites to check
the logical correlation of important node with the neighborhood node [ 15].
FIGURE 3.4 An automated design flow of the proposed HT detection methodology [ 15].
46 VLSI and Hardware Implementations
important internal nodes were first identified by the SPRA algorithm [15]. Then, for
a given area threshold (upper bound) Ath on the area overhead, a trial Nc number of
checkers with the appropriate logic were inserted in the netlist, the checker outputs
were combined in a final checker, and the modified netlist was resynthesized.
FIGURE 3.6 Improvement in HT detection coverage when HT is inserted away from checker.
VLSI and Hardware Implementations
Online Checkers to Detect Hardware Trojans
FIGURE 3.7 Design overhead results of proposed online monitoring technique for enhanced HT detection.
49
50 VLSI and Hardware Implementations
TABLE 3.2
Detection Coverage Improvement When Checkers Are at Important Nodes
Coverage with Checker (%)
c17 1 50 90 40
c432 1 40 100 60
c499 8 53 93 40
c1355 11 66 99 33
c1908 10 35 98 63
c2670 12 24 89 65
c3540 18 32 98 66
c5315 33 30 96 66
c6288 4 26 90 64
s27 1 58 95 37
s208 2 54 92 38
s298 2 81 90 9
s382 7 79 98 19
s400 12 68 88 20
s444 12 75 97 22
s526 15 28 89 61
s15850 45 31 99 68
s38417 65 33 91 58
b17 80 20 93 73
Avg. 17.84 46.47 93.94 47.47
overhead below 5%. Table 3.2 shows the performance and design overheads for the
three AES core designs integrated with checker module. The actual area overhead
was less than 5%, and the power and timing overheads were within the acceptable
limits. The overall design overhead of the proposed scheme is shown in Table 3.3.
TABLE 3.3
Checker Efficiency and Design Overheads of the AES Modules
Design Overhead (%)
TABLE 3.4
Overall Design Overheads of the AES Core (%)
Parameter AES AES+Checker Overhead (%)
Again, the area and power overheads of the checker integrated AES core were less
than 5% with a 96.2% of detection coverage overhead (Table 3.4).
3.4 CONCLUSION
Physical devices used in the IoMT to monitor and guide patients remotely are not
trustworthy, due to the insertion of HTs in the ICs embedded in them. Detection of
HTs is important, since an undetected HT in a deployed electronic system can be a
serious threat to privacy, security, and safety. In this context, we have developed an
automated low overhead online monitoring technique to detect HT at the gate-level
netlist. The proposed technique was applied to develop a secured system archi
tecture directed towards design-for-trust. The effectiveness of the proposed scheme
was experimentally validated, where detection coverage was close to 100% when
controlled design overhead was 10%.
The proposed technique is resistant to cell replacement attack, since reverse
engineering is not possible in the modified design and can be extended easily to
secure the system designed. The proposed online monitoring scheme is validated on
ISCAS’85, ISCAS’95, ITC’99, and AES encryption modules to detect HTs with
high detection coverage. Thus, the proposed online checkers could possibly provide
secured hardware for smart healthcare devices at the design level.
An extension of the current work is to develop online checkers for all smart
healthcare devices that also include data and software security along with the
proposed hardware security.
REFERENCES
[1] Rajendran, Sree Ranjani, Rijoy Mukherjee, and Rajat Subhra Chakraborty. “SoK:
Physical and logic testing techniques for hardware Trojan detection.” In
Proceedings of the 4th ACM Workshop on Attacks and Solutions in Hardware
Security, pp. 103–116, 2020.
[2] Ranjani, Rajendran Sree. “Machine learning applications for a real-time monitoring
of arrhythmia patients using IoT.” In Internet of Things for Healthcare
Technologies, pp. 93–107, Springer, Singapore, 2020.
[3] Ranjani, R. Sree, and M. Nirmala Devi. “Malicious hardware detection and design
for trust: an analysis.” Elektrotehniski Vestnik 84, no. 1/2 (2017): 7.
52 VLSI and Hardware Implementations
[4] Nixon, Patrick, Waleed Wagealla, Colin English, and Sotirios Terzis. “Security,
privacy, and trust issues in smart environments.” In Smart Environments: Technology,
Protocols and Applications, pp. 220–240, Wiley-Blackwell, USA, 2005.
[5] Kanuparthi, Arun, Ramesh Karri, and Sateesh Addepalli. “Hardware and embedded
security in the context of internet of things.” In Proceedings of the 2013 ACM
workshop on Security, privacy & dependability for cyber vehicles, pp. 61–64, 2013.
[6] Colins, D. “Trust in integrated circuits (tic).” DARPA Solicitation BAA07-24 (2007).
[7] Tehranipoor, Mohammad, and Farinaz Koushanfar. “A survey of hardware trojan
taxonomy and detection.” IEEE Annals of the History of Computing 01 (1900): 10–25.
[8] Rajendran, Sree Ranjani. “KARNA for a trustable hardware.” In Proceedings of the
2020 24th International Symposium on VLSI Design and Test (VDAT), pp. 1–4,
IEEE, 2020.
[9] Ranjani, R. Sree, and M. Nirmala Devi. “Enhanced logical locking for a secured
hardware IP against key-guessing attacks.” In Proceedings of the International
Symposium on VLSI Design and Test, pp. 186–197, Springer, Singapore, 2018.
[10] Ranjani, R. Sree, and M. Nirmala Devi. “Secured hardware design with locker-box
against a key-guessing attacks.” Journal of Low Power Electronics 15, no. 2 (2019):
246–255.
[11] Sametinger, Johannes, Jerzy Rozenblit, Roman Lysecky, and Peter Ott. “Security
challenges for medical devices.” Communications of the ACM 58, no. 4 (2015): 74–82.
[12] Dong, Chen, Guorong He, Ximeng Liu, Yang Yang, and Wenzhong Guo. “A multi-
layer hardware trojan protection framework for IoT chips.” IEEE Access 7 (2019):
23628–23639.
[13] Nicolaidis, Michael, and Yervant Zorian. “On-line testing for VLSI—a compen
dium of approaches.” Journal of Electronic Testing 12, no. 1–2 (1998): 7–20.
[14] Pagliarini, Samuel Nascimento. “Reliability analysis methods and improvement
techniques applicable to digital circuits.” PhD diss., 2013.
[15] Chakraborty, Rajat Subhra, Samuel Pagliarini, Jimson Mathew, Sree Ranjani
Rajendran, and M. Nirmala Devi. “A flexible online checking technique to enhance
hardware trojan horse detectability by reliability analysis.” IEEE Transactions on
Emerging Topics in Computing 5, no. 2 (2017): 260–270.
[16] Benchmarkcircuits. “The iscas85, iscas89 and itc99 benchmark circuits.” http://
pld.ttu.ee/ maksim/benchmarks/.
[17] Opencores. “The 128-bit advanced encryption standard ip core.” 2009. www.opencores.
org/projects.cgi/web/aescore.
4 Machine Learning
Methods for Hardware
Security
Soma Saha1 and Bodhisatwa Mazumdar2
1
Department of Computer Engineering, SGSITS Indore,
Indore, Madhya Pradesh, India
2
Department of Computer Science and Engineering, IIT
Indore, Indore, Madhya Pradesh, India
CONTENTS
4.1 Introduction..................................................................................................... 54
4.2 Preliminaries ...................................................................................................54
4.2.1 Machine Learning Models Used in Hardware Security ...................55
4.2.1.1 Supervised Learning ............................................................56
4.2.2 Unsupervised Learning....................................................................... 62
4.2.2.1 Clustering Algorithms .........................................................62
4.2.2.2 K-means Clustering Algorithm ...........................................62
4.2.2.3 Partitioning Around Medoids (PAM) .................................63
4.2.2.4 Density-Based Spatial Clustering (DBSCAN) and
Ordering Points to Identify the Clustering Structure (OPTICS)...... 63
4.2.3 Feature Selection and Dimensionality Reduction .............................63
4.2.3.1 Genetic Algorithms..............................................................63
4.2.3.2 Pearson's Correlation Coefficient ........................................63
4.2.3.3 Minimum Redundancy Maximum Relevance (mRMR) ....64
4.2.3.4 Principal Component Analysis............................................64
4.2.3.5 Two-Dimensional Principal Component Analysis .............64
4.2.3.6 Self-Organizing Maps (SOMs) ...........................................64
4.3 Hardware Security Challenges Addressed by Machine Learning ................64
4.3.1 Hardware Trojans ...............................................................................65
4.3.2 Reverse Engineering........................................................................... 66
4.3.3 Side-Channel Analysis .......................................................................67
4.3.4 IC Counterfeiting................................................................................67
4.3.5 IC Overproduction..............................................................................68
4.4 Present Protection Mechanisms in Hardware Security .................................68
4.4.1 Hardware Trojan Detection................................................................70
4.4.2 IC Counterfeiting Countermeasures................................................... 71
4.4.3 Reverse Engineering Approach .........................................................73
DOI: 10.1201/9781003201038-4 53
54 VLSI and Hardware Implementations
4.1 INTRODUCTION
The areas of artificial intelligence (AI) and the real-life applications of machine
learning (ML) are jointly thriving, and have permeated in almost every aspect of
human lives in the present day. With the advent of both AI and ML-induced
computing systems, society is struggling to absorb and understand this faster
growing technology with pace. Therefore, instead of making the world safer and
affordable, this technology exploits security challenges that may endanger public
and private life. In another aspect, the presence of an enormous number of off-shore
vendors in integrated circuit (IC) design, fabrication and manufacturing process for
cost-effective production of ICs, the insertion of malfunctioning hardware by
malicious agents in the entire IC supply chain and design for even the smallest
computing systems has turned out to be one of severe personal as well as national
threat. Additionally, with the increasing count and power of hardware-based at
tacks, the urge to address and resolve hardware security challenges has become
evident in the context of VLSI design as well as in the context of data con
fidentiality, availability, and integrity.
In another aspect, the full potential of this thriving technology is yet to be
realized in various fields, such as hardware security and network security.
Exceptional success of applying ML in a variety of research domains has en
couraged researchers from the hardware security domain to explore its potential to
address a variety of security challenges. In recent years, ML methods have been
used in a variety of hardware-related security challenges with the aim of providing
powerful defense mechanisms by (i) understanding and mitigating vulnerabilities in
the IC supply chain, counterfeiting, overbuilding, (ii) constructing defence me
chanisms against hardware Trojans (HTs), (iii) detecting HTs through reverse en
gineering (RE) efforts, (iv) inserting logic locking mechanisms, and so on. Research
works in hardware security in terms of attack mechanisms also incorporated ML
algorithms in (i) side-channel analysis (SCA), and (ii) physical unclonable function
(PUF) attack.
4.2 PRELIMINARIES
In this section, we provide an ensemble of ML models that have been used ex
tensively in hardware security research. The ML models comprise different types of
Methods for Hardware Security 55
FIGURE 4.1 The general flow of ML, primarily supervised learning framework.
56 VLSI and Hardware Implementations
According to the nature of data available and processed, the learning tasks in ML
algorithms can be categorized into supervised learning and unsupervised learning. The
above-mentioned steps in the last paragraph for applying ML algorithms are aligned
to supervised learning. In the hardware security domain, both supervised and un
supervised learning are widely explored based on the nature of problem and available
data types. When input data are available with their corresponding paired correct
target output, data are labeled. The supervised learning techniques use labeled data for
training the models and select best performed models to predict the target output
classes for newly obtained input data. In contrast, the unsupervised learning techni
ques deal with unlabeled data, and focus on learning from the underlying structures of
available labeled data along with the data for which labeling is missing. In the fol
lowing paragraphs, we present an overview of most widely explored ML techniques,
feature selection methods, dimensionality reduction methods, and several optimiza
tion and model enhancement techniques in hardware security.
If the data are linearly non-separable, a kernel trick is used with specific functions to
map the data samples to a higher dimensional space, where they can be linearly se
parated. A variety of kernels is used in SVMs, for example, polynomial function/
kernel, wavelet kernel, and Gaussian radial basis function (RBF) kernel. SVMs are
efficient in high-dimensional space. SVMs have different kernel-dependent hyper
parameters and a soft margin parameter or a slack variable that are required to be fine-
tuned along with the selection of appropriate kernels to enhance the model prediction
accuracy. In SVM, a trade-off between maximizing the interval/margin and minimizing
the training error can be controlled by tuning the slack variable value. Increasing the
slack variable value may cause overfitting or poor generalized performance on new
data; however, it provides better accuracy over training data. Extended versions of
basic SVMs for binary classifications are utilized to support multi-class classification.
Methods for Hardware Security 57
FIGURE 4.3 Example of ML techniques. Classification using (a) decision tree on a two-
dimensional dataset, and (b) random forest of three decision trees.
FIGURE 4.4 Examples of ML techniques. Classification using (a) artificial neural networks
(ANNs) and (b) convolutional neural networks (CNN).
4.2.1.1.12 AutoEncoder
AutoEncoder is a neural network with layers that work as encoders and decoders.
The encoder aims at learning a representation (or encoding) for a set of data for
dimensionality reduction by training the underlying network to ignore “noise” [10].
The decoder aims at mapping the encoded output obtained from the encoder to the
original input through reconstruction. Figure 4.5 depicts the general structure of an
Methods for Hardware Security 61
FIGURE 4.5 Taxonomy of different machine learning algorithms used in attacks and de
fense approaches in hardware security.
predictions as it deals with data streams and time-series with arbitrary time-lags
between instances [13].
CAs can work without a priori knowledge about the input data. Therefore, CAs can
be used in the HT detection and protection field, where features of ICs/golden
designs are not accessible.
i. the distance metric values between data points within each intra-cluster minimal
ii. the distance metric values within inter-clusters as maximum as possible
Initially, cluster centroids are chosen randomly. Then, data points are allocated to the
nearest clusters. Subsequently, the centroid of each cluster is recomputed to reduce
intra-cluster distance metric values. These steps are repeated until no changes in the
centroids are possible or any other explicit stopping criteria is reached.
run. Given this condition, a small-sized input dataset can result in large variance of
estimates and overfit. In this section, we present the algorithms that are associated
with feature selection and dimensionality reduction.
FIGURE 4.6 Examples of Hardware Trojans, (a) A combinational Trojan, which can be
triggered with corner case condition, a = 0, b = 0, c = 0, (b) A sequential Trojan, which can be
triggered when the corner case condition, a = 0, b = 1 occurs for 2n times, where n is the
length of the counter.
66 VLSI and Hardware Implementations
are not violated, and dishonest. Reasons to perform RE with honest motivation
comprise functional verification, fault analysis, and understanding the working of
deployed products. Dishonest intentions to mount RE comprise cloning, piracy,
design counterfeiting, and insertion of HT. RE of electronic devices range from chip
to system levels. A broad classification of RE application is shown in Figure 4.7.
An IC comprises multiple electronic devices fabricated using semiconductor
materials. ICs constitute package material, bond wires, die, and lead frame. The die
is composed of multiple metal layers, vias, and metal layer interconnections, as
shown in Figure 4.8. X-ray tomography is one of the non-destructive RE methods
that provide layer-wise images of the ICs for analysis of internal wire connections,
vias, wire bonding, capacitors, etc. Destructive RE methods comprise etching and
grinding each layer for analysis. In this process, images are captured with either
scanning electron microscope (SEM) or transmission electron microscope (TEM).
In addition, printed circuit boards (PCBs) are reverse engineered first with
identification of ICs, and other components mounted on them, and corresponding
traces on the visible layers. Subsequently, X-ray imaging or delayering are used to
further identify traces, connections, and vias of the internal layers in the PCBs.
Furthermore, system-level RE targets the system's firmware that constitutes in
formation about the system's operations and corresponding sequence of events.
System firmware remains embedded in nonvolatile memories (NVMs), such as
ROM, EEPROM, and flash memories. RE can provide insight into system func
tionality by analyzing the contents of such memories.
4.3.4 IC COUNTERFEITING
IC counterfeiting is a longstanding problem that has grown in scope and magnitude
over the past decade. With the advent of shrinking VLSI technology and the
complexity of ICs used, they are assembled and fabricated globally over different
geographical regions. This trend has led to a thriving illicit market that undercuts
the market competition with counterfeit ICs and electronic systems. Some of the
topmost counterfeited semiconductors comprise analog IC, microprocessor IC,
memory IC, programmable logic IC, transistors, and others. In the recent past,
several ML-based countermeasures have been employed for detecting counterfeit
ICs. Automation of inspection procedures have recently used image processing
algorithms and ANNs [29].
4.3.5 IC OVERPRODUCTION
IC overproduction involves the foundry producing more ICs than that required by
an IC design house. Subsequently, the foundry sells the ICs in the market without
authorization of the IC design house.
TABLE 4.1
Role of Machine Learning in Hardware Security Challenges
Challenges in ML Opportunities in Hardware Challenges to ML
Hardware Hardware Security Security Threats Implementations
Security from ML
emerged as the strongest variant as it has been shown to be effective even when
only few traces or measurements are available [52].
Of the most commonly used profiled analysis, template attacks have emerged as
the most powerful variant. However, there are many instances of worst case side-
channel security threat wherein ML-based algorithms outperform template based
attacks. Attackers are often provided with a sufficiently large number of power
traces that aid in building precise leakage models. With a properly chosen algorithm
and parameter tuning phase in ML algorithms, such attacks turn out to be more
efficient. If the algorithm is properly tuned, it requires an even smaller number of
features to achieve a high success rate of the attack. A measure called data con
fusion factor is proposed in [53] to differentiate between various ML methods. In
supervised ML approach, SVMs, RF, rotation forest (RTF) [54], and multiboost
algorithm [55] comprise classifiers that exhibited high accuracy on multiple data
sets [56].
4.5.2 IC OVERBUILDING
IC overbuilding can be thwarted using techniques, such as hardware metering, logic
obfuscation, watermarking, and fingerprinting, For IC fingerprinting, PUFs are used
as the enforced challenge-response-pair (CRP) behavior to the system rely on
certain physical traits of the system. In the recent past, ML models have been used
to learn the CRP behavior of PUFs. Ring oscillator PUFs and arbiter PUFs have
been efficiently modeled using ML algorithms.
Methods for Hardware Security 75
REFERENCES
[1] Tom M. Mitchell. Machine Learning, International Edition. McGraw-Hill Series in
Computer Science. McGraw-Hill, Noida, India, 1997.
[2] Rana Elnaggar and Krishnendu Chakrabarty. Machine learning for hardware secu-
rity: Opportunities and risks. Journal of Electronic Testing, 34(2):183–201, 2018.
[3] Z. Huang, Q. Wang, Y. Chen, and X. Jiang. A survey on machine learning against
hardware trojan attacks: Recent advances and challenges. IEEE Access, 8:10796–
10826, 2020.
[4] M.A. Hearst, S.T. Dumais, E. Osman, J. Platt, and B. Scholkopf. Support vector
machines. Intelligent Systems and Their Applications, IEEE, 13(4):18–28, 1998.
[5] Gabriel Hospodar, Benedikt Gierlichs, Elke De Mulder, Ingrid Verbauwhede, and
Joos Vandewalle. The investigation of neural networks performance in side-channel
attacks. Journal of Cryptographic Engineering, 1(293):2190–8516, 2011.
[6] Jerome H. Friedman. Multivariate adaptive regression splines. The Annals of
Statistics, 19(1):1–67, 1991.
[7] Y. Liu, G. Volanis, K. Huang, and Y. Makris. Concurrent hardware trojan detection
in wireless cryptographic ics. In 2015 IEEE International Test Conference (ITC),
pages 1–8, 2015.
[8] Yinan Kong and Ehsan Saeedi. The investigation of neural networks performance in
side-channel attacks. Artificial Intelligence Review, 52(1):607–623, 2019.
[9] Yann LeCun and Yoshua Bengio. Convolutional Networks for Images, Speech, and
Time Series, pages 255–258. MIT Press, Cambridge, MA, USA, 1998.
[10] Pierre Baldi. Autoencoders, unsupervised learning and deep architectures. In 2011
International Conference on Unsupervised and Transfer Learning Workshop -
Volume 27, UTLW’11, pages 37–50. JMLR.org, 2011.
[11] Danilo P. Mandic and Jonathon Chambers. Recurrent Neural Networks forPre-
diction: Learning Algorithms, Architectures and Stability. John Wiley Sons, Inc.,
USA, 2001.
[12] Sixiang Wang, Xiuze Dong, Kewang Sun, Qi Cui, Dongxu Li, and Chunxiao He.
Hardware trojan detection based on elm neural Network. In 2016 First IEEE
International Conference on Computer Communication and the Internet (ICCCI),
pages 400–403. IEEE, 2016.
76 VLSI and Hardware Implementations
[13] Felix A. Gers, Nicol N. Schraudolph, and J ̈ urgen Schmidhuber. Learning precise
timing with LSTM recurrent networks. The Journal of Machine Learning Research,
3:115–143, 2002.
[14] Sweechuan Tan, Kaiming Ting, and Tonyfei Liu. Fast anomaly detection for
streaming data. In Twenty-Second International Joint Conference on Artificial
Intelligence, pages 1511–1516, 2011.
[15] N. Karimian, F. Tehranipoor, M. T. Rahman, S. Kelly, and D. Forte. Genetic al
gorithm for hardware trojan detection with ring oscillator network (ron). In 2015
IEEE International Symposium on Technologies for Homeland Security (HST),
pages 1–6, 2015.
[16] Chongxi Bao, Domenic Forte, and Ankur Srivastava. On reverse engineering-based
hardware trojan detection. IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, 35(1):49–57, 2016.
[17] L. Kaufman and P. J. Rousseeuw. Partitioning Around Medoids (Program PAM),
chapter 2, pages 68–125. John Wiley Sons, Ltd., Hoboken, 1990.
[18] A.N. Nowroz, K. Hu, F. Koushanfar, and S. Reda. Novel techniques for high-
sensitivity hardware trojan detection using thermal and power maps. IEEE
Transactions onComputer-Aided Design of Integrated Circuits and Systems,
33(12):1792–1805, 2014.
[19] B. Cakır and S. Malik. Hardware trojan detection for gate-level ics using signal
correlation based clustering. In 2015 Design, Automation Test in Europe
Conference Exhibition (DATE), pages 471–476, 2015.
[20] Hanchuan Peng, Fuhui Long, and Chris Ding. Feature selection based on mutual in-
formation: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 27(8):1226–1238, 2005.
[21] Svante Wold, Kim Esbensen, and Paul Geladi. Principal component analysis.
Chemometrics and Intelligent Laboratory Systems, 2(1):37–52, 1987. Proceedings
of the Multivariate Statistical Workshop for Geologists and Geochemists.
[22] Y. Liu, Y. Jin, A. Nosratinia, and Y. Makris. Silicon demonstration of hardware
trojan design and detection in wireless cryptographic ICs. IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, 25(4):1506–1519, 2017.
[23] T. Kohonen. The self-organizing map. Proceedings of the IEEE, 78(9):1464–
1480, 1990.
[24] Mohammad Tehranipoor and Farinaz Koushanfar. A survey of hardware trojan
taxonomy and detection. IEEE Design & Test of Computers, 27(1):10–25, 2010.
[25] Swarup Bhunia, Michael S. Hsiao, Mainak Banga, and Seetharam Narasimhan.
Hardware trojan attacks: Threat analysis and countermeasures. Proceedings of the
IEEE, 102(8):1229–1247, 2014.
[26] Robert J. Abella, James M. Daschbach, and Roger J. McNichols. Reverse engineering
industrial applications. Computers & Industrial Engineering, 26(2):381–385, 1994.
[27] Randy Torrance and Dick James. The state-of-the-art in ic reverse engineering. In
International Workshop on Cryptographic Hardware and Embedded Systems, pages
363–381. Springer, 2009.
[28] Ian McLoughlin. Secure embedded systems: The threat of reverse engineering. In
2008 14th IEEE International Conference on Parallel and Distributed Systems,
pages 729–736. IEEE, 2008.
[29] Navid Asadizanjani, Mark Tehranipoor, and Domenic Forte. Counterfeit electronics
detection using image processing and machine learning. In Journal of Physics:
Conference Series, volume 787, page 012023. IOP Publishing, 2017.
[30] Nisarg Patel, Avesta Sasan, and Houman Homayoun. Analyzing hardware based
malware detectors. In 2017 54th ACM/EDAC/IEEE Design Automation Conference
(DAC), pages 1–6. IEEE, 2017.
Methods for Hardware Security 77
[45] Er-Rui Zhou, Shao-Qing Li, Ji-Hua Chen, Lin Ni, Zhi-Xun Zhao, and Jun Li. A
novel detection method for hardware trojan in third party ip cores. In 2016
International Conference on Information System and Artificial Intelligence (ISAI),
pages 528–532. IEEE, 2016.
[46] Kento Hasegawa, Masao Yanagisawa, and Nozomu Togawa. A hardware-trojan
classification method using machine learning at gate-level netlists based on trojan
features. IEICE Transactions on Fundamentals of Electronics, Communications and
Computer Sciences, 100(7):1427–1438, 2017.
[47] Kento Hasegawa, Masao Yanagisawa, and Nozomu Togawa. Trojan-feature ex
traction at gate-level netlists and its application to hardware-trojan detection using
random forest classifier. In 2017 IEEE International Symposium on Circuits and
Systems (ISCAS), pages 1–4. IEEE, 2017.
[48] Tamzidul Hoque, Jonathan Cruz, Prabuddha Chakraborty, and Swarup Bhunia.
Hardware IP trust validation: Learn (the untrustworthy), and verify. In 2018 IEEE
International Test Conference (ITC), pages 1–10. IEEE, 2018.
[49] Faiq Khalid Lodhi, I. Abbasi, Faiq Khalid, Osman Hasan, F. Awwad, and Syed
Rafay Hasan. A self-learning framework to detect the intruded integrated circuits. In
2016 IEEE International Symposium on Circuits and Systems (ISCAS), pages
1702–1705. IEEE, 2016.
[50] Faiq Khalid Lodhi, Syed Rafay Hasan, Osman Hasan, and Falah Awwadl. Power
profiling of microcontroller's instruction set for runtime hardware trojans detection
without golden circuit models. In Design, Automation & Test in Europe Conference
& Exhibition (DATE), 2017, pages 294–297. IEEE, 2017.
[51] Halit Dogan, Domenic Forte, and Mark Mohammad Tehranipoor. Aging analysis
for recycled fpga detection. In 2014 IEEE International Symposium on Defect and
Fault Tolerance in VLSI and Nanotechnology Systems (DFT), pages 171–176.
IEEE, 2014.
[52] Annelie Heuser and Michael Zohner. Intelligent machine homicide. In International
Workshop on Constructive Side-Channel Analysis and Secure Design, pages
249–264. Springer, 2012.
[53] Stjepan Picek, Annelie Heuser, Alan Jovic, Simone A Ludwig, Sylvain Guilley,
Domagoj Jakobovic, and Nele Mentens. Side-channel analysis and machine
learning: A practical perspective. In 2017 International Joint Conference on Neural
Networks (IJCNN), pages 4095–4102. IEEE, 2017.
[54] Juan Jose Rodriguez, Ludmila I. Kuncheva, and Carlos J. Alonso. Rotation forest: A
new classifier ensemble method. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 28(10):1619–1630, 2006.
[55] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line
learning and an application to boosting. Journal of Computer and System Sciences,
55(1):119–139, 1997.
[56] Manuel Fernandez-Delgado, Eva Cernadas, Sen ́ en Barro, and Dinani Amorim. Do
we need hundreds of classifiers to solve real world classification problems? The
Journal of Machine Learning Research, 15(1):3133–3181, 2014.
5 Application-Driven Fault
Identification in NoC
Designs
Ankur Gogoi and Bibhas Ghoshal
Department of IT, IIIT Allahabad, Allahabad, Uttar Pradesh,
India
CONTENTS
5.1 Introduction..................................................................................................... 79
5.2 Related Work..................................................................................................81
5.3 Identification of Vulnerable Routers .............................................................81
5.3.1 Proposed Mathematical Model for Router Reliability......................82
5.3.2 Determination of the Vulnerable Routers Using Simulation............83
5.3.3 Look-up-Table (LuT) Generation from Experimental Data .............84
5.4 The Proposed Methodology for the Identification of Vulnerable Routers...86
5.4.1 Classification of Application Traffic Using Machine Learning .......88
5.4.1.1 Dataset Generation ..............................................................88
5.4.1.2 Feature Vector Extraction ...................................................90
5.4.1.3 Training of the ML Model..................................................90
5.4.1.4 Working of the Trained Model...........................................91
5.4.2 Validation of the ML Model for Traffic Classification ....................91
5.4.3 Identification of Vulnerable Routers Using Look-up-Table (LuT) ..91
5.5 Future Work and Scope .................................................................................94
5.5.1 Pooling of Unused Routers: A Structural Redundancy Approach ...95
5.6 Conclusion ......................................................................................................95
References................................................................................................................95
5.1 INTRODUCTION
To cope with rapidly growing requirements of computation and bandwidth, Network-
on-Chip (NoC) has been accepted as a promising communication infrastructure for
multicore architectures. NoC communication infrastructure facilitates fast and reliable
transmission of data packets among the cores employing various routing algorithms.
These algorithms ensure that the router and link-based architecture of NoC, as shown
in Figure 5.1, deliver packets and satisfy the different design requirements such as low
latency, high throughput, etc. The traffic fed to the NoC gets transferred as flits (small
chunks of a packet) through these routers and links [1].
DOI: 10.1201/9781003201038-5 79
80 VLSI and Hardware Implementations
be applied for the NoC routers. We propose the use of traffic classification to
identify probable fault sites in NoC. A supervised machine learning (ML) approach
is used to classify traffic and identify the fault sites or the vulnerable routers with
respect to that traffic.
1
R (t ) = e (t ) dt (5.1)
0
EaEM
EM (J , T ) = Je (5.2)
kT
Y
x+ T
+ ZT
(J , T )TDDB Je (5.3)
kT
Where X,Y and Z are fitting parameters and kT is the thermal energy.
For both EM and TDDB, the current density J is,
CVdd
J= f
WH
Where is the switching activity or bit transitions in one clock cycle, Vdd is the
voltage, and f is frequency. If the voltage and frequency are assumed to be constant,
then the current density depends on the switching activity, which in turn depends on
the incoming flit rate [15]. The failure rate can then be approximated as follows:
Application-Driven Fault Identification 83
For EM,
EaEM
EM (J , T) (ni / ts ) avg e (5.4)
kT
For TDDB,
Y
x+ T
+ ZT
TDDB (J , T) (ni / ts ) avg e (5.5)
kT
Here, avg is the average bit transitions per flit in a router, where ni is the total number
of incoming flits. The incoming flit rate is the number of flits passing through the router
per unit of time. The total incoming flits are the total flits crossing the router during the
traffic processing in Δts time and also stands for the workload of a router [12].
For EM,
EaEM
EM (J , T) ni . e (5.6)
kT
For TDDB,
Y
x+ T
+ ZT
TDDB (J , T ) ni . e (5.7)
kT
From equations (5.6) and (5.7), it can be concluded that reliability is a function of
the number of bit transitions taking place in an NoC router and consequently is a
function of the total number of flits crossing through the router, i.e. their individual
traffic load. Higher router workload contributes to an increased temperature, which
in turn affects the reliability adversely and making a router more prone to failures.
F
Nth =
ActiveRouters
Where F is the total number of flits crossing through the routers of the NoC and
Active Routers are the number of routers (participating routers) that have been
processing the flits. We have considered a 4 × 4 2D mesh NoC architecture for our
experiment to detect the vulnerable routers. Simulations were performed to estimate
the workload of every node in the network under the influence of a particular traffic
profile and XY routing algorithm. The simulations were run using Noxim simulator,
and the outcome of the simulation results are shown below.
Figure 5.2 shows the router load in terms of total incoming flits for encoder/de
coder traffic in a 4 × 4 architecture with threshold Nth = 1883 flits. The traffic load in
routers 0, 2, 4, and 8 are seen to exceed the NTh and fall under the category of
vulnerable routers. The routing algorithm affects the total incoming flits by adding
flits that are in transit, thereby causing a change in the traffic load of each router [15].
As seen in Figure 5.3, routers 4, 5, 6, and 8 exceed their limit of 2696 flits/node
(Nth). As a result, the mesh-based NoC architecture when used for MPEG appli
cation the routers 4, 5, 6, and 8 are identified as vulnerable routers.
Similarly, for consumer application traffic, shown in Figure 5.4, router numbers
2, 3, 4, 8, 9, and 10 crossed the threshold limit and need fault-tolerant support for
working properly. For MWD traffic shown in Figure 5.5, the locations 1, 2, 3, and 4
become vulnerable to fault. In the case of networking traffic shown in Figure 5.6,
router positions 1, 2, 3, 4, and 8 are identified as faulty. From Figure 5.7, it can be
observed that for office application traffic our experiment result identified router
positions 1 and 2 as vulnerable routers.
ti l1,l2,l3…lN
Here, ti is the processed traffic, and li is the list of vulnerable routers determined
against the traffic ti. The size of the LuT can be defined as M*N, where M is the
number of traffic classes considered for classification and N is the number of routers
86 VLSI and Hardware Implementations
The above algorithm takes application traffic as input and provides vulnerable
router sets as output. The algorithm can be divided into two phases, as mentioned
earlier. The first phase comprises Noxim setup and traffic classification, and the
second phase takes care of the LuT-based identification of vulnerable routers. The
setup parameters are made available and ready before warming up of the simulator
and can be seen in lines 1 and 2 of the algorithm. From line 3 to line 5 the Noxim
simulator is warmed up with the defined setup present in line 1 and line 2. After
successful warm-up, the given application traffic will simulate for 10,000 cycles
(from lines 6 to 8). The simulation outcome is stored in noximResult (line 7). The
features are extracted from the stored result of Noxim using the extractFeatures
function (line 9) to construct the feature vector (featureVector). Next, the feature
vector is fed to the ML traffic classification model (classificationML) to predict the
class of the given traffic (appTraffic) (line 10). The first phase of the algorithm
comprises up to line 10, and after the successful classification of traffic, the pre
dicted class is passed to the second phase of the algorithm to get the probable
vulnerable set. First, the predicted class is validated or matched with the available
traffic class present in the LuT (lines 11 and 12). If the predicted class is present in
the LuT list (line 12), then its corresponding vulnerable router set is extracted from
the LuT list using the function LuT_class (line 14).
without any anomalies, flit-level traffic routing information has been considered.
Twelve different types of application traffic are considered for the experiment, and
the total number of application traffic can be further extended according to the
interest of the individuals. Along with throughput and delay, the power consump
tion of application traffic has also been considered for a more accurate and specific
90 VLSI and Hardware Implementations
TABLE 5.1
Simulation Setup
Traffic 263decmp3dec, 263encmp3dec, auto indust, consumer, mp3encmp3dec,
MPEG, networking, office, pip, telecom, VOPD
Mesh-Dimension 64 nodes
Warm-Up Cycle 1000
Simulation Cycle 10000
Flit Size 16 bit, 32 bit, 64 bit, 128 bit
Routing Algorithm XY
TABLE 5.2
Feature Vector of MPEG Traffic with Respect to Different PIR
PIR Max Global Average Network Average IP Dynamic Static
Delay Delay Throughput Throughput Energy Energy
5.6 CONCLUSION
We have proposed a supervised ML-based approach to classify application traffic
and identify the fault locations in NoC. With the help of traffic classification, further
fault-tolerant mechanisms can be applied to make the NoC fault tolerant. We
performed the experiment in real traffic and achieved classification accuracy of
89.23%. Further, our proposed algorithm also successfully identifies the fault lo
cations or vulnerable routers for application traffic prior to the injection of that
traffic to the NoC system. We presented a mathematical model for identifying faulty
or vulnerable routers based on workload and also supported the presented mathe
matical model with experimental results. Based on this work, interested individuals
may further improve and implement the fault-tolerant approach that has been
mentioned in the previous section.
REFERENCES
[1] Sleeba, S. Z., Jose, J., Palesi, M., James, R. K., and Mini, M. G., 2018. Traffic aware
deflection rerouting mechanism for mesh network on chip, in: 2018 IFIP/IEEE
International Conference on Very Large Scale Integration (VLSI-SoC), Verona,
Italy, pp. 25–30. doi: 10.1109/VLSI-SoC.2018.8645011.
[2] John, M. R., James, R., Jose, J., Isaac, E., and Antony, J. K., 2014. A novel energy
efficient source routing for mesh NoCs, in: 2014 Fourth International Conference on
Advances in Computing and Communications, Cochin, pp. 125–129. doi: 10.1109/
ICACC.2014.36.
[3] Francis, Rosemary M., 2013. Exploring networks-on-chip for FPGAs. PhD. diss.
[4] Lin, S., Su, L., Su, H., Zhou, G., Jin, D., and Zeng, L., 2009. Design networks-on-
chip with latency/bandwidth guarantees. IET Computers Digital Techniques 3,
184–194. doi: 10.1049/iet-cdt:20080036.
[5] Bolotin, E., Cidon, I., Ginosar, R., and Kolodny, A., 2004. QNoC: QoS architecture
and design process for network on chip. Journal of Systems Architecture 50, 2–3,
105–128. doi: 10.1016/j.sysarc.2003.07.004.
[6] Liu, J., Harkin, J., Li, Y., and Maguire, L., 2014. Online traffic-aware fault detection
for networks-on-chip. Journal of Parallel and Distributed Computing 74, 1,
1984–1993. doi: 10.1016/j.jpdc.2013.09.001.
[7] Sahu, Pradip, Manna, Kanchan, Shah, Nisarg, and Chattopadhyay, Santanu, 2014.
Extending Kernighan-Lin partitioning heuristic for application mapping onto network-
on-chip. Journal of Systems Architecture 60. doi: 10.1016/j.sysarc.2014.04.004.
96 VLSI and Hardware Implementations
[8] Sajjadi-Kia, H., and Ababei, C., 2013. A new reliability evaluation methodology
with application to lifetime oriented circuit design. IEEE Transactions on Device
and Materials Reliability 13, 192–202. doi: 10.1109/TDMR.2012.2228862.
[9] Catania, V., Mineo, A., Monteleone, S., Palesi, M., and Patti, D., 2015. Noxim: An
open, extensible and cycle-accurate network on chip simulator, in: 2015 IEEE 26th
International Conference on Application-specific Systems, Architectures and
Processors (ASAP), pp. 162–163. doi: 10.1109/ASAP.2015.7245728.
[10] Rosing, T. S., Mihic, K., and De Micheli, G., 2007. Power and reliability man
agement of socs. IEEE Transactions on Very Large Scale Integration (VLSI)
Systems 15, 391–403. doi: 10.1109/TVLSI.2007.895245.
[11] Chang, Y., Chiu, C., Lin, S., and Liu, C., 2011. On the design and analysis of fault
tolerant noc architecture using spare routers, in: 16th Asia and South Pacific Design
Automation Conference (ASP-DAC 2011), pp. 431–436. doi: 10.1109/ASPDAC.2
011.5722228.
[12] Chatterjee, N., Chattopadhyay, S., and Manna, K., 2014a. A spare router based
reliable network-on-chip design, in: 2014 IEEE International Symposium on
Circuits and Systems (ISCAS), pp. 1957–1960. doi: 10.1109/ISCAS.2014.6865545.
[13] Khalil, K., Eldash, O., and Bayoumi, M., 2017. Self-healing router architecture for
reliable network-on-chips, in: 2017 24th IEEE International Conference on Electronics,
Circuits and Systems (ICECS), pp.330–333. doi: 10.1109/ICECS.2017.8292030.
[14] Xiang, Y., Chantem, T., Dick, R. P., Hu, X. S., and Shang, L., 2010. System-level
reliability modeling for mpsocs, in: 2010 IEEE/ACM/IFIP International Conference
on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pp. 297–306.
[15] Lu, Z., Huang, W., Stan, M. R., Skadron, K., and Lach, J., 2007. Interconnect
lifetime prediction for reliability-aware systems. IEEE Transactions on Very Large
Scale Integration (VLSI) Systems 15, 159–172. doi: 10.1109/TVLSI.2007.893578.
[16] Yamamoto, A. Y., and Ababei, C., 2014. Unified reliability estimation and man
agement of noc based chip multiprocessors. Microprocessors and Microsystems 38,
53–63. URL: http://www.sciencedirect.com/science/article/pii/S014193311300195
6, doi: 10.1016/j.micpro.2013.11.009.
6 Online Test Derived
from Binary Neural
Network for Critical
Autonomous
Automotive Hardware
Dr. Philemon Daniel
Assistant Professor, NIT Hamirpur, Hamirpur,
Himachal Pradesh, India
CONTENTS
6.1 Autonomous Vehicles ....................................................................................98
6.1.1 Levels of Autonomy......................................................................... 100
6.1.2 Safety Concerns................................................................................101
6.2 Traditional VLSI Testing .............................................................................103
6.3 Functional Safety..........................................................................................105
6.3.1 Fault Detection Time Interval..........................................................107
6.4 Discussion 1: Binary Convolutional Neural Network ................................108
6.4.1 One Layer of the Convolutional Network.......................................109
6.4.2 Forward Propagation ........................................................................109
6.4.3 Binary Neural Autoencoder Model with Convolutional 1D...........111
6.4.4 Binary Neural Network Model with Convolutional 2D .................112
6.4.5 Backward Propagation .....................................................................113
6.5 Discussion 2: On-Chip Compaction ............................................................114
6.5.1 Binary Recurrent Neural Networks .................................................115
6.5.2 Forward Propagation ........................................................................115
6.5.3 Backpropagation ...............................................................................116
6.5.4 Advantages and Limitations............................................................. 118
6.6 Discussion 3: Binary Deep Neural Network for Controller Variance
Detection.......................................................................................................121
6.7 Conclusion ....................................................................................................122
Acknowledgment ...................................................................................................123
References..............................................................................................................123
DOI: 10.1201/9781003201038-6 97
98 VLSI and Hardware Implementations
There is time before we fully realize the potential of autonomous vehicles (Hancock
2019), but already the driver assistance systems and partial autonomy have im
proved our experience in a big way. These systems use sophisticated hardware
along with a combination of sensors, radar, lidars, cameras, and software to take
decisions or make predictions based on the present and past data stream.
The following are some of the features already in place, as shown in Figure 6.1.
These features can be categorized into four major sub-sections (Manoharan and
Daniel 2018).
e. Lane change assist: Multiple checks are made before a lane change can be
initiated. Sensors can assist greatly in making a decision in this critical
movement.
2. Warning: This set warns the driver if there is a deviation from the expected
threshold.
a. Lane departure: Unintentional departure from the current lane is a good time
to receive a warning. Single or multiple cameras are employed for this
purpose.
b. Road departure: On freeways, when the vehicle drifts off the road a
warning is generated by the edge detection algorithms.
c. Collision: Radar or lidar is employed for detecting possible collision and
issues a warning or possibly slows down the vehicle.
d. Pedestrian detection: When a person or an animal suddenly crosses the
road, radar can pick up more instantly and alert the driver.
e. Drowsy driver: It is possible to doze off unknowingly, either fully or
partially, and a camera facing the driver can recognize the threshold and
warn the driver.
3. Temporary control: Although we are yet to reach full autonomy, there are
times when vehicles can complete single tasks fully autonomously.
4. Performance enhancement: There are other assists that are more for comfort
than for safety.
a. Panic brake assist: At times the driver could unnecessarily brake but release
the brake shortly. A hardware system can watch and smooth the brake pulse.
b. Side impact detection: Although lane-assist systems should take care of
drifting off, there could be occasions when another vehicle is too close to
the side. Then, a gentle alert to stay clear could help.
today. Level 3 (conditional automation): This is a big step as the driver is more
passive in his role, and the car is expected to assume control in situations when the
vehicle cannot decide or manage after a short warning. There are very few cars at
this level. Level 4 (High automation): The car is able to manage most of the
functions if the conditions are good and the driver is not expected to stay attentive.
This is a much higher level of automation and requires a mountain of safety fea
tures, both at the hardware level and at the software level. This level and the next
would lay the premise and be the major focus of the discussions further in this
chapter. Level 5 (Full automation): This level would be complete autonomy at all
conditions, which includes cars without a driver or even a steering wheel. No cars to
date have achieved this level, and this level is still a far-fetched dream majorly
because of the lack of sophisticated safety mechanisms required.
Prevention of error and risk management is an important part of any good VLSI
implementation. Some of the errors creep in during the design and manufacturing
process and can go undetected during the testing process. These ICs then reach the
autonomous cars and can manifest anytime during the run time of the cars, which
could become catastrophic.
The autonomous systems are becoming very complex, and varied systems in
tegrated together compounds the complexity of error detection and management.
Unfortunately, safety and management has become a specialized expertise of few
learned on the job and is not taught in schools.
According to Thorn, Kimmel, and Chaka (2018), safety of autonomous vehicles
should happen in four segments. The first one is the sensing part, which includes
sensors like LiDAR, radar, camera, GPS, vehicle-to-vehicle, and vehicle-to-
everything. There has to be hardware mechanisms in place when these sensors fail.
It is usual to keep redundant sensors and either takes the majority opinion or switch
over when a fault is detected.
The second one is the perception, where the data streamed from each of the
sensors is assembled together to make sense of the environment and the position of
the vehicle with respect to the surroundings. This is where the detection and un
derstanding of both static and moving objects and the interpretation of each of them
is accomplished. This is a key step where sophisticated hardware is used for the
Online Test from Binary Neural Network 103
purpose of correct perception. Any error here could be disastrous. This is a key
place where the controllers and hardware are expected to be perfect. Not much
room for deviation or even an additional delay could be problematic. In real-time
systems, correct decisions arriving late are wrong answers.
The next phase is the planning phase, where the vehicle's immediate future is
planned and a route is prepared for the vehicle to maneuver. Perception and
planning phases are the key phases where VLSI hardware faults are hard to detect
and manage and shall be focus of further discussion. These two phases are the brain
of the vehicle when in Level 5 and Level 6 autonomy.
The last phase is the control phase, where the execution of the plan takes place
and the car is steered in the desired path and speed. These are mostly mechanical
components.
There were a number of cases where partially autonomous vehicles were in
volved in accidents, and multiple times they were fatal because of multiple error
factors, as in Figure 6.4. In one incident, a Tesla autopilot crashed fatally on May
2016 when it collided with a truck with a sun-lit background (BBCNews 2019).
This time, the sensors failed to pick up the truck and hence failed to apply the brake.
In the same year, a Google car crashed into a bus when shifting lanes (Associated
Press 2016). This time, the perception and planning went wrong. More recently, in
March 2018 according to Channel News Asia (2018), a self-driving Uber car killed
a woman in Arizona while she was crossing a street.
systems have to be of very high quality and of long-term reliability. The test has to
be performed continually during the entire lifetime of the vehicle. These techniques
are usually employed during key-off, key-on, and periodic online tests, so ensuring
functional safety using traditional test techniques is impossible.
The real- time faults in a system on the basis of their existence and effect can be
classified as below:
a. Permanent faults: These faults exist indefinitely or for the entire lifetime in
the system unless any corrective action is taken. Mostly these faults are
caused due to manufacturing or design errors, shorts and opens in VLSI
circuits, and a few of these faults are caused by catastrophic environmental
disturbances or due to any physical damage to the chip. These defects may
remain for entire lifetime and can cause long-term malfunctioning of com
ponents. Although these faults are easiest to detect compared to other faults,
they could go undetected because of a certain gap in fault coverage.
b. Intermittent faults: These faults may appear regularly for a small time period.
These faults disappear and reappear after a relatively longer time period. This
entire process is repeated again and again. The cause of these faults is difficult
to predict, because there are no such fixed parameters that causes these faults,
but the effects are highly correlated. The system works well most of the time
but fails under unusual environmental conditions. These faults are caused
Online Test from Binary Neural Network 105
mainly due to marginal and unstable hardware. These faults are much harder
to detect (Guilhemsang et al. 2011).
c. Transient faults: These faults appear for a much shorter time period and then
disappear quickly and are uncorrelated with each other. These faults are hard
to detect because of their short duration and are mainly caused due to random
environmental disturbances. These are the hardest to detect and need high-
quality live online test hardware to detect and manage them.
The goal of online testing is to detect faults in the system during the normal op
eration without disturbing its operation, and to take suitable corrective actions to
mitigate its effect. For example, in some critical applications, after detecting a
malfunction the system has to repair on the fly or configure a different hardware to
take over or shut down its operation. Online testing can be categorized into two
types: concurrent and non-concurrent testing.
The automotive safety integrity level (ASIL) is a key segment of the ISO 26262
functional safety standard. It is established during the development stage and is
determined by observing the hazard analysis and risk assessment (HARA) of the
potential hazards based on feasibility and influence of the damage. The HARA
process is performed to identify the malfunctions that could perhaps contribute to
system risks and evaluate the compromise related with them. Khastgir et al. (2017)
discussed the automotive HARA process and presented a different approach to
overcome the security issues. They formed a ruleset for conducting automotive
HARA to determine the ASIL by parameterizing the individual automotive segment
based on harshness, exposure, and controllability of the risk.
There are four standards ASILs identified by the automotive safety standard ISO
26262: ASIL A, ASIL B, ASIL C, and ASIL D. ASIL D represents the largest level of
the automotive risk and enforced strict steps to reduce it, whereas ASIL A represents
the least level of automotive risk that can be warded off to some degree. ASIL are
divided by checking out the safety objectives of the system. For each electronic
segment of the automotive, the safety level is set up on the basis of three parameters:
severity, exposure, and controllability of the hazard, and each of these parameters are
further broken down into sub-classes. Severity measures the magnitude of damage to
the driver, commuters, and other people on the road. Exposure measures the possi
bility of the occurrence of the hazardous conditions. Controllability measures the
degree to which the vehicle can be dealt with by the driver when the hazardous
condition takes place owing to malfunctioning of any system component.
The ASILs in the vehicle are where the key systems like electric steering, airbag
deployment, and antilock braking systems are considered ASIL D, because the risks
related with malfunctioning of these systems is greatest. On the other end of the
Online Test from Binary Neural Network 107
safety spectrum, the systems which require smaller safety objectives are considered
ASIL A, because the compromise of accident caused owing to failures of such
systems is negligible and can be endured to some degree. Therefore, the process of
deciding the ASIL category is a very demanding process for automotive functions
that require significant functional safety and reliability.
With the infusion of safety techniques to identify or remedy the functional de
ficiencies in automotive, the undesirable risks can be played down to a considerable
degree. The security mechanism is a technological solution achieved by electronic
functions to identify the imperfections or control deficiencies to work out a safe
state and, if possible, repair the deficiencies.
Fault Handling Time Interval (FHTI): It is the time span from the occurrence of the
defect to the successful execution of the safety mechanism in the automotive system
to mitigate the risk and to come to a safe state. In this time period, both detection
and reaction processes are executed. The safety mechanism with low FHTI is
needed. It is further broken down into two parts:
I. Fault Detection Time Interval (FDTI): It is spelled out as the time period
from the occurrence of a fault in the system to its detection, and it should be
as small as possible.
II. Fault Reaction Time Interval (FRTI): It is represented as the time period
between the detection of faults in a system to achieve a safe state with the
help of safety mechanism.
feedforward neural network with binary filters and activations that reconstruct the
output the same as an input.
A deep neural network is stacked using multiple layers of binary convolutional
network that compress the input bits into the lower dimension bits and then use an
activation layer to skew the output decision to either a pass or a fail. Binary con
volutional networks are trained to learn binary filters that are capable of extracting
features present in the input bits and then combine them into multiple useful pat
terns that are rearranged to regenerate the output identical to its input.
Two different designs of CNNs are shown, one using Convolution 1D and the
other with Convolution 2D. The first set of models use 1D binary filters while the
later employs 2D filters. It includes an encoder section that compresses the input
bits into lower dimensions and a decision-making activation section. The neural
network is trained for minimal loss. To make the network synthesizable, instead of
the high precision values, binarized activations, inputs, outputs, and filters are used
that take on only two possible values +1 or –1.
The second sets of models are the 2D CNNs where the filters are 2D. They are
also trained for binary filters and activations. By binarizing the weights and acti
vation, complex matrix multiplications are replaced by a sequence of addition and
subtraction operations. Further, the values are quantized to limit the bits from ex
ploding. Binary filters give an efficient way of implementing convolutional op
erations. If all the parameters of convolutional networks are binary, then the
convolutional operation can be approximated by a simple combinational circuit.
Only binarization is used but if it is used with additional optimization techniques
like Gordon et al. (2018), the resultant hardware will be small while at the same
time maintain similar levels of accuracy.
a[1] = A (y[1] )
Where y[1] is output of first layer after convolving with filter and addition of bias term,
f [1] is filter matrix, x[1] is input, b[1] is the bias matrix, A is activation function, *
indicates convolutional operation and a[1] is final output after applying activation
function.
convolutional network, we convert the weights into binary values only during the
forward. The weights are clipped during the backward pass. This is a requirement
for stochastic gradient descent to be able to calculate the direction of minimum loss
and be able to update weights. For binarizing floating weights and activation in
binary convolutional networks, we used deterministic binarizes function, which is
easy to implement on hardware, as described in the following equation and as
shown in Figure 6.7.
+ 1 ifa > =0
ab = sign (a) =
1 otherwise
If the value is greater than or equal to 0, the sign function represents it with a +1,
otherwise by –1. Binary tanh is used as the activation function, as shown in the
equation below and in Figure 6.7.
y=2 (a ) 1
(a) = clip (0.5a + 0.5, 0, 1) = max (0, min (0.5a + 0.5, 1))
1. Output of first layer: we used four filter so we got four featured map
a[1] = A (y[1] )
2. Output of Encoder : we used one filter and got encoded output of 1 bit.
a[2] = A (y[2] )
a[3] = A (y[3] )
112 VLSI and Hardware Implementations
i
1
L= (max (0, 1 y . y¯))2
n n =1
Where y is the actual output and ȳ is equal to w.x which is the predicted output. This
indicates that when y and ȳ have the same sign; the loss is zero otherwise we get some
114 VLSI and Hardware Implementations
loss. Squared hinge loss is a modified version of hinge loss, as in Figure 6.9. It solves
the problem of discontinuity that occurs in hinge loss at y.ȳ = 1.
The gradient for squared hinge loss is calculated with the help of chain rule.
L (y¯) L (w . x ) L (w . x )
= = . = 2xy (1 yy¯)
x x x w
L
Wnew = Wold
f
X0 (t + 1) 001 X 0 (t ) d 0 (t )
X1 (t + 1) = 101 X1 (t ) d1 (t )
X2 (t + 1) 010 X2 (t ) d2 (t )
Current State:
In RNN, activation function is tanh, the weight at recurring neuron is Whh, and at
input neuron is Wxh.
So, current state can be given as:
yt = Whyht
6.5.3 BACKPROPAGATION
In RNN, we go back in time to change the weights, so this algorithm is known as
Backpropagation through time (BPTT). We adopt the Logcosh loss function to
measure the inconsistency between the actual output and the predicted output. The
Logcosh loss is represented by “L”, which is defined as in the equation below.
n
L (y. yˆ) = i =1
log (cosh (yˆi yi )
dL (y . yˆ)
here, 2 = e2. dt
e 2 =ŷ 2 y2
Wxh1 = µ 1x(0)
Why1 = µ 1y(0)
118 VLSI and Hardware Implementations
dL (y . yˆ)
here, 1 =. dt
(e1 + 2 Why).
e1 = ŷ1 y1
For all time steps, gradients can be combined as the weights are same for each time step.
A 16-bit ALU with 36-bit input and 16-bit output performing 16 operations is
chosen for this experiment. Using ATPG, 117 patterns were generated to get 100%
stuck at fault coverage for 5786 faults. These test patterns were compressed and
decompressed using the convolutional autoencoder network. Table 6.1 shows the
performance of CNN architectures for decompression, including training statistics.
The first column gives the number of layers in the network, followed by the input
shape during training. The third column shows the network structure where the line
separates the encoder from the decoder. The structure below the line is the decoder,
which does the decompression. The next two columns display the training accuracy
with float weights and binarized weights. Further two columns present the effi
ciency of the B-CNN decompressor with float weights and binarized weights in
terms of fault coverage when trained for 2N patterns where N is the input bit length
for the decompressor. The next column gives the actual compression ratio obtained,
and the last column confirms the small area overhead in terms of gates.
The hardware overhead can be calculated for each decompressor layer by this
expression:
k 1
h [i ] = w [m ] x [m + i ]
m =0
Table 6.2 shows the results of compaction using B-RNN. Each RNN can generate a
stable and unique signature. The table also shows the number of unique values each
of them can generate. There are several improvements possible to the RNN
structure, so they can be improved greatly.
Layers Input Network structure Foal Binary Float Binary Compression Gate
shape tweights weights weights weights ratio N/36 count
Layers Input Network structure Foal Binary Float Binary Compression Gate
shape tweights weights weights weights ratio N/36 count
1 × 36
9×4 8 × 1287 × 646 × 325 × 4 66.67 42.09 99.05 99.42 5/6 108
4×9
6 12 × 3 11 × 12810 × 649 × 2 67.81 44.59 98.32 97.63 1/2 4,332
8 × 647 × 1286 × 6
12 × 3 11 × 12810 × 649 × 1 58.98 42.92 86.95 88.54 1/4 4,332
8 × 647 × 1286 × 6
VLSI and Hardware Implementations
Online Test from Binary Neural Network 121
TABLE 6.2
RNN as MISR
No of Input Bits Structure Trained With No of Unique Values
• There is no deterministic way to know which network might perform well and
which wouldn’t for achieving certain fault coverage except to try various
structures.
• Since there is no seed value, there is no pseudo randomness to the test pat
terns. They are always deterministic.
TABLE 6.3
Area for Controller Variance
Area Overhead Binary Conv1D Layer Binary Dense Layer
states in hardware. A special kind of neural network called the binary neural net
work is used for the purpose, so there is no additional requirement of quantization.
For demonstration purpose, an SDRAM controller is utilized. This 256 Mb (16
Mb × 16 data bit bus) SDRAM is a high-speed DRAM designed with CMOS to
work in 3.3 V supply voltage with a synchronous data transfer. This SDRAM
controller employs a pipelined design and therefore has a significant speed data
transfer rate. All the inputs and the output signals are also full synchronized and are
registered. The busy signal guides the interaction with the SDRAM. It is a quad
bank SDRAM, and each bank has 8192 rows by 512 columns by 16 bits.
A dataset is prepared by exciting all possible controller states in the SDRAM and
thereby capturing all the valid states. The valid states are then subtracted from the
total possible states to obtain the invalid states. The controller has the following
control signals – the read enable(rd_enable), write enable(wr_enable), reset(rst_n),
clock enable(clk_enable), chip select(cs_n), Column address strobe (cas_n), Row
address strobe(ras_n), busy, we_n, data mask low (data_mask_low), and data mask
high (data_mask_high). A set of 2048 states are captured for these 11 signals, out of
which 547 states are valid. After training, binary neural network coverage of 98.2%
is achieved. The following is the hardware overhead required for the online con
troller test, which is approximately equivalent to 120 gates (Table 6.3).
6.7 CONCLUSION
A new and efficient DFT technique for on-chip test decompression, compaction,
and controller variance check is presented. This method, although new, does not
require any modifications in the existing test architectures nor in the design. They
are non-intrusive since hardware overhead and FDTI are key factors to achieve
safety in Level 5 and Level 6 autonomous vehicles. Binary convolutional auto
encoders, neural networks, and recurrent neural networks are promising for the
future of several on-chip test strategies. From these experiments, I am convinced
that these structures would outperform the existing on-chip test techniques. Also,
the results of using binary RNN/LSTM as output data compactor for signature
generation are very good. The controller variance detection hardware is small for
watching control lines in real time and can perform quick detection of faults. The
steps can be automized as the steps are a standard process.
Online Test from Binary Neural Network 123
ACKNOWLEDGMENT
We gratefully acknowledge the support of NVIDIA Corporation with the NVIDIA
GPU Grant of Titan X Pascal GPU used for this research.
REFERENCES
Associated Press. 2016. “Google Self-Driving Car Caught on Video Colliding with Bus.” The
Guardian. https://www.theguardian.com/technology/2016/mar/09/google-self-driving-car-
crash-video-accident-bus.
BBCNews. 2019. “Tesla Model 3: Autopilot Engaged during Fatal Crash.” BBCNews. https://
www.bbc.com/news/technology‐48308852.
Channel News Asia. 2018. “Self-Driving Uber Car Kills Arizona Woman Crossing Street.”
Channel News Asia, 20 March 2018. https://www.reuters.com/article/us-autos-
selfdriving-uber-idUSKBN1GV296.
Daniel, Philemon, Shaily Singh, Garima Gill, Anshu Gangwar, Bargaje Ganesh, and Kaushik
Chakrabarti. 2019. “Demonstration of On-Chip Test Decompression for EDT Using
Binary Encoded Neural Autoencoders.” In 2019 IEEE International Test Conference
India, ITC India 2019. doi: 10.1109/ITCIndia46717.2019.8979710.
Demmel, Sébastien, Dominique Gruyer, Jean Marie Burkhardt, Sébastien Glaser, Grégoire
Larue, Olivier Orfila, and Andry Rakotonirainy. 2019. “Global Risk Assessment in an
Autonomous Driving Context: Impact on Both the Car and the Driver.” IFAC-
PapersOnLine. doi: 10.1016/j.ifacol.2019.01.009.
Denomme, Daniel, Sam Hooson, and James Winkelman. 2019. “A Fault Tolerant Time
Interval Process for Functional Safety Development.” In SAE Technical Papers. Vol.
2019-April. SAE International. doi: 10.4271/2019-01-0110.
Gordon, Ariel, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien Ju Yang, and Edward Choi.
2018. “MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep
Networks.” In IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, no. 1: 1586–1595. doi: 10.1109/CVPR.2018.00171.
Guilhemsang, Julien, Olivier Heron, Nicolas Ventroux, Olivier Goncalves, and Alain
Giulieri. 2011. “Impact of the Application Activity on Intermittent Faults in Embedded
Systems.” In IEEE VLSI Test Symposium. doi: 10.1109/VTS.2011.5783782.
Hancock, P. A. 2019. “Some Pitfalls in the Promises of Automated and Autonomous
Vehicles.” Ergonomics 62 (4). doi: 10.1080/00140139.2018.1498136.
Haq, Fitash U. L., Donghwan Shin, Shiva Nejati, and Lionel C. Briand. 2020. “Comparing
Offline and Online Testing of Deep Neural Networks: An Autonomous Car Case
Study.” In 2020 IEEE 13th International Conference on Software Testing, Verification
and Validation, ICST 2020. doi: 10.1109/ICST46399.2020.00019.
Hulse, Lynn M., Hui Xie, and Edwin R. Galea. 2018. “Perceptions of Autonomous Vehicles:
Relationships with Road Users, Risk, Gender and Age.” Safety Science 102. doi: 10.1
016/j.ssci.2017.10.001.
Kapser, Sebastian, and Mahmoud Abdelrahman. 2020. “Acceptance of Autonomous
Delivery Vehicles for Last-Mile Delivery in Germany – Extending UTAUT2 with Risk
Perceptions.” Transportation Research Part C: Emerging Technologies 111. doi: 10.1
016/j.trc.2019.12.016.
Khastgir, Siddartha, Stewart Birrell, Gunwant Dhadyalla, Håkan Sivencrona, and Paul
Jennings. 2017. “Towards Increased Reliability by Objectification of Hazard Analysis
and Risk Assessment (HARA) of Automated Automotive Systems.” Safety Science
113870731. doi: 10.1016/j.ssci.2017.03.024.
124 VLSI and Hardware Implementations
Manoharan, K., and P. Daniel. 2018. “Survey on Various Lane and Driver Detection
Techniques Based on Image Processing for Hilly Terrain.” IET Image Processing 12
(9). doi: 10.1049/iet-ipr.2017.0864.
Nardi, Alessandra. 2021. “Automotive Functional Safety Using LBIST and Other Detection
Methods.” CADENCE. Accessed January7. https://www.cadence.com.
Rastegari, Mohammad, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. “XNOR-
Net: Imagenet Classification Using Binary Convolutional Neural Networks.” Lecture
Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence
and Lecture Notes in Bioinformatics) 9908 LNCS: 525–542. doi: 10.1007/978-3-319-4
6493-0_32.
Snyder, Ryan. 2016. “Implications of Autonomous Vehicles: A Planner's Perspective.”
Institute of Transportation Engineers. ITE Journal 86 (25).
The International Organization for Standardization. 2011. “Road Vehicles — Functional
Safety.” ISO 26262.
Thorn, Eric, Shawn Kimmel, and Michelle Chaka. 2018. “A Framework for Automated
Driving System Testable Cases and Scenarios.” Dot Hs 812 623.
7 Applications of Machine
Learning in VLSI Design
Sneh Saurabh, Pranav Jain, Madhvi Agarwal,
and OVS Shashank Ram
IIIT DelhiDelhi, India
CONTENTS
7.1 Introduction................................................................................................... 125
7.2 Machine Learning Preliminaries..................................................................127
7.3 System-Level Design ...................................................................................129
7.4 Logic Synthesis and Physical Design..........................................................130
7.5 Verification ...................................................................................................133
7.6 Test, Diagnosis, and Validation...................................................................135
7.7 Challenges..................................................................................................... 136
7.8 Conclusions................................................................................................... 137
References..............................................................................................................137
7.1 INTRODUCTION
With refinement and decreasing abstraction level, the number of components in a
design increases. At lower levels of abstraction, electronic design automation
(EDA) tools routinely are required to handle billions of entities. Therefore, these
tools need to operate on voluminous data and complex models to accomplish
various tasks that can be categorized as follows:
The quality of results (QoR) in accomplishing the above tasks can often be im
proved by statistical data analysis, learning from examples, and encapsulating the
designer's intelligence into an EDA tool. These considerations motivate applying
machine learning (ML) in VLSI design.
In recent times, there have been tremendous advancements in ML tools and
technology. These advancements were facilitated by novel mathematical formulations
and powerful computing resources that allow massive data processing. Consequently,
a plethora of freely-available tools to implement ML techniques and algorithms are
available to us. Therefore, ML techniques are now widely employed in VLSI design.
These techniques have improved the designer's productivity, tackled complex pro
blems more efficiently, and provided alternative solutions.
VLSI design is a complex process. We decompose the design process into
multiple steps, starting with system-level design and culminating in chip fabrica
tion. We can apply ML and related techniques in design implementation, ver
ification, testing, diagnosis, and validation stages. In this chapter, we review the
applications of ML in all these stages. We have summarized the applications of ML
in VLSI design covered in this chapter in Figure 7.1. However, note that this
chapter does not exhaustively list relevant work in these areas. Instead, this chapter
demonstrates the application of ML on a few examples taken from literature. It will
help readers in appreciating the link between ML and VLSI design. Subsequently,
readers can apply ML techniques to their specific problems by making appropriate
modifications to the illustrated examples.
We can identify three main steps in the design implementation: a) system-level
design, b) logic synthesis, and c) physical design. There are many opportunities to
apply ML in design implementation. It can help in exploring the design space more
Applications of ML in VLSI Design 127
efficiently. It can capture the designer's knowledge and reuse them by making re
commendations to the designers. It can also make smart transformations based on
past experiences and learning.
The design implementation is supported by the analysis and verification steps in
design flows. The verification process is integral to VLSI design since it ensures
that the design functionality matches the given specification. The verification pro
cess involves some analysis of the design data, and it can often be time-consuming.
ML techniques can help in inferring complex dependencies and handling volumi
nous data during the analysis of design data. It can help in the more accurate and
faster analysis and can ease or supplement the verification process. However, we
should ensure that the errors obtained during ML-based analysis are acceptable for
verification purposes.
Post-designing, we carry out fabrication in semiconductor foundries and test the
fabricated die to rule out failures due to defects. If there are failures, we diagnose the
root cause of failures and make a fix to avoid the problem. Additionally, we carry out
pre-production validation of chips functionality based on the real usage scenario. In all
these steps, we often encounter voluminous data. We can improve the conventional
approaches of test and diagnosis by employing statistical ML and data mining tech
niques. We can also improve the fabrication and manufacturing of chips by ML-driven
pattern matching and correcting the masks efficiently.
The rest of this chapter is organized as follows. We describe the basic concepts
of ML that are relevant for this chapter in Section 7.2. We describe the application
of ML in system-level design in Section 7.3 and logic synthesis and physical design
in Section 7.4. We describe the application of ML in verification in Section 7.5 and
the test, diagnosis, and validation in Section 7.6. We highlight key challenges in
adopting ML-based solutions in VLSI design flow in Section 7.7. We conclude this
chapter by outlining future directions in Section 7.8.
In VLSI design, we often employ a combination of the above models and learning
strategies. The choice of the ML model strongly depends on the available training
data. Moreover, the model complexity and whether it fits in the existing design
flows are critical considerations.
Applications of ML in VLSI Design 129
report latency, bandwidth, power consumption, and other attributes for a given
memory configuration. In Sen and Imam [3], neural network, SVM, random forest
(RF), and gradient boosting (GB) are tried for the ML model. It is reported that the
SVM and RF methods yielded better accuracies compared with other models [3].
At the system-level, we often use high-level synthesis (HLS) for design space
exploration. The exploration goal is to finally settle on a near-optimal solution that
meets the given constraints. We can use ML techniques to determine the Pareto-
optimal designs efficiently. We can build an ML model by training with samples
obtained by synthesizing a fraction of the design space [4]. The ML model can
quickly predict attributes such as area and throughput for a design. We can also
refine the model by smartly selecting the next synthesized sample [4]. We can
employ a regression model based on Gaussian processes, as reported in Zuluaga
et al. [4]. Alternatively, we can use tree-based RF, as reported in Liu et al. [5].
During design space exploration, we often need to make binary decisions. For
example, whether to inline a given function. It is easy to model binary decisions in a
tree-based RF using two branches [5]. Note that RF consists of multiple regression
trees. It produces the final result by a collective vote of these trees. Therefore, we
can minimize both the generalization error and prediction variance using RF [5].
The design space exploration during HLS can be made more efficient by using
ML techniques. For example, conventional search strategies such as simulated
annealing can generate a training set for implementing a decision tree [6].
Subsequently, the trained decision tree prunes the search space. Thus, design space
exploration becomes more efficient. We can improve local search algorithms such
as simulated annealing by choosing the starting point smartly [7]. For example, first,
we perform a conventional local search. Using these searches, we train an eva
luation function to predict the outcome of a given starting point. Subsequently, the
evaluation function guides design space exploration and filters out non-promising
start points. Thus, the local search becomes more efficient [7].
Another challenge in a heterogeneous system is allocating resources and
managing power during runtime. Traditionally, we depend on a priori information
about workload and thermal model of chips while designing. However, a priori
information is sometimes unavailable. Moreover, a priori information cannot ade
quately model the temporal and spatial uncertainties. These variations can be due to
the variations in the workloads, devices, and environment. Therefore, we need to
make appropriate decisions during runtime. We can employ ML techniques, such as
reinforcement learning, that adapt to the varying workload and environment [8].
They can handle scenarios for which we have not trained the system. It has been
shown that we can employ Q-learning to find an optimal policy in dynamic resource
allocation and improve the system performance [9].
these steps is still computationally difficult. Typically, EDA tools employ several
heuristics to obtain a solution. The heuristics are guided by tool options and user-
specified settings. The QoR strongly depends on them. Note that these steps are se
quential. Therefore, the solution produced by a step impacts all subsequent tasks. A
designer often adjusts the tool settings and inputs based on experience and intuition to
achieve the desired QoR. We can reduce design effort in these tasks and improve the
QoR by employing ML tools and techniques, as explained in the following
paragraphs.
One of the earliest attempts of reducing design effort using ML was Learning
Apprentice for VLSI design (LEAP) [10]. LEAP acquires knowledge and learns
rules by observing a designer and analyzing the problem-solving steps during their
activities. Subsequently, it provides advice to the designer on design refinement and
optimization. A designer can accept LEAP's advice or can ignore it and manually
carry out transformations. When a designer ignores the advice, LEAP considers it as
a training sample and updates its rule.
Recently, there is a renewed interest in ML-based design adviser tool. In Beerel
and Massoud [11], the authors reported developing DesignAdvisor. It will monitor
and record example design problems and the corresponding action taken by a de
signer when using standard EDA tools. For example, the design problem can be
multi-level Boolean network optimization using logic restructuring and Boolean
simplification. The corresponding designer action can be setting tool options or
directives for necessary logic transformation and optimization. Subsequently,
DesignAdvisor will learn to produce the best solution for a given problem. It can
make recommendations to designers and help to tune EDA tools and their opti
mization algorithms.
An ML-based tool for logic synthesis and physical design, such as DesignAdvisor,
need to consider to implement following tasks [11]:
1. Developing a training set: A training set consists of data points with a design
problem and its corresponding solution. For example, a data point can be an
initial netlist, constraints, cost function, optimization settings, and the final
netlist. We need to generate these data points for training or can acquire them
from designers.
2. Reduced representations of the training set: The training data points typically
contain many features. However, for efficient learning, we can reduce the
dimensionality of the training set. For example, we can perform PCA and
retain the most relevant input features.
3. Learning to produce the optimum output: The training data points that we
collect from the existing EDA tools are, typically, not the mathematical op
timum. These tools give the best possible solution that could be acceptable to
the designers. Therefore, training data does not represent the ground truth of the
problem. Moreover, data can be sparse and biased because some specific tools
generate those results. We can employ statistical models such as Bayesian
neural networks (BNNs) to tackle this problem [11]. BNNs have weights and
biases specified as distributions instead of scalar values. Therefore, it can tackle
disturbances due to noisy or incomplete training set.
132 VLSI and Hardware Implementations
it contains any DRC violation. Post-detailed routing samples have been used for
training the classifier. The classifier detects DRC violations based on features such
as local pin density, local overflow, pin proximity, and connectivity parameters
[15]. An SVM was found to be the most suitable model for this classifier in Chan
et al. [15]. Furthermore, using this predictor, we can avoid DRC problems by
spreading cells appropriately after placement [15].
Another area in which ML can be effective is design for manufacturability
(DFM) [16]. We can carry out VLSI mask optimization using ML techniques to
reduce cost and computational effort. The capabilities of ML in efficiently handling
big data are useful in these applications. It has been reported that efficient mask
layout optimization can be done in the optical proximity correction (OPC) frame
work [17]. We can train a deterministic classification ML model to identify the best
OPC engine for a given design [17].
7.5 VERIFICATION
We can employ ML techniques to improve and augment traditional verification
methodologies in the following ways:
parameters on which SI-effects depend. Some of the parameters that impact SI-effects
are: nominal (without considering SI) delay and slew, clock period, resistance, cou
pling capacitance, toggle rate, the logical effort of the driver, and temporal alignment
of victim and aggressor signals. Using these parameters, we can train an ML model
such as ANN or SVM to predict SI-induced changes on delay and slew [20]. Since
ML models can capture dependency in a high-dimensional space, we can utilize them
for easy verification. However, we should ensure that the errors produced by ML
models are tolerable for our verification purpose.
Another approach to estimate SI-effects is by using anomaly detection (AD)
techniques [21]. AD techniques are popularly employed to detect anomalies in fi
nancial transactions. However, we can train an ML model, such as a contractive
autoencoder (CA), with the features of SI-free time-domain waveforms.
Subsequently, we use the trained model to identify anomalies due to SI-effects. We
can use both unsupervised and semi-supervised AD techniques for this [21].
We can employ ML techniques to efficiently fix IR drop problems in an in
tegrated circuit [22]. Traditionally, we carry out dynamic IR drop analysis at the end
of design flows. Any IR drop problem is corrected by Engineering Change Order
(ECO) based on the designer's experience. Typically, we cannot identify and fix all
the IR drop problems together. Consequently, we need to carry out dynamic IR drop
analysis and ECO iteratively, until we have corrected all the IR drop issues.
However, IR drop analysis takes significant runtime and designer's effort. We can
reduce the iterations in IR drop signoff by employing ML to predict all the potential
IR drop issues and fix them together [22]. Firstly, by ML-based clustering tech
niques, we identify high IR drop regions. Subsequently, small regional ML-based
models are built on local features. Using these regional models IR drop problems
are identified and fixed. After we have corrected all the violations, a dynamic IR
drop check is finally done for signoff. If some violations still exist, we repeat the
process till all the IR drop issues are corrected.
We can use ML techniques in physical verification for problems such as litho
graphic hotspot detection [23]. By defining signatures of hotspots and a hier
archically refined detection flow consisting of ML kernels, ANN, and SVM, we can
efficiently detect lithographic hotspots. We can also employ a dictionary learning
approach with an online learning model to extract features from the layout [24].
Another area in which we can apply ML techniques is the technology library
models. Technology libraries form the bedrock of digital VLSI design.
Traditionally, timing and other attributes of standard cells are modeled in tech
nology libraries as look-up tables. However, these attributes can be conveniently
derived and compactly represented by using ML techniques. The ML-models
models can efficiently exploit the intrinsic degrees of variation in the data.
In Shashank Ram and Saurabh [25], we demonstrate this by modeling multi-
input switching (MIS) effects using ML techniques. Traditionally, we ignore MIS
effects in timing analysis. We employ a delay model that assumes only a single
input switching (SIS) for a gate during a transition. For SIS, the side inputs are held
constant to non-controlling values. However, ignoring MIS effects can lead to either
an overestimation or an under-estimation of a gate delay. We have examined the
impact of MIS on the delay of different types of gates under varying conditions. We
Applications of ML in VLSI Design 135
can model the MIS-effect by deriving a corrective quantity called MIS-SIS dif
ference (MSD) [25]. We obtain MIS delay by adding MSD to the conventional SIS
delay under varying conditions.
There are several benefits of adopting ML-based techniques for modeling MIS
effects. We can represent multi-dimensional data using a learning-based model
compactly. It can capture the dependency of MIS effects on multiple input para
meters and efficiently exploit them in compact representation. In contrast, tradi
tional interpolation-based models have large disk-size and loading time, especially
at advanced process nodes. Moreover, incorporating MIS effects in advanced delay
models will require a drastic change in the delay calculator and is challenging.
Therefore, we have modeled the MIS effect as an incremental corrective quantity
over SIS delay. It fits easily with the existing design flows and delay calculators.
Additionally, the approach proposed in Shashank Ram and Saurabh [25] is generic.
Therefore, the ANN-based model can be employed to capture other non-ideal ef
fects at advanced process nodes.
We have employed the ML-based MIS model to carry out MIS-aware timing
analysis [25]. It involves reading MIS-aware timing libraries and reconstructing the
original ANN. Since the ANNs are compact, the time consumed in the re
construction of ANNs is insignificant. Subsequently, using the circuit conditions,
we compute the MSD for each relevant timing arc. Using MSD, we adjust the SIS
delay and generate the MIS-annotated timing reports. It is demonstrated that the
ML-based MIS modeling can improve the accuracy of timing analysis [25]. For
example, for some benchmark circuits, traditional SIS-based delay differs from the
corresponding SPICE-computed delay by 120%. However, the ML-based model
produces delays with errors less than 3%. The runtime overhead of MIS-aware
timing analysis is also negligible. The methodology of Shashank Ram and Saurabh
[25] can be extended to create a single composite MIS model for different process
voltage temperature (PVT) conditions. In the future, we expect that we can effi
ciently represent other complicated circuit and transistor-level empirical models
using ML models.
Makris [27] that produces both the pass/fail labels and the confidence level in its
prediction. If the confidence is low for a prediction, traditional and more expensive
specification testing is employed to reach a final test decision. Thus, the cost ad
vantage of ML-based analog/RF testing is leveraged. Note that the test quality is not
sacrificed in the two-tier test approach [27]. We can employ a similar strategy for
other verification problems where ML-induced errors are critical.
We can use ML-based strategies for the diagnosis of manufacturing defects.
They can provide alternatives to the traditional techniques of exploring the causal
relationship. We can formulate a diagnosis problem as an evaluation of several
decision functions [28]. It can reduce the run time complexity of the traditional
diagnosis methods, especially for volume diagnosis. It has been reported in Wang
and Wei [28] that even with highly compressed output responses, we can find defect
locations for most defective chips. We can employ ML techniques in the diagnosis
of failures in a scan chain also [29]. Note that the scan chain patterns are not
sufficient to determine the failing flip-flop in a scan chain. Therefore, we need chain
failure diagnosis methodologies to identify defective scan cell(s) on a faulty scan
chain. We can employ unsupervised ML techniques based on the Bayes theorem
that are more tolerant to noises for this purpose [29].
Another problem that can utilize the capabilities of ML is the post-silicon va
lidation. Before production, we carry out post-silicon validation to ensure that the
silicon functions as expected, under on-field operating conditions. For this purpose,
we need to identify a small set of traceable signals for debugging and state re
storation. Traditional techniques such as simulation take high runtime in identifying
traceable signals. Alternatively, we can employ ML-based techniques for efficient
signal selection [30]. We can train an ML with a few simulation runs. Subsequently,
we can use this model to identify beneficial trace signals instead of employing time-
consuming simulations [30].
7.7 CHALLENGES
In the previous sections, we discussed various applications of ML techniques in
VLSI design. Nevertheless, there are some challenges involved in adopting ML
techniques in conventional design flows. The effectiveness of ML techniques in
VLSI design is dependent on complex design data. Therefore, producing compe
titive results repeatedly on varying design data is a challenge for many applications.
Moreover, training an ML model requires extracting voluminous data from a tra
ditional or a detailed model. Sometimes it is challenging to generate such a training
data set. Sometimes these training data are far away from the ground truth or
contains a lot of noises. Handling such a training set is challenging.
ML-based design flows can disrupt the traditional design flows and be expensive
to deploy. Moreover, applying ML-based EDA tools may not produce expected
results immediately. There is some non-determinism associated with the ML-based
applications. In the initial stages, there are not enough training data. Consequently,
an ML-based EDA tool cannot guarantee accurate results. Therefore, adopting
ML-based solutions in design flows is challenging for VLSI designers.
Nevertheless, in the long run, ML-based techniques could deliver rich dividends.
Applications of ML in VLSI Design 137
7.8 CONCLUSIONS
In summary, ML offers efficient solutions for many VLSI design problems [31–35].
It is particularly suitable for complex problems for which we have readily available
data to learn from and predict. With the advancement in technology, we expect that
such design problems will increase. The advances in EDA tools will also help
develop more efficient ML-specific hardware. The ML-specific hardware can ac
celerate the growth in ML technology. The advancement in ML technologies can
further boost their applications in developing complex EDA tools. Thus, there is a
synergic relationship between these two technologies. In the long run, both these
technologies together can deliver benefits to many other domains and applications.
REFERENCES
[1] Ozisikyilmaz, Berkin, Gokhan Memik, and Alok Choudhary. “Efficient system
design space exploration using machine learning techniques.” In Proceedings of the
2008 45th ACM/IEEE Design Automation Conference, pp. 966–969. IEEE, 2008.
[2] Greathouse, Joseph L., and Gabriel H. Loh. "Machine learning for performance and
power modeling of heterogeneous systems." In Proceedings of the 2018 IEEE/ACM
International Conference on Computer-Aided Design (ICCAD), pp. 1–6. IEEE, 2018.
[3] Sen, Satyabrata, and Neena Imam. “Machine learning based design space ex
ploration for hybrid main-memory design.” In Proceedings of the International
Symposium on Memory Systems, pp. 480–489. 2019.
[4] Zuluaga, Marcela, Andreas Krause, Peter Milder, and Markus Püschel. ““Smart”
design space sampling to predict Pareto-optimal solutions.” In Proceedings of the
13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers,
Tools and Theory for Embedded Systems, pp. 119–128. 2012.
[5] Liu, Hung-Yi, and Luca P. Carloni. “On learning-based methods for design-space
exploration with high-level synthesis.” In Proceedings of the 50th Annual Design
Automation Conference, pp. 1–7. 2013.
[6] Mahapatra, Anushree, and Benjamin Carrion Schafer. “Machine-learning based
simulated annealer method for high level synthesis design space exploration.” In
Proceedings of the 2014 Electronic System Level Synthesis Conference (ESLsyn),
pp. 1–6. IEEE, 2014.
[7] Kim, Ryan Gary, Janardhan Rao Doppa, and Partha Pratim Pande. “Machine
learning for design space exploration and optimization of manycore systems.” In
Proceedings of the 2018 IEEE/ACM International Conference on Computer-Aided
Design (ICCAD), pp. 1–6. IEEE, 2018.
[8] Pagani, Santiago, P.D. Sai Manoj, Axel Jantsch, and Jörg Henkel. “Machine
learning for power, energy, and thermal management on multicore processors: A
survey.” IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems 39, no. 1 (2018): 101–116.
[9] Xiao, Yao, Shahin Nazarian, and Paul Bogdan. “Self-optimizing and self-
programming computing systems: A combined compiler, complex networks, and
machine learning approach.” IEEE Transactions on Very Large Scale Integration
(VLSI) Systems 27, no. 6 (2019): 1416–1427.
[10] Mitchell, Tom M., Sridbar Mabadevan, and Louis I. Steinberg. “LEAP: A learning
apprentice for VLSI design.” In Machine Learning, pp. 271–289. Morgan
Kaufmann, 1990.
138 VLSI and Hardware Implementations
[11] Beerel, Peter A., and Massoud Pedram. “Opportunities for machine learning in
electronic design automation.” In Proceedings of the 2018 IEEE International
Symposium on Circuits and Systems (ISCAS), pp. 1–5. IEEE, 2018.
[12] Chang, Wen-Hsiang, Chien-Hsueh Lin, Szu-Pang Mu, Li-De Chen, Cheng-Hong
Tsai, Yen-Chih Chiu, and Mango C-T. Chao. “Generating routing-driven power
distribution networks with machine-learning technique.” IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems 36, no. 8 (2017):
1237–1250.
[13] Kirby, Robert, Saad Godil, Rajarshi Roy, and Bryan Catanzaro. “CongestionNet:
Routing congestion prediction using deep graph neural networks.” In Proceedings
of the 2019 IFIP/IEEE 27th International Conference on Very Large Scale
Integration (VLSI-SoC), pp. 217–222. IEEE, 2019.
[14] Tabrizi, Aysa Fakheri, Logan Rakai, Nima Karimpour Darav, Ismail Bustany, Laleh
Behjat, Shuchang Xu, and Andrew Kennings. “A machine learning framework to
identify detailed routing short violations from a placed netlist.” In Proceedings of
the 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), pp. 1–6.
IEEE, 2018.
[15] Chan, Wei-Ting J., Pei-Hsin Ho, Andrew B. Kahng, and Prashant Saxena.
“Routability optimization for industrial designs at sub-14nm process nodes using
machine learning.” In Proceedings of the 2017 ACM on International Symposium on
Physical Design, pp. 15–21. 2017.
[16] Baker Alawieh, Mohamed, Yibo Lin, Wei Ye, and David Z Pan. “Generative
learning in VLSI design for manufacturability: Current status and future directions.”
Journal of Microelectronic Manufacturing 2, no. 4 (2019).
[17] Yang, Haoyu, Wei Zhong, Yuzhe Ma, Hao Geng, Ran Chen, Wanli Chen, and Bei
Yu. “VLSI mask optimization: From shallow to deep learning.” In Proceedings of
the 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC),
pp. 434–439. IEEE, 2020.
[18] Ioannides, Charalambos, and Kerstin I. Eder. “Coverage-directed test generation
automated by machine learning--a review.” ACM Transactions on Design
Automation of Electronic Systems (TODAES) 17, no. 1 (2012): 1–21.
[19] Wang, Fanchao, Hanbin Zhu, Pranjay Popli, Yao Xiao, Paul Bodgan, and Shahin
Nazarian. “Accelerating coverage directed test generation for functional verifica
tion: A neural network-based framework.” In Proceedings of the 2018 on Great
Lakes Symposium on VLSI, pp. 207–212. 2018.
[20] Kahng, Andrew B., Mulong Luo, and Siddhartha Nath. “SI for free: Machine
learning of interconnect coupling delay and transition effects.” In Proceedings of the
2015 ACM/IEEE International Workshop on System Level Interconnect Prediction
(SLIP), pp. 1–8. IEEE, 2015.
[21] Medico, Roberto, Domenico Spina, Dries Vande Ginste, Dirk Deschrijver, and Tom
Dhaene. "Machine-learning-based error detection and design optimization in signal
integrity applications.” IEEE Transactions on Components, Packaging and
Manufacturing Technology 9, no. 9 (2019): 1712–1720.
[22] Fang, Yen-Chun, Heng-Yi Lin, Min-Yan Su, Chien-Mo Li, and Eric Jia-Wei Fang.
"Machine-learning-based dynamic IR drop prediction for ECO.” In Proceedings of
the International Conference on Computer-Aided Design, pp. 1–7. 2018.
[23] Ding, Duo, Andres J. Torres, Fedor G. Pikus, and David Z. Pan. “High performance
lithographic hotspot detection using hierarchically refined machine learning.” In
Proceedings of the 16th Asia and South Pacific Design Automation Conference
(ASP-DAC 2011), pp. 775–780. IEEE, 2011.
[24] Geng, Hao, Haoyu Yang, Bei Yu, Xingquan Li, and Xuan Zeng. “Sparse VLSI
layout feature extraction: A dictionary learning approach.” In Proceedings of the
Applications of ML in VLSI Design 139
2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 488–493.
IEEE, 2018.
[25] Shashank Ram, O.V.S., and Sneh Saurabh. “Modeling multiple input switching in
timing analysis using machine learning.” IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems 40, no. 4 (2020).
[26] Biswas, Sounil, and Ronald D. Blanton. “Statistical test compaction using binary
decision trees.” IEEE Design & Test of Computers 23, no. 6 (2006): 452–462.
[27] Stratigopoulos, Haralampos-G., and Yiorgos Makris. “Error moderation in low-cost
machine-learning-based analog/RF testing.” IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems 27, no. 2 (2008): 339–351.
[28] Wang, Seongmoon, and Wenlong Wei. “Machine learning-based volume diag
nosis.” In Proceedings of the 2009 Design, Automation & Test in Europe
Conference & Exhibition, pp. 902–905. IEEE, 2009.
[29] Huang, Yu, Brady Benware, Randy Klingenberg, Huaxing Tang, Jayant Dsouza,
and Wu-Tung Cheng. “Scan chain diagnosis based on unsupervised machine
learning.” In Proceedings of the 2017 IEEE 26th Asian Test Symposium (ATS),
pp. 225–230. IEEE, 2017.
[30] Rahmani, Kamran, Sandip Ray, and Prabhat Mishra. “Postsilicon trace signal se
lection using machine learning techniques.” IEEE Transactions on Very Large Scale
Integration (VLSI) Systems 25, no. 2 (2016): 570–580.
[31] Wang, Li-C., and Magdy S. Abadir. “Data mining in EDA-basic principles, pro
mises, and constraints.” In Proceedings of the 2014 51st ACM/EDAC/IEEE Design
Automation Conference (DAC), pp. 1–6. IEEE, 2014.
[32] Capodieci, Luigi. “Data analytics and machine learning for continued semi
conductor scaling.” SPIE News (2016).
[33] Wang, Li-C. “Experience of data analytics in EDA and test—principles, promises,
and challenges.” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems 36, no. 6 (2016): 885–898.
[34] Pandey, Manish. “Machine learning and systems for building the next generation of
EDA tools.” In Proceedings of the 2018 23rd Asia and South Pacific Design
Automation Conference (ASP-DAC), pp. 411–415. IEEE, 2018.
[35] Kahng, Andrew B. “Machine learning applications in physical design: Recent re
sults and directions.” In Proceedings of the 2018 International Symposium on
Physical Design, pp. 68–73. 2018.
8 An Overview of High-
Performance Computing
Techniques Applied to
Image Processing
Giulliano Paes Carnielli1, Rangel Arthur1,
Ana Carolina Borges Monteiro2,
Reinaldo Padilha Franca2, and Yuzo Iano2
1
Faculty of Technology (FT), State University of Campinas
(UNICAMP), Limeira, São Paulo, Brazil
2
School of Electrical Engineering and Computing (FEEC),
State University of Campinas (UNICAMP), Campinas,
São Paulo, Brazil
CONTENTS
8.1 Introduction................................................................................................... 141
8.1.1 Context.............................................................................................. 141
8.1.2 Concepts ...........................................................................................142
8.2 HPC Techniques Applied to Image Treatment ...........................................144
8.2.1 Cloud-Based Distributed Computing...............................................144
8.2.2 GPU-Accelerated Parallelization .....................................................145
8.2.3 Parallelization Using GPU Cluster ..................................................148
8.2.4 Multicore Architecture .....................................................................149
8.3 Neural Networks...........................................................................................150
8.3.1 Convolutional Neural Network (CNN) ...........................................153
8.3.2 Generative Adversarial Network (GAN).........................................153
8.3.3 HPC Techniques Applied to Neural Networks ...............................155
8.4 Machine Learning Applications Hardware Design .....................................155
8.4.1 FPGA ................................................................................................155
8.4.2 SVM..................................................................................................156
8.5 Conclusions................................................................................................... 157
Notes ......................................................................................................................158
References..............................................................................................................158
8.1 INTRODUCTION
8.1.1 CONTEXT
Digital image processing (DIP) is a dynamic and expanding area, comprising technol
ogies widely used in the most diverse areas and applications. For example, construction
of geographic and atmospheric models over a time series of satellite images, detection of
anomalies in manufactured products, support for diagnostics based on medical images,
and security based on biometric recognition, among others [1,2].
Many of the techniques used in image processing were developed back in the 1960s,
as shown by the work of Azriel Rosenfeld [3], one of the first to address the issue of
computer image processing. It is worth noting that, in the following decade, several
studies were developed with the objective of optimizing hardware architectures and
algorithms to make parallel image processing feasible. For example, in a work dated
1975 [4], Stamatopoulos shows the efficiency of parallel digital processors in relation
to serial digital computers for image processing. In a 1977 paper [5], Meilander dis
cusses the evolution of the parallel processor architecture for image processing.
Since its early days, DIP has been a challenge to the processing capacity of
hardware and software. Computer vision (CV) problems are computationally in
tensive [6,7]. The explanation for this is that images concentrate a large amount of
data and are usually treated in large sets by applications that demand results in a
short time. For these characteristics, image processing can be included in the
context of Big Data. Traditional general-objective machines cannot drive the dis
tinct I/O requisite of most image processing assignments, nor do it take benefit of
the parallel computing scope present in most vision-associated applications [8,9].
Therefore, as Jon A. Webb [10] states, DIP and CV are natural applications for
high-performance computing (HPC), which boost research, for example, in the
areas of graphics hardware, standards specification, modeling, efficient algorithms,
simulation, animation, and virtual reality [11].
The purpose of this study is to present an overview of four high-performance
processing techniques, used in the area of image processing and CV, as well as to
briefly address the use of artificial neural networks (ANNs) in the same context.
DIP has several applications that, admittedly, demand great computational power
due to either the amount of data usually addressed or, in certain contexts, the need
to generate results in a very short time. Therefore, an overview of these HPC
techniques, commonly applied to such research areas, becomes relevant.
8.1.2 CONCEPTS
1. Digital Image: An analogue image can be determined, according to Monteiro,
Gonzales and Woods, and Tyagi [7,12,13], as a two-dimensional function in
which x and y are respective to spatial coordinates, and the amplitude f is
known as the intensity or gray level of the digital image at the point.
such a process, the continuous coordinates of the original image are converted to
discrete values, resulting in an array of values, usually real numbers.
Each sampled point is called a pixel or pel (picture element) and stores information,
as a numerical value or a small set of numbers, of the properties recorded at the
sampled point, such as color or brightness. The sampling intervals determine the spatial
resolution of the image, that is, the number of points that form the matrix (Figure 8.1).
2. Image Processing and CV: According to Gonzales and Woods, and Nixon and
Aguado [12,14], image processing and CV are two areas comprising a wide
variety of applications, with no clear and defined border between them. Image
processing is often defined as the activity of transforming one image into
another, or a set of information. On the other hand, CV is the use of digital
computers to emulate human vision, comprising being able to generate in
ferences and take actions based on visual inputs. Image analysis (under
standing of images) is a discipline that stands between these two definitions.
Platforms such as Amazon Web Services (AWS) or Google Cloud Platform (GCP)
offer highly scalable, on-demand, accessible, and easily configurable environments
without requiring service provider interference. Such configuration features allow
for multiple processing units to be allocated in the same environment, or an en
vironment to be instantiated multiple times, providing an appropriate platform to
perform tasks in parallel. In addition, data can be kept in the virtual environment,
avoiding large transfers during processing (Figure 8.2) [19].
The aforementioned characteristics meet application scenarios in which the de
mand for HPC resources varies both over time and in the required specifications.
1. Case Study: The work presented by Shunxing Bao [21] discusses the pro
cessing of large quantities of medical images (Big Data Medical Image),
acquired by heterogeneous magnetic resonance (MRI) methods, with analysis
in multiple stages (multi-stage analysis), whose characteristics include the
frequent use of high-performance clusters, incurring large operational costs,
in addition to variability in processing time, sequential execution (pipeline)
with errors identified only in later stages, wasting time and computational
resources.
High-Performance Computing Techniques 145
GPU was initially developed for graphic processing and image rendering, but it
has become a highly parallel programmable processor, used for more generic
purposes. This change of application context was promoted by the publication of
parallel programming APIs, such as Open Computing Language (OpenCL – Apple)
and compute unified device architecture (CUDA - NVIDIA), which allocate the
computing power of a GPU, for a variety of applications [25].
A typical pipeline for texture analysis involves region recognition and segmenta
tion, feature extraction (characteristics), and even classification. The performance of
each of these steps depends on the accuracy of the previous step. In this context,
Tsai et al. [26] focuses on accelerating the extraction of features from MRI images
of the brain, using GPU.
The most widely used method for extracting statistical characteristics from textures
is the GLCM. However, this approach implies a high computational cost, when
performing several calculations on a Region of Interest (ROI) that slides over a high-
resolution image. This task is not suitable for processing on a simple CPU, given the
increase in the quantity and quality of data, and the need for a prompt response [27].
The purpose of this work is to use the power of GPU parallelism, to generate the
GLCM and extract its characteristics, simultaneously, for many small overlapping
ROI, covering the entire image area. The processing in each ROI is independent and
very similar, with great potential for parallelism (Figure 8.4) [28].
High-Performance Computing Techniques 147
3. Case Study 2: Still related to the use of GPUs, to accelerate image processing
techniques, Saxena et al. [29] propose a parallel implementation for the
morphological algorithm vHGW (van Herk/Gill-Werman), considered one of
the fastest for serial processing in CPU. Morphological operators (e.g. ero
sion, dilation, opening) are used in the extraction of components from images,
used for representation and description of regions and shapes, through the
application of a structuring element on the image.
This work presents a parallel strategy, based on the use of the NVIDIA GTX860M
GPU, with implementation over the CUDA 5.0 API, with 640 CUDA cores. The
computer system featured a 2.50GHz Intel i7 CPU, and 8GB of RAM.
The result was obtained from experiments with expansion and erosion operators,
implemented by the vHGW algorithm in CUDA 5.0. There was an effective gain with
the use of the GPU, but the speedup varied considerably with the size of the image,
and with the size and shape of the structuring element. The work reported that the
biggest gains were obtained with large images, and with the increase of CUDA cores.
In modern times, GPUs have progressed into highly parallel, multithreaded processors
with high memory bandwidth [30]. Therefore, it makes sense for multiple GPUs to be
integrated in order to create an even more parallel and powerful architecture.
A GPU cluster is a cluster of computers in which each node is provided with a
GPU, suitable for the GPGPU (General-Purpose Graphics Processing Unit) model,
in which such devices are typically aimed at graphics processing, are used for
processing general purpose.
In a GPU cluster, the nodes are connected to each other by some high-
performance network technology (e.g. InfiniBand). An Ethernet switch connects the
parallel infrastructure to a Head Node, responsible for the cluster interface, which
receives and processes external requests and assigns work to the nodes (Figure 8.5).
2. Case Study: An example of using a GPU cluster in the image processing area
can be found at Lui et al. [30]. This work presents a high-performance fra
mework to handle time-consuming applications, and time-series quantitative
retrieval of satellite images, in a cluster of GPUs.
Such quantitative sensing models are used to estimate diverse geophysical parameters
such as Aerosol Optical Depth (AOD) and even Normalized Difference Vegetation
Index (NDVI). That project investigates an efficient solution to derive a decade AOD
dataset on 1 km resolution images over Asia, related to the SRAP-MODIS
High-Performance Computing Techniques 149
Cores can be tightly or loosely coupled, depending on how they share resources (e.g.
cache, memory) and how they communicate (e.g. messages or memory sharing). In
addition, multicore systems are also classified as homogeneous (Figure 8.7a), when
the cores are identical, or support the same set of instructions, or are heterogeneous
(Figure 8.7b), otherwise [32].
In general, multicore architectures provide more efficiency in resource man
agement and data transfer due to the proximity of the cores. However, the Operating
System must be able to use the resources offered by architectures like this [31,32].
The Am2045 system, MPPA with distributed memory, had 336 processing cores
with 300MHz and a memory bank with 4 blocks of 2KB. Each core had two SR
units, a simple processor, with support for basic arithmetic and logic instructions,
and two SRD units, similar to a DSP (digital signal processor), supporting more
complex instructions (Figure 8.8).
The purpose of this work is to implement two image processing applications on
the Am2045 architecture: begin itemize item algorithm for JPEG encoding and,
item binary DIP (edge detection and hole filling) end itemize.
Regarding the JPEG encoding algorithm, the objective was to compare the re
sults with those of 6 other platforms. In the case of the binary image processing, the
result was compared with just one other approach based on dedicated hardware
(FPGA – field-programmable gate array).
The results indicated that, for the JPEG encoding algorithm, the solution offered
by Osorio et al. and Gu et al. [33,35] was surpassed by only one platform
(Table 8.1), which indicates that hardware-based solutions (FPGAs) still have some
advantage over software-based implementations. However, the second half of the
table, shown in Table 8.1, which is composed of approaches based on FPGAs, with
the exception of Altera Stratix, all the others had lower performance than the
technique with MPPA.
In the case of binary image processing, the method was compared with a single
work based on FPGA (Table 8.2). The technique proposed by Osorio et al. [33] had
a better performance than the competitor, after normalization based on clock speed,
since the metric was the count of machine cycles required for each processing.
FIGURE 8.7 Multicore system: (a) homogeneous and (b) heterogeneous [ 15].
152 VLSI and Hardware Implementations
treatment of images. There are several types of neural networks applied to image
processing, but two of these networks stand out for the vast amount of applications
in which they are used: convolutional neural networks (CNNs) and generative
adversarial networks (GANs) [15,35].
High-Performance Computing Techniques 153
TABLE 8.1
JPEG Encoding Results [ 30]
Platform Speed (MHz) MPixels/se
[ 16] 300 31
BlackFin [ 18] 750 2
TI C5410 [ 18] 160 0.4
Xilinx Virtex II-Pro 20 [ 18] 50 2.6
Xilinx Spartan 3 200 [ 19] 49 0.75
Altera Flex 10KE [ 20] 40 13.2
Altera Stratix II [ 20] 161 54
TABLE 8.2
Results for Binary Image Processing [ 33]
[ 16] [ 21] Normalized [ 21]
The architecture of this elaborated system has two communicating networks, the
generator network and the discriminator network (Figure 8.10). The generator takes
samples of the input images (training data) and synthesizes new images, randomly
changing their characteristics, adding noise. During training, the discriminator re
ceives a random sampling of real and false images. The discriminator must then
determine which images are real and which are false [42,43].
The learning process takes place based on the discriminator's output, which is
validated with the data provided, and used both to adjust (optimize) the generator
and the discriminator itself. The potential for application of these networks is huge,
ranging from the generation of a face of a ”virtual” model, to be used in advertising,
to the identification of anomalies in products, inside a factory [42–44].
clock cycle, calculations running in parallel and delivering the result at the same
clock pulse, given that this context is something completely impossible for software
to perform [52].
Exemplifying FPGA applications, it is possible to stand out in the electric sector,
for digital signal processing in real time; multimedia sector, for image processing in
real time and high performance; telecom sector, in high-performance switches and
routers; since FPGA chips are used for numerous applications, ranging from video
games to areas such as aerospace, prototyping, HPC, medical, among others. It is
possible to find the use of FPGA in applications for audio in circuits of DACs
(digital to analog converter) where several aspects are modified to achieve a better
reproduction of the sound, or even by alternative consoles using FPGA chips to
reproduce via hardware the same quality as the original versions of video games, or
even FPGA chips still allow the player the flexibility to play games from different
consoles using a single device [52].
Still emphasizing that modern technology companies include FPGAs in data
centers in order to speed up search engines, applying machine learning algorithms,
such as support vector machines (SVMs). Highlighting the main objective in the
parallel implementation of FPGAs both in the feedforward phase of a SVM and in
its training phase [53].
In this context, it is possible to employ machine learning algorithms written in
FPGAs, making them quite efficient and easily reprogrammable, evaluating that this
specialization specifically converts parallel computing, synthesizing deep learning
processor unit (DPU) or deep neural network (DNN) processing units in FPGAs [54].
In this sense, FPGAs applied to different types of machine learning models bring
flexibility, making it easier to accelerate applications based on the most ideal nu
merical precision and the memory model being used. Considering the possibility of
parallelizing the pre-training of DNNs in FPGAs to scale a given service hor
izontally. Still considering that DNNs can be pre-trained, as a deep resource
transformer for transferring learning, or adjusted with updated weights [52,54].
8.4.2 SVM
SVMs represent an machine learning technique for modeling classifiers and re
gressors, seeking to minimize the empirical risk simultaneously with the ability to
generalize. This is widely used mainly due to its mathematical properties that in
clude good generalization capacity and robustness. SVM training models obtain
good accuracy and low complexity, making it necessary to define the kernel and its
parameters, since the kernel and training model parameters together are called
hyperparameters of the SVM parameter selection problem [55,56].
SVM is a supervised learning algorithm whose objective is to classify a set of
data points mapped to a multidimensional feature space (characteristics) employing
a kernel function. In this supervised learning algorithm, the decision limit in the
input space is represented by a hyperplane in a higher dimension in space. This, in
general, is obtained from a finite subset of known data and their respective labels
(training set), seeking to classify data that are not part of the training set [55,56].
High-Performance Computing Techniques 157
SVMs realize the separation of a set of objects with distinct classes, that is, they
utilize the logic of decision plans to have a wide range of applications in different
areas, due to their application advantages related to good generalization perfor
mance, mathematical treatability, geometric interpretation, and use for the ex
ploration of unlabeled data [56].
Emphasizing that the efficiency of a classifying function is measured by both
complexity and generalizability. The generalization capacity is the function's ability
to correctly classify data that does not belong to the training set. The complexity
refers to the number of elements that are necessary to compose the function, such
as, for example, the points of the SVMs training set, the centers of the radial basis
function networks, or even relating the neurons in the layer hidden in the RNAs
(ANN) [57].
In this sense, combining the importance of using FPGAs as computational ac
celerators in recent years, it is possible to specifically accelerate algorithms in
search engines (machine learning algorithms) through SVM, through the efficient
use of FPGA resources. Considering the parallel hardware implementation, al
lowing a larger number of kernels to be implemented on a chip of the same size,
accelerating the training, both from the feedforward (inference) phase implemented
using the SVM polynomial kernel and its training phase, possible to obtain the
maximum possible acceleration at the cost of greater use of the available area as
sociated with the use of the FPGA area [58].
Still reflecting that through FPGAs it is possible to meet the demand for per
formance of artificial intelligence and even Big Data, i.e., a high volume of pro
cessing in the face of a huge volume of data. FPGAs still tend to enhance processing
velocity and decrease hardware costs by implementing a massive volume of pro
cesses concurrently and directing the flow of that data, removing cost and usability
barriers, employing implementation accessibility for any, and all projects that re
quire processing low latency of vast volume of data. Since FPGAs are program
mable just like CPUs or even GPUs, but it aims for parallel, low latency, and high-
speed issues, such as inferences and DNN [52,59].
8.5 CONCLUSIONS
Still emphasizing the FPGA technologies relating to machine learning and tech
nologies derived from artificial intelligence highlighting benefits such as speed,
since this performs at a remarkable clock speed in relation to actual CPUs. FPGAs
are simultaneous, i.e., instead of executing sequential instruction streams, it oper
ates in an optimal data stream between these parallel operations, following in an
enhance in performance, running applications n times faster on the same code
compared to CPUs/GPUs traditional.
Still reflecting that FPGAs technology is capable of containing millions of CLB
that can be employed to execute numerous actions at the same time, providing a
predominance of parallelism and competition, taking advantage of parallel architecture
solving problems in well-structured and independent processes that can be carried out
concurrently. As in the case of an image taken into processing non-simultaneously, a
certain part would process the entire image pixel by pixel. However, when this digital
158 VLSI and Hardware Implementations
image is processed simultaneously, it is divided into pieces processed at the same time
by distinct parts and then reassembled together. This feature renders the process more
complex, but much faster, considering that the data received must be optimally divided
and efficiently distributed to the parties, after which the collected data are processed
and reassembled, usually without blocking the work pipeline.
Parallel computing has an important meaning in various image processing
techniques, such as image segmentation, edge detection, noise removal, histogram
equalization, image registration, resource extraction, and distinct optimization
techniques, among others.
The present study provided a brief introduction of high-performance processing
techniques for DIP, with their respective resources and limitations, presenting case
studies in the areas of medical images, time series analysis in satellite images,
hardware architecture. However, the applications for these techniques are not limited
to these cases.
Each HPC technique is best applied to different image processing scenarios. In
addition, these techniques can be combined and overlapped, creating more possi
bilities for solutions.
It is a complex task to try to label and categorize these techniques, but an attempt
in this direction, with the purpose of organizing knowledge, can be useful as a
reference for future applications.
NOTES
1 Gray-Level Co-Occurrence Matrix: tabulation of occurrences of combined gray levels.
Used in texture analysis
2 Moderate Resolution Imaging Spectroradiometer data
3 It was a company of parallel processors that developed the Am2045, used primarily in high
performance embedded systems
REFERENCES
[1] Monteiro, A. C. B., França, R. P., Estrela, V. V., Razmjooy, N., Iano, Y., & Negrete, P.
D. M. (2020). Metaheuristics applied to blood image analysis. In Metaheuristics and
optimization in computer and electrical engineering (pp. 117–135). Springer, Cham.
[2] Monteiro, A. C. B., Iano, Y., França, R. P., & Arthur, R. (2020). Development of a
laboratory medical algorithm for simultaneous detection and counting of ery
throcytes and leukocytes in digital images of a blood smear. In Deep learning
techniques for biomedical and health informatics (pp. 165–186). Academic Press,
Cambridge, Massachusetts, United States.
[3] Rosenfeld, A. (1969). Picture processing by computer. ACM Computing Surveys
(CSUR), 1(3), 147–176.
[4] Yang, Z., Zhu, Y., & Pu, Y. (2008, December). Parallel image processing based on
CUDA. In 2008 International Conference on Computer Science and Software
Engineering (Vol. 3, pp. 198–201). IEEE.
[5] Meilander, W. C. (1977, January). The evolution of parallel processor architecture
for image processing. In COMPCON'77 (pp. 52–53). IEEE Computer Society.
[6] Choudhary, A., & Ranka, S. (1992). Guest editor's introduction: parallel processing
for computer vision and image understanding. Computer, 25(2), 7–10.
High-Performance Computing Techniques 159
[25] Kalaiselvi, T., Sriramakrishnan, P., & Somasundaram, K. (2017). Survey of using
GPU CUDA programming model in medical image analysis. Informatics in
Medicine Unlocked, 9, 133–144.
[26] Tsai, H. Y., Zhang, H., Hung, C. L., & Min, G. (2017). GPU-accelerated features
extraction from magnetic resonance images. IEEE Access, 5, 22634–22646.
[27] Xing, Z., & Jia, H. (2019). Multilevel color image segmentation based on GLCM
and improved salp swarm algorithm. IEEE Access, 7, 37672–37690.
[28] Siahaan, R., Pardede, C., & Gurning, W. P. (2020). Another Parallelism Technique
of GLCM Implementation with CUDA Programming. In 2020 4th International
Conference on Advances in Image Processing (143–151).
[29] Saxena, S., Sharma, S., & Sharma, N. (2017). Study of parallel image processing
with the implementation of vHGW algorithm using CUDA on NVIDIA'S GPU
framework. In World Congress on Engineering (Vol. 1).
[30] Liu, J., Xue, Y., Ren, K., Song, J., Windmill, C., & Merritt, P. (2019). High-
performance time-series quantitative retrieval from satellite images on a GPU
cluster. IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing, 12(8), 2810–2821.
[31] Jain, P. N., & Surve, S. K. (2020). A review on shared resource contention in
multicores and its mitigating techniques. International Journal of High Performance
Systems Architecture, 9(1), 20–48.
[32] Li, Y., & Zhang, Z. (2018, July). Parallel computing: review and perspective. In
2018 5th International Conference on Information Science and Control Engineering
(ICISCE) (pp. 365–369). IEEE.
[33] Osorio, R. R., Diaz-Resco, C., & Bruguera, J. D. (2009, August). High-performance
image processing on a massively parallel processor array. In 2009 12th Euromicro
Conference on Digital System Design, Architectures, Methods, and Tools
(pp. 233–236). IEEE.
[34] Imaging Boards and Software. (2008). Processor targets medical, video applica
tions. Available https://www.vision-systems.com/boards-software/article/167392
71/processor-targets-medical-video-applications.
[35] Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., … & Chen, T. (2018).
Recent advances in convolutional neural networks. Pattern Recognition, 77, 354–377.
[36] Fukushima, K., & Miyake, S. (1982). Neocognitron: A self-organizing neural net
work model for a mechanism of visual pattern recognition. In Competition and
cooperation in neural nets (pp. 267–285). Springer, Berlin, Heidelberg.
[37] Saha, S. (2018). A comprehensive guide to convolutional neural networks — the
ELI5 way. Available https://towardsdatascience.com/a-comprehensive-guide-to-
convolutional-neural-networks-the-eli5-way-3bd2b1164a53.
[38] França, R. P., Peluso, M., Monteiro, A. C. B., Iano, Y., Arthur, R., & Estrela, V. V.
(2018, October). Development of a Kernel: A deeper look at the architecture of an
operating system. In Brazilian technology symposium (pp. 103–114). Springer, Cham.
[39] Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding con
volutional networks. In European Conference on Computer Vision (pp. 818–833).
Springer, Cham.
[40] Shan, K., Guo, J., You, W., Lu, D., & Bie, R. (2017, June). Automatic facial ex
pression recognition based on a deep convolutional-neural-network structure. In
2017 IEEE 15th International Conference on Software Engineering Research,
Management and Applications (SERA) (pp. 123–128). IEEE.
[41] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
… & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural in
formation processing systems (pp. 2672–2680). NIPS, Montreal, Canada.
High-Performance Computing Techniques 161
[42] Yi, X., Walia, E., & Babyn, P. (2019). Generative adversarial network in medical
imaging: A review. Medical Image Analysis, 58, 101552.
[43] Nazeri, K., Ng, E., & Ebrahimi, M. (2018, July). Image colorization using gen
erative adversarial networks. In International conference on articulated motion and
deformable objects (pp. 85–94). Springer, Cham.
[44] Hakobyan, H. (2018). How GANs can turn AI into a massive force. Available
https://www.techinasia.com/talk/gan-turn-ai-into-massive-force.
[45] Barnell, M., Raymond, C., Capraro, C., Isereau, D., Cicotta, C., & Stokes, N. (2018,
September). High-performance computing (HPC) and machine learning demon
strated in flight using Agile Condor®. In 2018 IEEE High Performance Extreme
Computing Conference (HPEC) (pp. 1–4). IEEE.
[46] Lynn, T., Liang, X., Gourinovitch, A., Morrison, J. P., Fox, G., & Rosati, P. (2018,
January). Understanding the determinants of cloud computing adoption for high
performance computing. In 51st Hawaii International Conference on System
Sciences (HICSS-51) (pp. 3894–3903). University of Hawai'i at Manoa, Hawai'i.
[47] Amano, H. (Ed.). (2018). Principles and structures of FPGAs. Springer, Heidelberg,
Germany.
[48] Dekoulis, G. (Ed.). (2017). Field: Programmable gate array. BoD–Books on
Demand, London, UK.
[49] Dekoulis, G. (2020). Field programmable gate arrays (FPGAs) II. InTechOpen,
London, UK.
[50] Kumar, T. N., Almurib, H. A., & Lombardi, F. (2014, November). A novel design
of a memristor-based look-up table (LUT) for FPGA. In 2014 IEEE Asia Pacific
Conference on Circuits and Systems (APCCAS) (pp. 703–706). IEEE.
[51] Shokrolah-Shirazi, M., & Miremadi, S. G. (2008, July). FPGA-based fault injection
into synthesizable verilog HDL models. In 2008 Second International Conference
on Secure System Integration and Reliability Improvement (pp. 143–149). IEEE.
[52] Romoth, J., Porrmann, M., & Rückert, U. (2017). Survey of FPGA applications in
the period 2000–2015. Technical Report.
[53] Lopes, F. F., Ferreira, J. C., & Fernandes, M. A. (2019). Parallel implementation on
FPGA of support vector machines using stochastic gradient descent. Electronics,
8(6), 631.
[54] Duarte, J., Harris, P., Hauck, S., Holzman, B., Hsu, S. C., Jindariani, S., … &
Lončar, V. (2019). FPGA-accelerated machine learning inference as a service for
particle physics computing. Computing and Software for Big Science, 3(1), 13.
[55] Müller, K. R. Supervised Machine Learning: Learning SVMs and Deep
Learning. http://helper.ipam.ucla.edu/publications/mpstut/mpstut_13971.pdfIf.
[56] Cöltekin, C., & Rama, T. (2016, December). Discriminating similar languages with
linear SVMs and neural networks. In Third Workshop on NLP for Similar
Languages, Varieties and Dialects (VarDial3) (pp. 15–24).
[57] Pouriyeh, S., Vahid, S., Sannino, G., De Pietro, G., Arabnia, H., & Gutierrez, J.
(2017, July). A comprehensive investigation and comparison of machine learning
techniques in the domain of heart disease. In 2017 IEEE Symposium on Computers
and Communications (ISCC) (pp. 204–207). IEEE.
[58] Padierna, L. C., Carpio, M., Rojas-Domínguez, A., Puga, H., & Fraire, H. (2018). A
novel formulation of orthogonal polynomial kernel functions for SVM classifiers:
The Gegenbauer family. Pattern Recognition, 84, 211–225.
[59] Huanrui, H. (2016). New mixed kernel functions of SVM used in pattern re
cognition. Cybernetics and Information Technologies, 16(5), 5–14.
9 Machine Learning
Algorithms for
Semiconductor
Device Modeling
Yogendra Gupta1, Niketa Sharma1,
Ashish Sharma2, and Harish Sharma3
1
Swami Keshwanand Institute of Technology Jaipur, Jaipur,
Rajasthan, India
2
Indian Institute of Information Technology Kota, Kota,
Rajasthan, India
3
Rajasthan Technical University Kota, Kota, Rajasthan, India
CONTENTS
9.1 Introduction................................................................................................... 163
9.2 Semiconductor Device Modeling................................................................. 165
9.3 Related Work................................................................................................167
9.4 Challenges..................................................................................................... 167
9.5 Machine Learning Fundamentals................................................................. 168
9.5.1 Supervised Machine Learning Algorithms......................................168
9.5.2 Unsupervised Machine Learning Algorithms.................................. 169
9.5.3 Deep Learning Algorithms...............................................................169
9.6 Case Study: Thermal Modeling of the GaN HEMT Device......................169
9.6.1 Experimental Setup .......................................................................... 171
9.6.2 Results............................................................................................... 173
9.7 Conclusion ....................................................................................................174
Acknowledgments.................................................................................................. 176
References..............................................................................................................176
9.1 INTRODUCTION
This chapter describes machine learning algorithms from the viewpoint of semi
conductor device modeling. Machine learning has become one of the most favorable
modeling techniques for device, circuit, and system-level modeling. Electrical char
acteristics of the device such as I-V and C-V characteristics represent an input-output
relationship, also temperature, process variation, and power variations can be modeled
by machine learning algorithms. Semiconductor device modeling is an area of extensive
research. Technology computer aided design (TCAD) models are used to design and
analyze semiconductor devices. They are also used as a first step in compact model
development. Compact models are used for circuit design and functional verification.
Accuracy of these models directly translates to design accuracy, high yield, and more
profit. However, development of TCAD and compact models is a huge undertaking. It
requires concrete understanding of underlying physics. It also requires a lot of simulation
resources (computation time and memory), making it very complex and expensive.
Additionally, models developed for one device cannot be ported, i.e. used for another
device, if governing physical principles of two devices are different. This chapter is
motivated by the need of acceleration of model development for semiconductor de
vices. Research efforts have been directed in semiconductor device modeling and
machine learning fields separately; however, the use of machine learning in device
modeling is yet to be explored by the research community. In this chapter, we de
scribe various machine learning algorithms for semiconductor device modeling,
challenges, and trade-offs associated with them. As a case study, we have proposed
electrothermal modeling of GaN-based high electron mobility transistor (HEMT)
devices. A data-driven approach has been implemented for a temperature range
varying from 300 K to 600 K, based on one of the core methods of machine learning
techniques, i.e. decision trees (DTs). The performance of the proposed models was
validated through the simulated test examples. The attained outcomes show that the
developed models predict the HEMT device characteristics accurately depending on
the determined mean squared error between the actual and anticipated characteristics.
This chapter studies machine learning algorithms from the perspective of
semiconductor device modeling and manufacturing. It aims to discuss and review
major machine learning algorithms for semiconductor device modeling and man
ufacturing. The hypothesis behind this study is that semiconductor device modeling
problems can be treated like data analysis and mining problems. We can model
electrical characteristics of the device such as I-V, and C-V characteristics represent
an input-output relationship by mathematical functions [1]. This study is based on
the hypothesis that supervised machine learning algorithms can be used for de
velopment of such complex model equations.
Machine learning has also become popular for applications in semiconductor man
ufacturing, such as the etch anomaly analysis [2], lithographic hotspot detection [3], and
optical proximity correction [4]. Today’s analysis practices heavily rely on ex
tensive device and material characterizations. An machine-learning–assisted var
iation analysis based on device electrical characteristics is highly desired to allow
for more efficient material and device experimentation. Semiconductor manu
facturing is among the most complex processes ever devised. It is also among the
most data-rich, with extensive records of just about anything and everything that
can be measured or observed. Much of this data goes unused, at least in part be
cause of its huge volume. Unused data is sometimes called “dark” data. Artificial
intelligence’s (AI’s) ability to mine these troves of dark data for relevant re
lationships is one area of great promise.
Semiconductor device modeling is an area of extensive research. TCAD models
are used to design and analyze semiconductor devices. They are also used as a first
step in compact model development. Compact models are used for circuit design
Machine Learning Algorithms 165
pulses, indicating that the proposed model can include both the thermal and dispersion
effect in it precisely and effectively.
9.4 CHALLENGES
To explain common challenges faced during the process of semiconductor device
modeling, the widely used HEMT technology is used as an example. GaN semi
conductor devices have momentous properties contrasted with its silicon (Si) and
silicon carbide (SiC) equivalents and have the possibility of replacing the ubiqui
tous Si power devices. The wide-bandgap semiconductor-based sensors have a
number of advantages over other sensor technologies, such as operating at high
temperatures, with high chemical stability, or under ionizing radiation. In any case,
there are significant technical obstructions preventing their dynamic utilization by
the biosensor community. This is performing as a limitation to the commerciali
zation of GaN-based biosensors [11,12].
Considering the biosensing functioning of variety of GaN devices (which are based
on different structures, design, and packaging) is the primary step in acquiring an
indulgence into the operation of these devices. Nevertheless, since the modeling of
these novel devices depends on the conventional semiconductor device physics
methodology, there are practical difficulties for the sensor community to cop-up with
them. Due to the complexity of the device structure, time, and the analytical proce
dures involved, the existing models are not suitable for validating all applications and
cannot serve as a unified model for all existing GaN biosensing devices [13–17]. To
address this issue, GaN simulation models, which are an accurate replica of the actual
device, were designed using machine learning techniques.
Additional TCAD simulations are required to capture these effects with accu
racy. As stated in the previous section, integrated circuit design and verification
relies heavily on availability of accurate device models. To capture higher order
effects, scaling of devices requires a complex model to accommodate as many
scaling-imposed effects as possible. This requires fundamental understanding of
device physics and the origin of such effects. This reinstates the requirement of
physics-based modeling, due to its high accuracy compared to other methods. Similar
endeavors are required to model a new semiconductor device, since the physical base
of the new device may be different from existing devices. It may be possible to use
some of the existing intellectual property such as model source codes, test benches,
etc. However, it still takes a significant amount of time to make new models available
to the designer community (Figure 9.2).
layer, a 2.7 µm GaN buffer, and an AlN layer with sapphire as a substrate. The gate
length (LG), source-to-gate length (LSG), and gate-to-drain length (LGD) of the
device are 1 µm, 1 µm, and 2 µm, respectively.
The data extracted from the device simulations using the TCAD tool were uti
lized to train the machine learning techniques to develop the model for the GaN
HEMT device (Figure 9.4).
The machine learning technique used to model the device characteristics is the
DT algorithm. It is a widely used algorithm for inductive inference by which it
approximates the target function [45]. A DT is a flow-chart-like arrangement. Here,
every single node signifies a test on a variable. All branches denote a test outcome,
and the cost of the class variables are located at the leaves of the tree [46]. This
algorithm uses training examples (based on historical data) to construct a
Machine Learning Algorithms 173
classification model, which describes the connection between classes and attributes.
Once it has learned, the regression model can order new, obscure cases. The benefit
of the DT models is that they are robust to noisy data and efficient in learning
disjunctive expressions [47]. In this work, an exceptionally efficient and extensively
utilized classification-based algorithm named C4.5 [48–50] is employed. C4.5
employs a top-down, greedy construction of a DT. This technique utilizes a top-
down, greedy approach to the DT. This approach starts with inspecting which
characteristic (input variable) ought to be tried at the root of the tree. For all in
stances, the feature is assessed via a statistical assay (information gain) to decide
performance level for classification of the training examples. The best attribute is
chosen and employed as a test at the root node of the tree. Afterward, a relative of
the root hub is formerly made for every viable estimate of this characteristic
(discrete or continuous), and the training models are arranged to the fitting relative
nodes. The whole cycle is then continued utilizing the training models related to
every relative node to choose the best ascribe to test by then in the tree. The de
veloped DT could be employed to group the specific data element. It could be done
with beginning the root and progressing across the leaf node, demonstrating how a
class came across. Each nonterminal node represents a test or decision to be carried
out on a single attribute value (i.e. input variable value) of the considered data item,
with one branch and subtree for each possible outcome of the test. All the nodes
indicate a test that is executed on a specific parameter. These tests could be exe
cuted with the chosen data element besides branches for every viable outcome of
the test. A decision was taken while reaching a terminal node. At each non-leaf
decision node, the attribute (i.e. input variable) specified by the node is tested,
which leads to the root of the subtree corresponding to the test’s outcome. By
every nonleafy decision node, the characteristic is determined by the node being
examined. This progress prompts the root of the subtree relating to the test’s
result. After putting the attributes by superior information gain nearest to the root,
the algorithm prefers choosing shorter trees over longer ones. The algorithm can
likewise retrace to reexamine prior decisions by utilizing a pruning strategy called
rule post-pruning [1,49,50]. The “rule post-pruning” technique is used to conquer
the overfitting issue, which regularly emerges in learning errands.
9.6.2 RESULTS
We have done the simulation of I-V characteristics for GaN-based HEMT devices
for temperature range from 300 K to 600 K using silvaco-ATLAS. The machine
learning models were trained on the dataset with the different gate to source voltage
(Vgs) and temperature oscillating from 300 K to 600 K. The temperature values
higher than 300 K were used to test the model’s capability to extrapolate outside the
input range. The hyperparameters optimization of the DT function gives three
different optimized parameters, the MinParentSize = 1, the MaxNumSplits = 2000,
and the NumVariablesToSample = 3. The output curves related to the GaN HEMT
device in a temperature range varyings from 300 K and 600 K are presented in
Figure 9.5. The applied voltage between gate to the source is 2 V. The data ex
tracted from the DT model are compared with simulated outcomes and a mean
174 VLSI and Hardware Implementations
FIGURE 9.5 Comparison of drain current at different Temp (300 K to 600 K) at Vgs = +2V.
square error between these two was calculated as 5.32 × 10-8. The temperature
dependence of the maximum current and transconductance of different devices was
further studied. The transconductance (gm) versus gate-source voltage (Vds) for
drain bias Vds = +10 V of the simulated AlGaN/GaN HEMT is shown in Figure 9.6.
We observed a reduction in Ids (≈52%) with an increase in temperature from 300 K
to 600 K. The gate bias was ramped from –6V to 2 V for each of the drain biases. The
peak current density was reached of Vgs = +2 V, Vds = +10 V @ 300 K, with a current
density of 0.84 A/mm.
The high current density could be attributed to the extremely high charge that
was accumulated in the channel because of the polarization effects and high sa
turation peak velocity of electrons in GaN.
The output drain current becomes degrading with the temperature hike from 300 K
to 600 K. To comprehend the explanation, we have tested for different parameters, for
example, electron concentration and mobility. Figure 9.7 describes the behavior of
electric concentration versus depth (µm) at different temperatures. Peak electron
concentration along the channel was about 1.3 × 1019/cm3 @ 300 K corresponded to
the highest Ids = 0.84 A/mm when the device was turned on. Such an effect could be
explained, looking at the differences between the electron mobility values at different
temperatures, as displayed in Figure 9.8. The electron mobility in the channel was
1181 cm2/Vs @ 300 K. When the temperature increases, electron mobility degrades
along the channel due to phonon and impurity scattering. The predicted Ids values at
300–600 K are close to the actual simulations. These approximations show the mean
square error of 2.17 × 10−6. The predicted characteristics utilizing the DT algorithm
almost overlap alongside the simulation’s ones.
Machine Learning Algorithms 175
9.7 CONCLUSION
This chapter studies machine learning algorithms from the perspective of semi
conductor device modeling. It aims to discuss and review major machine learning
algorithms for semiconductor device modeling. We have done the measurements of
I-V characteristics for GaN-based HEMT devices for temperature range from 300 K
to 600 K. The proposed data modeling method for the I-V characteristics is the DT
machine learning algorithm. The inputs to this model are Vgs, Vds, and temperature.
The predicted Ids values at 300–600 K are close to the actual simulations. These
approximations show the mean square error of 2.17 × 10-6. The predicted char
acteristics utilizing the DT algorithm almost overlap alongside the simulation’s
ones. Our next future work will commence increment in the number of inputs with
including transistor size as an input parameter. We have also planned to increase the
temperature range by training the model over larger temperature range, beyond 600
K to improve model performance.
ACKNOWLEDGMENTS
The authors acknowledge the fund provided by TEQIP III- RTU (ATU) CRS
project scheme under the sanction no. “TEQUIP-III/RTU (ATU)/CRS/2019-20/19”.
REFERENCES
[1] Gore, Chinmay Chaitanya. Study of machine learning algorithms for their use in
semiconductor device model development. Master’s thesis, 2015.
Machine Learning Algorithms 177
[2] Susto, Gian Antonio, Terzi, Matteo, and Beghi, Alessandro. Anomaly detection
approaches for semiconductor manufacturing. Procedia Manufacturing 11,
2018–2024 (2017).
[3] Ding, Duo, Wu, Xiang, Ghosh, Joydeep, and Pan, David Z. Machine learning based
lithographic hotspot detection with critical-feature extraction and classification. In
2009 IEEE International Conference on IC Design and Technology, pp. 219–222.
IEEE, 2009.
[4] Luo, Rui. Optical proximity correction using a multilayer perceptron neural net
work. Journal of Optics 15, (7), 075708 (2013).
[5] Litovski, V. B., Radjenovic, J. I., Mrcarica, Z. M., and Milenkovic, S. L. MOS tran
sistor modelling using neural network. Electronics Letters 28, (18), 1766–1768 (1992).
[6] Murphy, Kevin P. Machine learning: A probabilistic perspective. Cambridge,
Massachusetts, United States: MIT Press, 2012.
[7] Tsividis, Yannis, and McAndrew, Colin. Operation and modeling of the MOS
transistor. Oxford, England: Oxford University Press, 2011.
[8] Khusro, Ahmad, Hashmi, Mohammad S., and Ansari, Abdul Quaiyum. Exploring
support vector regression for modeling of GaN HEMT. In 2018 IEEE MTT-S
International Microwave and RF Conference (IMaRC), pp. 1–3. IEEE, 2018.
[9] Hari, Nikita, Chatterjee, Soham, and Iyer, Archana. Gallium nitride power device
modeling using deep feed forward neural networks. In 2018 1st Workshop on
Wide Bandgap Power Devices and Applications in Asia (WiPDA Asia).
IEEE, 2018.
[10] Cai, Jialin, Yu, Chao, Sun, Lingling, Chen, Shichang, Su, Guodong, Liu, Jun, and Su,
Jiangtao. Machine learning based pulsed IV behavioral model for GaN HEMTs. In
2019 IEEE MTT-S International Wireless Symposium (IWS), pp. 1–3. IEEE, 2019.
[11] Ambacher, O., Foutz, B., Smart, J., Shealy, J. R., Weimann, N. G., Chu, K.,
Murphy, M., Sierakowski, A. J., Schaff, W. J., and Eastman, L. F.: Two-
dimensional electron gases induced by spontaneous and piezoelectric polarization in
undoped and doped AlGaN/GaN heterostructures. Journal of Applied Physics 87,
334–344 (2000).
[12] Ren, F., and Pearton, S. J. Semiconductor device-based sensors for gas, chemical,
and biomedical applications. Boca Raton, FL, USA: CRC Press, 2011.
[13] Sharma, N., Joshi, D., and Chaturvedi, N. An impact of bias and structure dependent
LSD variation on the performance of GaN HEMTs based biosensor. Journal of
Computational Electronics 13, (2), 503–508 (2014).
[14] Sharma, N., Mishra, S., Singh, K., Chaturvedi, N., Chauhan, A., Periasamy, C.,
Kharbanda, D. K., Prajapat, P., Khanna, P. K., and Chaturvedi, N. High resolution
AlGaN/GaN HEMT based electrochemical sensor for biomedical applications.
IEEE Transactions on Electron Devices 66, 1–7 (2019).
[15] Lalinský, T. et al. AlGaN/GaN based SAW-HEMT structures for chemical gas
sensors. Procedia Engineering 5, 152–155 (2010).
[16] Sharma, N., Dhakad, S. K., Periasamy, C., and Chaturvedi, N. Refined isolation
techniques for GaN-based high electron mobility transistors. Materials Science in
Semiconductor Processing 87, 195–201 (2018).
[17] Dhakad, S. K., Sharma, N., Periasamy, C., and Chaturvedi, N. Optimization of
ohmic contacts on thick and thin AlGaN/GaN HEMTs structures. Superlattices
Microstructures 111, 922–926 (2017).
[18] Osvald, J. Polarization effects and energy band diagram in AlGaN/GaN hetero
structure. Applied Physics A—Materials Science and Processing 87, 679 (2007).
[19] Gelmont, B., Kim, K. S., and Shur, M. Monte Carlo simulation of electron transport
in gal-lium nitride. Journal of Applied Physics 74, 1818–1821 (1993).
178 VLSI and Hardware Implementations
[20] Pearton, S. J., Zolper, J. C., Shul, R. J., and Ren, F. GaN: Processing, defects, and
devices. Journal of Applied Physics 86, 1–78 (1999).
[21] Levinshtein, M., Rumyantsev, S., and Shur, M. Properties of advanced semi
conductor materials. New York: Wiley, 2001.
[22] Vitanov, S., Palankovski, V., Maroldt, S., and Quay, R. High-temperature modeling
of Al-GaN/GaN HEMTs. Solid-State Electronics 54, 1105–1112 (2010).
[23] Luther, B. P., Wolter, S. D., and Mohney, S. E. High temperature Pt Schottky diode
gas sen-sors on n-type GaN. Sensors and Actuators B 56, 164–168 (1999).
[24] Ryger, I., Vanko, G., Kunzo, Lalinsky, P. T., Vallo, M., Plecenik, A., Satrapinsky,
L., and Plecenik, T. AlGaN/GaN HEMT based hydrogen sensors with gate ab
sorption layers formed by high temperature oxidation. Procedia Engineering 47,
518–521 (2012).
[25] Lalinsky, T., Ryger, I., Vanko, G., Tomaska, M., Kostic, I., Hascik, S., and Valloa,
M. Al-GaN/GaN based SAW-HEMT structures for chemical gas sensors. Procedia
Engineering 5, 152–155 (2010).
[26] Albrecht, J. D., Wang, R. P., and Ruden, P. P. Electron transport characteristics of
GaN for high temperature device modelling. Journal of Applied Physics 83,
4777–4781 (1998).
[27] Cordier, Y., Hugues, M., Lorenzini, P., Semond, F., Natali, F., and Massies, J.
Electron mobility and transfer characteristics in AlGaN/GaN HEMTs. Physica
Status Solidi (c) 2, 2720–2723 (2005).
[28] Turin, V. O., and Balandin, A. A. Electrothermal simulation of the self-heating
effects in GaN-based field-effect transistors. Journal of Applied Physics 100,
054501–054508 (2006).
[29] Islam, S. K., and Huq, H. F. Improved temperature model of AlGaN/GaN HEMT and
de-vice characteristics at variant temperature. International Journal of Electronics 94,
1099–1108 (2007).
[30] Sharma, Niketa, Periasamy, C., Chaturvedi, N., and Chaturvedi, N. Trapping effects
on leakage and current collapse in AlGaN/GaN HEMTs. Journal of Electronic
Materials, 60, 1–6 (2020).
[31] Sharma, N., and Chaturvedi, N. Design approach of traps affected source–gate re
gions in GaN HEMTs. IETE Technical Review 33,(1), 34–39 (2016).
[32] Galup-Montoro, C. MOSFET modeling for circuit analysis and design. Singapore:
World Scientific, 2007.
[33] Deng, W., Huang, J., Ma, X., and Liou, J. J. An explicit surface potential calculation
and compact current model for AlGaN/GaN HEMTs. IEEE Electron Device Letters
36, (2), 108–110 (2015).
[34] Sharma, N., Joshi, D., and Chaturvedi, N. An impact of bias and structure dependent
Lsd variation on the performance of GaN HEMTs based biosensor. Journal of
Computational Electronics, 13,(2), 503–508 (2014).
[35] Oishi, T., Otsuka, H., Yamanaka, K., Inoue, A., Hirano, Y., and Angelov, I. Semi-
physical nonlinear model for HEMTs with simple equations. In Integrated Nonlinear
Microwave and Millimeter- Wave Circuits (INMMIC), pp. 20–23. IEEE, 2010.
[36] Sang, L., and Schutt-Aine, J. An improved nonlinear current model for GaN HEMT
high power amplifier with large gate periphery. Journal of Electromagnetic Waves
and Applications. 26, (2–3), 284–293 (2012).
[37] Linsheng, L. An improved nonlinear model of HEMTs with independent transcon
ductance tail-off fitting. Journal of Semiconductors 32, (2), 024004–024006 (2011).
[38] Gunn, S. R. Support vector machines for classification and regression. ISIS
Technical Report 14, 85–86 (1998).
Machine Learning Algorithms 179
[39] Huque, M., Eliza, S., Rahman, T., Huq, H., and Islam, S.: Temperature dependent
analytical model for current–voltage characteristics of AlGaN/GaN power HEMT.
Solid-State Electronics 53, (3), 341–348 (2009).
[40] Chang, Y., Tong, K., and Surya, C. Numerical simulation of current–voltage
characteristics of AlGaN/GaN HEMTs at high temperatures. Semiconductor Science
and Technology 20, (2), 188–192 (2005).
[41] Breiman, L. Statistical modeling: The two cultures. Statistical Science 16, (3),
199–231 (2001).
[42] Marinković, Z. et al. Neural approach for temperature-dependent modeling of GaN
HEMTs. International Journal of Numerical Modelling: Electronic Networks,
Devices and Fields 28, (4), 359–370 (2015).
[43] Neudeck, P. G., Okojie, R. S., and Chen, L.-Y. High temperature electronics-a role for
wide bandgap semiconductors. Proceedings of the IEEE 90, (6), 1065–1076 (2002).
[44] Braha, D., and Shmilovici, A. On the use of decision tree induction for discovery of
interactions in a photolithographic process. IEEE Transactions on Semiconductor
Manufacturing 16, (4), 644–652 (2003).
[45] Quinlan, J. R. Induction of decision trees. Machine Learning 1, (1), 81–106 (1986).
[46] Witten, I. H., Frank, E., Hall, M. A., and Pal, C. J. Data mining: Practical machine
learning tools and techniques. Burlington, Massachusetts, United States: Morgan
Kaufmann, 2016.
[47] Mitchell, T. M. Machine learning. Burr Ridge, IL: McGraw Hill, 1997.
[48] Braha, D. (Ed.). Data mining for design and manufacturing: methods and appli
cations. Boston, MA: Kluwer Academic, 2001.
[49] Mitchell, T. M. Machine learning. New York: McGraw-Hill, 1997.
[50] Quinlan, J. R. Induction of decision trees. Machine Learning, 1, 81–106 (1986).
10 Securing IoT-Based
Microservices Using
Artificial Intelligence
Sushant Kumar1 and Saurabh Mukherjee2
1
Asst Prof, Research Scholar, Banasthali University,
Vanasthali, Rajasthan
2
Prof, Banasthali University, Vanasthali, Rajasthan
CONTENTS
10.1 Introduction: Background and Driving Forces..........................................181
10.2 Previous Work............................................................................................182
10.3 Proposed Work ...........................................................................................183
10.4 Results......................................................................................................... 185
10.4.1 Components ..................................................................................186
10.4.2 Deployment and Testing ..............................................................189
10.5 Result and Discussion ................................................................................191
10.6 Conclusions................................................................................................. 191
References..............................................................................................................193
and technologies, perhaps a little more traditional to the base, but equally successful
as TLS [6] and MQTT [7], provided an appropriate structured environment.
The general objective was to provide the world of IoT with a safe architectural
alternative adapted to new technological trends, which can be used generically in a
multitude of situations. More specifically, first, special relevance was given to gen-
erating an alternative for smart homes. For this reason, the methodology encourages
the clear division of functions, but takes care of the fluid integration and the avail-
ability of tools for the creation of new utilities. Second, among the many existing
security problems, those related to client authentication when calling multiple services
and confidentiality when transmitting information, especially when compromising the
network, were attacked. Finally, we were aware of the hardware restrictions of many
devices, especially sensors.
Although in most of the related works SSL/TLS is the preferred security and
encryption mechanism, [22] what Sharma et al. propose is very interesting when
analyzing the complications that may exist at a practical level in IoT with TLS.
Their work analyzes the possibility of using SSH and highlights the advantages
provided by the data compression issue included in said protocol, which is espe-
cially advantageous when working over HTTP. Zarca et al. [23] contribute to the
IoT environment with a model-driven approach and propose an OAuth-oriented
model with a strong UML inclination.
This proposal, through transformations, could be adapted to a specific architecture,
offering the possibility of customizing it to the required environment. Another in-
teresting architectural and security proposal is the one presented by Yi et al. [24],
where instead of a traditional mechanism such as that presented by the SSL certifi-
cation authorities, an approach of local certification authorities would be used, which
more frequently, but also with a lighter process, would authenticate the IoT equip-
ment. It can be considered that a middle point between the two proposals that we have
just reviewed would be that of Sood et al. [25], which uses traditional certificates, but
with close authentication, at the node level. They emphasize that this mechanism
could be complemented with one of authorization, such as OAuth or similar.
Zhou et al. [26] emphasize OAuth, but above all, with the particularity of
concentrating its security architecture on the gateway equipment, where the base
station or sink node resides, which is in charge of processing heavy of authenti-
cating, authorizing, and establishing the links between clients and resources. This
approach is very relevant when we consider how susceptible those edge devices are
linked to edge computing [27], through which an entire IoT system can be com-
promised. One of the high points of security in edge systems in IoT is usually
related to the use of MQTT (or similar protocols), for which several works, such as
that of Amato et al. [28], propose improvements over said protocol.
FIGURE 10.1 Components at the local or edge layer of the IoT system.
from a preprocessor when necessary. They are divided into premises, located in the
intelligent environment, such as light or temperature controllers. They can also be
remote, such as those capable of sending instructions, probably over the network, to
a distant computer, but controlled from the home or smart office, such as when it is
required to send an SMS, email, or tweet.
Finally, this layer contemplates the preprocessor equipment, which can also be
called edge processors or brokers. These capture raw information from the sensors to
forward it to the centralized broker or actuator when the sensor is incapable. The
information can be sent as received from the sensor, or it can be preprocessed and this
result sent. These computers can be Raspberry Pi or small computers such as tablets. In
the centralized layer, presented in Figure 10.2, the teams in charge of the general
coordination of all the components are considered, and it is considered that three
10.4 RESULTS
The resulting architectural design took into consideration, above all, the need to
ensure the exchange of information of all the components of the system, trying to
take care of the speed of calculation at all times. These elements require reconciling
characteristics that are often incompatible. For example, more robust cryptography
systems may require more computing power than many lightweight devices, such as
sensors, provide. The final architecture designed, implemented, and tested is the one
outlined in Figure 10.3, which will be described in detail below. In the first place,
the components involved will be specified, and then the security functionality in
general will be presented.
10.4.1 COMPONENTS
In this section we will try to define in a general way the types of components or
equipment involved in the architecture proposed in Figure 10.3, in which there are
basically those in charge of the registry service (Registry), the equipment providing
authentication services for clients and users (UAA), and then, in a broad way, all the
teams providing general services, as well as all the client teams (Services and
Clients). Although for simplicity the components will be referred to in the singular,
the architecture considers that, for each category or type of component, especially
services, they can work in clusters.
The main task of the REG is to allow services to register through IP and aliases
(service name) and thus make them available to customers, who will connect to the
REG to request the information with which they will finally connect to the services
of interest.
The REG also provides a load balancing service, by detecting that a service is
registered in a cluster (multiple computers with the same service). The first point of
contact for all other components of the system, whether these are services or clients,
is the REG, which requires a static IP; all the other components of the system,
however, can work with dynamic IPs, through a DNS.
The UAA takes its acronym from “user authentication and authorization”.
Within our architecture, it basically provides us with the authentication service,
which works under OAuth2. The UAA stores the data of all clients in the system,
including their roles. With this information, the UAA user services may or may not
authorize the use of certain elements. Any component can connect with the UAA to,
through its client credentials (user and password), request an access token. In the
same way, any component of the system can request the UAA to validate a token
received from a third party.
The last category of components accommodates all other services and all clients. In
general, these components will interact with each other after having registered/au-
thenticated in the system with the help of the REG and the UAA. The services can be
very diverse, and it is up to the system administrator to decide which ones to require.
However, in our architecture for IoT, there are some that are fundamental, for which
they have been implemented in the test system, and they will be mentioned below.
Securing IoT-Based Microservices 187
To allow interconnectivity and, at the same time, reduce its complexity, the messaging
service was implemented, which in the methodology is represented by the central
broker (Figures 10.1 and 10.2).
The broker is able to receive and distribute all the messages circulating in the
system and basically allows all services and clients to establish a single connection
with the broker in general, to deposit messages and retrieve them from one or more
queues. This broker can work with any communication protocol, or a combination
of them. However, since in the world of IoT, at least today, the most widespread
protocol is the Message Queuing Telemetry Transport (MQTT), it is the one used in
the implementation presented here. Another service implemented for the proof of
concept of the architecture is the one related to persistence, as a necessary support
to subsequently implement batch processing.
For this, a transit service was implemented that takes the information from the
broker and transfers it to a Hadoop cluster [29], where different types of tools of said
ecosystem can be used to process the information. One of the cases that was worked
on, given the nature of the IoT information, especially that coming from the sensors,
was that of time series. For this, two data series services were built, thus providing
graphing and trend analysis, among others. Regarding customers, all the sensors are
considered here, which provide information to the system, the actuators, which react
with the environment thanks to the information from the system, and all those devices,
mobile or desktop, that allow the user to enter to configure the system, collect pro-
cessed information, or even act also as sensors and actuators.
d. Security Schemes
The system's security architecture comprises three fundamental scenarios: the basic
one, to which all elements must adhere to in their transactions, unless otherwise
specified; the lightweight scheme, generally used only when starting a worker process
on the system; and the strengthened one, for relationships of trust between services.
e. Basic Scheme
This is the default scheme that the system components will use in their
transactions. This scheme is represented in Figure 10.3 by the dotted line that
encompasses the system, and uses a combination of one-way TLS plus OAuth2.
Every service must provide its public security certificate (PKI) to clients, who can
then validate it with the certificate authority (CA). Likewise, every client must
provide the services with an OAuth access token so that they can validate it with
the authentication service. The use of TLS, in our scheme, is especially necessary
to be able to encrypt the content of the information that is transmitted. It is used
only on the services side to limit as much as possible the overhead that would
imply, above all, at the administration level (but also of resources and proces-
sing), to use it in all the components. The security breach that appears is com-
pensated with the use of OAuth2, through which the clients are in turn validated
by the services.
188 VLSI and Hardware Implementations
f. Light Outline
g. Strengthened Schema
Similar to the problem worked in the light scheme, sometimes two services
require interconnection, but at least one of them (who acts as a client) is unable to
obtain its access token. When dealing with services, it is not convenient to open an
insecure channel as is done in the lightweight scheme. In order to maintain the
security standard, then, it was decided to implement a two-way TLS scheme, which
is possible, without incurring greater overhead, since, being services, they already
have their PKI anyway. Additionally, in general the services will be executed on
equipment with greater processing capacity. This implementation also requires a
dedicated channel to be able to execute this type of validation, and the example is
given by the communication between the REG and the UAA. The UAA is the one
that provides the access tokens and therefore should validate itself, which would
generate a security hole. The REG, then, opens a dedicated channel so that a UAA
service can register in this way at all times. By connecting the UAA with the REG,
they exchange their respective PKIs, mutually validating each other over TLS,
without reducing the system's security standard.
h. Functionality
Returning to Figure 10.3, the dotted line represents the scope of the basic se-
curity scheme, which encompasses the entire system. Internally, the numbers 1
(monitoring system), 2 (IoT security controller), and 3 (security audit log) can be
seen, circled, which indicate the recommended starting order to guarantee the
fluidity of the service. In practice, at least the services have in their base library the
functionality to retry the connection when this starting order is not respected.
However, this can be subject to unnecessary delay.
Securing IoT-Based Microservices 189
In the first place, the registration server, REG, is started, which will provide a
central access point for the acquisition of contact information for the other services:
every service will register in the REG its respective IP and its alias (service name),
and every client will search here, by aliases, for the IP of the service required to be
able to connect with it. REG offers three access points, each of which must handle a
different security mode: the first, lightweight, allows any client to obtain the UAA's
IP without any additional security; the second mode, strengthened, allows the
connection of the UAA using two-way TLS; the last, basic one, requires OAuth2,
which allows clients to request information from services and to record their contact
information.
Second, an authentication server (UAA) is started, which will provide OAuth2
credentials to clients. The enhanced security mechanism, with two-way TLS vali-
dation, is used between the UAA and the REG. The UAA connects with the REG,
as well as any other service, to give its IP and aliases and thus be available to the
entire system. Once these two services, REG and UAA, are online, all other
components, services, and clients can start their work.
Finally, then, as point 3, any other component, be it this service or client, will
proceed as follows: first, using the lightweight security scheme, they will connect
with the REG to request the IP of the UAA. They establish the connection with the
UAA and request the access token, using their client credentials. With the access
token in hand, the basic security scheme can already be used and, in the case of
services, they will be registered with the REG, delivering IP and aliases, to wait for
client requests, or act as a client from another service, as needed. In the case of a
client, the next step is to use a service, where the basic security scheme will be
applied; it connects to the REG and by means of an alias it requests the IP of the
service of its interest, to then connect with said service.
FIGURE 10.4 ESP8266 monitoring noise level with a KY038; a MKR1010 monitoring
room temperature with a DHT11; an N9005 injecting random messages; and a laptop
monitoring all services.
10.6 CONCLUSIONS
This work was motivated by a pressing current need: to secure IoT systems,
especially those linked to a home context, and to do so without impairing the user's
freedom of access to collect information and to modify the configurations of their
system. This need, as it turned out, starts among other things from the relative
informality of the IoT, especially in a smart home environment. It began with a
methodological conception that stratifies the environment in layers: the one most
closely related to the interaction with the space by collecting information and ex-
ecuting actions to modify the microenvironment, and the layer of centralized pro-
cessing and analysis of the information. Both layers were interrelated with a
centralized connection that unifies them while also maintaining their light coupling.
This methodology was aimed at allowing an implementation based on micro-
services, where, in addition to avoiding any type of monolithic structure as much as
possible, a group of services and libraries were provided for clients that greatly
facilitate the generation of new utilities and components. The tests carried out with
the services, clients, and sensors generated under this infrastructure confirmed both
192 VLSI and Hardware Implementations
FIGURE 10.5 Wireshark trace that captures the packets and verifies that, in transit, they are
encrypted.
the robustness and the relative ease of use of the components. The main point is
security. Although it was not the easiest to implement, once the three defined
schemes for the different types of connections were debugged, it proved to be a
robust choice, which withstood the tests of improper access. It should be noted,
however, that a field-specific methodical testing scheme remains to be designed and
implemented, which will be the subject of future work. We can, however, affirm
Securing IoT-Based Microservices 193
that the combination of TLS, OAuth2, and MQTT, produced the expected results to
a large extent.
As main contributions, we have the implementation of a solid secure architecture
of IoT for the home, the stable and fluid combination of at least three high-level
technologies for security management, and a functional reference implementation
that can be made publicly available for free use.
REFERENCES
[1] W. Iqbal, H. Abbas, M. Daneshmand, B. Rauf and Y. A. Bangash, “An In-Depth
Analysis of IoT Security Requirements, Challenges, and Their Countermeasures via
Software-Defined Security,” in IEEE Internet of Things Journal, vol. 7, no. 10,
pp. 10250–10276, Oct. 2020. doi: 10.1109/JIOT.2020.2997651
[2] N. Neshenko, E. Bou-Harb, J. Crichigno, G. Kaddoum and N. Ghani, “Demystifying
IoT Security: An Exhaustive Survey on IoT Vulnerabilities and a First Empirical Look
on Internet-Scale IoT Exploitations,” in IEEE Communications Surveys & Tutorials,
vol. 21, no. 3, pp. 2702–2733, Thirdquarter 2019. doi: 10.1109/COMST.2019.2910750
[3] D. Shin, K. Yun, J. Kim, P. V. Astillo, J. Kim and I. You, “A Security Protocol for
Route Optimization in DMM-Based Smart Home IoT Networks,” in IEEE Access,
vol. 7, pp. 142531–142550, 2019. doi: 10.1109/ACCESS.2019.2943929
[4] F. Meneghello, M. Calore, D. Zucchetto, M. Polese and A. Zanella, “IoT: Internet of
Threats? A Survey of Practical Security Vulnerabilities in Real IoT Devices,” in
IEEE Internet of Things Journal, vol. 6, no. 5, pp. 8182–8201, Oct. 2019. doi: 10.11
09/JIOT.2019.2935189
[5] V. Hassija, V. Chamola, V. Saxena, D. Jain, P. Goyal and B. Sikdar, “A Survey on
IoT Security: Application Areas, Security Threats, and Solution Architectures,” in
IEEE Access, vol. 7, pp. 82721–82743, 2019. doi: 10.1109/ACCESS.2019.2924045
[6] S. Siboni et al., “Security Testbed for Internet-of-Things Devices,” in IEEE
Transactions on Reliability, vol. 68, no. 1, pp. 23–44, March 2019. doi: 10.1109/TR.2
018.2864536
[7] M. A. Al-Garadi, A. Mohamed, A. K. Al-Ali, X. Du, I. Ali and M. Guizani, “A
Survey of Machine and Deep Learning Methods for Internet of Things (IoT)
Security,” in IEEE Communications Surveys & Tutorials, vol. 22, no. 3,
pp. 1646–1685, Thirdquarter 2020. doi: 10.1109/COMST.2020.2988293
[8] C. Choi and J. Choi, “Ontology-Based Security Context Reasoning for Power IoT-
Cloud Security Service,” in IEEE Access, vol. 7, pp. 110510–110517, 2019. doi:
10.1109/ACCESS.2019.2933859
[9] M. G. Samaila, J. B. F. Sequeiros, T. Simões, M. M. Freire and P. R. M. Inácio,
“IoT-HarPSecA: A Framework and Roadmap for Secure Design and Development
of Devices and Applications in the IoT Space,” in IEEE Access, vol. 8,
pp. 16462–16494, 2020. doi: 10.1109/ACCESS.2020.2965925
[10] F. Hussain, R. Hussain, S. A. Hassan and E. Hossain, “Machine Learning in IoT
Security: Current Solutions and Future Challenges,” in IEEE Communications
Surveys & Tutorials, vol. 22, no. 3, pp. 1686–1721, Thirdquarter 2020. doi: 10.11
09/COMST.2020.2986444
[11] M. Frustaci, P. Pace, G. Aloi and G. Fortino, “Evaluating Critical Security Issues of
the IoT World: Present and Future Challenges,” in IEEE Internet of Things Journal,
vol. 5, no. 4, pp. 2483–2495, Aug. 2018. doi: 10.1109/JIOT.2017.2767291
[12] B. Liao, Y. Ali, S. Nazir, L. He and H. U. Khan, “Security Analysis of IoT Devices
by Using Mobile Computing: A Systematic Literature Review,” in IEEE Access,
vol. 8, pp. 120331–120350, 2020. doi: 10.1109/ACCESS.2020.3006358
194 VLSI and Hardware Implementations
[13] D. Wang, B. Bai, K. Lei, W. Zhao, Y. Yang and Z. Han, “Enhancing Information
Security via Physical Layer Approaches in Heterogeneous IoT With Multiple
Access Mobile Edge Computing in Smart City,” in IEEE Access, vol. 7,
pp. 54508–54521, 2019. doi: 10.1109/ACCESS.2019.2913438
[14] K. Lounis and M. Zulkernine, “Attacks and Defenses in Short-Range Wireless
Technologies for IoT,” in IEEE Access, vol. 8, pp. 88892–88932, 2020. doi: 10.11
09/ACCESS.2020.2993553
[15] S. N. Swamy and S. R. Kota, “An Empirical Study on System Level Aspects of
Internet of Things (IoT),” in IEEE Access, vol. 8, pp. 188082–188134, 2020. doi:
10.1109/ACCESS.2020.3029847
[16] T. M. Fernández-Caramés, “From Pre-Quantum to Post-Quantum IoT Security: A
Survey on Quantum-Resistant Cryptosystems for the Internet of Things,” in IEEE
Internet of Things Journal, vol. 7, no. 7, pp. 6457–6480, July 2020. doi: 10.1109/
JIOT.2019.2958788
[17] J. Wang et al., “IoT-Praetor: Undesired Behaviors Detection for IoT Devices,” in
IEEE Internet of Things Journal, vol. 8, no. 2, pp. 927–940, Jan. 2021. doi: 10.11
09/JIOT.2020.3010023
[18] X. Li, Q. Wang, X. Lan, X. Chen, N. Zhang and D. Chen, “Enhancing Cloud-Based
IoT Security Through Trustworthy Cloud Service: An Integration of Security and
Reputation Approach,” in IEEE Access, vol. 7, pp. 9368–9383, 2019. doi: 10.1109/
ACCESS.2018.2890432
[19] S. Malani, J. Srinivas, A. K. Das, K. Srinathan and M. Jo, “Certificate-Based
Anonymous Device Access Control Scheme for IoT Environment,” in IEEE
Internet of Things Journal, vol. 6, no. 6, pp. 9762–9773, Dec. 2019. doi: 10.1109/
JIOT.2019.2931372
[20] I. Farris, T. Taleb, Y. Khettab and J. Song, “A Survey on Emerging SDN and NFV
Security Mechanisms for IoT Systems,” in IEEE Communications Surveys &
Tutorials, vol. 21, no. 1, pp. 812–837, Firstquarter 2019. doi: 10.1109/COMST.201
8.2862350
[21] M. Wazid, A. K. Das, V. Odelu, N. Kumar, M. Conti and M. Jo, “Design of Secure
User Authenticated Key Management Protocol for Generic IoT Networks,” in IEEE
Internet of Things Journal, vol. 5, no. 1, pp. 269–282, Feb. 2018. doi: 10.1109/
JIOT.2017.2780232
[22] V. Sharma, I. You, K. Andersson, F. Palmieri, M. H. Rehmani and J. Lim,
“Security, Privacy and Trust for Smart Mobile- Internet of Things (M-IoT): A
Survey,” in IEEE Access, vol. 8, pp. 167123–167163, 2020. doi: 10.1109/
ACCESS.2020.3022661
[23] A. M. Zarca, J. B. Bernabe, A. Skarmeta and J. M. Alcaraz Calero, “Virtual IoT
HoneyNets to Mitigate Cyberattacks in SDN/NFV-Enabled IoT Networks,” in IEEE
Journal on Selected Areas in Communications, vol. 38, no. 6, pp. 1262–1277, June
2020. doi: 10.1109/JSAC.2020.2986621
[24] M. Yi, X. Xu and L. Xu, “An Intelligent Communication Warning Vulnerability
Detection Algorithm Based on IoT Technology,” in IEEE Access, vol. 7,
pp. 164803–164814, 2019. doi: 10.1109/ACCESS.2019.2953075
[25] K. Sood, K. K. Karmakar, S. Yu, V. Varadharajan, S. R. Pokhrel and Y. Xiang,
“Alleviating Heterogeneity in SDN-IoT Networks to Maintain QoS and Enhance
Security,” in IEEE Internet of Things Journal, vol. 7, no. 7, pp. 5964–5975, July
2020. doi: 10.1109/JIOT.2019.2959025
[26] W. Zhou, Y. Jia, A. Peng, Y. Zhang and P. Liu, “The Effect of IoT New Features on
Security and Privacy: New Threats, Existing Solutions, and Challenges Yet to Be
Solved,” in IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1606–1616, April
2019. doi: 10.1109/JIOT.2018.2847733
Securing IoT-Based Microservices 195
[27] C.-S. Park and H.-M. Nam, “Security Architecture and Protocols for Secure MQTT-
SN,” in IEEE Access, vol. 8, pp. 226422–226436, 2020. doi: 10.1109/ACCESS.202
0.3045441
[28] F. Amato, V. Casola, G. Cozzolino, A. De Benedictis and F. Moscato, “Exploiting
Workflow Languages and Semantics for Validation of Security Policies in IoT
Composite Services,” in IEEE Internet of Things Journal, vol. 7, no. 5,
pp. 4655–4665, May 2020. doi: 10.1109/JIOT.2019.2960316
[29] S. Sathyadevan, K. Achuthan, R. Doss and L. Pan, “Protean Authentication Scheme –
A Time-Bound Dynamic KeyGen Authentication Technique for IoT Edge Nodes in
Outdoor Deployments,” in IEEE Access, vol. 7, pp. 92419–92435, 2019. doi: 10.1109/
ACCESS.2019.2927818
[30] M. Oh, S. Lee, Y. Kang and D. Choi, “Wireless Transceiver Aided Run-Time Secret
Key Extraction for IoT Device Security,” in IEEE Transactions on Consumer
Electronics, vol. 66, no. 1, pp. 11–21, Feb. 2020. doi: 10.1109/TCE.2019.2959593
[31] S. Pérez, J. L. Hernández-Ramos, S. Raza and A. Skarmeta, “Application Layer Key
Establishment for End-to-End Security in IoT,” in IEEE Internet of Things Journal,
vol. 7, no. 3, pp. 2117–2128, March 2020. doi: 10.1109/JIOT.2019.2959428
[32] S. Mandal, B. Bera, A. K. Sutrala, A. K. Das, K. R. Choo and Y. Park,
“Certificateless-Signcryption-Based Three-Factor User Access Control Scheme for
IoT Environment,” in IEEE Internet of Things Journal, vol. 7, no. 4, pp. 3184–3197,
April 2020. doi: 10.1109/JIOT.2020.2966242
[33] G. George and S. M. Thampi, “A Graph-Based Security Framework for Securing
Industrial IoT Networks From Vulnerability Exploitations,” in IEEE Access, vol. 6,
pp. 43586–43601, 2018. doi: 10.1109/ACCESS.2018.2863244
[34] R. Sairam, S. S. Bhunia, V. Thangavelu and M. Gurusamy, “NETRA: Enhancing
IoT Security Using NFV-Based Edge Traffic Analysis,” in IEEE Sensors Journal,
vol. 19, no. 12, pp. 4660–4671, June 2019. doi: 10.1109/JSEN.2019.2900097
[35] E. Dushku, M. M. Rabbani, M. Conti, L. V. Mancini and S. Ranise, “SARA: Secure
Asynchronous Remote Attestation for IoT Systems,” in IEEE Transactions on
Information Forensics and Security, vol. 15, pp. 3123–3136, 2020. doi: 10.1109/
TIFS.2020.2983282
[36] N. Ghosh, S. Chandra, V. Sachidananda and Y. Elovici, “SoftAuthZ: A Context-
Aware, Behavior-Based Authorization Framework for Home IoT,” in IEEE Internet
of Things Journal, vol. 6, no. 6, pp. 10773–10785, Dec. 2019. doi: 10.1109/JIOT.2
019.2941767
[37] Z. Deng, Q. Li, Q. Zhang, L. Yang and J. Qin, “Beamforming Design for Physical
Layer Security in a Two-Way Cognitive Radio IoT Network With SWIPT,” in IEEE
Internet of Things Journal, vol. 6, no. 6, pp. 10786–10798, Dec. 2019. doi: 10.1109/
JIOT.2019.2941873
11 Applications of the
Approximate Computing
on ML Architecture
Kattekola Naresh1 and Shubhankar Majumdar2
1
ECE Department, VNR VJIET, Hyderabad, India
2
ECE Department, NIT Meghalaya, Shillong, India
CONTENTS
11.1 Approximate Computing ............................................................................198
11.1.1 Introduction...................................................................................198
11.1.2 Approximation ..............................................................................199
11.1.3 Strategies of Approximation Computing .....................................200
11.1.4 What to Approximate ...................................................................201
11.1.5 Error Analysis in Approximate Computing.................................202
11.2 Machine Learning.......................................................................................203
11.2.1 Introduction...................................................................................203
11.2.2 Neural Networks........................................................................... 203
11.2.2.1 Architecture..................................................................204
11.2.2.2 Abilities and Disabilities .............................................204
11.2.3 Machine Learning vs. Neural Network .......................................204
11.2.4 Classifications of Neural Networks in Machine Learning ..........205
11.2.4.1 Artificial Neural Network (ANN) ...............................206
11.2.4.2 Convolution Neural Network (CNN)..........................209
11.2.5 Novel Algorithm in ANN ............................................................209
11.2.5.1 Introduction ..................................................................209
11.2.5.2 Weights of Neurons .....................................................209
11.2.5.3 Weight vs. Bias............................................................210
11.2.5.4 Neuron (Node) .............................................................210
11.3 Approximate Machine Learning Algorithms.............................................211
11.3.1 Introduction...................................................................................211
11.3.2 Approximate Computing Techniques ..........................................212
11.3.3 Approximate Algorithms for Machine Learning.........................213
11.3.4 Results and Analysis ....................................................................213
11.4 Case Study 1: Energy-Efficient ANN Using
Alphabet Set Multiplier..............................................................................214
11.4.1 Introduction...................................................................................214
11.4.2 8-bit 4 Alphabet ASM..................................................................216
11.4.3 Four Alphabet ASMs Using CSHM Architecture.......................217
DOI: 10.1201/9781003201038-11 197
198 VLSI and Hardware Implementations
11.1.2 APPROXIMATION
The major concept of an approximate computation is simple. We intentionally
decrease the accuracy to conserve energy, time, and/or memory. This will be a
pattern that sees loss of accuracy as an opportunity, not a loss. In recent years,
various approximation methods have emerged that explore possibilities, including
unreliable hardware, NN accelerators, and numerical approximations. The extensive
challenge is about conveying this important concept in a basic way to all failed
programmers, thereby guaranteeing a precise level of accuracy.
Approximate computation differs from computations linked to concepts such as
probability computing. The approximate computation doesn’t include presupposi
tions about the probabilistic characteristic of the process driving the network
system. Moreover, the computation uses statistical characteristics of algorithms and
data to exchange efficiency for energy and power conservation.
Approximate computation was often used along with NN accelerator, and they
are often used in error-prone applications that are tolerant. There are studies that
suggest several estimation methods. NNs exhibit parallelism and could be ac
celerated by special hardware [1]. There exist various quality measures such as
pixel differences in an image, categorization of data and clusters, ranking accuracy,
etc., and these can be subject to incorrect computations. Other quality measures
include overall image quality index and validation. For many applications, there
exist several performance measures that can be utilized to calculate quality re
duction, such as k-means clustering accuracy and average centroid distance, which
are being used as performance metrics [2]. Other areas include image processing,
face detection, and search engines. Approximate computing always contributes to a
200 VLSI and Hardware Implementations
variety of devices and components such as analytic models, CPUs and GPUs, si
mulators and inaccurate computation techniques, and eventually SRAM cells and
cache memory.
In a few more scenarios, the usage of AC is not avoidable. Either the possibility
of AC comes from inheritance, or in few scenarios, the AC can be used very ac
tively to maximize efficiency [2].
The incentives and opportunities for AC are as follows:
To address the full potential of approximate computing, some of the challenges are
as follows:
There can be many other streams that can use approximation computing, and they
can be studied through experiments and research work. For example, if any circuit
needs to be optimized, the circuit Boolean expression needs to be optimized. Here,
the respected K-Map of the circuit is tried to fold with maximum literals in such a
way that there should not be more errors that affect the functionality of the circuit.
The error analysis and the design requirement are the major functions of approx
imate computing.
For example,
K-Map for accurate and approximate 2-bit multipliers can be written as shown in
Figure 11.1
202 VLSI and Hardware Implementations
FIGURE 11.1 K-Map for 2-bit multiplier for accurate and approximate design [Source: V.
Mrazek 2018].
From Figure 11.1, the implementation of a 2-bit multiplier for accurate design needs 4
partial products, whereas the approximate design can be optimized with 3 partial
products, which has an error in 1 position of 16 combinations with the nearest value.
Hence, the gate-level design can be optimized to a major extent with one error position.
For such an inexact design, a parameter has been utilized to evaluate the inexactness
in accordance with the exact yield; the pre-defined error distance is being proposed
as shown with respect to the figure of merit for an inexact computing. For specified
input, the error distance (ED), is explained as an arithmetic difference between the
exact result (E) and the inexact result (I) [3–8].
ED (E , I ) = E – I = [i ] 2i [j ] 2j (11.1)
i j
Here, I and j are the indices for the bits in E and I, respectively.
MED
NMED = (11.5)
Smax
Smax: It is the maximum magnitude of the output value of the precise adder.
11.2.2.1 Architecture
NN's design is often referred to as “architectural design” or “configuration”. It consists
of basic groups of the count of layers. It also comprises the process of interconnect to
change the weights varied. The selection of the design obtained the results that can be
extracted. It is the most important aspect to a NN's implementation.
The easiest design is a design in which two layers of input and output are divided
into units. Each block in the input layer has an input value and an output value equal
to the input value. Due to the combination of function and transfer function, all
input-level blocks are connected to the inputs in the output block. There exists
greater than one output block. In such situation, there is a logistic regression or
linear depending on whether the transfer function is linear or logistic. The regres
sion coefficient is the system weight.
By summating one or more hidden layers among all the inputs and output layers
and at such node, the output layers and blocks improve the predictive power of the
NN. However, there should be as many hidden layers as possible. This allows the
NN to generalize it without storing all the data in the training set, preventing abuse.
learning might develop from fundamentals, they may contain a few human
interventions throughout the beginning levels.
o As the nested layers within transversal data along hierarchies of various
definitions, NNs do not really need human involvement, which finally
enables us to learn through our failures.
• It is possible to classify machine learning models into two types: supervised
and unsupervised learning modules. NNs, though, can be classified into NNs
that are recurrent, feedforward, modular, and co-evolutionary.
• In a simple way, an ML model works – it is fed and learned from data. Each
time that it constantly learns from the results, the ML model becomes more
sophisticated and educated. The configuration of an NN, on the other hand, is
very complex. In it, the data flows through multiple layers of interconnected
nodes, through which each node classifies the previous layer's characteristics
and data before transmitting the results to other nodes in subsequent layers.
• Although machine learning models are adaptive, learning from new sample
data and interactions continuously evolves. The models can thus classify the
trends in the details. Here and the only input layer is data. There are several
layers, though, also in a basic NN model.
o The first layer, preceded by a hidden layer, is the input layer, and then
eventually the output layer. One or more neurons are found in any layer.
You can improve its analytical and problem-solving ability by raising the
number of hidden layers within such an NN model.
• Probability, analytics, and programming. Hadoop and Big Data, knowledge of
ML architectures, algorithms, and data structures are skills needed for ma
chine learning. NNs include skills such as modeling data, geometry, and
graph theory; linear algebra; programming and statistics; and probability.
• In fields such as hospitals, banking, e-commerce (recommendation engines),
financial services and insurance, self-driving vehicles, online video streaming,
IoT, and transportation and logistics, to name a few, machine learning is
implemented. In the other hand, NNs have been used to address various
market problems, including, among other things, revenue forecasting, data
analysis, consumer study, risk assessment, voice recognition, and character
recognition.
• There are a few key difference between the machine learning and NNs. NNs
are basically part of deep learning, which is a branch of machine learning.
NNs are, however, few not but regularly sophisticated implementation of
machine learning that is actually seeing applications in a variety of areas of
interest.
Here, the classifications of NN are CNN and ANN. The way we communicate with
the globe is changing. These various forms of NNs are at the center of profound
learning revolutions and powered technologies such as self-driving cars vehicles,
voice recognition, unmanned aerials, etc.
• Table format
• Visual format
• Words and text format
ANNs are often arranged in layers. The layers are made of several interdependent
“nodes” containing “activation functions”. NNs can be organized into three layers:
a. Input Layer
Every input layer will give with purpose to have a descriptive attribute value
entered in each and every order that will be observed. There are nodes present
in the input layers that are usually sufficient for the number of independent
variables. “Input layer” refers to a template for a network that interacts with
high numbers of hidden layers, whereas the input layer nodes will be passive.
In other words, it does not change the information. Get a unique value from
the inputs and then duplicate it on multiple outputs. Duplicate each value in
the input layer, and send it to some or all hidden nodes.
b. Hidden Layer
Hidden layers specify the input value of the system. In this case, the primary
arc comes from another hidden node or an input node associated with each
node. Connect the outgoing arc with a hidden node or another hidden node.
Regarding the hidden layer, specific transformation is completed with a weight
network of “connection”. It will have more than one hidden layer in the network
if required. Values entering the hidden node will be multiplied with their re
spective weights, which are predefined groups that will be saved in the program.
Then, add respective weights of the input layers to get the value.
c. Output Layer
All hidden layers relate to the output layer to obtain data. The output layer
acquires the connection from the input layer or the hidden layer. It replaces
the initial value corresponding to which a variable value response is expected.
Deployment issues usually only have single output layer nodes. Active nodes
in the warehouse mix it up and vary information to provide output values.
There are some advantages of the NN to perform useful data gestures de
pending on choosing the right pressure. This is often different from regular
IP. ANN is considered a simple mathematical model to improve present
analyzed data techniques. Although this is incompatible with the functioning
of the neurons in the brain of the mammal species, it is still an important
component for artificial intelligence.
weight value, thereby adding a bias ahead of processing the data to the further layer.
The NN's final layer is also known as the output layer. To generate the correct numbers
in a specified range, the output layer often tunes the inputs from the hidden layers.
Figure 11.4 shows the ANN process for a single neuron.
Z = x1 w1 + x2 w2 + … +xn wn + b 1 (11.7)
FIGURE 11.5 Operations at one neuron of a neural network [Source: Miroslav Kubat 2017].
TABLE 11.1
Implementation Results
Approximate Computing Classification Problem
Algorithm
Touch Mobility Image
shown in Table 11.1. In the two categorizations, the problems with tolerable pre
cision loss and the proposed approximate SVM classifiers and KNN (K = 3) have
demonstrated their advantages.
For example, recognition of connect modality is increased by a multiple of 1.7 over
an exact loss, which is less than 3%, where a speed of 3.2 could be achieved for image
classification with an accuracy loss of less than 5% [12], which is shown in Table 11.1.
loss is suffered. But, to acquire the quality of output, we observe retraining of NN with
limitations in the location.
Existing ASM can be replaced by an conventional multiplier in the artificial
neurons to reduce the intake of energy and also increase the various benefits, that is,
as increase in processing speed and reduction of area [15]. Last, we impose a more
complex neuron design, which will not constitute any of the pre-processing bank,
and a multiplier-less neuron, emerging to huge improvements with less accuracy
decrement in energy usage [16].
The primary operation of these ANNs includes levels, such as training and
testing. First, the training method is normally done offline so it is not an electricity
intake issue. The skilled ANN is being utilized to test the random records of data,
that is performed on the chip. Huge networks, which have millions of neurons, and
the trying out system, even though less computed in depth than schooling, may
additionally require great computation. The checking out method is generally ahead in
propagation, that includes activation operations, summation, and multiplication. The
vast electricity consuming calculations amongst those will be multiplication, through
which some distance over-weighs the activation and summation. Hence, the funda
mental recognition is to alleviate this trouble by way of presenting a solution that is
energy efficient. On these paintings, by using the approximate ASM, we primarily
mirror the traditional multiplier inside the neurons. Sooner or later we take some time
to produce a synthetic neuron without a multiplier block. Notice that, introducing an
NN along the approximate multiplier may lead to minimum reduction of accuracy but
achieve good-sized strength discount.
In the multiplication operation, from smaller bit sequences, products will be
generated and will be the lower multiples of input “In” (multiplier). The putrefaction
depends upon the multiplicand “W”, in such an instance is representing the synaptic
weight. Table 11.2 shows the decomposition of two multiplication operations; they
are W1 × I and W2 × I.
Note that if I, 3I, 5I, 7I, 9I, 11I, 13I, and 15I are pre-existing, then total mul
tiplication will be decreased to a smaller number of shifts and additions. Some of
the small bit sequences are called alphabets. In ASM [12–14], certain pre-specified
alphabets are shifted and added instead of directly multiplying the multiplier and
multiplicand. These alphabets are collectively referred to as alphabet sets, which
consist of lower order multiples and generate these alphabets in a pre-computer
library is needed.
TABLE 11.2
Decomposition of Multiplication Operation
Weights Decomposition of Product
i. Produce characters.
ii. Choose characters.
iii. Shift the respected characters.
iv. Summate the shifted characters.
FIGURE 11.8 Four alphabet ASMs using CSHM architecture [Source: S. S. Sarwar 2016].
218 VLSI and Hardware Implementations
with a set of eight characters {1,3,5,7,9,11,13,15} will be enough to use selection, shift
to generate any product, and then add operation. To obtain a significant performance
improvement, we recommend using a minimum number of alphabets because all
combinations cannot be covered, resulting in a multiplicative approximation.
For example:
Consider 4 characters {1,3,5,7}. From that we could produce 12 (inclusive of 0
(00002)) from 16 possible 4-bit combinations through a shift operation (for example,
from 1 (00012)) to get 2 (00102)), 4 (01002), and 8 (10002)). In this case, the un
supported quaternary value of bits is {9,11,13,15}. Therefore, since the alphabet set
used does not support LSB 10012 (910), the product 011010012 × I cannot be generated
with any combination of selection, shift, and addition. To overcome this problem,
constrained training of ANNs is introduced such that the non-supported set of com
binations will not happen. Also, the ANN application is error-proof. We can use this
method and obtain an appropriate set of weights, simultaneously at the same time
imposing constraints on the network to cause minimal or no loss of network accuracy.
Compared with the original training, the cost of retraining is small.
An algorithm to limit the weight of 12-bit ASM will be explained below with an
example as follow:
The 12-bit synaptic weight is regarded as 3-bit quartet P, a series version of Q and
R, where R is the LSB and P is the MSB. Figure 11.9 shows the 12-bit weight value is
divided into three quartets. Since, the 2's complement binary number system is used,
the primary bit of P is the sign bit; so we don't have to take the sign bit into con
sideration because only the absolute value is multiplied. Therefore, P has 8 combi
nations, from 0 (0002) to 7 (1112), and Q and R have 16 combinations, from 0 (00002)
to 15 (11112). If only 2 alphabets {1,3} are used, out of 16 the maximum number of
combinations supported is only 8. In this case, 5 and 7 for P, while 5, 7, 9, 10, 11, 13 Q
and R are 14, 15 are not supported. Therefore, to minimize the loss of precision, we
change these non-supported values to the closest supported values. Algorithm 1 is a
weight constraint mechanism of a 12-digit, 2-alphabet {1,3} multiplier.
FIGURE 11.9 12-bit weight value decomposed into three quartets [Source: S. S.
Sarwar 2016].
Approximate Computing on ML Architecture 219
FIGURE 11.10 Overview of the ANN design methodology [Source: S. S. Sarwar 2016].
training data set (TrData), test data set (data), and quality constraint (Q), which
indicates the tolerable quality degradation in the implementation. Quality specifi
cations are for specific applications.
In order to check the validity of our model, we use it in a face detection program,
which detects whether there is a face based on the input image data. Here, just 2 is the
number of final output neurons. Input neurons of 1024 are used and 100 hidden layer
neurons. We first created 8-bit and 12-bit synaptic weights for unsupported (for peri
odic multipliers) and restricted conditions using the training data collection (for ASM).
Then, using the test data collection, we checked the network and obtained better results
with a maximum precision drop of 0.47%. Table 11.3 displays the results.
After the success, we used the MNIST dataset to solve a more complex
“handwritten digit recognition” problem. As earlier, to stimulate synaptic weights
(in this, the final number of output neurons is 10) a similar method was used. Then,
we used these synaptic weights to test the accuracy of the system in the designed
processing engine.
The accuracy results are shown in Table 11.4.
TABLE 11.3
For Face Detection Accuracy Results of NN
Width of Synapse No. of Characters Accuracy (%) Accuracy Loss (%)
TABLE 11.4
Digit Recognition Accuracy Results of NN
Width of Synapse No. of Characters Accuracy (%) Accuracy Loss (%)
TABLE 11.5
Lists the Benchmark Metrics Used
Application Dataset NN Model Number Number of Number of
of Layers Neurons Trainable
Synapses
these tools. The NN using the corresponding training data set is trained here. Then,
during the retraining of the NN, a restriction on weight update imposed for the
minimum count of characters in the ASM-based neuron can be utilized. The synaptic
loads and test patterns are used as input from the trained NN to our processing engine.
In Verilog at the register transfer level (RTL), a processing engine is implemented and
is linked to Synopsys Design Compiler Ultra of IBM at 45 nm technology. This is
always utilized to compute the exhaustion of energy and area under constant velocity
circumstances. Table 11.5 shows the number of typical layers for the benchmarks used.
models and a design without the usage of multiplier technique are provided, which
can decrease the design complication of a time-multiplexed ANN.
Considering that the area used by floating-point addition and multiplication
operations is larger than that of integer counterparts, and requires a lot of energy,
they are converted to integers, then the offset value is added, and the floating-point
weight processing is found in the startup. This transformation is achieved purely by
multiplying the offset and weight of each floating point by 2q. Here, q represents the
quantized digit, therefore finding a lowest number higher than the product or near
value to it.
FIGURE 11.12 Neuron calculations at the kth layer using MAC blocks with ANN [Source:
M. Esmali Nojehdeh 2020].
Approximate Computing on ML Architecture 223
weights of the input variables. If an ANN contains λ in each layer of neurons, where
the number of layers is represented by 1 ≤ I ≤ λ and λ, the appropriate value of MAC
modules is i i , which is the total count for neurons. There is a difficulty for the
MAC block calculation, and register is calculated by the output and input counts of
each layer of each neuron and each layer of the weight value. The complexity of the
control module is evaluated by the input count on each layer. As the neuron cal
culation is obtained, after the calculation of the previous layer is completed, the
neuron calculation of the next layer is started, layer by layer. Therefore, it can be
achieved only by providing an output signal. After i (ii + 1) clock cycles, the
calculation of the entire ANN for each layer input with λ layer and I is obtained,
where 1 ≤ I ≤ λ. The SMAC neuron architecture is shown in Figure 11.12.
FIGURE 11.13 Designing a ANN by using simple MAC block [Source: M. Esmali
Nojehdeh 2020].
224 VLSI and Hardware Implementations
FIGURE 11.14 n-bit Ripple Carry Adder [Source: M. Esmali Nojehdeh 2020].
00 0 0 0 0 0✓ 0✓ 0 0 0✓ 0✓ 0 0 0✓0✓ 0 0 0✓ 0✓ 0 0
00 1 0 1 1 0✓ 1✓ 0 1 0✓ 1✓ 0 1 0✓1✓ 0 1 0✓ 0㋱ -1 0
01 0 0 1 1 1㋱ 0㋱ +1 2 0✓ 1✓ 0 1 0✓1✓ 0 1 0✓ 1✓ 0 1
01 1 1 0 2 1✓ 0✓ 0 2 0㋱ 1㋱ -1 1 0㋱1㋱ -1 1 0㋱ 1v -1 1
10 0 0 1 1 0✓ 1✓ 0 1 1㋱ 0㋱ +1 2 1 ㋱0㋱ +1 2 1㋱ 0㋱ +1 2
Approximate Computing on ML Architecture
10 1 1 0 2 1✓ 0✓ 0 2 1✓ 0✓ 0 2 1✓0✓ 0 2 1✓ 0✓ 0 2
11 0 1 0 2 1✓ 0✓ 0 2 1✓ 0✓ 0 2 1✓1㋱ +1 3 1✓ 1㋱ +1 3
11 1 1 1 3 1✓ 1✓ 0 3 1✓ 1✓ 0 3 1✓1✓ 0 3 1✓ 1✓ 0 3
FIGURE 11.15 Exact 4-bit unsigned multiplier [Source: M. Esmali Nojehdeh 2020].
Along with the approximate multipliers, another method is also proposed called
LEBZAM [19], which is achieved by setting the r least effective output of the
accurate multiplier to zero. Here, r represents their approximate level. There are
different algorithms proposed for the implementation of approximate multipliers
and compressors used in multipliers [25–27].
The synthesis procedure is as follows:
Figure 11.16 shows the implementation of a 4-bit approximate multiplier where the
value of r represents 3. Therefore, under the architecture introduced in Section 11.3,
by utilizing approximate multipliers along various widths and approximate amounts
in the MAC modules designed by the ANN, hardware accuracy can be considered,
thereby leading to greatly reduced hardware complexity of the ANN. Likewise, by
utilizing the approximate adder, the hardware complexity of ANN will be reduced
ahead [19].
Approximate Computing on ML Architecture 227
FIGURE 11.16 Approximate 4-bit unsigned multiplier with the least significant 3 bits are
set to logic value 0 [Source: M. Esmali Nojehdeh 2020].
TABLE 11.7
Results of SMAC Neuron Architecture Using Approximate Multipliers and
Adders
Multiplier Type Approximation Level Area Delay Power Energy Area Energy
Hidden Output Mul Add Gain Gain
Mul Add
multiplier [18], ANN is carried out under SMAC ANN and SMAC neuron archi
tectures. The12s 2nm and mul12s 2KM approximate multipliers provide 12-bit input,
and among other multipliers, the minimum area consumption and minimum error are
chosen. Please note that the estimated amounts of the multipliers and adders on the
hidden layer and the output layer are calculated manually, taking into consideration the
HMR limit rate, and give the ANN design yield with the desired computational
overhead and the important values. Then, explain the specification in Verilog, and
synthesize it with the Cadence Genus platform and TSMC 40 nm model library [19].
Tables 11.7 to 11.8 show the gate-level yield of the ANN design. Here, the
power, delay, and area represent the total area (μm2), the delay over critical path
(unit is ns), and the total consumption of power (mW). The latency represents the
time (in ns), and it is essential to obtain the ANN output after the input is applied. It
is set on as the multiple of the clock width multiplied to obtain the ANN output. To
acquire an ANN output by using SMAC ANN and SMAC neuron, the number of
clock cycles that are required are calculated as 34 and 468 of ANN, respectively.
Furthermore, energy represents consumption of energy expressed in pJ, which is the
product of waiting time multiplied by power consumption. We noticed that using
the technique of retiming in the synthesis tool can iteratively improve the clock
cycle. The test data in the simulation was used to generate the switching activity
data needed to calculate the power consumption [19].
To check the ANN specification, this test data set is also used. Under the SMAC
neuron architecture, Table 11.7 lists the gate-level effects of the ANN design, where
the exact multiplier in the MAC block is only replaced by estimated values. It is
noted that the estimated multipliers have been designed for a fixed scale, so an ANN
architecture containing all the multipliers can have more energy consumption,
delay, and area parameters than an ANN that uses specific differential multipliers
Approximate Computing on ML Architecture 229
TABLE 11.8
Results of SMAC ANN Architecture Using Approximate Multipliers and
Adders
Approximation Level Area Delay Power Energy Area gain Energy gain
relative to these multipliers. This is also since optimized precision multipliers and
adders are used in the logical synthesis tool. Using the estimated multiplier of, on
the other side, will reduce the hardware complexity of the ANN by finding the
required degree of approximation of the multiplier in the output layer and the
hidden layer. Furthermore, the largest reduction in area, latency, and power usage
comes from our estimated multiplier. Note that the trade-off between computational
overhead and SMAC exact could also be transverse by changing the estimated level
of multipliers [19].
Under the SMAC neural structure, Table 11.7 displays the gate-level effects of
the SMAC neuron design, in which the exact multipliers and adders in the MAC
block are displaced by approximate multipliers and adders. An approximate adder
and an approximate multiplier are used to significantly minimize the complexity
ANN's hardware. The maximum increases in area and consumption of energy are
43% and 64%, respectively, using the proposed approximate multipliers. The gate-
level effects of the ANN specification by using the SMAC ANN architectonic are
seen in Table 11.8, in which only the exact multiplier is substituted with an esti
mated multiplier in the MAC block. We require a single multiplier be reacquired,
and the estimated multiplier preferred results in the highest region of gain and
power usage. In addition, the use of the approximate adder will minimize the
complexity of hardware, as seen in Table 11.8. It should be noted that using the
approximate multipliers and adders will also expand the hardware, which could be
noticed from the outputs [19].
Figure 11.17 shows the graph to represent the SMAC neuron to SMAC ANN
with gain in percentages for replacing accurate designs with approximate design.
From Figure 11.17, by replacing accurate adders and multipliers with approximate
adders and multiplier designs, respectively, the gains in area, delay, power, and
230 VLSI and Hardware Implementations
FIGURE 11.17 Graph to represent gain in constraint for area, delay, power, and energy.
energy for SMAC neuron were from 1% to 64%, and gains in SMAC ANN were
from 4% to 31%, respectively. Hence, efficient design of ANN was chosen by
approximate arithmetic circuits.
11.6 CONCLUSION
ANNs are one of the most well-established machine learning techniques that have a
great scope of applications in approximate or estimated fields. Here, approximate
computing is embedded to implement an NN model to increase the efficiency of the
design. Hence, some of the machine learning algorithms are classified for better
latency using approximation techniques. The arithmetic circuits are replaced with
approximate modules to design an efficient SMAC ANN and SMAC neuron where
the results show that the area, delay, power, and energy can be effective from 4% to
64% with the proposed models. Two types of arithmetic blocks were explained,
which are multiplier less design of ANN and approximate multiply accumulate
block. The results of alphabet set multiplier were compared with the size of sy
napses with the size of CNNs with 4,2,1 alphabet NN, respectively. One more result
shows the comparison of SMAC neurons with different approximate multipliers and
adders, which are effective in energy saving as per the earlier values. SMAC ANN
and SMAC neuron implementations are energy efficient in different approximate
levels of adders and multipliers. As large-scale NN gains great interest with com
plexity, there should always be a reduced or optimized technique to represent the
efficient design of ANNs.
Approximate Computing on ML Architecture 231
REFERENCES
[1] V. Kumar, R. Kant. 2019. “Approximate Computing for Machine Learning,” in C.
Krishna, M. Dutta, and R. Kumar (eds), Proceedings of 2nd International
Conference on Communication, Computing and Networking. Lecture Notes in
Networks and Systems, vol. 46. Springer, Singapore. doi: 10.1007/978-981-13-121
7-5_59.
[2] S. Mittal. 2016. “A Survey of Techniques for Approximate Computing,” ACM
Computing Surveys, vol. 48, no. 4, pp. 1–33, Mar. doi: 10.1145/2893356.
[3] Z. Yang, A. Jain, J. Liang, J. Han, F. Lombardi. 2013. “Approximate XOR/XNOR-
Based Adders for Inexact Computing,” 13th IEEE International Conference on
Nanotechnology (IEEE-NANO 2013), Beijing, pp. 690–693. doi: 10.1109/NANO.2
013.6720793.
[4] H. Junqi, T. N. Kumar, H. Abbas, F. Lombardi. 2017. “Simulation-Based
Evaluation of Frequency Upscaled Operation of Exact/Approximate Ripple Carry
Adders,” 2017 IEEE International Symposium on Defect and Fault Tolerance in
VLSI and Nanotechnology Systems (DFT), Cambridge, pp. 1–6. doi: 10.1109/DFT.2
017.8244437.
[5] P. J. Braspenning, F. Thuijsman, A. J. M. M. Weijters. 1991. “Artificial Neural
Networks,” Heidelberg, Germany: Springer. doi: 10.1007/BFb0027019.
[6] M. E. Nojehdeh, M. Altun. 2020. “Systematic Synthesis of Approximate Adders
and Multipliers with Accurate Error Calculations,” Integration, vol. 70, pp. 99–107.
doi: 10.1016/j.vlsi.2019.10.001.
[7] Gopinath Rebala, Ajay Ravi, Sanjay Churiwala. 2019. “An Introduction to Machine
Learning,” Heidelberg, Germany: Springer International Publishing, Aug. doi: 10.1
007/978-3-030-15729-6.
[8] Miroslav Kubat. 2017. “An Introduction to Machine Learning, Second Edition,”
Heidelberg, Germany: Springer International Publishing, Sep. doi: 10.1007/
978-3-319-63913-0.
[9] Zhentao Gao, Yuanyuan Chen, Zhang Yi. 2020. “A Novel Method to compute the
weights of Neural Networks,” Neurocomputing, vol. 407. ISSN 0925-2312, doi:
10.1016/j.neucom.2020.03.114.
[10] H. Younes, A. Ibrahim, M. Rizk et al. 2019. “Algorithmic Level Approximate
Computing for Machine Learning Classifiers,” 26th IEEE International Conference
on Electronics, Circuits and Systems (ICECS), Genoa, Italy, pp. 113–114. doi:
10.1109/ICECS46596.2019.8964974.
[11] S. Venkataramani, S. T. Chakradhar, K. Roy et al. 2015. “Approximate Computing
and the Quest for Computing Efficiency,” 52nd Annual Design Automation
Conference on - DAC ’15, San Francisco, California, pp. 1–6. doi: 10.1145/2744
769.2751163.
[12] S. Sivanantham. 2013. “Low Power Floating Point Computation Sharing Multiplier
for Signal Processing Applications,” International Journal of Engineering and
Technology (IJET), vol. 5.2, pp. 979–985. doi: 10.1.1.411.8328.
[13] G. Karakonstantis, K. Roy. 2007. “An Optimal Algorithm for Low Power
Multiplierless FIR Filter Design using Chebychev Criterion,” IEEE International
Conference on Acoustics, Speech and Signal Processing - ICASSP '07, Honolulu,
HI, pp. II-49–II-52. doi: 10.1109/ICASSP.2007.366169.
[14] Jongsun Park, Hunsoo Choo, K. Muhammad et al. 2000. “Non-adaptive and
Adaptive Filter Implementation Based on Sharing Multiplication,” IEEE
International Conference on Acoustics, Speech, and Signal Processing. Proceedings
(Cat. No.00CH37100), Istanbul, Turkey, vol. 1, pp. 460–463. doi: 10.1109/
ICASSP.2000.862012.
232 VLSI and Hardware Implementations
CONTENTS
12.1 Introduction................................................................................................. 233
12.1.1 Reinforcement Learning and Markov Decision Process.............233
12.1.2 Hardware for Reinforcement Learning at the Edge....................235
12.2 Background.................................................................................................237
12.3 Hardware Realization of Simple Reinforcement Learning Algorithm.....239
12.3.1 Architecture-Level Description....................................................239
12.3.2 Flow of Data in the Hardware Architecture ...............................243
12.4 Results and Analysis of SRL Hardware Architecture ..............................243
12.5 Q-Learning and SRL Algorithm Applications ..........................................245
12.6 Future Work: Application and Hardware Design Overview ....................246
12.6.1 Hardware Design Overview.........................................................247
12.7 Conclusion ..................................................................................................250
Acknowledgment ...................................................................................................251
References..............................................................................................................251
12.1 INTRODUCTION
12.1.1 REINFORCEMENT LEARNING AND MARKOV DECISION PROCESS
A computational approach to modeling the interaction of an agent (such as a robot)
with its environment (such as its surroundings in which it funtions or operates) and
the use of these interactions to modify the agent’s functional behavior (called its
actions) through the maximization of a notional metric termed a “return”, is often
the basis of reward-based learning. An interesting real-life example is that of a child
trying to wave hands, stand up, walk, and learn according to the surroundings while
trying to maximize his or her reward/learning with time. In a way, reinforcement
learning (RL) is goal-focused learning of an agent while interacting with an
algorithms, we will present extensive data and results for Xilinx FPGA fabrics
and platforms.
The chapter consists of several sections organized as follows. Section 12.2 gives
a brief background about the algorithms, techniques, and hardware architectures
available in the literature for RL. Section 12.3 illustrates our proposed hardware
architecture for the simple reinforcement learning (SRL) algorithm, with
architecture-level description and data flow descriptions across the modules.
Implementation and simulation results, along with performance data considering a
few key metrics obtained from our proposed hardware architecture, are presented in
Section 12.4. In this section, we also provide a comparative analysis of the available
hardware implementations for Q-learning as reported in the literature. Section 12.5
discusses several applications of the Q-learning and SRL algorithms. As part of
future work, illustration of an autonomous robot for agriculture/farming industry is
provided at both the hardware architectural and application level in Section 12.6.
Section 12.7 concludes the chapter.
12.2 BACKGROUND
Designing hardware to accelerate RL algorithms has been an active area of research,
with many engineers and scientists continuously proposing several architectures and
finding different ways to make possible their implementation at the edge. After a
proposal in 1989 by C. Watkins on Q-learning, a technical note by C. Watkins and P.
Dayan further provided an in-depth view on Q-learning in the early 1990s [22], by
detailing a convergence theorem and showing how the Q-learning algorithm provides
an optimal path, as long as all discrete actions are repeatedly sampled in all states
while discussing the Markov environments. Liu and Elhanany [23] proposed a pi
pelined hardware architecture that significantly reduces delay caused by action se
lection and value function updates. They provided a set of formal proofs related to
reduced delays due to their approach. The proposed approach enabled the authors to
mainly focus on application of Q-learning to large-scale or continuous action spaces.
Hwang et al. [24] proposed a hardware realization for a multilevel behavioral
robot that can execute complex tasks in an autonomous fashion and called it
Modular Behavioural Agent (MBA) with learning ability. The realization was made
by considering a template that embeds an RL mechanism with a critic–actor model,
and the proposed architecture was implemented on an FPGA that hosted a CPU
core. Their hardware setup demonstrated with examples the ability of goal-seeking
robots to reach their destinations in unstructured environments. In the literature,
much of current research work continues to focus on RL algorithms and ways to
accelerate them using specialized hardware architectures for edge applications.
Though this comes with certain challenges, a necessary vision is being framed by
both academia and the industry to address them.
Shi et al. [25] provided an introduction to the definition of edge computing along
with several case studies to materialize the concept. Several challenges and op
portunities were discussed as part of processing the data at the edge of the network.
It is indeed essential to look at the advantages that come up with edge deployment,
ranging from privacy, data security, data bandwidth savings, response time, etc.
238 VLSI and Hardware Implementations
Several applications that are part of the DRL paradigm (a combination of both RL
and DNNs) are very difficult to deploy at the edge as they require huge compu
tational resources, demand higher power, and are not portable in the majority of
cases. As such, a convergence for both deep learning and edge computing has
emerged as a topic of consideration.
Wang et al. [26] provided a comprehensive survey of how smart devices that
generate huge amounts of data need to be processed at the edge without much delay
in response. As part of the survey, the authors illustrated application scenarios of
realizing an intelligent edge with a customized computing framework. With the aim
of facilitating AI in the day-to-day life of humans, efforts are being made by both
the hardware and software communities to minimize the cost of edge devices and
make them more efficient.
Designing hardware architectures for DRL algorithms targeted at FPGAs has been
a major area of interest with researchers proposing novel solutions for model-free
environments and policy optimization-based algorithms. One such solution is pro
posed by Hyungmin Cho et al. [27]. They presented an FPGA-based asynchronous
advantage actor–critic (A3C) DRL platform called FA3C in which they demonstrated
its advantages with respect to performance and energy efficiency when compared to
an implementation based on a NVIDIA Tesla P100 GPU platform. In both the im
plementations, the A3C agents were programmed to learn the control policies of
selected Atari games.
With diverse problems that arise in robotics, distributed control, and a variety of
other domains, multi-agent systems (MAS) could be used to propose solutions,
wherein the tasks, depending upon the necessity, are handled by multiple agents.
This, when done in conjunction with RL algorithms, allows researchers to propose
solutions based on strategies like fully cooperative, fully competitive, and general
(neither competitive nor cooperative) tasks. A complete overview of MAS and
Multi Agent Reinforcement Learning (MARL) is provided as part of the technical
report by Busoniu et al. [28] along with an example of coordinated transportation of
an object by two robots while discussing cooperative robotics. Though realization
of hardware for MARL algorithms comes with a challenge of parallelization of
computing resources and fast and efficient data sharing between several on-chip
modules, a considerable effort has been made by researchers in this direction. There
is also a proposed algorithm by Matta et al. [29] on how the standard Q-learning
algorithm has been extended as part of MAS to target applications like swarm
robotics, which allows a knowledge sharing mechanism between multiple agents.
Several works providing information on how MARL works in different scenarios
involving real-life/social dilemma and allows agents to learn and behave accord
ingly to accomplish tasks are available in the literature. Some of the research work
reported in the literature has proven the multi-agent behavior with simulations of
environment for complex tasks, while other works have experimentally shown the
behavior on real hardware. Policy optimization algorithms involving multi- agent
behavior in the form of actor–critic strategy are also extensively studied as part of
knowledge transfer in order to mimic traditional student–teacher behavior. This
indeed demands parallel computations to be handled for multi-thread/multi-core
processing while allowing faster learning abilities and accomplishments.
Reinforcement Learning Algorithms 239
FIGURE 12.2 State diagram representing an environment with 16 states and 4 actions.
has been designed using a state transition diagram in which each state is a register
entity holding a specific value according to its rewarding identity.
As shown in Figure 12.2, the environment has 16 states, with the start/initial state
being state0 and the end-state/goal being state15. In each state, the agent can take
any of the available four actions – up, down, right, and left. The state transiton
diagram models a grid-based environment in which border states such as state0,
state1, state2, state3, state12, state13, etc., have only a subset of the above four
actions that allows the agent to enter or transit into another state, possibly a new
state. However, it is possible that the agent, with some of the actions in the subset,
continues to remain in the same state it was occupying earlier. For example, con
sider state0 in Figure 12.2. In this state if the agent takes either action up or action
right it will end up occupying states state4 and state1, respectively, while it will
continue to remain in state0 whenever it takes actions left or down. Similarly, while
in state12, taking actions right and down will move the agent into state13 and
state8, respectively, whereas taking actions up and left will leave the agent in the
Reinforcement Learning Algorithms 241
same state12. Some states like state5 will allow the agent to go into newer states for
every action it takes – for example action up taken in state5 will move the agent into
state9, action down into state1, action right into state6, and action left into state4,
respectively. Some other states, where every action taken will allow the agent to
enter newer states, are the ones located at the center of the grid (modeling the
environment), for example, state6 and state10. Though a start state and an end state
are provided in the state transition diagram for reference, it has been designed in
such a way that it holds good even if the agent starts or ends, respectively, for any
start and end state pairs.
As part of providing rewards to the SRLA for the states it traverses, a reward
matrix has been designed by allotting each state with their respective reward points
or values, as shown in Figure 12.3. The main aim of any RL algorithm-based agent
is to collect positive reward points and maximize its cumulative reward while
reaching the goal. Depending upon the traversal path taken (a collection of different
states starting with the start state) for the actions executed by the SRLA in different
states, those actions that provide positive rewards to the agent when a particular
new next state is reached would be mapped with the new state. As the objective is to
maximize the cumulative reward, this leaves room for only a single or unique action
to be taken by an agent to reach a new state from any specific current state, thereby
reducing the memory required to store the values for other actions in a state-action
table (which is generally part of the Q-learning algorithm while populating the Q-
table) as discussed in Notsu et al. [36].
The hardware architecture depicted in Figure 12.4 has two major components –
SRLA and environment. The main objective of the hardware architecture targeted for
rendering the SRL algorithm is to make its deployment possible in edge devices. As
discussed in earlier sections and in Notsu et al. [36], the SRL algorithm is very compact
as it results in the smallest memory footprint as well as a very low logic hardware
footprint when implemented with the architecture given in Figure 12.4. However, one
drawback that has been reported in the literature is that it does not necessarily result in
an optimal solution or path always, unlike the Q-learning algorithm. While this is
certainly a disadvantage, SRL is good at providing sub-optimal solutions or paths for
the RL agent that enables the goal to be reached with a lesser amount of resources. This
is proven by our results and analysis given in the sections to follow.
The hardware architecture is designed with a pool of registers, Read Only
Memory (ROM), memory array, and some comparators as and when necessary to
Reinforcement Learning Algorithms 243
mimic the functionality of the algorithm discussed in Notsu et al. [36]. The State
Register holds value of the current state of the agent. The Next State Register holds
the value of the new state reached by the agent for a specific action taken by it when
it is in the current state. The Action Register is responsible for storing the value of
any action that the agent chooses randomly from the set of actions applicable in the
current state while moving around in the environment. In this particular im
plementation, four actions were considered to navigate in the environment (up,
down, left, and right). A linear feedback shift register (LFSR) is employed to
generate the actions randomly, as shown in Figure 12.4. The Previous Register
shown in the architecture stores the value of the current state, once the agent moves
to a new next state. This is needed to facilitate comparison and reward-based action
assignment to the respective current state. Reward Register holds the respective
reward value the agent has collected while moving to a new next state. RewardM is
a ROM that is responsible for storing pre-determined reward values associated with
each state of the environment. Goal Verification Circuit is responsible for verifying
the correct state-actions mappings. State and Action Circuit is responsible for
writing the respective state and action mappings to the MemoryGood memory array
in a controlled and conditional way depending upon the types of rewards achieved
by the agent while navigating in the environment. Goal Verification Register is a
synchronous register that gets set if the conditional mapping of state and action is
met as per the rewarding criteria of state and vigilance.
TABLE 12.1
Comparison of Proposed Hardware Architecture for SRL with Silva et al. [ 37]
and Spano et al. [ 38]
Parameter Under Proposed Hardware Silva et al., Hardware Spano et al., Hardware
Consideration Architecture for SRL Architecture for Q- Architecture for Q-
(States = 16, Learning [ 37] Learning [ 38]
Actions = 4) (States = 12, (States = 16,
Actions = 4) Actions = 4)
data in Table 12.1 prove that the proposed hardware architecture enables an im
plementation to be deployed as part of edge devices.
Silva et al. [37] and Spano et al. [38] proposed hardware architectures for im
plementation of Q-learning algorithms targeted for Xilinx Virtex-6 platforms and
Xilinx Zynq Ultrascale+ MPSoC ZCU106 Evaluation kits, respectively. The au
thors present data related to the FPGA resource utilization, power consumption, and
other important design attributes for different data formats and different number of
states and actions. A tabulated comparison is provided in Table 12.1 to get an
understanding and to know the advantages of the proposed SRL hardware archi
tecture in comparison to those proposed in Da Silva et al. [37] and Spanò et al. [38].
It is worth mentioning that the implementation results provided as part of the
proposed hardware architecture also include the environment as a state transition
diagram, where to the best of our knowledge both the architectures being compared
are concerned only about the agent’s computational, memory, and other associated
resource requirements. As seen, our proposed hardware architecture clearly has an
advantage with respect to resource utilization. It is weaker on the power con
sumption parameter, which is seen to be slightly higher. This discrepancy is
primarily due to the I/O pins, which were needed and realized for verification of
hardware. These I/O pins consume power, which amounts to almost 96% (23
mW) of the power consumed by the entire implementation, as clearly given in the
table. This could be easily reduced by design management of I/O pins for the
target FPGA [39]. Hence, the proposed architecture nearly outperforms both the
hardware architectures for the considered parameters, as shown in Table 12.1.
The main aim for the comparison is to prove the usefulness of SRL hardware ar
chitecture to be beneficial for implementation on edge devices. The proposed FPGA-
based hardware implementation also results in a good execution speed in comparison
to the software-based execution of the SRL algorithm with Desktop CPU (x86-64
Reinforcement Learning Algorithms 245
architecture). The execution speed is also at par with that of the architecture reported
in Da Silca et al. [37]. As stated earlier, the SRL algorithm in itself doesn’t guarantee
an optimal path, as guaranteed by the Q-learning algorithm. This should be considered
its shortcoming. From the perspective of hardware implementation on edge devices,
the proposed architecture appears to do better compared to other architectures. It is the
cost of non-optimality of the path generated by the agent for different applications,
which needs to be taken into consideration while deciding on the choice of the
hardware architectures for edge devices implementing these applications.
TABLE 12.2
Autonomous Robot Description for Different Modes of Operation
Power ON Working Mode Chemical Docking Mode Power OFF
Filling Mode
This state This mode is When the The docking mode This state de-
indicates the responsible for chemical/ is responsible for activates all the
start of the reading the fertilizer in a entering into a sensors.
robot by sensory data and container hosted charging station
activating all processing it at on the body of when the power/
the necessary both the the robot gets battery resource
sensors. processor and finished, this has reached a
co-processor mode allows the certain minimum
level. robot to re-fill it level.
at the designated
dispensary point.
This state also The actual path The activation of Docking mode is This state could be
allows the robot planning and this mode halts retained till the entered from any
to enter any of other operations the working battery reaches of the other states
the working/ related to plant mode the satisfactory depending upon
chemical filling/ disease detection temporarily till level of charge the necessity or
docking modes and chemical the chemical is with other emergency or as
depending upon spraying filled in its working mode part of a safety
the necessity happens in respective sensors measure or of
(e.g. Scenario: this mode. container up to a temporarily completion of
If initially the particular level. suspended. assigned tasks.
chemical
container is
empty, it
directly enters
into chemical
filling mode
before entering
to actual
working mode).
overview of such challenges and behaviors has already been provided in previous
sections with reference to detailed sources available in the literature. Major challenges
to be addressed as part of hardware design involve the parallelization of computa
tional data to accommodate more agents and the reduction of delay in processing data
and distributing data through various data channels.
A general agent–environment interface for MAS is shown in Figure 12.7, along
with a representational diagram of how the multi-agent behavior could be achieved
12.7 CONCLUSION
An attempt has been made to provide an overview with respect to RL algorithms,
MDPs, specialized hardware architectures designed for the edge involving acceleration
Reinforcement Learning Algorithms 251
ACKNOWLEDGMENT
The authors would like to express sincere gratitude to IIITB as well as Machine
Intelligence and Robotics Centre (Government of Karnataka) for supporting this work.
REFERENCES
[1] Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An Introduction,”
A Bradford Book, Cambridge, MA, USA, 2018.
[2] Csaba Szepesvari, “Algorithms for Reinforcement Learning – Synthesis lectures on
Artificial Intelligence and Machine Learning,” Morgan and Claypool Publishers,
San Rafael, CA, 2010.
[3] Xiaofei Wang, Yiwen Han, Victor C. M. Leung, Dusit Niyato, Xueqiang Yan, and
Xu Chen, “Edge AI - Convergence of Edge Computing and Artificial Intelligence,”
Springer, Singapore, 2020.
[4] NVIDIA GPU – Turing Architecture. Available online at ( www.nvidia.com/en-us/
geforce/turing/) - Accessed website on 29th November 2020.
[5] Coral Development Board – Edge TPU coprocessor. Available online at ( https://
coral.ai/products/dev-board/) - Accessed website on 29th November 2020.
[6] Intel Neural Compute Stick 2. Available online at ( https://software.intel.com/content/
www/us/en/develop/hardware/neural-compute-stick.html) – Accessed website on 29th
November 2020.
252 VLSI and Hardware Implementations
CONTENTS
13.1 Introduction................................................................................................. 256
13.2 Preliminaries ............................................................................................... 257
13.2.1 Framework for Implementation Vulnerability Analysis .............257
13.3 Profiled Side-Channel Attacks ...................................................................259
13.3.1 Deep Learning Architecture for Analysis ...................................259
13.3.2 Convolutional Neural Networks ..................................................259
13.4 Protected Countermeasure Techniques...................................................... 260
13.4.1 Unrolled Implementation .............................................................260
13.4.2 Threshold Implementation ...........................................................260
13.5 Case Study of GIFT Cipher.......................................................................261
13.5.1 GIFT Algorithm Description .......................................................261
13.5.2 Implementation Profiles ...............................................................261
13.5.3 Round (Naive) Implementation ...................................................262
13.5.4 (Un)Rolled Implementation .........................................................262
13.5.5 Partially (Un)Rolled Implementation with Threshold
Implementation Countermeasure .................................................262
13.5.6 Experiment Setup .........................................................................263
13.6 Description of PSCA on GIFT Using DeepSCA......................................263
13.6.1 Vulnerability Analysis..................................................................264
13.7 Conclusion and Future Work.....................................................................267
Acknowledgments.................................................................................................. 267
References..............................................................................................................267
13.1 INTRODUCTION
Side-channel attacks (SCAs) are becoming a cause for concern with the large-scale
deployment and use of resource-constrained devices. SCAs are implementation
attacks that gain information from the physical implementation of a computer
system, rather than weaknesses in the implemented algorithm itself. These attacks
were first pioneered by Paul Kocher in the 1990s, when it was discovered that the
circuit consumes different amounts of power depending on the input data that are
fed. These types of SCAs, where the variation in power consumption is exploited,
are known as power analysis attacks [1]. During the implementation of a crypto
graphic algorithm, the plaintext (input) is combined with the key in different
combinations to produce the ciphertext (output). An attacker exploits the data that
leaks through side channels to reduce the hypothetical key search space and identify
the dependency between the device and the secret key.
Hardware and software implementations of many block ciphers have been
practically compromised by continuous side-channel analysis, and their security has
been a long-standing issue for the embedded systems industry.
To protect devices from this attack, researchers have come up with efficient
countermeasure techniques, namely, threshold implementation (TI) and private-t
circuits. In 2006, Nikova et al. [2], proposed a scheme, TI based on secret-sharing,
and this is provably secure against first-order differential power analysis attacks.
Several TI techniques have been proposed over the years, with variations in secret
sharing and implementation techniques. Another countermeasure was private circuits,
proposed by Ishai et al. [3]. A circuit is t-probing secure, meaning any set of “t”
intermediate variables is independent of the secret, which means circuit security is not
dependent on the type of side-channel leakage but on the amount or the rate at which
information leaks from the device. The main goal of this technique was to mask and
wrap the building blocks with random values that remain secure against attacks even
if the attacker is able to observe t bits during one computational clock cycle.
However, it does not stop exploring the attacks on the countermeasures, speci
fically after the advent of profiled (template) attacks [4,5]. Profiled SCAs play a
significant role in the security assessment of cryptographic implementations. These
are the most potent type of attacks as the adversary can characterize the side-
channel leakage of the device before the attack. In this scenario, the adversary gains
access to the inputs and keys of the profiling device and uses them to characterize
the physical leakage. The profiling SCA happens in two steps. First is a profiling
phase, where the adversary characterizes the leakage distribution function for all
possible secret key values with the help of the traces acquired from the profiling
device. Second is an attack phase, where he performs a key recovery attack on the
target device. However, in a useful adversary model, many traces are required for
the attack. Measurement handling and analysis expertise required for this attack are
very high. Then, machine-learning–based attacks [6] widen up the possibility and
exposure to handle the attack.
Recent works have highlighted the advantages of employing deep learning (DL)
architectures [7] such as multi-layer perceptrons (MLPs) and convolutional neural
networks (CNN) as an alternative to the existing profiling SCA attacks. Despite
Deep Learning Techniques 257
13.2 PRELIMINARIES
13.2.1 FRAMEWORK FOR IMPLEMENTATION VULNERABILITY ANALYSIS
Though cryptographic primitives are theoretically secure, their implementation might
be vulnerable due to reasons such as lack of awareness about the adversity and
knowledge about validation methodologies. The adopted countermeasure should be
adequately vetted to avoid a chain of problems. The adopted evaluation method must
be a standard one; otherwise, a proper framework has to be developed and augmented
with existing standards for ease of invalidation. Validating security for counter
measure techniques needs a comprehensive testing process, which is lacking in ex
isting standards such as FIPS-140-2 (conformance style testing) and Common Criteria
258 VLSI and Hardware Implementations
(evaluation style testing). The focus here is on the development of a framework for
secure implementation of crypto-primitives against implementation attacks. The fra
mework should be fast, effective, and reliable for analysis. To achieve this effectively,
the objective is divided into multiple stages, as shown in Figure 13.1.
Naive implementation. Naive implementation involves identification of the
algorithm building blocks, then realizing the blocks according to the application
requirement. In general, serial, round, and unrolled implementations are preferred
for constrained, conventional, and accelerator devices, respectively, along with
optimized implementation techniques [10].
Conformance-style test. Generally, naive implementations are vulnerable
against SCAs, in particular against power attacks. Conformance-style testing helps
in identifying quickly whether the implementation leaks potential information or
not. For instance, the test vector leakage assessment (TVLA) [11] explores the
difference of means to standard error variation using two sets of traces. Though χ˜2
test’s [12] success rate [13] is effective, TVLA is simpler for analysis.
Exploitation using power analysis. The following points are essential to re
veal the secret key of naïve implementation. First, identify a PoI in the algorithm.
Second, capture power consumption during execution of the algorithm. Third, the
power consumption model should reflect the actual power consumption of a cir
cuit. Last, an appropriate statistical distinguisher should be used to correlate be
tween modeled and captured power consumption of the circuit to reveal the secret
key. Among many statistical methods [14], CPA is reliable and a widely
adopted one.
Countermeasure. TI uses the secret sharing principle. Countermeasures break the
correlation between modeled and captured power consumption. Masking and hiding
are two preliminary approaches to protect the implementation. One such provable
masking technique is TI based on secret sharing techniques. Other methods such as
private-t circuits [3] and orthogonal direct sum masking come with several trade-offs.
Therefore, TI is preferred for an efficient countermeasure realization.
Evaluation of protected implementation. A big challenge is how to evaluate
the protected implementation efficiently. A straightforward way is to perform
TVLA once more and compare with step 2’s TVLA results to understand the attack
complexity. Another way is to perform (non)-profiling techniques using advanced
SCAs, like the DL [15] approach. Recent works have explored the SCA based on
different DL techniques, particularly, CNNs [16,17] outperform other attacks.
That’s why the feedback mechanism is essential to improve countermeasure tech
niques towards higher order masking to increase attack complexity. Most of the
steps evolved over the years. In this chapter, we focus on the evaluation of protected
implementation, mainly using the DL architecture with their hyper-parameters for
Deep Learning Techniques 259
analysis. The rest of the section provides a brief introduction about profiled SCAs,
including CNN, and the protected countermeasure techniques.
n3
[ ]
1 2
z = s [ ]n . [ ]n
260 VLSI and Hardware Implementations
• Correctness: The sum of the output shares gives the desired output.
• Non-completeness: Every function is independent of at least one share of each
of the input variables.
• Uniformity: The input, the output, and the distribution of its shared output
values have to be uniform. In other words, each possible shared output has to
be equally likely.
We adopted the GIFT cipher as a case study and evaluated profiled SCA using CNN
as follows.
Deep Learning Techniques 261
TABLE 13.1
GIFT S-Box
x 0 1 2 3 4 5 6 7 8 9 a b c d e f
S(x) 1 a 4 c 6 f 3 9 2 d b 7 5 0 8 E
TABLE 13.2
GIFT P-Layer
i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
P(i) 0 17 34 51 48 1 18 35 32 49 2 19 16 33 50 3
i 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
P(i) 4 21 38 55 52 5 22 39 36 53 6 23 20 37 54 7
i 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
P(i) 8 25 42 59 56 9 26 43 40 57 10 27 24 41 58 11
i 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
P(i) 12 29 46 63 60 13 30 47 44 61 14 31 28 45 62 15
262 VLSI and Hardware Implementations
are divided into two sets: the traces that are going to be used for training of the CNN
model and the traces that are going to be used for evaluating the functionality and
classification accuracy of the trained DL model. GIFT is implemented in three
distinct profiles as follows:
of the cipher gets diffused by the fourth round, and a new key schedule begins from
the fifth round onwards. Entropy increases as the rounds of operation increase.
The CNN architecture and the number of training and testing traces remain the
same for all three profiles. The following results were obtained for the three profiles
of the cipher, respectively.
The number of traces required to retrieve two key bits of the GIFT cipher when
implemented in a round-based manner was found to be around 175 traces, as shown
in Figure 13.5.
Profile 2: Unrolled Implementation: Two sets of results were obtained for this
profile, corresponding to attacking the first and fifth round, respectively.
The number of traces required to retrieve two key bits of the GIFT cipher when the
first round of operation is attacked and when it is implemented in an Unrolled
fashion comes to be around 1750 traces, as shown in Figure 13.6. This is sig
nificantly more than the number of traces required in the case of Round-based
Implementation.
As expected, Unrolled Implementation is more challenging to break than naive
implementation since there are no registers to store the intermediate state value
outputs. Thus, Unrolled Implementation can also be viewed as a form of a coun
termeasure to increase the security and resistance of the cipher against side-channel
vulnerabilities. Subsequently, in the conventional SCA scenario, the number of
traces required to retrieve the key bits in an Unrolled Implementation is in 100,000
traces. This number is also substantially reduced in this case.
The number of traces required to retrieve two key bits of the GIFT cipher when the
fifth round of operation is attacked and when it is implemented in an Unrolled
266 VLSI and Hardware Implementations
fashion comes to be around 4500 traces, as shown in Figure 13.7. This is the case
since according to the algorithm, all the key bits get diffused by the end of the
fourth round, and theoretically, it should become harder to retrieve key bits by
attacking the fifth round.
Our result proves this concept wherein the number of traces required is sub
stantially higher because of the increase in entropy, thereby increasing the security
of the cipher.
For this profile, we combined the TI countermeasure with the Unrolled Implementation
profile, and from our experiments, we found the implementation is secure up to 10,000
Deep Learning Techniques 267
traces, proving that when we combine TI with the Unrolled Implementation of GIFT, it
should be harder to break the algorithm.
ACKNOWLEDGMENTS
Authors wishing to acknowledge technical staff and financial support from SETS.
REFERENCES
[1] P. C. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” in Advances in
Cryptology - CRYPTO ’99, 19th Annual International Cryptology Conference,
Santa Barbara, California, USA, August 15–19, 1999, Proceedings, ser. Lecture
Notes in Computer Science, M. J. Wiener, Ed., vol. 1666. Springer, 1999,
pp. 388–397. [Online]. Available: 10.1007/3–540-48405-125
[2] S. Nikova, V. Rijmen, and M. Schlaffer, “Secure hardware implementation of non-
linear functions in the presence of glitches,” in Information Security and Cryptology
- ICISC 2008, 11th International Conference, Seoul, Korea, December 3–5, 2008,
Revised Selected Papers, ser. Lecture Notes in Computer Science, P. J. Lee and J.
H. Cheon, Eds., vol. 5461. Springer, 2008, pp. 218–234. [Online]. Available: 10.1
007/978-3-642-00730-914
[3] Y. Ishai, A. Sahai, and D. A. Wagner, “Private circuits: Securing hardware against
probing attacks,” in Advances in Cryptology - CRYPTO 2003, 23rd Annual
International Cryptology Conference, Santa Barbara, California, USA, August 17–21,
2003, Proceedings, ser. Lecture Notes in Computer Science, D. Boneh, Ed., vol. 2729.
Springer, 2003, pp. 463–481. [Online]. Available: 10.1007/978-3-540–45146-427
[4] S. Chari, J. R. Rao, and P. Rohatgi, “Template attacks,” in Cryptographic Hardware
and Embedded Systems - CHES 2002, 4th International Workshop, Redwood
Shores, CA, USA, August 13–15, 2002, Revised Papers, ser. Lecture Notes in
Computer Science, B. S. K. Jr., C¸. K. Koc¸, and C. Paar, Eds., vol. 2523. Springer,
2002, pp. 13–28. [Online]. Available: 10.1007/3–540-36400-53
[5] T. Bartkewitz and K. Lemke-Rust, “Efficient template attacks based on probabilistic
multi-class support vector machines,” in Smart Card Research and Advanced
Applications - 11th International Conference, CARDIS 2012, Graz, Austria,
November 28–30, 2012, Revised Selected Papers, ser. Lecture Notes in Computer
Science, S. Mangard, Ed., vol. 7771. Springer, 2012, pp. 263–276. [Online].
Available: 10.1007/978-3-642–37288-918
[6] G. Hospodar, B. Gierlichs, E. D. Mulder, I. Verbauwhede, and J. Vandewalle,
“Machine learning in side-channel analysis: A first study,” Journal of
Cryptographic Engineering, vol. 1, no. 4, pp. 293–302, 2011. [Online]. Available:
10.1007/s13389-011-0023-x
268 VLSI and Hardware Implementations
CONTENTS
14.1 Introduction................................................................................................. 272
14.2 Classification of Hardware Attacks ...........................................................273
14.2.1 Hardware Trojan Taxonomy........................................................274
14.2.1.1 Insertion Phase.............................................................275
14.2.1.2 Level of Description....................................................276
14.2.1.3 Activation Mechanism.................................................276
14.2.1.4 Effects of Hardware Trojans.......................................277
14.2.1.5 Location .......................................................................277
14.2.2 Types of Hardware Trojans .........................................................278
14.3 Countermeasures for Threats of Hardware Trojans in IoT Nodes...........280
14.3.1 Hardware Trojan Detection Approaches .....................................280
14.3.2 Hardware Trojan Diagnosis .........................................................281
14.3.3 Hardware Trojan Prevention........................................................281
14.4 Machine Learning Models .........................................................................282
14.4.1 Supervised Machine Learning .....................................................284
14.4.2 Unsupervised Machine Learning .................................................285
14.4.3 Dimensionality Reduction & Feature Selection..........................285
14.4.4 Design Optimization..................................................................... 286
14.5 Proposed Methodology...............................................................................286
14.5.1 Stage 1: Analysis of IoT Circuit Structure Features................... 286
14.5.2 Stage 2: Feature Extraction from Netlist.....................................286
14.5.3 Stage 3: Hardware Trojan Classifier Training ............................286
14.5.4 Stage 4: Detection of Hardware Trojan ......................................287
14.5.5 Comparison of HT Detection Models Based on ML .................288
14.6 Conclusion ..................................................................................................290
References..............................................................................................................290
14.1 INTRODUCTION
In semiconductor technology, there has been enormous progress, which has brought
about a great amount of participation in the design stage and development stages of
integrated circuits (ICs) [1]. The design complication of the circuits increasing con-
tinuously was that they required a specialized team worldwide to work on the com-
plexity of the design to increase the manufacturability and efficiency of the ICs.
However, the difficulty lies in the security of the circuits from an adversary at any
stage of the manufacturing process inserting a malicious circuit [2]. In the design-
fabrication process of ICs, various stages could be exploited by the adversary, such as
attacks to perform an unwanted function in a circuit, called a hardware Trojan (HT),
which impact the trustworthiness and security of the device. HT circuits are defined as
an extra circuit present in a main circuit that consists of payload logic and trigger
logic, as shown in Figure 14.1. The payload logic gets activated by a trigger signal
that is sent by trigger logic from an adversary to perform an unwanted function in the
main part of the circuit [3]. The attack may cause damage to hardware by means of
change in the functionality of the device, obstruction caused during the execution,
betrayal of information stored in the hardware, etc. These threats cause severe concern
in various critical applications such as medical equipment, mobile communications,
reactors, defense-based strategies like aerospace companies, devices connected to
Internet of Things (IoTs), etc. [4]. HTs are evolving constantly beyond chips to layers
of the design, circuit components, and even devices that lead to security issues in the
entire hardware ecosystem. Thus, there is a need for security to protect hardware.
Rather than software protection, the physical device is protected from hardware
vulnerability by using hardware security. This hardware security adds a supplemen-
tary layer of security for an important hardware system [5].
In recent years, globalization in semiconductor technology has led to an increase in
device manufacturing. This led to drastic achievements in wireless communication:
embedded system application, microelectronics technology, sensor technology, and
IoTs. These are widely used by governments, academic institutes, and industries
worldwide [6]. For example, in smart healthcare systems, patients can register for an
appointment with a doctor for consultation in any area using IoT technology. In in-
dustrial systems, production is managed in various dimensions using IoT. IoT ap-
plications provide communication between humans and nature [7]. Accordingly, there
are various methodologies to protect IoT nodes from adversaries. One such metho-
dology is the machine learning approach. The machine learning approach is a recent
trend in hardware security to protect against an adversary attack. Machine learning
algorithms are considered to be the major protection method from any attack [8].
There are new models proposed for HT detection; even then, attacks from adversaries
find a better method and insert an HT into the circuit. Thus, machine learning is more
effective and used to detect the unknown HT present in the circuit by training the
classifier to detect the new HT. Machine learning finds out the unwanted function in
the circuit by determining variation in the power consumption of the device, run-time
variation, etc. Accordingly, it is able to distinguish the HT-affected circuit from un-
affected circuits. In this chapter, we discuss threats and countermeasures of hardware
attacks, machine-learning–based approaches, and their application.
• Design – This includes tools, IPs, standard cells, and models. Here, the
models and cells will be used by the designer as per the design requirements
but there is the possibility of HT attack in the IPs supplied by the IP vendor if
tools are provided by an untrusted vendor.
• Fabrication – This stage includes the process of masking, lithography, and
packaging. The fabrication process may also be untrusted, and there is the
possibility of the addition of an extra HT circuit in the main circuit.
• Manufacturing test – This stage includes verification and testing of the ICs. It is
secured if the verification is done under a trusted unit. After testing, the ICs are
sent to market, where the devices are distributed by a trusted distributor or
possibly by an untrusted distributor, thus increasing vulnerability to HT attack.
The threats are unavoidable in these processes as the processes are expensive and
consume more time because of an increase in the manufacturing units of ICs and
their global requirements. Hence, it is challenging to determine the defensive ap-
proach to lessen the security risk caused by HT attacks [10]. These cause great
economic loss and harm to society.
The attacks may be during fabrication or before fabrication; detection of these
attacks is highly difficult because of various reasons [11]:
• First, the complexity of IP blocks in the circuit has made detection of minute
modification in the IoT chips difficult.
• Second, physical inspection methods and reverse engineering methods con-
sume a lot of time, and they are costly to operate in the circuit as the ICs are
sized to nm scale. Thus, detection will be difficult. Reverse engineering is
destructive in nature, and it is not an effective method as it fails to determine
if the remaining ICs in a circuit are HT free.
• Third, the HT is activated only for the specified signal sent by the adversary as
per the function requirement; hence, it is difficult to determine the affected signal.
• Fourth, faults present in the netlist of the design are determined by the design
for test, but the test is capable of detecting only the stuck-at-faults. Thus, it
fails to ensure the design is HT free.
payload and trigger mechanism, thus differentiating them as analogue HTs and digital
HTs [3]. Digital HTs include both combinational and sequential HTs. These types of
HTs are integrated, extended, and further classified into an attribute-based HT tax-
onomy [16]. Attributes like design location, design phase, abstraction level, type of
logic, layout, and design function have improved the classification of HTs [17]. The HT
taxonomy is shown in Figure 14.3. This figure shows an organized taxonomy built on
the relationship amongst the HT location injected in a target chip in addition to those
targets that are overwhelmed through the trigger input.
• Fabrication phase: During this phase, the adversary can use his own masks
over wafers. This may cause a serious effect. The adversary can change the
chemical composition during fabrication. This leads to consumption of more
power supply and leads to faster aging of the chip.
• Assembly phase: In this phase, the verified chips and the supplementary
components are mounted on a printed circuit board (PCB). At this time, the
adversary may add two or more components on the PCB, along with the HT-
free ICs. These components later cause a change in the device function or the
stored information may get leaked.
• Testing phase: In this phase, the circuit is tested to determine the circuit
functions; however, even in this situation the HTs are not discovered as the
adversary kept the nets secret. Due to these secret nets, the adversary can
activate the HT when required to collect stored data.
• System level: The modules used in the target hardware design, such as in-
terconnections, hardware blocks, and communication protocols, may be
triggered by the HTs present in any of the modules. Thus, at system level
there is a possibility of HT attack.
• Development atmosphere: This comprises verification, simulation, synthesis,
and tools validation. The HT can be inserted using CAD tools and scripts.
• Register transfer level: At this level, the HT insertion can be done easily by
the attacker. The adversary has control of the functional modules that are
described in terms of signals, registers, and Boolean function.
• Gate level: At this level, the logic gates are interconnected to form a design.
The HT affects the size and location of the design; thus, the attacker can alter
the functionality of the design. These HTs can be sequential or combinational;
thus, they are called “functional Trojans”.
• Transistor level: To build logic gates, transistors are used. Thus, the attacker
can remove the transistor or insert a transistor to alter the functionality of the
circuit. The size of a transistor is also changed to modify the parameters of the
circuit. This type of HT attack will produce a huge delay in the critical path of
the circuit.
• Layout design: The circuit dimension and locations are described at this level.
This is also called the physical level of the circuit. The adversary inserts the
HT by changing the length of the wire, spacing among circuit fundamentals,
and alignment of layers.
badly and also leads to aging of the device. The dormant mechanism is activated by
another circuit, which is either externally or internally triggered.
14.2.1.5 Location
The HT can be present in any part of the chip. That is, it may be present in pro-
cessors, I/O devices, memory, the clock, or the power supply. The HT can be in
multiple components or in a single component. These HTs act independently where
they are present.
278 VLSI and Hardware Implementations
• IP-level HT: Each individual IP core of the chip is implanted by an HT, and
these are triggered by any of the internal conditions and by rare nets or
signals. When the HT gets activated, it affects the IP core where it is present,
as shown in Figure 14.4(a).
FIGURE 14.4 Trojan threats: (a) IP-level threat, (b) bus-level threat, (c) chip-level
threat [ 31].
Machine Learning in Hardware Security 279
TABLE 14.1
Summary of Types of Hardware Trojans
Type
Sl.no Property IP-level HTs Bus-level HTs SoC-level HTs
Thus, the affected IP core or the untrusted third-party IP core gets activated by the
rare conditional signals, which severely damages the chip. IP-level HTs are similar
to system on chip (SoC)-level HTs, but they differ in some aspects as the IP-level
HTs are implanted in individual IPs but SoC-level HTs can be implanted in many
IPs present in the chip. IP-level HTs affect the particular IP where they are em-
bedded, whereas SoC-level HTs affect the other IPs, which affects the general
function of the SoC. The detection of IP-level HTs can be done by determining
functional and structural features, but SoC-level HTs are determined by dynamic
analysis of the whole SoC.
Bus-level HTs also differ in some aspects from SoC-level HTs as these HTs are
implanted in the circuit components that connect to buses but not in IPs, whereas
SoC-level HTs are implanted in more than one IP. Bus-level HTs cause damage on
interconnecting cores, whereas SoC-level HTs impact the whole SoC. Bus-level
HTs are determined by linker and router behaviors, whereas SoC-level HTs are
determined by the behaviors of IP cores.
280 VLSI and Hardware Implementations
analyzes the presence of HTs in the circuit. Logic test – the side-channel ana-
lysis approach becomes ineffective due to process variations. Thus, the logic test
is more effective as it generates test patterns of the circuit to detect HTs. As the
adversary can insert any number of HTs in the circuit, it is difficult to determine
all HTs, and the test patterns for each HT are difficult to generate. Thus, to
generate the test vector, the statistical approach has been developed.
functional cell instead of nonfunctional cells. The standard cells that are in-
serted in the spare space are connected to form a circuit called “built-in self-
authentication”, which is an autonomous circuit.
• Split manufacturing – At the untrusted foundry, the transistors and lower metal
layers are fabricated to improve the security of an IC. At the foundry, split
manufacturing is used. That is, the attackers in the front-end on-line (FEOI)
layers without back-end on-line (BEOI) cannot recognize the appropriate places
in a circuit for inserting the HT. In routing, floor planning, and placement of the
design, the insertion of an HT is avoided by split manufacturing.
training the model for prediction or classification [20]. The machine learning algo-
rithm involves the general procedure shown in Figure 14.6, and it is given as follows:
i. Pre-processing phase: In this phase, first the related features of the design are
selected, and later the data with these features are extracted from the selected
exceptional data as they will be used to differentiate a changed value of the
284 VLSI and Hardware Implementations
target outputs. To generate the samples for learning, scaling, feature selec-
tion, and dimensionality reductions are implemented.
ii. Learning phase: In this phase, to drive the models from the training dataset,
the appropriate learning algorithms are selected and implemented. In order to
obtain the final models, cross-validation, evaluation of results, and optimi-
zation, these operations are carried out to perform the models.
iii. Evaluation phase: In this phase, to evaluate their performance, the test da-
tasets of the final models are tested.
iv. Prediction phase: In this phase, the target output values for the new inputs are
predicted by using the final models.
According to the nature of the data types, machine learning processes the data; thus,
the learning tasks are categorized into two types for HT defensive mechanisms, i.e.
supervised machine learning and unsupervised machine learning. Supervised ma-
chine learning techniques perform model training by operating the labelled data,
and the final model is selected to foresee the target classes over input data.
Unsupervised machine learning techniques emphasize characterizing the data by
learning from the labelled data for all classes of targets that are unavailable.
Supervised and the unsupervised machine learning differ depending on the avail-
able data labels. An overview of machine learning algorithms, feature selections,
optimization, and model enhancement techniques in defensive strategies against
HTs are presented here.
The summary of machine learning algorithms is given in Table 14.2, which shows
that supervised learning is effective compared to unsupervised learning because
supervised learning is able to address cases with few features. Thus, it achieves
better classification results. Exclusively, machine learning is suitable for HT de-
tection as a definite output is analyzed for each input. Still, this process consumes
more time, and multiple iterations are performed to attain optimum result.
TABLE 14.2
Summary of Machine Learning Algorithms
Sl. No. Machine Learning Advantages Disadvantages
Algorithm
1 Supervised learning • Cases with few features are • Overfitting and underfitting
performed quickly. are easy.
• According to input, the output • It is unsuitable for multi-
is clear. classification problems.
• Classification results are
relatively better.
• The learning models are not
sensitive to noise.
2 Unsupervised • No golden ICs are required. • It has poor classification
learning • Large dataset is efficient and results.
scalable. • The optimal solution is not
• Models are simple and easy to accurate.
implement and independent of • These models are noise
parameters. sensitive.
• It has the requirement of
cluster values.
3 Selection of features • Redundant features are • It consumes more time.
removed. • It loses HT features.
• It has better accuracy of HT • It manually determines
detection. threshold values.
• Attribute space is removed.
4 Design Optimization • It is used in the DFS strategy. • It consumes more time.
Rare signals are decreased. • It is effective for simple
• Learning model's performance circuits.
is improved. • To obtain optimal solutions,
several iterations need to be
performed.
used, where each one of the HT netlists is treated as a testing set and the remaining
HT-included netlists are treated as a training set. To detect the HT-free netlist, one
among that netlist is considered as a testing dataset, and the remaining netlists are
treated as a training set.
Throughout the training procedure, classifier superiority is determined through
the appropriate indicators. The classifier is retrained if the indicators and detection
of HT nets are not satisfactory.
TABLE 14.3
Comparison of Machine Learning (ML) Approaches
Detection Non-ML-based Strategy ML-based Strategy
Technology
of HT Advantages Disadvantages Advantage Disadvantage
recreate the novel design of the end products. The reverse engineering
technique has the capability to determine minute changes to ICs and thus has
a high accuracy of detection, but it is an irreversible method that consumes
more time to apply to a complex IC. Even though it is time consuming, this
method is applied on a design consisting of a smaller number of ICs. The
consumption time of this traditional methodology is reduced by applying a
favorable machine learning classifier that minimizes the number of steps to be
carried out. Machine-learning–based reverse engineering helps to avoid
manually entering the netlist, thus generating the netlist automatically by
analyzing each layer of the IC design. The machine-learning–based non-
destructive methodology also has some deficiencies, i.e. it is applicable to a
simple design and is costly.
• In circuit feature analysis [32], the analysis is based on the circuit features
extracted from the gate-level netlist. The extracted features are computed and
further determine whether the net is suspicious or not. The HT is determined by
analyzing the switching activity and net features. The suspicious nets get ac-
tivated in rare conditions, which are supplied by the circuit inputs, which in turn
act as trigger inputs and activate the HT in the circuit. The n extracted features
from individual nets are used to segregate the infected HT nets from the
standard nets. The machine-learning–based feature analysis makes use of the
classifier to determine the infected nets from the standard nets. The classifier is
trained with the training set, and the testing set from the unknown netlist is
tested to identify the suspicious activity carried out. The true positive rate of this
method is high, but the accuracy and the true negative rate is not achieved. The
learning model has the combination of structural and functional features to
generate a vigorous training set, thus improving the efficiency of HT detection.
• In side channel analysis [33,34], circuit parameters are measured, such as path
delay, critical path, power and temperature, to differentiate the IC infected
from the HT with that of the golden IC, which is free from HTs. The side
channel analysis analyzes the variations in the parameters, and if there are any
changes in the parameter values from that of the golden IC parameters, it
determines the presence of the extra circuit in the design. The HT detection
efficiency is based on the signal-to-noise ratio as side channel analysis de-
pends on noise and process variation. To improve the deficiency of side
channel analysis, machine learning is incorporated along with a traditional
method like ANN with side channel, ELM with side channel, or BPNN with
side channel to improve the signal-to-noise ratio in the circuit. However, they
lack pre-processing of sampled features, and they are volatile in nature. This
method analyzes various machine learning models and applies the most ap-
propriate algorithm depending on detection cases and features of the circuit.
The machine learning algorithm is more effective, and higher precision rates are
achieved. Machine learning techniques are not sensitive to noise, and thus the ac-
curacy is achieved through special indicators, such as true positive rate, true negative
rate, and accuracy.
290 VLSI and Hardware Implementations
14.6 CONCLUSION
The threat of HTs has raised concerns among designers. Thus, it has become a major
focus in industrial and academic research. Here, we have explained the machine-
learning–based HT approach by analyzing the various challenges and problems faced
during research. Subsequently, the threats from adversaries have increased, and HT
attacks are possible beyond the layers of the chip. Hence, a basic model is bestowed
for an HT defensive approach of the hardware ecosystem. This chapter validates the
application of machine learning techniques against HT attacks. In the future, the
collaboration of two or more classifiers will be used to determine HT attacks in the
system on a chip. Further, HT detection methodology will be used by the government,
corporations, etc., to secure information from an adversary attack.
REFERENCES
[1] G. Sumathi, L. Srivani, D. T. Murthy, K. Madhusoodanan, and S. S. Murty, “A
review on HT attacks in PLD and ASIC designs with potential defence solutions,”
IETE Tech. Rev., vol. 35, no. 1, pp. 64–77, Jan. 2018, doi: 10.1080/02564602.201
6.1246385.
[2] O. Sinanoglu, “Do you trust your chip?” in Proc. Int. Conf. Design Technol. Integr.
Syst. Nanosc. Era (DTIS), Apr. 2016, doi: 10.1109/dtis.2016.7483804.
[3] S. Bhunia, M. S. Hsiao, M. Banga, and S. Narasimhan, “Hardware Trojan attacks:
Threat analysis and countermeasures,” Proc. IEEE, vol. 102, no. 8, pp. 1229–1247,
Aug. 2014, doi: 10.1109/jproc.2014.2334493.
[4] A. Antonopoulos, C. Kapatsori, and Y. Makris, “Trusted analog/mixedsignal/RF
ICs: A survey and a perspective,” IEEE Des. Test, vol. 34, no. 6, pp. 63–76, Dec.
2017, doi: 10.1109/mdat.2017.2728366.
[5] M. Rostami, F. Koushanfar, and R. Karri, “A primer on hardware security: Models,
methods, and metrics,” Proc. IEEE, vol. 102, no. 8, pp. 1283–1295, Aug. 2014.
[6] R. Khan, S. U. Khan, R. Zaheer, and S. Khan, “Future Internet: The internet of
things architecture, possible applications and key challenges,” in Proc. 10th Int.
Conf. Frontiers Inf. Technol., Dec. 2013, pp. 257–260.
[7] F. Conti et al., “An IoT endpoint system-on-chip for secure and energyefficient
near-sensor analytics,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 64, no. 9,
pp. 2481–2494, Sep. 2017.
[8] A. Kulkarni, Y. Pino, and T. Mohsenin, “Adaptive real-time Trojan detection fra-
mework through machine learning,” in Proc. 2016 IEEE Int. Symp. Hardw.
Oriented Secur. Trust. HOST 2016, 2016, pp. 120–123.
[9] H. Salmani, “COTD: Reference-free hardware trojan detection and recovery based
on controllability and observability in gate-level netlist,” IEEE Trans. Inf. Forensics
Security, vol. 12, no. 2, pp. 338–350, Feb. 2017.
[10] H. Li, Q. Liu, and J. Zhang, “A survey of hardware Trojan threat and defense,”
Integration, vol. 55, pp. 426–437, Sep. 2016.
[11] H. Li, Q. Liu, J. Zhang, and Y. Lyu, “A survey of hardware Trojan detection,
diagnosis and prevention,” in Proc. 14th Int. Conf. Comput.-Aided Design Comput.
Graph. (CAD/Graph.), Aug. 2015, pp. 173–180, doi: 10.1109/cadgraphics.2015.41.
[12] K. Xiao, D. Forte, Y. Jin, R. Karri, S. Bhunia, and M. Tehranipoor, “Hardware
Trojans: Lessons learned after one decade of research,” ACM Trans. Des. Automat.
Electron. Syst., vol. 22, no. 1, pp. 1–23, May 2016, doi: 10.1145/2906147.
Machine Learning in Hardware Security 291
[13] M. Banga and M. S. Hsiao, “A region based approach for the identification of
hardware Trojans,” in Proc. IEEE Int. Workshop Hardw.-Oriented Secur. Trust, Jun.
2008, pp. 40–47, doi: 10.1109/hst.2008.4559047.
[14] X. Wang, M. Tehranipoor, and J. Plusquellic, “Detecting malicious inclusions in
secure hardware: Challenges and solutions,” in Proc. IEEE Int. Workshop Hardw.-
Oriented Secur. Trust, Jun. 2008, pp. 15–19, doi: 10.1109/hst.2008.4559039.
[15] J. Zhang, F. Yuan, L. Wei, Y. Liu, and Q. Xu, “VeriTrust: Verification for hardware
trust,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 34, no. 7,
pp. 1148–1161, Jul. 2015, doi: 10.1109/tcad.2015.2422836.
[16] S. Moein, S. Khan, T. A. Gulliver, F. Gebali, and M. W. El-Kharashi, “An attribute-
based classification of hardware Trojans,” in Proc. 10th Int. Conf. Comput. Eng.
Syst. (ICCES), Dec. 2015, pp. 351–356, doi: 10.1109/icces.2015.7393074.
[17] S. Moein, T. A. Gulliver, F. Gebali, and A. Alkandari, “A new characterization of
hardware Trojans,” IEEE Access, vol. 4, pp. 2721–2731, 2016, doi: 10.1109/
access.2016.2575039.
[18] A. Basak, S. Bhunia, T. Tkacik, and S. Ray, “Security assurance for system-on-chip
designs with untrusted IPs,” IEEE Trans. Inf. Forensics Security, vol. 12, no. 7,
pp. 1515–1528, Jul. 2017, doi: 10.1109/tifs.2017.2658544
[19] S. Narasimhan and S. Bhunia, “Hardware Trojan detection,” in M. Tehranipoor, C.
Wang (eds.) Introduction to Hardware Security and Trust. New York, NY:
Springer, 2012, pp. 51–57. https://doi.org/10.1007/978‐1‐4419‐8080‐9_15
[20] E. Alpaydin, “Introduction,” in Introduction to Machine Learning. Cambridge, MA,
USA: MIT Press, 2014, pp. 1–18.
[21] Y. Liu, G. Volanis, K. Huang, and Y. Makris, “Concurrent hardware Trojan de-
tection in wireless cryptographic ICs,” in Proc. IEEE Int. Test Conf. (ITC), Oct.
2015, pp. 1–8, doi: 10.1109/test.2015.7342386.
[22] K. Hasegawa, M. Yanagisawa, and N. Togawa, “A hardware-Trojan classification
method using machine learning at gate-level netlists based on Trojan features,”
IEICE Trans. Fundam., vol. E100-A, no. 7, pp. 1427–1438, Jul. 2017.
[23] X. Chen, L. Wang, Y. Wang, Y. Liu, and H. Yang, “A general framework for
hardware Trojan detection in digital circuits by statistical learning algorithms,”
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 36, no. 10,
pp. 1633–1646, Oct. 2017, doi: 10.1109/tcad.2016.2638442.
[24] J. Li, L. Ni, J. Chen, and E. Zhou, “A novel hardware Trojan detection based on BP
neural network,” in Proc. 2nd IEEE Int. Conf. Comput. Commun. (ICCC), Oct.
2016, pp. 2790–2794, doi: 10.1109/compcomm.2016.7925206.
[25] S. Wang, X. Dong, K. Sun, Q. Cui, D. Li, and C. He, “Hardware Trojan detection
based on ELM neural network,” in Proc. 1st IEEE Int. Conf. Comput. Commun.
Internet (ICCCI), Oct. 2016, pp. 400–403, doi: 10.1109/cci.2016.7778952.
[26] F. K. Lodhi, S. R. Hasan, O. Hasan, and F. Awwadl, “Power profiling of micro-
controller's instruction set for runtime hardware Trojans detection without golden
circuit models,” in Proc. Design, Automat. Test Europe Conf. Exhibit. (DATE),
Mar. 2017, pp. 294–297, doi: 10.23919/date.2017.7927002.
[27] N. Karimian, F. Tehranipoor, M. T. Rahman, S. Kelly, and D. Forte, “Genetic al-
gorithm for hardware Trojan detection with ring oscillator network (RON),” in Proc.
IEEE Int. Symp. Technol. Homeland Secur. (HST), Apr. 2015, pp. 1–6, doi: 10.11
09/ths.2015.7225334.
[28] C. X. Bao, Y. Xie, Y. Liu, and A. Srivastava, “Reverse engineeringbased hardware
Trojan detection,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol.
35, no. 1, pp. 49–57, Jan. 2016, doi: 10.1109/TCAD.2015.2488495.
[29] A. N. Nowroz, K. Hu, F. Koushanfar, and S. Reda, “Novel techniques for high-
sensitivity hardware Trojan detection using thermal and power maps,” IEEE
292 VLSI and Hardware Implementations
Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 33, no. 12, pp. 1792–1805,
Dec. 2014, doi: 10.1109/tcad.2014.2354293.
[30] B. Çakir and S. Malik, “Hardware Trojan detection for gate-level ICs using signal
correlation-based clustering,” in Proc. Design, Automat. Test Eur. Conf. Exhibit.
(DATE), 2015, pp. 471–476.
[31] X. Wang, Y. Zheng, A. Basak, and S. Bhunia, “IIPS: Infrastructure IP for secure
SoC design,” IEEE Trans. Comput., vol. 64, no. 8, pp. 2226–2238, Aug. 2015, doi:
10.1109/tc.2014.2360535.
[32] J. Smith, “Non-destructive state machine reverse engineering,” in Proc. 6th Int.
Symp. Resilient Control Syst. (ISRCS), Aug. 2013, pp. 120–124, doi: 10.1109/
isrcs.2013.6623762.
[33] X. Chen, Q. Liu, S. Yao, J. Wang, Q. Xu, Y. Wang, Y. Liu, and H. Yang,
“Hardware Trojan detection in third-party digital intellectual property cores by
multilevel feature analysis,” IEEE Trans. Comput.-Aided Design Integr. Circuits
Syst., vol. 37, no. 7, pp. 1370–1383, Jul. 2018, doi: 10.1109/tcad.2017.2748021.
[34] D. Jap, W. He, and S. Bhasin, “Supervised and unsupervised machine learning for
side-channel based Trojan detection,” in Proc. IEEE 27th Int. Conf. Appl.-Specific
Syst., Archit. Processors (ASAP), Jul. 2016, pp. 17–24, doi: 10.1109/asap.201
6.7760768.
15 Integrated Photonics
for Artificial Intelligence
Applications
Ankur Saharia1, Kamal Kishor Choure2,
Nitesh Mudgal2, Rahul Pandey3, Dinesh Bhatia2,
Manish Tiwari1, and Ghanshyam Singh2
1
Manipal University Jaipur, Jaipur, Rajasthan, India
2
MNIT Jaipur, Jaipur, Rajasthan, India
3
SKIT Jaipur, Jaipur, Rajasthan, India
CONTENTS
15.1 Introduction to Photonic Neuromorphic Computing.................................293
15.2 Classification of Photonic Neural Network...............................................294
15.3 Photonic Neuron and Synapse ...................................................................298
15.4 Conclusion ..................................................................................................300
References..............................................................................................................300
machine learning [4], and computing. The ANN is considered to mimic the ar
chitecture of the human brain. In von Neumann’s model computer, there is physical
separation between computing and signal processing memory for information,
which limits computing efficiency while parallel processing a large set of signals.
On the other hand, to get high computing efficiency ANNs have massive accu
mulate computation (MAC) for parallel processing signals [5].
Researchers and the industry both are coming forward to develop a specific,
tailored electronic architecture that can be used to counter the shortcomings of the
present computer model for the ANN [6]. In the past few years, brain-inspired
neuromorphic computing has achieved tremendous popularity for its unmatched
computing efficiency. Researchers have developed several architectures and elec
tronic devices that can perform similar functions as neurons and synapses for
neuromorphic computing [7–9], but there is a limitation on the computing speed of
electronic neuromorphic computing. More’s law falters because nanotechnology
has reached atomic scale and high fabrication costs [10]. Photonics technology in the
form of integrated photonics provides the best alternative to fulfill the requirements of
present-day high-speed computing technologies like photonic neural network com
puting. Therefore, to overcome the limitation of electronic neuromorphic computing,
a parallel platform that has been introduced is photonic neuromorphic computing,
which uses photons rather than electrons for processing and computation. Photonic
neuromorphic computing has several advantages over electronics neuromorphic
computing; it provides wide bandwidth, has high processing and computational speed,
has a low power requirement, and has low latency [11,12]. Compared to electronic
neuromorphic computing, photonic neuromorphic computing is still in its early stage
of research and application. Silicon [13] and indium phosphide [14–16] are promising
materials for the integrated photonic platform for neuromorphic computing.
Figure 15.1 shows the simple architecture of neuromorphic computing reproduced
from Del Valle et al. [17]. The neuromorphic architecture consists of an input neuron
layer and output neuron layer connected via a synaptic connection, as shown in
Figure 15.1. The input signal is fed to an input neuron via synapses for processing and
computation. The main purpose of neurons is signal processing and computation, and
the purpose of synapses is weights and storage memory. The inputs are associated
with some weights, so the summation of input is done along with the weights. To get
the output close to the correct value, continuous training is performed. The mathe
matical representation is shown in Sui et al. [18] as
n
S= Wi Xi + bi
i =1
a. Multilayer Perceptron
layer to another successive layer, it is also called a feedforward neural network. The
main purpose of the input layer is to pass the input to other connected layers instead of
performing computation. To get the calculated correct output from the input, there
should be the proper selection of weight and transfer function. Training data is re
quired for supervised learning in multilayer perceptron [19].
The advent of SNN started in 1907 when Louis Lapicque [21] proposed the
integrate-and-fire neuron, the basic computational unit. The information in the
SNN is represented through spikes. These spikes have proper timing and
sequence for the transfer of information between artificial neurons. The archi
tecture of SNNs consists of an input layer, a hidden layer, and an output layer, as
shown in Figure 15.4. Every neuron of any layer connects to all the neurons of the
next successive layer, and the output of neurons is the weighted sum of previous
ones. The information transfer has a time difference. The uniqueness of the SNN
is its time different delay property, which makes it more advantageous than other
neural networks, but this time different delay makes the SNN more complex in
terms of training and configuration [22].
d. Reservoir Computing
Reservoir computing is one of the most versatile computing types among the
other types of neuromorphic computing. Reservoir computing is primarily a re
current neural-network–based scheme. To improve the computational efficiency
of reservoir computing, there needs to be a proper design for the recurrent neural
network (RNN)-based reservoir. As shown in Figure 15.5, reservoir computing
consists of an input layer through which the input data are transformed into high-
dimensional space. The main purpose of the reservoir tank is to nonlinearly
convert the provided input into a high-dimensional space that helps the learning
algorithms. The internal points of the reservoir tank are also referred to as “re
servoir states”. The weights connected input layer and reservoir are represented as
Win while the weights between the reservoir and output layer are represented as
Wout . In reservoir computing, the output weights are updated and the input and
298 VLSI and Hardware Implementations
reservoir are in fixed states. Reservoir computing can be realized with the help of
components, devices, and substrates. In reservoir computing, the concept of time-
delay reservoir computing is simple, has low power consumption, and uses less
hardware [23,24].
TABLE 15.1
Some of the Recent Advancements of Photonic Techniques for Neuromorphic
Computing
Proposed Photonic Techniques Neuromorphic Summary
Computing
Florian Denis-Le Coarer et al. [ 33] Reservoir computing The proposed reservoir design is able
used nonlinear micro ring to delay the XOR task at 20 Gb/s
resonators as nodes for all optical with bit error rates less than 10−3
reservoir computing on photonic and injection power less than
chips. The photonic architecture for 2.5 mW.
reservoir computing proposed is
based on a silicon insulator on a
micro ring resonator.
Changming Wu et al. [ 34] Convolutional neural They built a photonic kernel based
demonstrated on-waveguide network on arrays for convolutional neural
metasurfaces made of phase- networks for image processing and
change materials Ge2Sb2Te5 for recognition tasks.
multimode photonic convolutional
neural computing.
Laurent Larger et al. [ 35] Reservoir computing They demonstrated the computation
implemented reservoir computing efficiency with the proposed model
based on an electro-optic (EO) and experimentally got one million
phase-delay dynamic built with words per second for speech
telecom bandwidth devices for recognition with low error rate.
ultra-fast image processing.
Indranil Chakraborty et al. [ 36] Spiking neural They proposed the phase-change
proposed a phase-change dynamic network dynamics of GST based an all-
Ge2Sb2Te5 [(GST) embedded on photonic integrate-and-fire neuron,
top of a micro ring resonator for and this proposed neuron is
fast neuromorphic computing. compatible to integrate with
synapses in all photonic spiking
neural networks.
Matthew N. Ashner et al. [ 37] Reservoir computing They demonstrated the equivalence
showed the uses of multimode of multimode micro ring resonators
micro ring resonators for standard with the reservoir computer
reservoir computing and form with showing the equation of light
optical nonlinearity and optical through waveguide.
feedback.
Charis Mesaritakis et al. [ 38] Reservoir computing They presented two application-
proposed all optical reservoir oriented benchmark tests. They
computing using an InGaAsp micro used the nonlinear property i.e. the
ring resonator for optical network two-photon absorption and kerr
application. effect for the micro ring resonator.
Mitsumasa Nakajima et al. [ 39] Reservoir computing They demonstrated parallel
demonstrated scalable on chip processing wavelength division
photonic for reservoir computing. multiplexing, which provides ultra-
(Continued)
300 VLSI and Hardware Implementations
15.4 CONCLUSION
In this chapter, we presented the vital role of photonics for neuromorphic com
puting. The photonic components for neural networks were also discussed: photonic
neuron and photonic synapse. Last, we presented a short survey on the recent de
velopments of photonic techniques for neuromorphic computing. There still are
many challenges that need to be addressed for efficient computing. There is a
compatibility issue between photonic neurons and photonic synapses within neu
romorphic computing that needs to be improved.
REFERENCES
[1] W. Maass, “Networks of spiking neurons: The third generation of neural network
models,” Neur. Netw., vol. 10, no. 9, pp. 1659, 1997.
[2] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J.
Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D.
Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K.
Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep
neural networks and tree search,” Nature, vol. 529, Jan. 2016.
[3] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous
activity,” Bull. Math. Biophys., vol. 5, no. 4, pp. 115–133, Dec. 1943.
Artificial Intelligence Applications 301
[4] S. Samarasinghe, “Neural networks for applied sciences and engineering: From
fundamentals to complex pattern recognition.” Boca Raton, USA: Auerbach
Publications, 2016.
[5] B. W. Bai, H. W. Shu, X. J. Wang, et al., “Towards silicon photonic neural networks
for artificial intelligence,” Sci. China Inf. Sci., vol. 63, no. 6, pp. 160403, 2020. 10.1
007/s11432-020-2872-3.
[6] L. De Marinis, M. Cococcioni, P. Castoldi, and N. Andriolli, “Photonic neural
networks: A survey,” in IEEE Access, vol. 7, pp. 175827–175841, 2019. 10.1109/
ACCESS.2019.2957245.
[7] C. D. Schuman, T. E. Potok, R. M. Patton, et al., “A survey of neuromorphic com
puting and neural networks in hardware,” arXiv preprint arXiv: 1705. 06963, 2017.
[8] K. Roy, A. Jaiswal, and P. Panda, “Towards spike-based machine intelligence with
neuromorphic computing,” Nature, vol. 575, p. 607, 2019.
[9] J. D. Zhu, T. Zhang, Y. C. Yang, et al., “A comprehensive review on emerging
artificial neuromorphic devices,” Appl. Phys. Rev., vol. 7, p. 011312, 2020.
[10] J. Shalf, “The future of computing beyond Moore’s Law.Phil,” Trans. R. Soc. A.,
vol. 378, p. 20190061, 2020. 10.1098/rsta.2019.0061.
[11] P. R. Prucnal and B. J. Shastri, “Neuromorphic photonics.” Boca Raton: CRC
Press, 2017.
[12] Q. Zhang, H. Yu, M. Barbiero, B. Wang, and M. Gu, “Artificial neural networks
enabled by nanophotonics,” Light Sci. Appl., vol. 8, pp. 1–14, 2019.
[13] L. Chrostowski and M. Hochberg, “Silicon photonics design: From devices to
systems.” Cambridge, UK: Cambridge Univ. Press, 2015.
[14] L. A. Coldren, S. C. Nicholes, L. Johansson, S. Ristic, R. S. Guzzon, E. J. Norberg,
and U. Krishnamachari, “High performance InP-based photonic ICs: A tutorial,” J.
Lightw. Technol., vol. 29, no. 4, pp. 554–570, 2011.
[15] M. Smit, X. Leijtens, H. Ambrosius, et al., “An introduction to InP-based generic
integration technology,” Semicond. Sci. Technol., vol. 29, no. 8, 2014, Art. no. 083001.
[16] M. Smit, K. Williams, and J. van der Tol, “Past, present, and future of InP-based
photonic integration,” APL Photon., vol. 4, no. 5, 2019, Art. no. 050901.
[17] Javier Del Valle, Juan Ramirez, Marcelo Rozenberg, and Ivan Schuller, “Challenges
in materials and devices for resistive-switching-based neuromorphic computing,” J.
Appl. Phys., vol. 124, p. 211101, 2018. 10.1063/1.5047800.
[18] X. Sui, Q. Wu, J. Liu, Q. Chen, and G. Gu, “A review of optical neural networks,”
in IEEE Access, vol. 8, pp. 70773–70783, 2020. 10.1109/ACCESS.2020.2987333.
[19] M. W. Gardner and S. R. Dorling, “Artificial neural networks (the multilayer
perceptron)—a review of applications in the atmospheric sciences,” Atmos.
Environ., vol. 32, no. 14–15, pp. 2627–2636, 1998. ISSN 1352-2310, 10.1016/
S1352-2310(97)00447-0.
[20] V. H. Phung and E. J. Rhee, “A deep learning approach for classification of cloud
image patches on small datasets,” J. Inf. Commun. Converg. Eng., vol. 16,
pp. 173–178, 2018. 10.6109/jicce.2018.16.3.173.
[21] L. F. Abbott, “Lapicque’s introduction of the integrate-and-fire model neuron
(1907),” Brain Res. Bullet., vol. 50, pp. 303–304, 1999.
[22] T. Iakymchuk, A. Rosado, J. V. Frances, and M. Batallre, “Fast spiking neural
network architecture for low-cost FPGA devices,” in 7th International Workshop on
Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC), York,
UK, pp. 1–6, 2012. 10.1109/ReCoSoC.2012.6322906.
[23] L. Appeltant, M. C. Soriano, G. Van der Sande, et al., “Information processing using a
single dynamical node as complex system,” Nat. Commun., vol. 2, p. 468, 2011.
[24] F. Duport, A. Smerieri, A. Akrout, et al., “Fully analogue photonic reservoir
computer,” Sci. Rep., vol. 6, p. 22381, 2016. 10.1038/srep22381.
302 VLSI and Hardware Implementations
[39] Mitsumasa Nakajima, Kenji Tanaka, and Toshikazu Hashimoto, “Scalable reservoir
computing on coherent linear photonic processor,” Commun. Phys., p. 4, 2021. 10.1
038/s42005-021-00519-1.
[40] J. R. Ong, C. C. Ooi, T. Y. L. Ang, S. T. Lim, and C. E. Png, “Photonic con
volutional neural networks using integrated diffractive optics,” IEEE J. Sel. Top.
Quantum Electron., vol. 26, no. 5, pp. 1–8, Sept–Oct. 2020, Art no. 7702108. 10.11
09/JSTQE.2020.2982990.
[41] Irene Estébanez, Janek Schwind, Ingo Fischer, and Apostolos Argyris, “Accelerating
photonic computing by bandwidth enhancement of a time-delay reservoir,”
Nanophotonics, vol. 9, no. 13, pp. 4163–4171, 2020. 10.1515/nanoph-2020-0184.
Index
Note: Italicized page numbers refer to figures, bold page numbers refer to tables
305
306 Index