0% found this document useful (0 votes)
38 views

Implementation of Deep Neural Network Using VLSI B

The document discusses implementing deep neural networks using stochastic computing on VLSI chips. It begins by introducing neural networks and their use in machine learning applications. It then discusses how stochastic computing can be used to implement neural networks with lower hardware complexity and power usage compared to traditional digital implementations, though with some loss of precision. Various stochastic computing architectures for neural networks are described, along with their advantages in terms of speed, reliability and energy efficiency. The document focuses on the basic building block of stochastic computing neural networks - the stochastic neuron, which uses an array of stochastic number generators and arithmetic circuits to approximate the functionality of a biological neuron.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Implementation of Deep Neural Network Using VLSI B

The document discusses implementing deep neural networks using stochastic computing on VLSI chips. It begins by introducing neural networks and their use in machine learning applications. It then discusses how stochastic computing can be used to implement neural networks with lower hardware complexity and power usage compared to traditional digital implementations, though with some loss of precision. Various stochastic computing architectures for neural networks are described, along with their advantages in terms of speed, reliability and energy efficiency. The document focuses on the basic building block of stochastic computing neural networks - the stochastic neuron, which uses an array of stochastic number generators and arithmetic circuits to approximate the functionality of a biological neuron.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Journal of Physics: Conference Series

PAPER • OPEN ACCESS

Implementation of Deep Neural Network Using VLSI by Integral


Stochastic Computation
To cite this article: Vijitha Khan et al 2021 J. Phys.: Conf. Ser. 1964 062091

View the article online for updates and enhancements.

This content was downloaded from IP address 178.171.114.220 on 24/07/2021 at 01:47


ICACSE 2020 IOP Publishing
Journal of Physics: Conference Series 1964 (2021) 062091 doi:10.1088/1742-6596/1964/6/062091

Implementation of Deep Neural Network Using VLSI


by Integral Stochastic Computation

Vijitha Khan1*, R Parameshwaran2, G Arulkumaran3, and B Gopi4


1
Department of Electronics and Communication Engineering, Ahalia School of
Engineering & Technology, Kozhippara, Kerala, India
2
Department of Electronics and Communication Engineering, National Institute of
Technology, Tiruchirapalli, Tamil Nadu, India
3
Department of Electrical and Computer Engineering, Bule Hora University, Bule
Hora, Ethiopia
4
Department of Electronics and Communication Engineering, Muthayammal
Engineering College, Rasipuram, Tamil Nadu, India
Email: *[email protected], [email protected],
3
[email protected], [email protected],

Abstract. Efficient machine learning techniques that need substantial equipment and power
usage in its computation phase are computational models. Stochastic computation has indeed
been added and the solution a compromise between this ability of the project and information
systems and organisations to introduce computational models. Technical specifications and
energy cost are greatly diminished in Stochastic Computing by marginally compromising the
precision of inference and calculation pace. However, Sc Neural Network models' efficiency
has also been greatly enhanced with recent advances in SC technologies, making it equivalent
to standard relational structures and fewer equipment types. Developers start with both the
layout of a rudimentary SC nerve cell throughout this essay and instead study different kinds of
SC machine learning, including word embedding, reinforcement learning, convolutionary
genetic algorithms, and reinforcement learning.
Consequently, rapid developments in SC architectures that further enhance machine learning's
device speed and reliability are addressed. Both for practice and prediction methods, the
generalised statement and simplicity of SC Machine Learning are demonstrated. After this,
concerning conditional alternatives, the strengths and drawbacks of SC Machine learning are
addressed.
Keywords: Deep neural network, SC, VLSI, FPGA, Computing

1. Introduction
In several cognitive computing implementations, including identifying function abstraction and device
control, neural networks are generally used. For neural network models, their nonlinear features,
modular setup capacity, and self-adaptability make them handy. Previously, by replicating certain
nervous system processes, algorithms were influenced and constructed to execute certain activities or
activities of concerns. As a massively parallel machine composed of basic processors like cells, a NN
is normally introduced. It represents the human mind, so with a training period it acquires information
and stores the awareness in substrate weighs correlated only with interneuron relations. Several types
of biological systems are based on numerous architectures and supervised learning. A system model is
a type of artificial neural network in which many sheets with neurons interface. An MLP, such as the
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
ICACSE 2020 IOP Publishing
Journal of Physics: Conference Series 1964 (2021) 062091 doi:10.1088/1742-6596/1964/6/062091

backpropagation method, facilitates gradient descent-based classification feature selection. It could be


used for the description of distinct and separate trends that are non - linear. A deep convolutional
neural network between processing layers is commonly composed of far more least differential
equation structures. As either an illustration, the output of an MLP is enhanced significantly by a
multi-layer perceptron. A Multi-layer perceptron can conduct semi-supervised learning and find
solutions such as recognising image detection expression and recognising speech signals when using a
quick bloated neural network. Convolutionary algorithms prove more effective in image representation
than many other biological systems and greatly minimise the storage needed to store level masses
using mass transfer and fundamental elements. Machine Learning algorithms, such as computer vision,
are commonly used to solve moment issues. The long-term selective memory framework was
implemented and is now one of the more commonly utilised neural net frameworks to increase the
performance of RNNs, the long-term selective memory framework was implemented and is now one
of the more commonly utilised neural net frameworks. Empower of Machine Learning provide the
benefits of an intrinsically great standard of concurrency and rapid processing speed relative to
multiple processors.
Thankfully, since NNs can involve lots of neuronal in a thin layer, complicated hardware is needed,
resulted in thousands of variables which need to be modified to reach great precision. Although a large
Genetic algorithm may effectively overfit the sample size, many methods have been developed to
address the overfitting and weight noise. The trigger feature or sheet masses jump directly to these
approaches. Broad network protocols achieve better precision compared to the small channels when
using these approaches. NNs have taken this approach in FPGA multicore processors, but these
architectures' power generation usage remains strong in integrated devices. Unlike traditional binary
circuitry, a dynamical programming function utilises a limited hardware complexity with ease of
fabrication numerical and weak trialability. It affects the amount of certain simple arithmetic loops,
such as transistors, allocations, and subtractors. Linear state machine computers will incorporate the
sigmoid equation, for instance, the multi-layer perceptron and longitudinal function. Such architectures
enable SC Algorithms to be deployed at a substantially lower hardware complexity by marginally
compromising computing precision.
Furthermore, SC encodes measured principles using deterministic series. It thus adds nonlinear
dynamics into the SC Machine learning and, thus, noise can theoretically be used to solve the
computational burden to increase precision in the estimation. Because of the long scale factor and the
substantial majority of stochastic organisations needed throughout the loop, it is difficult for SC
Neural nets to attain reduced processing latency and power consumption relative to traditional
architectures. Many advanced SC processing approaches have been suggested to cut down the chain's
amount, thereby enhancing consistency and energy consumption to address this obstacle. Such projects
concentrate on developing and reusing randomisation producers to accomplish improved performance
and energy conservation. Than other differential architectures, these modern approaches make SC
Computer programs efficient in both computer reliability and systems are operating.
As functional components, NNs comprise nerves. It is known that now the brain has about billion
neuronal. The neuronal's animal cell generates input data from synapses attached to other nodes'
regulated cholinergic synapses. As a sequence of intense intensity variations defined as pulses, the
impulses are translated and coded and instead transmitted along the axon to many other receptors. A
normal cell arrangement of the visual cortex. The development of new neural associations amongst
nerves and the replacement of old neural connections, all leading to improvements in the design and
dimensions of the algorithm, has been designed to respond to the external environment through
different components. The neuronal in a NN is meant as a knowledge processing system dependent on
the phenotypic expression and is used as the major resource of a neuron's node structure shown below.
Many SC neuronal have the same form, an essential unit of deep learning; it uses an array of SNGs,
an integer circuit for SC, but an approximation of likelihood. The SC arithmetic network performs the
purpose of a cell. Multiplication, full adder, and stimulation circuits can indeed be introduced
according to the brain. Such algebra correlations are needed for estimation in specific computations

2
ICACSE 2020 IOP Publishing
Journal of Physics: Conference Series 1964 (2021) 062091 doi:10.1088/1742-6596/1964/6/062091

and can be applied by various SC designers as stated in the rest of this article. An algorithmic series is
converted back it into hash number by the PE. A potential to transform entity circuit can be used to
execute it. This model compared the possibilities stored and in time step with the series produced.
When the probabilities embedded throughout the original signal are greater, the magnitude in the out
of counter is reduced and likewise until that frequency is provided. The magnitude of the out of tracker
is the same after unification and is called an approximation of the original signal's likelihood.

Figure 1: structure of a neuron

Furthermore, for the preprocessing step, the BP elements are required. The implementation of
integer SC loops and for BP to MLPs shows that it is possible to execute the BP loop utilising
subtractors or thresholds. That BP process is implemented in the BP modules in five processes:
calculating the negatively affected in the hidden layers. After this, the surface concentrations are
eventually revised. The multipolar world interpretation is taken into account in the application. Two
feedback signals from the properly sized configuration tool are needed to show whether the properly
sized stock has risen or reduced.
Furthermore, to decode differential signalling in the calculation, it comprises three stochastic
sequences. The SC BP loops are suggested to optimise the pipeline network and extend the
computational set. The estimation is centred on enhanced stochastic logic, and the binary interpretation
includes the quantities embedded in the strings. The ESL uses greater osmotic strings to reflect the
quality and expand the SCC computing distance.

2. Proposed Method
While different SC groups are indicated, country SC-D Computer program design elements' precision
is still not adequate, using many SC pattern lengths. The suggested SCDNN will use incredibly limited
series lengths and, whereas, retain high accuracy in computation. Figure 2 demonstrates the research
SC neuron framework to accomplish that goal, which involves increasing cause and effect relationship
percentage and the high precision indicated by the duration integrative framework. The CI-multiplier
is implied, based on the past debate. In SC-DNN, the weight and the contribution from either the
preceding stage are typically two multiplication types.

3
ICACSE 2020 IOP Publishing
Journal of Physics: Conference Series 1964 (2021) 062091 doi:10.1088/1742-6596/1964/6/062091

Figure 2: SC-based Neural Network

A is a weight-generated unanimously DSC series, and the composition of B is unspecified, and that
is the contribution from either the subsequent sheet. We replay that '1's throughout B to just the end
with A and skip the remaining pieces to get all the correct outcome. However, since A is spread evenly
owing to its unique DSC generation process, the output is more reliable than among RSC. A monitor
that only increases when another successful based are '1's is used to accomplish the reaching. The DSC
differential to the dynamical transformer or the AND vector are paired with this indicated SC vector at
an additional expense of just an activate signal. Throughout the range from 0 and 1, three SC strings
are randomised, and the standard duplication result C and the converters multiply result E are indeed
tail on the right results. The means error rate of the planned SC slider, identified as our new framework
is plotting along with many other conventional types of multiplication, has the same efficiency and
outstrips RSC. Notice that perhaps the k-bit corrected multiplication for a reasonable contrast has the
same accuracy as the 2k duration SC multiplications. Then again, our proposed architecture still
performs better among SC multiplication when thresholds were rippled.

Figure 3: The proposed CI multiplier which combines the multiplication

In summary, irrespective of the comparison state, the suggested SC multiplication will produce the
highest efficiency, thereby significantly increasing the energy performance of SC-DNN In DNN. The
artificial neuron dramatically increased the efficiency ReLU is the most widely utilised one proposed
ci multiplication has shown above fig 3. SC clipping earlier-ReLU is centred on the computer of the
definite system. However, that better FSM condition amount is difficult to calculate, so tremendous
precision losses are added. A high-precision clipping ReLU feature centred on capacitors rather than
FSM is suggested in this section. The subtractor decreases Y contribution from variable X in the loop,
and the discrepancy is collected more by the multiplier. That performance part being, whereas,

4
ICACSE 2020 IOP Publishing
Journal of Physics: Conference Series 1964 (2021) 062091 doi:10.1088/1742-6596/1964/6/062091

identified by the total symbol. Numerically, it could be proven that perhaps the track's feature is
trimmed as follows and that items in sequences X or Y are believed to be distinct and standard errors,
and as per the size of the population. Consequently, the average member state of the template is
supposed to be equal.

3. Results and Discussion


The two control approaches minimise the shelled by increasing the accuracy of computing units above.
A width method is highlighted in this paper to minimise the maximum duration while using various
creatively varying sizes for different pictures. However, since the wearable device would also not be
influenced by growing the SC shelled, it inspires us to use universal primers duration for both the
convenient object and system known duration for both the difficult picture to minimise the overall
lifespan. The comparison of a Proposed method on vertex 7 is shown in table 1. Aperture is
candidate SC series sizes of small and large stores the equivalent limit with each duration determines
whether the information helped to be recertified. First of all, the NN based on Lam checks the feature
vector with both the curve Len size picked. The picture is determined to be quick, and the outcome is
approved if the maximum performance Outmax is closer to the theoretical curve seventh. On the
opposite, for longer distances, the picture is checked once.

Table 1 : Comparison of the Proposed method on Virtex 7


DESCRIPTION PROPOSED EXISTING
AREA (LUTS) 101234 1013002
LATENCY (𝝁𝒔) 1.705 1.705
THROUGHPUT (MBPS) 3826 3822

Figure 4: Cascaded multipliers

The sequential cells have also been synthesised for the full DNN stochastic architecture. This
architecture possesses a combo impact in power dissipation regarding a pre-published ASIC design in
a 45nm technical node. The architecture synthesised in the TSMC 40nm comprises approximately 2.2
mm2, which ensures that they provide an 18x benefit faster by using a comparable technical node,
which is shown in fig 4. The portable integration of the maximum feature and data augmentation
process by accurately leveraging the signal associations is the key reason for achieving know the real.

5
ICACSE 2020 IOP Publishing
Journal of Physics: Conference Series 1964 (2021) 062091 doi:10.1088/1742-6596/1964/6/062091

They combined it or the design to be implemented using a relatively small amount of completely non-
organisations.

4. Conclusion
Throughout the stochastic domain, Intrinsic SC makes the device execution of finesse systems possible
and enables calculations to be done with streams of various lengths that can increase device efficiency.
Utilising additive SC, an appropriate stochastic application of a DBN is suggested. Both findings of
the analysis and deployment show that perhaps the suggested technique decreases the region's
occupancy by up to 6%, and the lag equals state of the art. They also found it using a greater coverage
area with a higher classification performance and decreasing the recognition systems to reach the same
misclassified error rate as conditional radix design. The proposed framework uses less power than in
its double logarithm equivalent. The fabrication process will also save energy usage by using relatively
non-architecture concerning the conditional lambda implementations while losing efficiency.
Stochastic computation is a model approach for applying machine learning techniques in edge
computing hardware due to the benefits of small areas and low energy usage. Nevertheless, numerous
obstacles are also encountered in the search to produce positive outcomes. Developers propose an
effective decreased structure in this article to deal with either the high area absorbed by computer
programs, the resolution loss caused by signal comparison, and the integration of the probabilistic
clinical significance. A completely convolutionary neural layer is built in a single FPGA chip for the
first time, producing improved performance outcomes compared to the conventional sequence of
binary architectures, demonstrating the architect's compression ratios by leveraging the connection
characteristics dynamical inputs.

References
[1]. Shan, N., Ye, Z. and Cui, X., 2020. Collaborative Intelligence: Accelerating Deep Neural
Network Inference via Device-Edge Synergy. Security and Communication Networks, 2020.
[2]. Park, S.S., Park, K.B. and Chung, K.S., 2018, February. Implementation of a CNN accelerator
on an Embedded SoC Platform using SDSoC. In Proceedings of the 2nd International
Conference on Digital Signal Processing (pp. 161-165).
[3]. Yousefpour, A., Devic, S., Nguyen, B.Q., Kreidieh, A., Liao, A., Bayen, A.M. and Jue, J.P.,
2019, November. Guardians of the Deep Fog: Failure-Resilient DNN Inference from Edge to
Cloud. In Proceedings of the First International Workshop on Challenges in Artificial
Intelligence and Machine Learning for the Internet of Things (pp. 25-31).
[4]. Frasser, C.F., Linares-Serrano, P., Canals, V., Roca, M., Serrano-Gotarredona, T. and Rossello,
J.L., 2020. Fully-parallel Convolutional Neural Network Hardware. arXiv preprint
arXiv:2006.12439.
[5]. Cariow, A. and Cariowa, G., 2018. Hardware-Efficient Structure of the Accelerating Module for
Implementation of Convolutional Neural Network Basic Operation. arXiv preprint
arXiv:1811.03458.
[6]. Abderrahmane, N. and Miramond, B., 2019, August. Information coding and hardware
architecture of spiking neural networks. In 2019 22nd Euromicro Conference on Digital System
Design (DSD) (pp. 291-298). IEEE.
[7]. Zhang, Y., Zhou, Z., Huang, P., Fan, M., Han, R., Shen, W., Liu, L., Liu, X. and Kang, J., 2019,
June. An improved hardware acceleration architecture of binary neural network With 1T1R
array-based forward/backward propagation module. In 2019 Silicon Nanoelectronics Workshop
(SNW) (pp. 1-2). IEEE.
[8]. Qin, Y.F., Bao, H., Wang, F., Chen, J., Li, Y. and Miao, X.S., 2020. Recent Progress on
Memristive Convolutional Neural Networks for Edge Intelligence. Advanced Intelligent
Systems, 2(11), p.2000114.
[9]. Jun X., 2020. FPGA deep learning acceleration based on a convolutional neural network. arXiv
preprint arXiv:2012.03672.

6
ICACSE 2020 IOP Publishing
Journal of Physics: Conference Series 1964 (2021) 062091 doi:10.1088/1742-6596/1964/6/062091

[10]. Murmann, B., 2020. Mixed-signal computing for deep neural network inference. IEEE
Transactions on Very Large Scale Integration (VLSI) Systems.
[11]. Myrén, A., 2020. Hardware acceleration of convolutional neural networks on FPGA.
[12]. Wu, M., Chen, Y., Kan, Y., Nomura, T., Zhang, R. and Nakashima, Y., 2020, June. An Elastic
Neural Network Toward Multi-Grained Re-configurable Accelerator. In 2020 18th IEEE
International New Circuits and Systems Conference (NEWCAS) (pp. 218-221). IEEE.
[13]. Kwak, M., Lee, S., Kim, S. and Hwang, H., 2020. Improved Pattern Recognition Accuracy of
Hardware Neural Network: Deactivating Short Failed Synapse Device by Adopting Ovonic
Threshold Switching (OTS)-Based Fuse Device. IEEE Electron Device Letters, 41(9), pp.1436-
1439.
[14]. Xue, C., Cao, S., Jiang, R. and Yang, H., 2018, May. A Reconfigurable Pipelined Architecture
for Convolutional Neural Network Acceleration. In 2018 IEEE International Symposium on
Circuits and Systems (ISCAS) (pp. 1-5). IEEE.
[15]. Liu, B., Chen, S., Kang, Y. and Wu, F., 2019. An Energy-Efficient Systolic Pipeline
Architecture for Binary Convolutional Neural Network. In 2019 IEEE 13th International
Conference on ASIC (ASICON) (pp. 1-4). IEEE

You might also like