0% found this document useful (0 votes)
91 views13 pages

An Area - and Energy-Efficient Spiking Neural Network With Spike-Time-Dependent Plasticity Realized With SRAM Processing-in-Memory Macro and On-Chip Unsupervised Learning

This document summarizes a research paper that presents a spiking neural network (SNN) based on both SRAM processing-in-memory (PIM) macro and on-chip unsupervised learning with Spike-Time-Dependent Plasticity (STDP). The proposed SNN architecture consists of an input layer, an excitatory layer, and a single inhibitory neuron layer. A PIM SRAM macro with a Reconfigurable Multi-bit PIM Multiply-Accumulate module and a Programmable High-precision PIM Threshold Generation module is designed to implement the SNN in a 55nm CMOS process. Measurement results show the chip achieves a learning efficiency of 0.47 nJ/pixel

Uploaded by

PANTHADIP MAJI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views13 pages

An Area - and Energy-Efficient Spiking Neural Network With Spike-Time-Dependent Plasticity Realized With SRAM Processing-in-Memory Macro and On-Chip Unsupervised Learning

This document summarizes a research paper that presents a spiking neural network (SNN) based on both SRAM processing-in-memory (PIM) macro and on-chip unsupervised learning with Spike-Time-Dependent Plasticity (STDP). The proposed SNN architecture consists of an input layer, an excitatory layer, and a single inhibitory neuron layer. A PIM SRAM macro with a Reconfigurable Multi-bit PIM Multiply-Accumulate module and a Programmable High-precision PIM Threshold Generation module is designed to implement the SNN in a 55nm CMOS process. Measurement results show the chip achieves a learning efficiency of 0.47 nJ/pixel

Uploaded by

PANTHADIP MAJI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

92 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 17, NO.

1, FEBRUARY 2023

An Area- and Energy-Efficient Spiking Neural


Network With Spike-Time-Dependent Plasticity
Realized With SRAM Processing-in-Memory
Macro and On-Chip Unsupervised Learning
Shuang Liu , J. J. Wang , J. T. Zhou , S. G. Hu , Q. Yu , T. P. Chen , and Y. Liu

Abstract—In this article, we present a spiking neural net- level of bionics. Compared with other neural networks, e.g.,
work (SNN) based on both SRAM processing-in-memory (PIM) convolutional neural networks (CNNs), SNNs work more likely
macro and on-chip unsupervised learning with Spike-Time- to the human brain. By replacing continuous values with discrete
Dependent Plasticity (STDP). Co-design of algorithm and hard-
ware for hardware-friendly SNN and efficient STDP-based learn- spiking events over time, the hardware and energy consumed in
ing methodology is used to improve area and energy efficiency. processing algorithms are greatly reduced. However, efficiently
The proposed macro utilizes charge sharing of capacitors to implementing SNNs is still one of the major challenges cur-
perform fully parallel Reconfigurable Multi-bit PIM Multiply- rently. Realization of SNNs includes two main processes: 1)
Accumulate (RMPMA) operations. A thermometer-coded Pro-
efficient SNN training for discretized and non-derivable data;
grammable High-precision PIM Threshold Generator (PHPTG)
is designed to achieve low differential non-linearity (DNL) and and 2) design of energy-efficient and high-throughput SNN
high linearity. In the macro, each column of PIM cells and a architecture adapted to artificial intelligence (AI) applications,
comparator act as a neuron to accumulate membrane potential such as edge computing, wearable smart devices, Internet of
and fire spikes. A simplified Winner Takes All (WTA) mecha- Things (IoT), etc. [1], [2], [3].
nism is used in the proposed hardware-friendly architecture. By
The training of SNNs can be divided into two categories:
combining the hardware-friendly STDP algorithm as well as the
parallel Word Lines (WLs) and Processing Bit Lines (PBLs), we supervised learning and unsupervised learning. One of the su-
realize unsupervised learning and recognize the Modified National pervised learning methods is to train SNNs using improved
Institute of Standards and Technology (MNIST) dataset. The chip back propagation (BP algorithms) [4], [5], [6]. These methods
for the hardware implementation was fabricated with a 55 nm rely on high-precision derivation and multiplication/division
CMOS process. The measurement shows that the chip achieves a
operations, which consume extra computing resources and
learning efficiency of 0.47 nJ/pixel, with a learning energy efficiency
of 70.38 TOPS/W. This work paves a pathway for the on-chip energy. For example, BP algorithms usually utilize 32/64-bit
learning algorithm in PIM with lower power consumption and floating-point digits for computation [7]. Besides, although
fewer hardware resources. supervised learning can realize higher accuracy, it requires to
Index Terms—MNIST, on-chip unsupervised learning, process- label training data, which can take a lot of effort and time.
ing-in-memory (PIM), spiking neural network (SNN), SRAM, On the other hand, compared with supervised learning, which
spike-time-dependent plasticity (STDP). relies on labelled input and output training data, unsupervised
learning processes unlabelled data, which is becoming more
I. INTRODUCTION attractive due to its adaptability to various environments and
applications [8].
MONG many artificial neural networks (ANNs), spiking
A neural networks (SNNs) have achieved an unprecedented
As an extension to Hebb’s learning rules [9], spike-time-
dependent plasticity (STDP) is considered to be one of the
outstanding candidates to provide an energy-efficient and low-
Manuscript received 27 October 2022; revised 9 December 2022 and 2 January
2023; accepted 23 January 2023. Date of publication 6 February 2023; date of cost solution for unsupervised on-chip learning. Recently, some
current version 23 March 2023. This work was supported by the NSFC under works based on STDP unsupervised training were reported [10],
Grant 92064004. This paper was recommended by Associate Editor C. Frenkel. [11], [12], [13], [14], [15], [16], [17]. For example, Kim
(Corresponding author: J. J. Wang.)
Shuang Liu, J. J. Wang, J. T. Zhou, S. G. Hu, Q. Yu, and Y. Liu are with et al. [12] presented a stochastic-STDP-based SNN, in which
the State Key Laboratory of Thin Solid Films and Integrated Devices, Univer- linear feedback shift register (LSFR) is used to generate the
sity of Electronic Science and Technology of China, Chengdu 610054, China updated weight. More recently, unsupervised learning based on
(e-mail: [email protected]; [email protected]; [email protected];
[email protected]; [email protected]). STDP using resistive random-access memory (ReRAM) has
T. P. Chen is with Nanyang Technological University, Singapore 639798 (e- been reported by several groups [18], [19], [20], [21]. Zhao
mail: [email protected]). et al. [18] proposed an SNN using memristor-based inhibitory
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TBCAS.2023.3242413. synapses to realize the mechanisms of lateral inhibition and
Digital Object Identifier 10.1109/TBCAS.2023.3242413 homeostasis with low hardware complexity.

1932-4545 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: AREA- AND ENERGY-EFFICIENT SPIKING NEURAL NETWORK 93

Over the past few years, as an architecture that breaks through


the Von Neumann bottleneck [22], processing-in-memory (PIM)
has already been state-of-the-art in terms of energy efficiency,
chip throughput, area, etc. [23], [24]. PIM configuration con-
ducts the operations of data storage and computing together,
which eliminates the data transfer between the data storage
units and computing units and thus saves time and energy costs
during computing. Many PIM-based ANN networks have been
reported [25], [26], [27], [28]. However, SNNs based on PIM
with online unsupervised learning capability are still rarely
implemented and reported.
This paper presents an SNN based on both the PIM SRAM
macro and on-chip unsupervised learning with STDP. Co-design
of algorithm and hardware for hardware-friendly SNN and ef-
ficient STDP-based learning methodology is used to improve
area and energy efficiency. The proposed SNN consists of an
input layer, an excitatory layer, and an inhibition layer with only
Fig. 1. The architecture of the PIM-based SNN with on-chip unsupervised
one neuron, which is different from other SNNs reported in the learning.
literature [12], [14], [29], [30]. A carefully designed PIM SRAM
macro with Reconfigurable Multi-bit PIM Multiply-Accumulate A. The Proposed 9T1C PIM Cell
(RMPMA) module and Programmable High-precision PIM
Threshold Generation (PHPTG) module were fabricated with Fig. 2(a) shows the 9T1C PIM SRAM bit cell which con-
a 55 nm CMOS technology. The charge sharing of capacitors sists of a conventional 6 T SRAM to save weight, a CMOS
is used to perform PIM operation and is responsible for the ac- Transmission Gate (N 1 and P 1) to conduct the input with
cumulation of neural membrane potential in SNN. Spike-trace- low voltage loss, an NMOS (N 2), and a Metal-Oxide-Metal
based STDP is used to update the synaptic weights on the chip (MOM) capacitor for charge storing. With the cooperation of
efficiently. By utilizing the hardware-friendly STDP algorithm the global Word-line Input Module (WLIM) and global Bit line
as well as the parallel Word Lines (WLs) and Processing Bit Setup Module (BLSM), the 1-bit unit could realize the PIM
Lines (PBLs), we achieve unsupervised learning and recognize dot product of 1-bit weight and a multi-bit precision input in
the MNIST dataset. Further analysis shows that the PIM SRAM the analog domain. The PIM SRAM cell works in two modes:
macro is competitive in terms of area and energy efficiency. SRAM mode and PIM mode. During the operation in the SRAM
The remaining part of the paper is organized as follows: mode, the weight can be written/read through vertical Word Line
Section II describes the design of the charge-sharing-based PIM (WL) and horizontal Bit Lines (BL and BLB), which works in
structure and operation. Section III presents the STDP learning the same way as a conventional SRAM array. The data stored
technique. Section IV discusses the evaluation of STDP learning in the 6 T SRAM (W and WB) controls the TG and further
operation and compares the proposed architecture with that determines whether the Input Word Line (IWL) is charged to
of other works reported previously. A conclusion is given in the input voltage Vtrace .
Section V. In the PIM mode, the PIM operation can be divided into
three phases, as shown in Fig. 2(b). In the first phase, N 2
is activated by Processing Word Line (PWL) to discharge the
Processing Bit Line (PBLB), which is grounded by BLSM.
II. PIM STRUCTURE AND OPERATION Meanwhile, the voltage on PBL will approach VDD by pulling
As shown in Fig. 1, the proposed PIM architecture includes down RST. To achieve higher throughput, the SRAM macro
several main parts: a PIM-based Leaky Integrate and Fire Neu- allows synchronous write/read operations which are frequently
ral Module (LIFNM), an STDP-based Weight Update Mod- conducted during unsupervised learning in PIM Phase 1 because
ule (WUM), a System Controller (SC), an Input Decoder and of separated word lines (WL and PWL) and bit lines (BL/BLB
Write/Read Input/Output (W/R IO). The LIFNM performs fully and PBL/PBLB). In PIM Phase 2, N 2 is turned off; thus the IWL
parallel PIM operations, including the RMPMA operation and keeps floating and the voltage on PBL is maintained at VDD.
PHPTG operation, realized in a 9-transistors-and-1-capacitor- When IEN /IEN B in WLIM is enabled, if data stored in SRAM
based (9T1C) SRAM macro by charge sharing of the capacitors is “1” (W = 1), the voltage on IWL will trace the PIM input
integrated on the top of the SRAM cell. Moreover, a comparator Vtrace ; If W = 0, the voltage on IWL always remains at VSS. In
array is used in the LIFNM to calculate the Leaky Integrate and PIM Phase 3, N 2 is turned on to set IWL to VSS, and then the
Fire (LIF) neuron’s [31] membrane potential. A Spike Trace voltage on PBL will approach a fixed value, representing PIM
Module in the LIFNM is also introduced to simplify updating calculation results. The charge of the capacitor can be expressed
of the STDP weights. In the STDP-based WUM, 4-bit flash as
analog-to-digital converters (ADCs) are used to convert the
Qc = c × (V DD − w · Vtrace )
spiking-traced voltages Vtrace which are input to the RMPMA
into the digital domain to conduct STDP calculation. = c × V DD − c × w · Vtrace (1)
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.
94 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2023

Fig. 2. (a) Circuit schematic, top view, and (b) waveforms during the PIM
operation of the proposed 9T1C SRAM cell.

where the dot product w · Vtrace can be obtained through a


linear transformation. To simplify the design and improve energy
Fig. 3. (a) The RMPMA architecture, (b) multi-bit timing sequence and (c)
efficiency, we implement the linear transformation by adjusting layout of the proposed RMPMA cell.
the threshold generated by PHPTG. A detailed discussion will
take place in a later section.
PIM phases, S1 is turned off, while S2 keeps on; the capacitances
B. Reconfigurable Multi-Bit PIM Multiply-Accumulate on the second and fourth columns are only half of those for the
Module first and third columns; at this time, the RMPMA unit can be
To operate multi-bit-weight dot products, the array is designed regarded as 2 neurons with 2-bit weight; after completing PIM
as a Reconfigurable Multi-bit PIM Multiply-Accumulate Mod- phases, by keeping S1 and S2 synchronously on/off, we can
ule (RMPMA) unit for every four columns. Each orange block obtain a 4-bit weight neuron.
in Fig. 3(a) contains 8 × 4 9T1C cells (8 rows and 4 columns). Eq. (1) presents the calculation of the dot product of the inputs
Four switches (two S1 and two S2 ) are inserted between different and weight on an SRAM PIM cell. When RST in Fig. 2 is set
rows and three switches (S3 , S4 , and S5 ) are placed between four to high and PWL is enabled, the capacitors of each column start
columns, as shown in Fig. 3(a). charge sharing, and the total charge on each PBL in 1-bit mode
Fig. 3(b) exhibits the control sequence of 1-bit, 2-bit and 4-bit can be expressed as
reconfigurable RMPMA. When S3 , S4 , and S5 are turned off, 
n

in the 1-bit mode, both S1 and S2 are turned on. Each column Q= c0 × (V DD − wi · Vtracei ) (2)
of the PBL in Fig. 3(a) is connected to 64 MOM capacitors. The i=1

charges on the four columns of PBL exhibit the same weight. where n is the row number of the 9T1C array and n equals to
The RMPMA unit can be regarded as a full connection between 64 in the 1-bit weighted design; c0 is the MOM capacitance of
64 inputs and 4 neurons with 1-bit weight. In 2-bit mode, S1 and each SRAM PIM cell; wi and Vtracei are the weight and the
S2 are turned on at first; after each SRAM PIM unit completing input of the ith row, respectively. Thereby the final voltage on

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: AREA- AND ENERGY-EFFICIENT SPIKING NEURAL NETWORK 95

threshold voltages; 2) for 2-bit weight precision, V BL3 (V BL2 )


and V BL1 (V BL0 ) can generate 2 and 7-bit threshold voltages;
3) for 4-bit weight precision, V BL0 ∼ V BL3 are shorted, and
the accuracy of the output voltage is 8-bit. The right panel of Fig.
4(a) shows the schematic of the 9T1C cell in PHPTG, which is
the same as the 9T1C cell proposed in RMPMA, except that the
source of N 2 is directly connected to VSS, and all PWLs are
connected to RST.
Fig. 4(b) shows the workflow of PHPTG. In PHPTG, two
steps are required to complete the threshold voltage generation,
i.e., charge reset and charge sharing. During charge reset, both
plates of the capacitors inside the two SRAMs are discharged
to VSS through RST transistors. When entering charge sharing,
Vertical Processing Word Lines (VPWL and VPWLB) enable
the transmission gate in SRAM. A high level of “1” stored in
the upper SRAM cell will charge the left plate of the capacitor
to VDD. At the same time, the lower SRAM cell stores “0”
causing the voltage on the left plate of the capacitor at VSS.
According to the principle of capacitor charge conservation, the
voltage on the right plates connected with the two capacitors
will become VDD/2. In the fabricated chip, the voltage VV BL
can be expressed as

V DD × m
Fig. 4. (a) The architecture and (b) the charge operation of PHPTG. VV BL = (4)
n

where n is the total number of 9T1C cells connected with VBL,


PBL can be expressed as and m is the number of “1” stored among these cells. In the
n RMPMA module, the number of neurons can be configured
Q wi · Vtracei
VP BL = = V DD − i=1
(3) among 128, 64, and 32; and the corresponding weight precisions
n × c0 n
are 1-bit, 2-bit, 4-bit, respectively. Therefore, DACs with recon-
It is worth noticing that in the 2-bit and 4-bit modes, the figurable channels are necessary, and the precisions of these
control of S3 , S4 , and S5 is also required. In the 2-bit mode, DACs also need to be adjusted corresponding to the weight
S3 and S5 turn on but S4 turns off during charge sharing. Thus, precision to reduce power consumption and area overhead.
n in Equation (3) is 128 (= 64 × 2). In the 4-bit mode, all of Although resistive digital-to-analog converters (RDACs), e.g.,
these three switches keep on, and n becomes 256 (= 64 × 4), as R-2R DACs [33], [34], do prevail over the proposed PHPTG
shown in Fig. 3(b). In general, the proposed RMPMA macro has with about 50% area reduction and 60% power savings under the
three configurations: 128 neurons with 1-bit weight, 64 neurons same precision (8 b), and high-precision RDACs are compatible
with 2-bit weight, or 32 neurons with 4-bit weight. Fig. 3(c) with lower precision by ignoring some least significant bits
shows the layout of an RMPMA cell. By inserting a dummy (LSBs), integrating an 8-bit precision RDAC under each column
structure or a switch between SRAM PIM cells, the consistency of PIM cells consumes more area (about twice the size of a
and scalability of the array can be guaranteed. Furthermore, the 6-bit-precision PHPTG module in 55 nm). In 2-bit/4-bit weight
dummy structure compensates for the loss of precision caused precision mode, every 2/4 columns of 6-bit precision PHPTG
by the parasitism of the switches. modules can be directly cascaded into a 7-bit/8-bit DAC. If
RDACs are used, there must be 1/3 redundant RDACs for every
C. Programmable High-Precision PIM Threshold 2/4 columns in 2-bit/4-bit weight precision mode. In addition,
As a technology from thermometer-coded digital-to-analog the proposed PHPTG has functions of both storage and DAC,
converter (DAC) [32], we implement a thermometer-coded PH- which can be flexibly reconfigured and expanded under the
PTG to achieve low differential non-linearity (DNL). Fig. 4(a) premise of precision and chip area.
shows the proposed Programmable High-precision PIM Thresh-
old (PHPTG) structure, which contains 64 × 128 9T1C cell,
D. Threshold Adjustment and Converting
external reset transistors, and switches for extending precision.
Similar to the RMPMA module, each four column memory array Since the RMPMA module cannot directly obtain the dot
forms a PHPTG unit, as shown in Fig. 4(a). Switches S6 , S7 and product of the input and weight, we propose a technique by
S8 can configure PHPTG into three modes: 1) for 1-bit weight using threshold adjustment and conversion to achieve the dot
precision, Vertical Bit Lines (V BL0 ∼ V BL3 ) can output 4 6-bit product. From Eq.(3), the dot product of input and weights can

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.
96 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2023

Fig. 5. (a) Transfer function, and (b) voltage variability of Programmable Fig. 6. (a) The circuit schematic and (b) the Monte Carlo SPICE simulation
High-precision PIM Threshold Generation (PHPTG). result of the proposed area- and energy-efficient dynamic comparator.

be computed with the equation below,


process variations. Such a small sigma value is due to the fact

n
that the multilayer MOM structure has a large capacitance and
wi · Vtracei = n × (V DD − VP BL ) (5)
thus reduces the influence of the parasitic parameters. Also, the
i=1
 CMOS TG suppresses the substrate bias effect and increases the
where ni=1 wi · Vtracei is the membrane potential of the neu- output accuracy. The result is competitive with that in [25] and
ron in SNN. The dot product determines the neuron firing by shows compatibility with our expected behavior for precisely
comparing the membrane potential with the threshold voltage generating the STDP-modulated voltages and the threshold volt-
(VV BL ), as described below, ages.
 n The proposed on-chip unsupervised learning with STDP can
1, wi · Vtracei > VV BL
δ= i=1
n (6) enhance the tolerance against various process, voltage, and tem-
0, i=1 wi · Vtracei < VV BL perature (PVT) conditions. The proposed STDP-based weight
where δ = 1 means the neuron fires, and there is no firing when update can appropriately calibrate the error caused by the mis-
δ = 0. According to (5), (6) can be rewritten as match of capacitors or the PVT of the chip. For example, if the
 PHPTG/RMPMA voltage is smaller than the expected value, the
1, VP BL < V DD − VVnBL STDP rule will adjust the weights to make it larger.
δ= (7)
0, VP BL > V DD − VVnBL
 E. Area- and Energy-Efficient Dynamic Comparator
The comparison between the membrane potential ( ni=1 wi ·
Vtracei ) and the threshold voltage (VV BL ) are translated into A rail-to-rail dynamic comparator takes advantage of a large
a comparison between the voltage on PBL (VP BL ) and the up- input range, which can satisfy membrane potentials and the
dated threshold (V DD − VVnBL ). This process can be quantified threshold voltage varied from VSS to VDD. Fig. 6(a) shows the
and calculated by the WUM module. While the voltage VP BL proposed dynamic comparator for evaluating the neuronal mem-
on PBL can be directly calculated by the RMPMA module. brane potentials. The comparator has almost no quiescent current
This greatly simplifies the RMPMA operation, and thereby (∼61 pA per comparator when SEN is VSS). Using the same
improveing the energy efficiency of the PIM operation. process, the proposed comparator consumes approximately 2.5
Fig. 5(a) shows the thermometer-coded digital value versus times the dynamic power as the StrongARM latch reported
output voltage on VBL in PHPTG, exhibiting high linearity. We in [35], whose input range is only half of the proposed compara-
use Monte Carlo SPICE simulations to verify the On-Chip Varia- tor. In Fig. 6(a), two input pairs share the latched output stage,
tion (OCV) of this design with all the capacitors, transistors, and which is competitive with other comparators in terms of area and
post-layout parasitic parameters taken into account. As shown power consumption, e.g. [27]. When SEN (SENB ) is pulled down
in Fig. 5(b), given 1000 points, the voltage distribution of VBL (up) to VSS (VDD), the input pair nodes: PN/PP (NN/NP) start
has a sigma value δ = 0.256 mV, indicating the robustness to to discharge (charge) in advance. In addition, the output of the

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: AREA- AND ENERGY-EFFICIENT SPIKING NEURAL NETWORK 97

Fig. 7. Mapping the hardware-friendly SNN to the proposed 9T1C PIM.


Fig. 8. (a) Conventional and (b) the proposed trace-based STDP algorithm.

comparator (SOUT ) is reset to VSS. Then, SEN (SENB ) approach A. Hardware-Friendly STDP
to VDD (VSS), and the voltage difference between the compara-
tor inputs, VIP and VIN, determine whether PP/PN (NP/NN) The conventional STDP algorithm is shown in Fig. 8(a).
is pulled up (down) fastest. The following latch structure can The update of weight depends on the time interval and sequence
distinguish the small offset between PP (NP) and PN (NN) and between the pre-neuron and post-neuron. If a pre-neuron’s spike
rapidly generate a rail-to-rail output through positive feedback. arrives before a post-neuron’s firing, the pre-neuron can be
Fig. 6(b) shows the Monte Carlo SPICE simulation result of the considered to have a facilitative effect on post-neuron’s fir-
proposed area- and energy-efficient dynamic comparator, given ing, known as long-term potentiation (LTP). As a result, the
1000 points, the offset voltage distribution of the comparator has weights of pre-to-post neurons increase. On the other hand, if
a sigma value of 3.794 mV, which shows a 1.98 × improvement a pre-neuron fires after a post-neuron, this pre-neuron has an
in offset voltage compared to [36]. inhibitory effect on the post-neuron, which is called long-term
depression (LTD). Thus, the weight is decreased. Each update
value of weight is a function of the pulse interval. Therefore,
III. STDP LEARNING under the conventional design, when a spike arrives, there should
In this section, we introduce a co-design of algorithm and be a calculation to update the weights, which is not friendly to
hardware for hardware-friendly SNN and efficient STDP-based hardware design and consumes a lot of energy.
learning methodology. As shown in Fig. 7, the amount of input To optimize hardware overhead and achieve low power con-
neurons is the same as that of excitatory neurons and the two sumption, two techniques are proposed. The first one is to use a
types of neurons are fully connected. Similarly to [18], the trace technique to track the time-varying trajectory of the input
inhibitory layer of the proposed SNN has only a neuron, which layer in SNN. During the tracking process, when an input spike
is used to realize the winner-takes-all (WTA) mechanism [37]. arrives, the trace potential increases by a fixed value. When there
When an excitatory neuron fires, the excitation is transmitted to is no spike input, the trace potential decays over time, as shown
the inhibitory layer, which inhibits the firing of other neurons in in the right panel of Fig. 8(b).
the excitatory layer. This mechanism is achieved by resetting the The second technique is to simplify the LTD process. In
comparators of the remaining unfired excitatory neurons through STDP algorithms, LTP is always triggered by the post-neuron.
the STDP/WTA Controller in the WUM shown in Fig. 1. Since the pre-neurons have been traced, the value of LTP can be
The mapping of the SNN is also shown in Fig. 7. The obtained only by sampling the trace of each input neuron when
Spike Trace Module is responsible for the operations of the the post-neuron fires. On the other hand, LTD is triggered by
input layer. The full-connected weights between the input layer pre-neurons, which is not easy to achieve the post-neuron trace
and excitatory layer and the threshold voltage of each neuron for each input neuron. Therefore, in our design, the weights
are stored in 9T1C-based RMPMA and PHPTG, respectively. only need to be subtracted over a fixed value when neurons
The comparator array, RMPMA, PHPTG, and some controlling trigger LTD, as shown in the right panel of Fig. 8(b). This
logic in System Controller constitute the excitatory layer. The method can reduce the complexity of the circuit, while within
inhibition layer is realized by WTA Controller in STDP-based acceptable accuracy. The detailed comparison will be discussed
WUM. During unsupervised learning, the STDP-based WUM in the evaluation section. In the proposed STDP algorithm, each
can update the weights and threshold voltages in RMPMA and trigger of the LTP process triggers an LTD process, and the LTD
PHPTG through writing/reading operations. needs one clock cycle to update.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.
98 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2023

Fig. 10. Decay behavior driven by the input spike trace as well as waveforms
at different terminals.

T race2) from the Spike Trace Module, and the synapse weights
Fig. 9. (a) Illustration of the spike trace circuit that can minimize the analog-
nonlinearity and (b) the trace potential of the output as a function of the input of T race1 and T race2 are both fixed at 1, Fig. 10 shows the
spikes. decay behavior caused by input. On the other hand, the output
membrane potential on PBL is also illustrated in Fig. 10. As the
PIM is driven by clock, the output membrane potential waveform
In the aforementioned LTP procedure, when an input spike on PBL is pulse-like as shown in Fig. 10.
arrives, the trace potential of the pre-neuron increases by a fixed
value. To increase analog linearity, two clamp amplifiers are B. Parallel WLs and PBLs
used in the Spike Trace Module in the PIM-based LIFNM, as
In conventional PIM, WL is vertical to the PBL, meaning that
shown in Fig. 9(a). The feedback of the amplifier makes the drain
SRAM should be accessed row by row, as shown in Fig. 11(a).
voltages of transistors N M 1 and N M 2 tend to be the same.
For STDP, the weight updating occurs between all of the input
When the input spike is enabled, the drain and source voltages
neurons and the firing post-neuron. In other words, the updated
of transistors P M 1 and P M 2 are approximately equal. Thus,
STDP weights are stored in columns, which are synchronized
no matter what the value of the trace potential is, the charging
in the same direction as PBL. This may lead to multiple writ-
current of capacitor C is approximately equal to the reference
ing/reading cycles during the STDP updating. A disadvantage
current Iref. It should be noted that a suitable decay factor
may exist that unrelated cells may be written/read. In order to
(Vdecay ) is needed to let the trace potential decay over time,
avoid the above issue, a parallel direction of WL and PBL is
as shown in Fig. 9(b). The capacitor C in Fig. 9(a) is composed
adopted to accommodate the column-by-column updating char-
of a row of 9T1C capacitors in RMPMA. The decay time is set
acteristic of STDP, as shown in Fig. 11(b). This topology only
to 160 ns, with Vdecay biased at around 300 mV by external
takes one writing/reading cycle to complete updating/loading all
DAC. In charge-sharing PIMs [25], [38], [39], the input drivers
STDP weights.
are necessary because of parallel connection of capacitors in
the bit cells. In our design, the proposed Input Trace Module
C. STDP Learning Timing Diagram
is responsible not only for formatting the inputs, but also for
driving the RMPMA and reflecting the decay behavior of LIF The timing diagram for 4-bit-weight STDP learning is shown
neurons. To reduce hardware overhead, we designed an Input in Fig. 12. In LIF operation, the membrane potential start to leaky
Trace Module with full driving capability rather than a cascade integrate and fire including four stages, i.e., three PIM phases
of a low-power Input Trace Module accompanied with a strong and one phase which contains precision config and comparator
PIM driver, which results in additional capacitor and amplifier. sensing. These four stages are periodically performed every two
In LIF neurons, leakage behavior is necessary to avoid clocks. Once the voltage on VBL exceeds the voltage on PBL,
membrane potential saturation. To further facilitate the area e.g., VBLm and PBLm , the output of mth comparator (Soutm )
efficiency, the neuron membrane potential decay behavior is will be pulled up. This leads to WTA and the disability of SEN .
realized by the decay of the input spike trace, as shown in Fig. 10. Meanwhile, the trace voltages of the spike trace circuit in Fig.
Assuming that the RMPMA module has two inputs (T race1 and 9(a) are sampled by ADCs in WUM and reset to VSS later. In

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: AREA- AND ENERGY-EFFICIENT SPIKING NEURAL NETWORK 99

Fig. 13. Optical micrographic image of the prototype chip.

Fig. 11. (a) Conventional memory access direction (blue) which is perpendic-
ular to the direction of PBLs (red); and (b) the proposed memory access direction Fig. 14. Layout of the proposed 9T1C bit cell.
(blue) which is in parallel with the direction of PBLs (red).

to a fixed value to increase the threshold of the excitatory neuron.


The LTD operation at the second clock is implemented by sub-
tracting a fixed LTD value from QPBLm [63:0]. The LTD process
is not negligible as it relieves the weights in QPBLm [63:0] from
saturating. In write operation, the updated QPBLm [63:0] and
QVBLm [63:0] are rewritten to RMPMA and PHPTG, respec-
tively. After STDP learning is finished, a new LIF operation
begins, as shown in Fig. 12.

IV. EVALUATION
The proposed PIM-based SRAM macro with the capability
of on-chip unsupervised STDP learning was manufactured with
a 55 nm CMOS process. Fig. 13 shows a micrograph of the
chip with an area of 0.21 mm2 (excluding IO pads). In LIFNM,
the on-chip SRAM macro consists of 64 × 128 9T1C SRAM
cells in RMPMA and 64 × 128 9T1C SRAM cells in PHPTG,
respectively. We also built a larger SRAM macro with STDP
algorithm based on the Cadence Advanced Mixed-signal Sim-
Fig. 12. Timing diagram including LIF, read SRAM, STDP update, and write ulator (AMS) and PyTorch framework to evaluate the accuracy
SRAM.
and weight distribution.
The layout of the proposed 9T1C cell is shown in Fig. 14.
For high-density integration, the MOM capacitor formed by the
read operation, WLs in RMPMA (WLm [63:0](RMPMA)) and third-to-sixth metals in the 6M1T (6 metal and 1 top metal)
PHPTG (WLm [63:0](PHPTG)) are activated to read the 64 4- process is fabricated on the top of the 9 T SRAM cell without
bit weights (QPBLm [63:0]) and threshold value (QVBLm [63:0]) the requirement of extra area. The MOM capacitor, occupying
into WUM. an area of 3.68 μm2 , has a capacitance of ∼6 fF. The proposed
In STDP operation, two clocks are needed. At the first clock, 9T1C cell is 26% larger in area than the 7T1R cell [40] owing
QPBLm [63:0] and the results of 64 ADCs are individually to the two additional transistors and is 0.86 × that of the 8T1C
summed to achieve LTP. Each value in QVBLm [63:0] is summed cell [25].

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.
100 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2023

Fig. 16. Reconstructed weight matrices of 10 neurons before learning (a) and
(b) after on-chip unsupervised learning; and (c) distributions of PHPTG voltage
Fig. 15. Flowchart of the on-chip unsupervised learning of the proposed SNN. before and after learning.

The working flow chart of our testbench is shown in Fig. 15,


including off-chip software preprocessing and on-chip unsu-
pervised learning controlled by the state machine in the sys-
tem controller (SC). The verification for benchmark is carried
out with the resized Modified National Institute of Standards
and Technology (MNIST) dataset. The pulse-coding method
adopted is the Poisson distribution spike train, which is the same
as our previous work [16], whose firing rate is proportional to
the pixel value. Larger pixel values prompt higher firing rates.
To evaluate the recognition accuracy, after all of the training
images have been encoded and subjected to unsupervised STDP
learning, neurons in the excitatory layer are classified using the
methods proposed in [14] and [16].
To meet the requirement of the test platform, the original size
of the 28 × 28 MNIST training and test images was resized to 8
× 8 and 16 × 16 for examination of the power consumption and
functionality of the chip, and software evaluation of the accuracy, Fig. 17. Weight distributions under (a) 1-bit, (b) 2-bit, and (c) 4-bit weight
respectively. We examined the images with the weight precisions precision.
of 1 b, 2 bits, and 4 bits, respectively. The corresponding PHPTG
precisions are 6 bits, 7 bits, and 8 bits, respectively. At the
beginning of the test, synaptic weights and neuronal threshold the network has not learned anything yet. With STDP learning
voltages were randomly generated by off-chip software and proceeding, neurons appear to converge to a specific number, as
written into the LIFNM. After 60,000 un-labeled images in the shown in Fig. 16(b). With the assistance of STDP, neurons in
resized testing set were encoded and sent to the fabricated chip the excitatory layer form a fixed connection, which is distinctive
for learning, the excitatory layer is classified [14], [16]. After due to random weights and thresholds generated before learning.
training, we used the testing set for inference to obtain accuracy. Fig. 16(c) shows the distributions of PHPTG voltage before
We also read out the trained weights through the chip I/Os for and after learning. It can be observed that after learning, the
more detailed analysis. PHPTH voltage distribution changes from random to a specific
Fig. 16(a) and (b) shows the weight matrices before and after distribution.
learning. Each image in Fig. 16(a) and (b) is a reconstructed Fig. 17(a), (b) and (c) present the distributions of the learned
matrix of synaptic weights from all input neurons to one neuron synaptic weights under different precisions (1 b, 2 b and 4 b).
in the excitatory layer. As can be seen in Fig. 16(a), before It can be seen that under each precision, most weights are zero,
learning, each weight matrix is randomly distributed because indicating that the synaptic weights have sparsity. This means

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: AREA- AND ENERGY-EFFICIENT SPIKING NEURAL NETWORK 101

TABLE I
COMPARISON OF THE PROPOSED SNN MODEL WITH THE SNN MODEL IN [16]

Fig. 19. (a) Measured and (b) simulated reconstructed weight matrices using
(c) an original 8 × 8 face image; and simulated reconstructed weight matrices
under (d) 1-bit, (e) 2 b, (f) 4 b weight precisions using (g) an original 19 × 19
face image.

Fig. 18. Simulated recognition accuracy as a function of the number of exci-


tatory neurons for the 8-bit, 7-bit, and 6-bit PHPTG voltages. The classification reconstructed weight matrices under different weight precisions.
method was proposed in [16]. It can be observed that as the weight precision increased from 1 b
to 4 b, the similarity between the weight matrix and the original
image (Fig. 19(g)) gradually increases. For quantitative analysis,
that there is no connection between the neurons in the input layer we introduce the concept of normalized root mean square error
and the neuron in the excitatory layer. This is also biologically (NRMSE) [29] to estimate the learning accuracy as follows:
interpretable, as each neuron only remains connected with a N RM SE
subset of its related neurons after a particular training. The 
second largest amount of weights is the largest weight under 1 Nimage Npixel
each precision, and the amounts of other weights are very small. = (xj,k − x̂j,k )2
Nimage · Npixel j=1 k=1
In addition, since most of the synaptic weights are zero, the
(8)
left plate of the capacitor in the proposed 9T1C PIM SRAM
cell in Fig. 2(a) does not frequently discharge during training where Nimage and Npixel represent the numbers of images and
and inference. Therefore, the proposed SRAM macro is energy pixels, respectively; and xj,k and x̂j,k are normalized in the range
efficient. of 0 to 1. After 10 of 8 × 8 images in [41] are applied to our
Table I shows the comparison between the proposed SNN chip for learning, we achieve an NRMSE of 0.26. The untrained
model and that in [16]. When the number of excitatory neurons NRMSE is 0.57; and the NRMSE for simulation is 0.25. The
is set to 1600, and the same classification method presented slight differences between measurement and simulation illus-
in [16] is used, the simulated recognition accuracy of the MNIST trate the robustness of the proposed architecture. For 10 of 19
dataset reaches 90%, which is a bit lower than [16] due to the × 19 images, the NRMSE under 1-bit, 2-bit, and 4-bit weight
simplification of the inhibitory layer and lower weight precision. precisions are 0.39, 0.29, 0.27, respectively, while the untrained
Note that in our model, since the inhibitory layer is optimized, NRMSE is 0.55, 0.43, 0.38, respectively. This means that the
the number of weights is reduced for 1600 × 1600 compared proposed unsupervised learning results in better NRMSE as the
with [16]. Furthermore, the weight precision is 4-bit in this work. weight precision increases.
In [16], the weight precision is 32-bit, resulting in a memory The PIM-based macro in this work has 2 KB (64 × 128
size of 8.63 MB, which is 172.6 times larger than that in our RMPMA and 64 × 128 PHPTG) capacity and supports recon-
model. Even under the same accuracy (90%), Ref. [16] still figurable multibit weights/neurons. The chip supply voltage is
requires a memory size of about 735 KB, which is about 14 1.2 V. When neurons work at 50 MHz in 1-bit mode, the chip
times larger than ours. When using the classification method achieves a throughput of 204.8 GOPS, consuming 29.8 nJ for
in [16] for evaluation, the simulated recognition accuracy as a unsupervised learning and 22.5 nJ for inference. Fig. 20 shows
function of the number of excitatory neurons for the 8-bit, 7-bit, the power consumption measurements. 32% of the total power
6-bit PHPTG voltages is presented in Fig. 18. The recognition is consumed by the spike trace module, for tracing the input
accuracy is increased with the increase in neuron number and spikes and charging/discharging the capacitors in the RMPMA
the precision of PHPTG voltage. module consume most of the power. In addition, the weight
For further exploration, a more complicated task based on in RMPMA is sparse, causing most of the RMPMA cells not
Face Data Set [41] was used for unsupervised learning in this to charge/discharge. The PHPTG consumes about 31% of the
work. Fig. 19(a) and (b) show the measured, simulated recon- total power, similarly for charging/discharging the capacitors.
structed weight matrices using the original 8 × 8 face image in The RMPMA module consumes only 3.4% of the total power
Fig. 19(c), respectively. Fig. 19(d), (e) and (f) show the simulated because charging/discharging is power-supplied by the spike

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.
102 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2023

TABLE II
COMPARISON OF THE SNN CHIP OF THIS WORK WITH OTHERS REPORTED IN THE LITERATURE

Co-design of algorithm and hardware for hardware-friendly


SNN and efficient STDP-based learning methodology is used
to improve area and energy efficiency. With the reconfigurable
multi-bit precision and hardware-friendly STDP, improvement
of learning efficiency of SNN has been achieved. An on-chip
unsupervised learning is demonstrated and the handwritten
digit recognition is successfully implemented using the MNIST
dataset. PIM-based on-chip SNN with a low learning energy
Fig. 20. Measured power consumption of the proposed SNN SRAM macro.
of 0.47 nJ/pixel and a high learning energy efficiency of 70.38
TOPS/W has been successfully realized on our chip.

trace module. The remaining 33.6% of the power consumption


is mainly attributed to the system controller, comparators, digital
REFERENCES
accumulations, and ADC operations in STDP WUM.
Table II shows the comparison of the proposed SRAM macro [1] B. Varghese, N. Wang, S. Barbhuiya, P. Kilpatrick, and D. S. Nikolopoulos,
“Challenges and opportunities in edge computing,” in Proc. IEEE Int.
in this work with the SNN chips recently reported in works of Conf. Smart Cloud, 2016, pp. 20–26.
literature [12], [14], [29], [30]. Both the proposed architecture [2] M. Chan, D. Estève, J.-Y. Fourniols, C. Escriba, and E. Campo, “Smart
and [12] are PIM-based mixed-signal circuits (MSC). Refs. [14], wearable systems: Current status and future challenges,” Artif. Intell. Med.,
vol. 56, no. 3, pp. 137–156, 2012.
[29], [30] are realized with conventional digital, MSC, and [3] S. Li, L. D. Xu, and S. Zhao, “The Internet of Things: A survey,” Inf. Syst.
field programmable gate array (FPGA) circuits, respectively, Front., vol. 17, no. 2, pp. 243–259, 2015.
which may cause a larger area overhead (e.g., 5.98 mm2 [4] J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neu-
ral networks using backpropagation,” Front. Neurosci., vol. 10, 2016,
in [14]) and higher learning energy (e.g., 29.8 nJ/Pixel in [30]). Art. no. 508.
Due to the integration of PIM cells and neurons, our work [5] Y. Jin, W. Zhang, and P. Li, “Hybrid macro/micro level backpropagation for
achieves lower learning energy (0.47 nJ/Pixel), higher learning training deep spiking neural networks,” in Proc. Adv. Neural Inf. Process.
Syst., 2018, pp. 7005–7015.
efficiency (70.38 TOPS/W), and higher area efficiency (975.2 [6] Y. Wu, L. Deng, G. Li, J. Zhu, Y. Xie, and L. Shi, “Direct training for
GOPS/mm2 ) compared with those reported in [12], [14], [29], spiking neural networks: Faster, larger, better,” in Proc. AAAI Conf. Artif.
[30]. When using the classification method in [14], the accuracy Intell., 2019, pp. 1311–1318.
[7] A. N. Gomez, M. Ren, R. Urtasun, and R. B. Grosse, “The reversible
can approach 92.1%, compared with 85%, 96.6%, and 82.52% residual network: Backpropagation without storing activations,” in Proc.
reported in [12], [14], [30], respectively. As can be observed in Adv. Neural Inf. Process. Syst., 2017, pp. 2211–2221.
the table, our design is competitive with those reported in works [8] T. Hastie, R. Tibshirani, and J. Friedman, “Unsupervised learning,” in
The Elements of Statistical Learning. Berlin, Germany: Springer, 2009,
of literature [12], [14], [29], [30]. pp. 485–585.
[9] S. J. Cooper, “Donald O. Hebb’s synapse and learning rule: A history and
commentary,” Neurosci. Biobehavioral Rev., vol. 28, no. 8, pp. 851–874,
V. CONCLUSION 2005.
[10] P. U. Diehl and M. Cook, “Unsupervised learning of digit recognition using
This paper introduces a PIM-based SNN utilizing on-chip spike-timing-dependent plasticity,” Front. Comput. Neurosci., vol. 9, 2015,
unsupervised learning with area- and energy-efficient STDP. Art. no. 99.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: AREA- AND ENERGY-EFFICIENT SPIKING NEURAL NETWORK 103

[11] G. K. Chen, R. Kumar, H. E. Sumbul, P. C. Knag, and R. K. Krishnamurthy, [34] J. Gowda, J. Bharadwaj, N. H. Sastry, C. R. Patel, and B. Sangeetha, “4-Bit
“A 4096-neuron 1M-synapse 3.8-pJ/SOP spiking neural network with on- R-2R DAC in 18nm FinFET technology,” in Proc. IEEE Int. Conf. Circuits,
chip STDP learning and sparse weights in 10-nm FinFET CMOS,” IEEE Controls Commun., 2021, pp. 1–5.
J. Solid-State Circuits, vol. 54, no. 4, pp. 992–1002, Apr. 2019. [35] B. Razavi, “The strongARM Latch [a circuit for all seasons],” IEEE Solid-
[12] D. Kim, X. She, N. M. Rahman, V. C. K. Chekuri, and S. Mukhopad- State Circuits Mag., vol. 7, no. 2, pp. 12–17, 2015.
hyay, “Processing-in-memory-based on-chip learning with spike-time- [36] Y. Chen, L. Lu, B. Kim, and T. T.-H. Kim, “Reconfigurable 2T2R ReRAM
dependent plasticity in 65-nm CMOS,” IEEE Solid-State Circuits Lett., architecture for versatile data storage and computing in-memory,” IEEE
vol. 3, pp. 278–281, 2020. Trans. Very Large Scale Integration Syst., vol. 28, no. 12, pp. 2636–2649,
[13] G. Kim, K. Kim, S. Choi, H. J. Jang, and S.-O. Jung, “Area-and energy- Dec. 2020.
efficient STDP learning algorithm for spiking neural network SoC,” IEEE [37] J. Wang et al., “Winner-takes-all mechanism realized by memristive neural
Access, vol. 8, pp. 216922–216932, 2020. network,” Appl. Phys. Lett., vol. 115, no. 24, 2019, Art. no. 243701.
[14] H. Kim, H. Tang, W. Choi, and J. Park, “An energy-quality scalable STDP [38] E. J. Choi et al., “SRAM-based computing-in-memory macro with fully
based sparse coding processor with on-chip learning capability,” IEEE parallel one-step multibit computation,” IEEE Solid-State Circuits Lett.,
Trans. Biomed. Circuits Syst., vol. 14, no. 1, pp. 125–137, Feb. 2020. vol. 5, pp. 234–237, 2022.
[15] J. Park and S.-D. Jung, “Presynaptic spike-driven spike timing-dependent [39] B. Zhang et al., “PIMCA: A programmable in-memory computing accel-
plasticity with address event representation for large-scale neuromorphic erator for energy-efficient dnn inference,” IEEE J. Solid-State Circuits, to
systems,” IEEE Trans. Circuits Syst. I: Regular Papers, vol. 67, no. 6, be published, doi: 10.1109/JSSC.2022.3211290.
pp. 1936–1947, Jun. 2020. [40] Z. Lin, Y. Wang, C. Peng, X. Wu, X. Li, and J. Chen, “Multiple sharing
[16] G. Qiao et al., “A neuromorphic-hardware oriented bio-plausible 7T1R nonvolatile SRAM with an improved read/write margin and reliable
online-learning spiking neural network model,” IEEE Access, vol. 7, restore yield,” IEEE Trans. Very Large Scale Integrat. Syst., vol. 28, no. 3,
pp. 71730–71740, 2019. pp. 607–619, 2019.
[17] J. Wu et al., “Efficient design of spiking neural network with STDP learning [41] “CBCL Face Database.” Accessed: Feb. 01, 2023. [Online]. Available:
based on fast CORDIC,” IEEE Trans. Circuits Syst. I: Regular Papers, https://github.com/galeone/face-miner/tree/master/datasets/mitcbcl
vol. 68, no. 6, pp. 2522–2534, Jun. 2021.
[18] Z. Zhao et al., “A memristor-based spiking neural network with high
scalability and learning efficiency,” IEEE Trans. Circuits Syst. II: Exp.
Briefs, vol. 67, no. 5, pp. 931–935, May 2020.
[19] Y. Zhou et al., “Complementary memtransistor-based multilayer neural
networks for online supervised learning through (anti-) spike-timing- Shuang Liu received the B.S. degree in microelec-
dependent plasticity,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, tronics from the University of Electronic Science
no. 11, pp. 6640–6651, Nov. 2022. and Technology of China, Chengdu, China, where
[20] Y. Guo, H. Wu, B. Gao, and H. Qian, “Unsupervised learning on resistive he is currently working toward the Ph.D. degree.
memory array based spiking neural networks,” Front. Neurosci., vol. 13, His research interests include processing-in-memory
2019, Art. no. 812. circuits and neuromorphic systems.
[21] D. Ielmini, “Brain-inspired computing with resistive switching memory
(RRAM): Devices, synapses and neural networks,” Microelectronic Eng.,
vol. 190, pp. 44–53, 2018.
[22] J. Backus, “Can programming be liberated from the von Neumann style?
a functional style and its algebra of programs,” Commun. ACM, vol. 21,
no. 8, pp. 613–641, 1978.
[23] X. Huang, C. Liu, Y.-G. Jiang, and P. Zhou, “In-memory comput-
ing to break the memory wall,” Chin. Phys. B, vol. 29, no. 7, 2020, J. J. Wang received the Ph.D. degree in microelec-
Art. no. 0 78504. tronics from the University of Electronic Science
[24] D. Saito et al., “Analog in-memory computing in FeFET-based 1T1R and Technology of China, Chengdu, China. He is
array for edge AI applications,” in Proc. IEEE Symp. VLSI Technol., 2021, currently a Postdoctoral Researcher with the Univer-
pp. 1–2. sity of Electronic Science and Technology of China.
[25] Z. Jiang, S. Yin, J.-S. Seo, and M. Seok, “C3SRAM: An in-memory- His research interests include digital circuit design,
computing SRAM macro based on robust capacitive coupling computing nonvolatile memory devices, and their applications
mechanism,” IEEE J. Solid-State Circuits, vol. 55, no. 7, pp. 1888–1897, in artificial intelligence.
Jul. 2020.
[26] X. Si et al., “A local computing cell and 6T SRAM-based computing-
in-memory macro with 8-b MAC operation for edge AI chips,” IEEE J.
Solid-State Circuits, vol. 56, no. 9, pp. 2817–2831, Sep. 2021.
[27] H. Valavi, P. J. Ramadge, E. Nestler, and N. Verma, “A 64-Tile 2.4-
Mb in-memory-computing CNN accelerator employing charge-domain J. T. Zhou received the B.S. degree in microelec-
compute,” IEEE J. Solid-State Circuits, vol. 54, no. 6, pp. 1789–1799, tronics from the University of Electronic Science
Jun. 2019. and Technology of China, Chengdu, China, where
[28] X. Si et al., “A dual-split 6t SRAM-based computing-in-memory unit- he is currently working toward the M.S. degree. His
macro with fully parallel product-sum operation for binarized DNN edge research interests include neuromorphic computation
processors,” IEEE Trans. Circuits Syst. I: Regular Papers, vol. 66, no. 11, and digital IC design.
pp. 4172–4185, Nov. 2019.
[29] H. Cho, H. Son, K. Seong, B. Kim, H.-J. Park, and J.-Y. Sim, “An on-
chip learning neuromorphic autoencoder with current-mode transposable
memory read and virtual lookup table,” IEEE Trans. Biomed. Circuits
Syst., vol. 12, no. 1, pp. 161–170, Feb. 2018.
[30] S. Li, Z. Zhang, R. Mao, J. Xiao, L. Chang, and J. Zhou, “A fast and
energy-efficient SNN processor with adaptive clock/event-driven compu-
tation scheme and online learning,” IEEE Trans. Circuits Syst. I: Regular S. G. Hu received the Ph.D. degree in microelec-
Papers, vol. 68, no. 4, pp. 1543–1552, Apr. 2021. tronics from the University of Electronic Science
[31] L. F. Abbott, “Lapicque’s introduction of the integrate-and-fire model and Technology of China, Chengdu, China. His
neuron (1907),” Brain Res. Bull., vol. 50, no. 5/6, pp. 303–304, 1999. research interests include thin-film transistor, non-
[32] R. O. Topaloglu, “Process variation-aware multiple-fault diagnosis of volatile memory devices, and their applications in
thermometer-coded current-steering DACs,” IEEE Trans. Circuits Syst. artificial intelligence. Since 2016, he has been an
II: Exp. Briefs, vol. 54, no. 2, pp. 191–195, Feb. 2007. Associate Professor with the University of Electronic
[33] D. Marche and Y. Savaria, “Modeling R−2R segmented-ladder dacs,” Science and Technology of China.
IEEE Trans. Circuits Syst. I: Regular Papers, vol. 57, no. 1, pp. 31–43,
Jan. 2010.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.
104 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2023

Q. Yu received the Ph.D. degree from the Univer- Y. Liu received the B.Sc. degree in microelectronics
sity of Electronic Science and Technology of China from Jilin University, Changchun, China, in 1998, and
(UESTC), Chengdu, China, in 2010. He is currently the Ph.D. degree from Nanyang Technological Uni-
a Professor and the Vice Dean with the School of Mi- versity, Singapore, in 2005. From May 2005 to July
croelectronics and Solid-State electronics, UESTC. 2006, he was a Research Fellow with Nanyang Tech-
nological University. In 2008, he joined the School
of Microelectronics, University of Electronic Science
and Technology of China, Chengdu, China, as a Full
Professor. He is the author or coauthor of more than
130 peer-reviewed journal papers and more than 100
conference papers. His research includes memristor
neural network system, neuromorphic computing ICs, and AI-RFICs. In 2006,
T. P. Chen received the Ph.D. degree from The he was the recipient of the prestigious Singapore Millennium Foundation Fel-
University of Hong Kong, Hong Kong, in 1994. He lowship, and one US patent and more than 30 China patents.
is currently an Associate Professor with the School
of Electrical and Electronic Engineering, Nanyang
Technological University, Singapore.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR. Downloaded on December 22,2023 at 10:30:36 UTC from IEEE Xplore. Restrictions apply.

You might also like