Hai Jin
Hai Jin
This work was supported by the EU’s Horizon Europe Research and Innovation Programme under Grant 101070679.
ABSTRACT In-memory computing (IMC) aims at executing numerical operations via physical processes,
such as current summation and charge collection, thus accelerating common computing tasks including the
matrix-vector multiplication. While extremely promising for memory-intensive processing such as machine
learning and deep learning, the IMC design and realization must face significant challenges due to device
and circuit nonidealities. This work provides an overview of the research trends and options for IMC-
based implementations of deep learning accelerators with emerging memory technologies. The device
technologies, the computing primitives, and the digital/analog/mixed design approaches are presented.
Finally, the major device issues and metrics for IMC are discussed and benchmarked.
INDEX TERMS In-memory computing, deep learning, deep neural network, emerging memory technolo-
gies, matrix-vector multiplication.
I. INTRODUCTION itself [9], [10]. The range of operations that can be exe-
Today, artificial intelligence and its enabling technology, the cuted within memory devices includes stateful logic [11],
deep neural networks (DNN), have become largely popular in [12], pulse integration [13], [14], associative memory [15],
various applications such as image recognition, autonomous [16], and stochastic computing [17]. The most popular and
vehicles, speech recognition, and natural language process- enabling IMC operation is, however, matrix-vector multipli-
ing. In the last five years, a state-of-the-art deep neural cation (MVM) via Ohm’s and Kirchhoff’s law in a memory
network model increased the number of its parameters by array [18], [19]. IMC has been thus largely targeted for
about 4 orders of magnitude, leading to a significant increase hardware accelerators of DNN, where MVM is by far the
in computational and memory requirements for both the most intensive workload. The ability to execute MVM in a
training and the inference operations [1], [2], [3], [4], [5], single operation by activating all rows and all columns in
[6]. Traditional computing systems (Fig. 1a) typically store parallel represents a key benefit of IMC that is unrivaled
massive information on a memory unit that is physically by other technologies. Despite the simplicity of the MVM
connected to the computational unit by a data bus. The concept and the potential advantages of IMC, the design
continuous data movement between the processing and the options and the interaction between circuit operation and
memory units represents the main bottleneck due to the lim- device nonidealities still represent a key open challenge.
ited bandwidth, long latency, sequential data processing, and This work provides an overview of IMC for DNN accel-
high energy consumption [7], [8]. eration from the perspectives of device technology, circuit
To minimize the latency and energy overhead of con- design, device-circuit interaction, and its impact on comput-
ventional von Neumann computers, in-memory comput- ing accuracy. Section II illustrates the emerging nonvolatile
ing (IMC) aims at performing the computation in close memory technologies that are currently considered for IMC.
proximity to the memory or even in situ within the memory Section III presents an overview of various IMC circuit
c 2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/
FIGURE 2. Graphic representation of the main emerging memory devices. (a) Resistive random access memory (RRAM). (b) Phase change memory (PCM).
(c) Ferroelectric random access memory (FeRAM). (d) Spin-transfer torque magnetic random access memory (STT-MRAM). (e) Ferroelectric field-effect
transistor (FeFET). (f) Spin-orbit torque magnetic random access memory (SOT-MRAM). (g) Electrochemical random access memory (ECRAM).
(h) Memtransistor device.
Devices in Fig. 2a-d have a two-terminal structure, which [86], [87]. The memory behavior can be obtained by migra-
makes them suitable for high-density crosspoint architec- tion of dislocations in polycrystalline MoS2 [88], lateral
tures [10]. In many cases, two-terminal devices are connected migration of Ag across the source/drain electrodes [85], or
to an access transistor resulting in a one-transistor/one- charge-trapping [89]. In some cases, MoS2 memtransistors
resistor (1T1R) structure with improved control of the device display gradual weight-update characteristics that are useful
current during programming and readout. Alternatively, for reservoir computing [89] and training accelerators [90].
three-terminal devices have been proposed. The ferroelec-
tric field-effect transistor (FeFET in Fig. 2e) consists of A. COMPARISON OF NVM TECHNOLOGIES
a field-effect transistor in which the gate stack contains In order to summarize and provide some quantitative
a ferroelectric layer [78]. The ferroelectric polarization is information, Table 1 shows a comparison between the main
reflected by the threshold voltage VT of the device, result- emerging memories and the charge-based CMOS memo-
ing in a memory effect similar to floating gate devices. ries [91]. Fig. 3a shows a correlation plot of speed, evaluated
FeFET arrays with ferroelectric HfO2 have been recently as the inverse of the read time, and density, evaluated as
demonstrated [35], [79]. the inverse of the cell area. Data from the literature are
The spin-orbit torque magnetic random access memory compared to the typical ranges for CMOS-based conven-
(SOT-MRAM in Fig. 2f) consists of a magnetic tunnel junc- tional memory technologies, such as SRAM, DRAM, and
tion (MTJ) structure deposited on top of a line of heavy NAND Flash. The performance/cost of emerging NVM is
metal, such as Pt or W [80]. The MTJ is programmed in usually intermediate between CMOS memories, where speed
a P/AP state by a current flowing across the heavy-metal approaches DRAM whereas density is still generally between
line via spin-orbit coupling. The cell is read by sensing the SRAM and DRAM.
MTJ resistance, as in the STT-MRAM. The three-terminal Fig. 3b shows the array size as a function of the tech-
structure allows the separation of the programming and the nology node for various NVM demonstrators. The capacity
reading paths, improving the cycling endurance and the write spans the whole range from embedded memory (1-100 MB)
speed [81]. to standalone memory (1-100 GB). Note that smaller tech-
The electrochemical random access memory (ECRAM in nology nodes do not necessarily lead to higher array capacity,
Fig. 2g) consists of a transistor device where the conductivity which is due to the different maturity levels of the technolo-
of the channel is modified in a nonvolatile way and can be gies. Fig. 3c shows the memory capacity of some NVM
reversed by injecting ionized dopants across an electrolyte demonstrators as a function of the year, highlighting the
layer [82]. ECRAM generally shows high endurance and continuous development of various memory technologies.
extremely low-power consumption thanks to the low mobility
channel, for instance, WO3 [83]. ECRAM also exhibits a III. IN-MEMORY MATRIX-VECTOR MULTIPLICATION
controllable, linear weight update that is suitable for training Most IMC implementations aim at accelerating matrix-vector
accelerators [82], [84]. multiplication (MVM), which is by far the most essential
The memtransistor (Fig. 2h) consists of a transistor device computing primitive in deep learning and machine learn-
with a 2D semiconductor material for the channel layer [85], ing [92]. Fig. 4 shows a sketch of the MVM concept
FIGURE 3. Performances and characteristics of various emerging memory demonstrators. (a) Memory speed (expressed as the inverse of the read time)
as a function of the device miniaturization (expressed as the inverse of the cell size) [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41],
[42], [43], [44], [45], [46], [47], [48], [49], [50]. (b) Memory capacity as a function of the technology node [29], [30], [31], [32], [33], [34], [35], [36], [37], [38],
[39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67].
(c) Memory array capacity during years [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50],
[51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67].
N
Ii = Gi,j · Vj , (1) FIGURE 4. Crosspoint memory array based on resistive memories can
j perform matrix-vector multiplication directly in situ, by means of Ohm’s
law and Kirchhoff’s current law. By applying a voltage vector at the
where Gi,j is the conductance of the memory device at a columns, the analog conductive elements produce a current that is
certain position i, j, Vj the voltage applied at the jth column collected at the rows, conveniently biased at 0 V. The resulting output
current vector is the multiplication of the conductance matrix G with the
and N is the number of columns and rows [10], [93]. voltage vector V.
MVM can thus be carried out by physical laws, in situ,
without modifying or moving the stored parameters [10].
Most importantly, thanks to the inherent parallelism of the array, similar to Fig. 4, where device conductances can
array, the MVM computation is virtually performed in one be programmed in the binary [94], [95] or multilevel
step independently of the size of the matrix, thus achiev- domain [96], [97]. Steady-state currents collected at the
ing an outstanding time complexity of O(1). Note that the grounded rows are generally acquired by a readout chain
memory array is typically compatible with the BEOL pro- consisting of a transimpedance amplifier (TIA) and an
cess, allowing for 3D stacking and a memory density scalable analog-to-digital converter (ADC) [98]. A major limitation of
down to 4F 2 /N, where N is the number of stacked layers this architecture is the programming operation, where volt-
and F is the feature size of the lithographic process. age/currents might be difficult to control [99]. In particular,
Depending on the required specifications and the memory when applying various programming schemes [100], [101],
devices, various IMC implementations of MVM acceler- a certain number of half-selected cells experience a non-
ators are possible. Fig. 5a shows the resistive crosspoint negligible leakage current.
FIGURE 6. Example of applications that benefit from IMC matrix-vector multiplication. Depending on the frequency update requirements and the noise
sensitivity of the application, each hardware solution should combine memory devices with specific physical properties with adequate peripheral circuits.
For instance, applications that rely on one-time programming of weight values after an ex situ software-based training (e.g., DNN inference, CL-IMC, and
DCT) can trade off the need for accurate tuning algorithms with less stringent requirements on the cycling endurance of the device itself. On the other
hand, applications that demand frequent and continuous updates of the conductance matrix (e.g., DNN training) require efficient gradual programming
and endurance capabilities of the adopted memory device. Image “Pillars of Creation” from James Webb Space Telescope gallery [110].
FIGURE 7. DNN inference workload mainly consists of MVM, which is basically a Multiply-and-Accumulate operation. Crosspoint accelerators of DNN
inference can be classified depending on the way these two operations are performed. A fully digital approach relies on memory logic gates
implementing an XNOR-Multiply and on a counter for the accumulation. A mixed digital-analog approach requires an analog accumulation via Kirchhoff’s
current law (KCL). A fully analog approach relies on resistive elements, that allow the encoding of multilevel weights and activations. Going from digital
to analog, the parallelism and the information density of the accelerator increase, at the expenses of more severe parasitic effects and more complex
peripheral circuits. Further explorations of the fully analog approach are needed to unleash the potential of IMC for DNN inference acceleration.
weight update are ECRAM devices [125] and MoS2 -based being performed by analog or digital operations, three dif-
charge-trap memory [89], [90]. ferent options can be identified for MVM accelerators, as
depicted in Fig. 7.
IV. IN-MEMORY ACCELERATION OF DNN INFERENCE
The computational workload of a DNN mostly consists of A. FULLY DIGITAL CIRCUITS
MVM with variable input vectors and stationary weight The fully digital approach relies on memory logic gates to
matrices, which can be directly accelerated by a memory perform the multiplication, and on counters to perform the
array. Depending on multiply and accumulate operations sequential accumulation. To encode the binary alphabet of a
FIGURE 13. (a) Plot of the correlation between the conductance value G,
and its standard deviation, for a certain technology. (b) The simulated
relative current error of the MVM product as a function of the matrix size.
Device parameters were extracted from [79], [166], [184], [185].
[17] S. Gaba, P. Knag, Z. Zhang, and W. Lu, “Memristive devices [37] C.-C. Chou et al., “An N40 256K×44 embedded RRAM macro
for stochastic computing,” in Proc. IEEE Int. Symp. Circuits Syst. with SL-precharge SA and low-voltage current limiter to improve
(ISCAS), Jun. 2014, pp. 2592–2595. read and write performance,” in Proc. IEEE Int. Solid-State Circuits
[18] S. N. Truong and K.-S. Min, “New memristor-based cross- Conf. (ISSCC), Feb. 2018, pp. 478–480.
bar array architecture with 50-% area reduction and 48-% [38] T. Kim et al., “High-performance, cost-effective 2z nm two-deck
power saving for matrix-vector multiplication of analog neuro- cross-point memory integrated by self-align scheme for 128 Gb
morphic computing,” J. Semicond. Technol. Sci., vol. 14, no. 3, SCM,” in Proc. IEEE Int. Electron Devices Meeting (IEDM),
pp. 356–363, 2014. [Online]. Available: https://koreascience.kr/ Dec. 2018, pp. 37.1.1–37.1.4.
article/JAKO201420249945718.page [39] F. Arnaud et al., “Truly innovative 28nm FDSOI technology for auto-
[19] C. Li et al., “Analogue signal and image processing with large motive micro-controller applications embedding 16MB phase change
memristor crossbars,” Nat. Electron., vol. 1, no. 1, pp. 52–59, 2018. memory,” in Proc. IEEE Int. Electron Devices Meeting (IEDM),
[20] D. Keitel-Schulz and N. Wehn, “Embedded DRAM development: Dec. 2018, pp. 18.4.1–18.4.4.
Technology, physical design, and application issues,” IEEE Des. Test [40] Y.-C. Shih et al., “Logic process compatible 40-nm 16-Mb, embedded
Comput., vol. 18, no. 3, pp. 7–15, May/Jun. 2001. perpendicular-MRAM with hybrid-resistance reference, sub-µ
[21] B. Yan et al., “A 1.041-Mb/mm2 27.38-TOPS/W signed-INT8 a sensing resolution, and 17.5-nS read access time,” IEEE J. Solid-
dynamic-logic-based ADC-less SRAM compute-in-memory macro State Circuits, vol. 54, no. 4, pp. 1029–1038, Apr. 2019.
in 28nm with reconfigurable bitwise operation for AI and embedded [41] L. Wei et al., “13.3 a 7Mb STT-MRAM in 22FFL FinFET technology
applications,” in Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC), with 4ns read sensing time at 0.9V using write-verify-write scheme
vol. 65, Feb. 2022, pp. 188–190. and offset-cancellation sensing technique,” in Proc. IEEE Int. Solid-
[22] Y.-D. Chih et al., “16.4 an 89TOPS/W and 16.3TOPS/mm2 all-digital State Circuits Conf. (ISSCC), Feb. 2019, pp. 214–216.
SRAM-based full-precision compute-in memory macro in 22nm for [42] P. Jain et al., “13.2 a 3.6Mb 10.1Mb/mm2 embedded non-volatile
machine-learning edge applications,” in Proc. IEEE Int. Solid- State ReRAM macro in 22nm FinFET technology with adaptive form-
Circuits Conf. (ISSCC), vol. 64, Feb. 2021, pp. 252–254. ing/set/reset schemes yielding down to 0.5V with sensing time of
[23] A. Agrawal et al., “Xcel-RAM: Accelerating binary neural networks 5ns at 0.7V,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC),
in high-throughput SRAM compute arrays,” IEEE Trans. Circuits Feb. 2019, pp. 212–214.
Syst. I, Reg. Papers, vol. 66, no. 8, pp. 3064–3076, Aug. 2019. [43] Y.-D. Chih et al., “13.3 a 22nm 32Mb embedded STT-MRAM
[24] R. Waser and M. Aono, “Nanoionics-based resistive switch- with 10ns read speed, 1M cycle write endurance, 10 years reten-
ing memories,” in Nanoscience and Technology. London, U.K.: tion at 150◦ C and high immunity to magnetic field interference,”
Macmillan, Aug. 2009, pp. 158–165. [Online]. Available: http:// in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), Feb. 2020,
www.worldscientific.com/doi/abs/10.1142/9789814287005_0016 pp. 222–224.
[25] D. Ielmini, “Resistive switching memories based on metal oxides: [44] Y.-C. Shih et al., “A reflow-capable, embedded 8Mb STT-MRAM
Mechanisms, reliability and scaling,” Semicond. Sci. Technol., vol. 31, macro with 9nS read access time in 16nm FinFET logic CMOS
no. 6, Jun. 2016, Art. no. 063002. [Online]. Available: https:// process,” in Proc. IEEE Int. Electron Devices Meeting (IEDM),
iopscience.iop.org/article/10.1088/0268-1242/31/6/063002 Dec. 2020, pp. 11.4.1–11.4.4.
[26] H.-S. P. Wong et al., “Metal–oxide RRAM,” Proc. IEEE, vol. 100, [45] D. Edelstein et al., “A 14 nm embedded STT-MRAM CMOS
no. 6, pp. 1951–1970, Jun. 2012. [Online]. Available: http:// technology,” in Proc. IEEE Int. Electron Devices Meeting (IEDM),
ieeexplore.ieee.org/document/6193402/ Dec. 2020, pp. 11.5.1–11.5.4.
[27] S. Balatti, S. Larentis, D. C. Gilmer, and D. Ielmini, “Multiple [46] V. B. Naik et al., “JEDEC-qualified highly reliable 22nm
memory states in resistive switching devices through controlled size FD-SOI embedded MRAM for low-power industrial-grade, and
and orientation of the conductive filament,” Adv. Mater., vol. 25, extended performance towards automotive-grade-1 applications,” in
no. 10, pp. 1474–1478, Mar. 2013. [Online]. Available: https:// Proc. IEEE Int. Electron Devices Meeting (IEDM), Dec. 2020,
onlinelibrary.wiley.com/doi/10.1002/adma.201204097 pp. 11.3.1–11.3.4.
[28] S. Yu, Y. Wu, and H.-S. P. Wong, “Investigating the switching dynam- [47] A. Fazio, “Advanced technology and systems of cross point memory,”
ics and multilevel capability of bipolar metal oxide resistive switching in Proc. IEEE Int. Electron Devices Meeting (IEDM), Dec. 2020,
memory,” Appl. Phys. Lett., vol. 98, no. 10, 2011, Art. no. 103514. pp. 24.1.1–24.1.4.
[Online]. Available: https://doi.org/10.1063/1.3564883 [48] J. J. Sun et al., “Commercialization of 1Gb Standalone spin-transfer
[29] C.-C. Chou et al., “A 22nm 96KX144 RRAM macro with a self- torque MRAM,” in Proc. IEEE Int. Memory Workshop (IMW),
tracking reference and a low ripple charge pump to achieve a May 2021, pp. 1–4.
configurable read window and a wide operating voltage range,” in [49] T. Shimoi et al., “A 22nm 32Mb embedded STT-MRAM macro
Proc. IEEE Symp. VLSI Circuits, Jun. 2020, pp. 1–2. achieving 5.9ns random read access and 5.8MB/s write throughput
[30] H. Chung et al., “A 58nm 1.8V 1Gb PRAM with 6.4MB/s pro- at up to Tj of 150◦ C,” in Proc. IEEE Symp. VLSI Technol. Circuits
gram BW,” in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2011, (VLSI Technol. Circuits), Jun. 2022, pp. 134–135.
pp. 500–502. [50] S. M. Seo et al., “First demonstration of full integration and character-
[31] Y. Choi et al., “A 20nm 1.8V 8Gb PRAM with 40MB/s program ization of 4F2 1S1M cells with 45 nm of pitch and 20 nm of MTJ
bandwidth,” in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2012, size,” in Proc. Int. Electron Devices Meeting (IEDM), Dec. 2022,
pp. 46–48. pp. 10.1.1–10.1.4.
[32] T.-Y. Liu et al., “A 130.7 mm2 2-layer 32Gb ReRAM memory device [51] G. Servalli, “A 45nm generation phase change memory technology,”
in 24nm technology,” in IEEE Int. Solid-State Circuits Conf. Dig. in Proc. IEEE Int. Electron Devices Meeting (IEDM), Dec. 2009,
Tech. Papers, Feb. 2013, pp. 210–211. pp. 1–4.
[33] M.-F. Chang et al., “19.4 embedded 1Mb ReRAM in 28nm [52] C. Gopalan et al., “Demonstration of conductive bridging random
CMOS with 0.27-to-1V read using swing-sample-and-couple sense access memory (CBRAM) in logic CMOS process,” in Proc. IEEE
amplifier and self-boost-write-termination scheme,” in IEEE Int. Int. Memory Workshop, May 2010, pp. 1–4.
Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Feb. 2014, [53] S. H. Lee et al., “Highly productive PCRAM technology platform
pp. 332–333. and full chip operation: Based on 4F2 (84nm pitch) cell scheme for
[34] J. Zahurak et al., “Process integration of a 27nm, 16Gb cu 1 Gb and beyond,” in Proc. Int. Electron Devices Meeting, Dec. 2011,
ReRAM,” in Proc. IEEE Int. Electron Devices Meeting, Dec. 2014, pp. 3.3.1–3.3.4.
pp. 6.2.1–6.2.4. [54] A. Kawahara et al., “Filament scaling forming technique and level-
[35] S. Dünkel et al., “A FeFET based super-low-power ultra-fast embed- verify-write scheme with endurance over 107 cycles in ReRAM,” in
ded NVM technology for 22nm FDSOI and beyond,” in Proc. IEEE IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2013,
Int. Electron Devices Meeting (IEDM), Dec. 2017, pp. 19.7.1–19.7.4. pp. 220–221.
[36] Y. J. Song et al., “Demonstration of highly manufacturable STT- [55] M. Ueki et al., “Low-power embedded ReRAM technology for
MRAM embedded in 28nm logic,” in Proc. IEEE Int. Electron IoT applications,” in Proc. Symp. VLSI Technol. (VLSI Technol.),
Devices Meeting (IEDM), Dec. 2018, pp. 18.2.1–18.2.4. Jun. 2015, pp. T108–T109.
[56] C. Park et al., “Systematic optimization of 1 Gbit perpendicular [76] C. Chappert, A. Fert, and F. N. Van Dau, “The emergence of
magnetic tunnel junction arrays for 28 nm embedded STT-MRAM spin electronics in data storage,” Nat. Mater., vol. 6, no. 11,
and beyond,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), pp. 813–823, Nov. 2007. [Online]. Available: https://www.nature.
Dec. 2015, pp. 26.2.1–26.2.4. com/articles/nmat2024
[57] S.-W. Chung et al., “4Gbit density STT-MRAM using perpendicu- [77] R. Carboni et al., “Modeling of breakdown-limited endurance in spin-
lar MTJ realized with compact cell structure,” in Proc. IEEE Int. transfer torque magnetic memory under pulsed cycling regime,” IEEE
Electron Devices Meeting (IEDM), Dec. 2016, pp. 27.1.1–27.1.4. Trans. Electron Devices, vol. 65, no. 6, pp. 2470–2478, Jun. 2018.
[58] Y. J. Song et al., “Highly functional and reliable 8Mb STT-MRAM [Online]. Available: https://ieeexplore.ieee.org/document/8338113/
embedded in 28nm logic,” in Proc. IEEE Int. Electron Devices [78] A. I. Khan, A. Keshavarzi, and S. Datta, “The future of ferro-
Meeting (IEDM), Dec. 2016, pp. 27.2.1–27.2.4. electric field-effect transistor technology,” Nat. Electron., vol. 3,
[59] D. Shum et al., “CMOS-embedded STT-MRAM arrays in 2x nm no. 10, pp. 588–597, Oct. 2020. [Online]. Available: https://www.
nodes for GP-MCU applications,” in Proc. Symp. VLSI Technol., nature.com/articles/s41928-020-00492-7
Jun. 2017, pp. T208–T209. [79] M. Trentzsch et al., “A 28nm HKMG super low power embedded
[60] J. Y. Wu et al., “A 40nm low-power logic compatible phase change NVM technology based on ferroelectric FETs,” in Proc. IEEE Int.
memory technology,” in Proc. IEEE Int. Electron Devices Meeting Electron Devices Meeting (IEDM), Dec. 2016, pp. 11.5.1–11.5.4.
(IEDM), Dec. 2018, pp. 27.6.1–27.6.4. [80] I. M. Miron et al., “Perpendicular switching of a single ferromag-
[61] F. Arnaud et al., “High density embedded PCM cell in 28nm netic layer induced by in-plane current injection,” Nature, vol. 476,
FDSOI technology for automotive micro-controller applications,” no. 7359, pp. 189–193, Aug. 2011. [Online]. Available: http://www.
in Proc. IEEE Int. Electron Devices Meeting (IEDM), Dec. 2020, nature.com/articles/nature10309
pp. 24.2.1–24.2.4. [81] K. Garello et al., “Ultrafast magnetization switching by spin-
[62] C.-F. Yang et al., “Industrially applicable read disturb model and orbit torques,” Appl. Phys. Lett., vol. 105, no. 21, Nov. 2014,
performance on mega-bit 28nm embedded RRAM,” in Proc. IEEE Art. no. 212402. [Online]. Available: https://aip.scitation.org/doi/full/
Symp. VLSI Technol., Jun. 2020, pp. 1–2. 10.1063/1.4902443
[63] S. H. Han et al., “28-nm 0.08 mm2 /Mb embedded MRAM for [82] J. Tang et al., “ECRAM as scalable synaptic cell for high-speed,
frame buffer memory,” in Proc. IEEE Int. Electron Devices Meeting low-power neuromorphic computing,” in Proc. IEEE Int. Electron
(IEDM), Dec. 2020, pp. 11.2.1–11.2.4. Devices Meeting (IEDM), Dec. 2018, pp. 13.1.1–13.1.4. [Online].
Available: https://ieeexplore.ieee.org/document/8614551/
[64] D. Min et al., “18nm FDSOI technology platform embedding PCM
[83] J. Lee, R. D. Nikam, D. Kim, and H. Hwang, “Highly scal-
& innovative continuous-active construct enhancing performance for
able (30 nm) and ultra-low-energy (∼5fJ/pulse) vertical sensing
leading-edge MCU applications,” in Proc. IEEE Int. Electron Devices
ECRAM with ideal synaptic characteristics using ion-permeable
Meeting (IEDM), Dec. 2021, pp. 13.1.1–13.1.4.
Graphene electrodes,” in Proc. Int. Electron Devices Meeting (IEDM),
[65] K. Lee et al., “28nm CIS-compatible embedded STT-MRAM for
Dec. 2022, pp. 2.2.1–2.2.4. [Online]. Available: https://ieeexplore.
frame buffer memory,” in Proc. IEEE Int. Electron Devices Meeting
ieee.org/document/10019326/
(IEDM), Dec. 2021, pp. 2.1.1–2.1.4.
[84] S. Kim et al., “Metal-oxide based, CMOS-compatible ECRAM
[66] T. Ito et al., “A 20Mb embedded STT-MRAM array achieving 72% for deep learning accelerator,” in Proc. IEEE Int. Electron Devices
write energy reduction with self-termination write schemes in 16nm Meeting (IEDM), Dec. 2019, pp. 35.7.1–35.7.4. [Online]. Available:
FinFET logic process,” in Proc. IEEE Int. Electron Devices Meeting https://ieeexplore.ieee.org/document/8993463/
(IEDM), Dec. 2021, pp. 2.2.1–2.2.4.
[85] M. Farronato, M. Melegari, S. Ricci, S. Hashemkhani, A. Bricalli, and
[67] C. Peters, F. Adler, K. Hofmann, and J. Otterstedt, “Reliability D. Ielmini, “Memtransistor devices based on MoS2 multilayers with
of 28nm embedded RRAM for consumer and industrial prod- volatile switching due to AG cation migration,” Adv. Electron. Mater.,
ucts,” in Proc. IEEE Int. Memory Workshop (IMW), May 2022, vol. 8, no. 8, Jan. 2022, Art. no. 2101161. [Online]. Available:
pp. 1–3. https://onlinelibrary.wiley.com/doi/10.1002/aelm.202101161
[68] S. Raoux, W. Wełnic, and D. Ielmini, “Phase change materials and [86] H. Lee et al., “Dual-gated MoS2 memtransistor crossbar
their application to nonvolatile memories,” Chem. Rev., vol. 110, array,” Adv. Funct. Mater., vol. 30, no. 45, Nov. 2020,
no. 1, pp. 240–267, Jan. 2010. [Online]. Available: https://pubs.acs. Art. no. 2003683. [Online]. Available: https://onlinelibrary.wiley.
org/doi/10.1021/cr900040x com/doi/10.1002/adfm.202003683
[69] M. Wuttig and N. Yamada, “Phase-change materials for rewriteable [87] R. A. John et al., “Ultralow power dual-gated subthreshold
data storage,” Nat. Mater., vol. 6, no. 11, pp. 824–832, Nov. 2007. oxide neuristors: An enabler for higher order neuronal tempo-
[Online]. Available: https://www.nature.com/articles/nmat2009 ral correlations,” ACS Nano, vol. 12, no. 11, pp. 11263–11273,
[70] P. Zuliani et al., “Overcoming temperature limitations in phase Nov. 2018. [Online]. Available: https://pubs.acs.org/doi/10.1021/
change memories with optimized Gex Sby Tez ,” IEEE Trans. Electron acsnano.8b05903
Devices, vol. 60, no. 12, pp. 4020–4026, Dec. 2013. [88] V. K. Sangwan et al., “Multi-terminal memtransistors from polycrys-
[71] D. Ielmini, A. Lacaita, A. Pirovano, F. Pellizzer, and R. Bez, talline monolayer molybdenum disulfide,” Nature, vol. 554, no. 7693,
“Analysis of phase distribution in phase-change nonvolatile mem- pp. 500–504, Feb. 2018. [Online]. Available: https://www.nature.
ories,” IEEE Electron Device Lett., vol. 25, no. 7, pp. 507–509, com/articles/nature25747
Jul. 2004. [Online]. Available: http://ieeexplore.ieee.org/document/ [89] M. Farronato, P. Mannocci, M. Melegari, S. Ricci,
1308435/ C. M. Compagnoni, and D. Ielmini, “Reservoir computing with
[72] P. Narayanan et al., “Fully on-chip MAC at 14nm enabled by accu- charge-trap memory based on a MoS2 channel for neuromorphic
rate row-wise programming of PCM-based weights and parallel engineering,” Adv. Mater., Oct. 2022, Art. no. 2205381.
vector-transport in duration-format,” in Proc. Symp. VLSI Technol., [90] M. Farronato, M. Melegari, S. Ricci, S. Hashemkani,
Jun. 2021, pp. 1–2. C. M. Compagnoni, and D. Ielmini, “Low-current, highly lin-
[73] T. Mikolajick et al., “FeRAM technology for high density ear synaptic memory device based on MoS2 transistors for online
applications,” Microelectron. Rel., vol. 41, no. 7, pp. 947–950, training and inference,” in Proc. IEEE 4th Int. Conf. Artif. Intell.
Jul. 2001. [Online]. Available: https://linkinghub.elsevier.com/ Circuits Syst. (AICAS), 2022, pp. 1–4.
retrieve/pii/S002627140100049X [91] D. Ielmini and S. Ambrogio, “Emerging neuromorphic devices,”
[74] T. S. Böscke, J. Müller, D. Bräuhaus, U. Schröder, and U. Böttger, Nanotechnology, vol. 31, no. 9, Feb. 2020, Art. no. 092001.
“Ferroelectricity in hafnium oxide thin films,” Appl. Phys. Lett., [Online]. Available: https://iopscience.iop.org/article/10.1088/1361-
vol. 99, no. 10, Sep. 2011, Art. no. 102903. [Online]. Available: 6528/ab554b
http://aip.scitation.org/doi/10.1063/1.3634052 [92] S. Shukla et al., “A scalable multi-TeraOPS core for AI train-
[75] S. Majumdar, “Back’ end CMOS compatible and flexible ferroelec- ing and inference,” IEEE Solid-State Circuits Lett., vol. 1, no. 12,
tric memories for neuromorphic computing and adaptive sensing,” pp. 217–220, Dec. 2018.
Adv. Intell. Syst., vol. 4, no. 4, Apr. 2022, Art. no. 2100175. [93] A. Chen, “A comprehensive crossbar array model with solutions
[Online]. Available: https://onlinelibrary.wiley.com/doi/10.1002/aisy. for line resistance and nonlinear device characteristics,” IEEE Trans.
202100175 Electron Devices, vol. 60, no. 4, pp. 1318–1326, Apr. 2013.
[94] W.-H. Chen et al., “A 65nm 1Mb nonvolatile computing-in-memory [112] Z. Sun, G. Pedretti, E. Ambrosi, A. Bricalli, W. Wang, and D. Ielmini,
ReRAM macro with sub-16ns multiply-and-accumulate for binary “Solving matrix equations in one step with cross-point resistive
DNN AI edge processors,” in Proc. IEEE Int. Solid-State Circuits arrays,” Proc. Nat. Acad. Sci. USA, vol. 116, no. 10, pp. 4123–4128,
Conf. (ISSCC), Feb. 2018, pp. 494–496. 2019.
[95] S. D. Spetalnick et al., “A 40nm 64kb 26.56TOPS/W 2.37Mb/mm2 [113] P. Mannocci, G. Pedretti, E. Giannone, E. Melacarne, Z. Sun, and
RRAM binary/compute-in-memory macro with 4.23x improvement D. Ielmini, “A universal, analog, in-memory computing primitive for
in density and >75% use of sensing dynamic range,” in Proc. linear algebra using memristors,” IEEE Trans. Circuits Syst. I, Reg.
IEEE Int. Solid-State Circuits Conf. (ISSCC), vol. 65, Feb. 2022, Papers, vol. 68, no. 12, pp. 4889–4899, Dec. 2021.
pp. 1–3. [114] Z. Sun, E. Ambrosi, G. Pedretti, A. Bricalli, and D. Ielmini,
[96] T.-H. Kim, J. Lee, S. Kim, J. Park, B.-G. Park, and H. Kim, “In-memory PageRank accelerator with a cross-point array of resis-
“3-bit multilevel operation with accurate programming scheme tive memories,” IEEE Trans. Electron Devices, vol. 67, no. 4,
in TiOx /Al2 O3 memristor crossbar array for quantized neuro- pp. 1466–1470, Apr. 2020.
morphic system,” Nanotechnology, vol. 32, no. 29, Apr. 2021, [115] Z. Sun, G. Pedretti, A. Bricalli, and D. Ielmini, “One-step regression
Art. no. 295201. doi: 10.1088/1361-6528/abf0cc. and classification with cross-point resistive memory arrays,” Sci. Adv.,
[97] V. Milo et al., “Multilevel HfO2 -based RRAM devices for low- vol. 6, no. 5, 2020, Art. no. eaay2378.
power neuromorphic networks,” APL Mater., vol. 7, no. 8, Aug. 2019, [116] P. Mannocci, E. Melacarne, and D. Ielmini, “An analogue in-memory
Art. no. 081120. [Online]. Available: https://aip.scitation.org/doi/full/ ridge regression circuit with application to massive MIMO acceler-
10.1063/1.5108650 ation,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 12, no. 4,
[98] I. Yeo, M. Chu, S.-G. Gi, H. Hwang, and B.-G. Lee, “Stuck- pp. 952–962, Dec. 2022.
at-fault tolerant schemes for memristor crossbar array-based neu- [117] M. Mahmoodi et al., “An analog neuro-optimizer with adaptable
ral networks,” IEEE Trans. Electron Devices, vol. 66, no. 7, annealing based on 64 × 64 0T1R crossbar circuit,” in Proc. IEEE
pp. 2937–2945, Jul. 2019. Int. Electron Devices Meeting (IEDM), 2019, pp. 14–7.
[99] D. Ielmini and G. Pedretti, “Device and circuit architectures for [118] M. N. Bojnordi and E. Ipek, “Memristive Boltzmann machine:
in-memory computing,” Adv. Intell. Syst., vol. 2, no. 7, 2020, A hardware accelerator for combinatorial optimization and deep
Art. no. 2000040. [Online]. Available: https://onlinelibrary.wiley. learning,” in Proc. IEEE Int. Symp. High Perform. Comput. Archit.
com/doi/abs/10.1002/aisy.202000040 (HPCA), 2016, pp. 1–13.
[100] Y.-C. Chen et al., “An access-transistor-free (0T/1R) non-volatile [119] Y. Kiat, Y. Vortman, and N. Sapir, “Feather moult and bird appearance
resistance random access memory (RRAM) using a novel threshold are correlated with global warming over the last 200 years,” Nat.
switching, self-rectifying chalcogenide device,” in Proc. IEEE Int. Commun., vol. 10, no. 1, p. 2540, 2019.
Electron Devices Meeting, Dec. 2003, pp. 37.4.1–37.4.4. [120] J. J. Hopfield, “Neural networks and physical systems with emergent
[101] D. Ielmini and Y. Zhang, “Physics-based analytical model of collective computational abilities,” Proc. Nat. Acad. Sci. USA, vol. 79,
chalcogenide-based memories for array simulation,” in Proc. Int. no. 8, pp. 2554–2558, 1982.
Electron Devices Meeting, Dec. 2006, pp. 1–4.
[121] G. Pedretti et al., “A spiking recurrent neural network with phase-
[102] M. Hu et al., “Memristor-based analog computation and neural
change memory neurons and synapses for the accelerated solution
network classification with a dot product engine,” Adv. Mater.,
of constraint satisfaction problems,” IEEE J. Explor. Solid-State
vol. 30, no. 9, 2018, Art. no. 1705914. [Online]. Available: https://
Computat. Devices Circuits, vol. 6, no. 1, pp. 89–97, Jun. 2020.
onlinelibrary.wiley.com/doi/abs/10.1002/adma.201705914
[122] F. Cai et al., “Power-efficient combinatorial optimization using intrin-
[103] H. Cai et al., “Proposal of analog in-memory computing with mag-
sic noise in memristor Hopfield neural networks,” Nat. Electron.,
nified tunnel magnetoresistance ratio and universal STT-MRAM
vol. 3, no. 7, pp. 409–418, 2020.
cell,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 69, no. 4,
pp. 1519–1531, Apr. 2022. [123] T. Dalgaty, E. Esmanhotto, N. Castellani, D. Querlioz, and
[104] J. M. Lopez et al., “1S1R optimization for high-frequency infer- E. Vianello, “Ex situ transfer of Bayesian neural networks to resistive
ence on binarized spiking neural networks,” Adv. Electron. Mater., memory-based inference hardware,” Adv. Intell. Syst., vol. 3, no. 8,
vol. 8, no. 8, 2022, Art. no. 2200323. [Online]. Available: https:// 2021, Art. no. 2000103.
onlinelibrary.wiley.com/doi/abs/10.1002/aelm.202200323 [124] S. Agarwal et al., “Resistive memory device requirements for a
[105] J. M. Lopez et al., “1S1R sub-threshold operation in crossbar arrays neural algorithm accelerator,” in Proc. Int. Joint Conf. Neural Netw.
for low power BNN inference computing,” in Proc. IEEE Int. Memory (IJCNN), 2016, pp. 929–938.
Workshop (IMW), May 2022, pp. 1–4. [125] X. Xu et al., “40× retention improvement by eliminating resis-
[106] G. W. Burr et al., “Access devices for 3D crosspoint memory,” tance relaxation with high temperature forming in 28 nm RRAM
J. Vacuum Sci. Technol. B, vol. 32, no. 4, Jul. 2014, chip,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), 2018,
Art. no. 040802. [Online]. Available: https://avs.scitation.org/doi/ pp. 20.1.1–20.1.4.
full/10.1116/1.4889999 [126] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio.
[107] S. A. Chekol, J. Song, J. Park, J. Yoo, S. Lim, and H. Hwang, “Binarized Neural Networks: Training Deep Neural Networks With
“Chapter 5—Selector devices for emerging memories,” in Weights and Activations Constrained to +1 or −1.” Mar. 2016.
Memristive Devices for Brain-Inspired Computing (Woodhead [Online]. Available: http://arxiv.org/abs/1602.02830
Publishing Series in Electronic and Optical Materials), S. Spiga, [127] T. Simons and D.-J. Lee, “A review of binarized neural networks,”
A. Sebastian, D. Querlioz, and B. Rajendran, Eds. London, Electronics, vol. 8, no. 6, p. 661, Jun. 2019. [Online]. Available:
U.K.: Woodhead, Jan. 2020, pp. 135–164. [Online]. Available: https://www.mdpi.com/2079-9292/8/6/661
https://www.sciencedirect.com/science/article/pii/B97800810278200 [128] M. Bocquet et al., “In-memory and error-immune differential RRAM
00058 implementation of binarized deep neural networks,” in Proc. IEEE
[108] Y.-C. Luo, A. Lu, J. Hur, S. Li, and S. Yu, “Design of non-volatile Int. Electron Devices Meeting (IEDM), Dec. 2018, pp. 20.6.1–20.6.4.
capacitive crossbar array for in-memory computing,” in Proc. IEEE [129] H. Kim, Y. Kim, and J.-J. Kim, “In-memory batch-normalization
Int. Memory Workshop (IMW), May 2021, pp. 1–4. for resistive memory based binary neural network hardware,” in
[109] S. Jung et al., “A crossbar array of magnetoresistive memory Proc. 24th Asia South Pac. Design Autom. Conf. (ASPDAC), 2019,
devices for in-memory computing,” Nature, vol. 601, no. 7892, pp. 645–650. [Online]. Available: https://doi.org/10.1145/3287624.
pp. 211–216, Jan. 2022. [Online]. Available: https://www.nature. 3287718
com/articles/s41586-021-04196-6 [130] E. Giacomin, T. Greenberg-Toledo, S. Kvatinsky, and
[110] “Pillars of Creation (NIRCam and MIRI Composite Image).” P.-E. Gaillardon, “A robust digital RRAM-based convolutional
Accessed: Mar. 7, 2023. [Online]. Available: https://webbtelescope. block for low-power image processing and learning applica-
org/contents/media/images tions,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 66, no. 2,
[111] S. N. Truong, S. Shin, S.-D. Byeon, J. Song, and K.-S. Min, pp. 643–654, Feb. 2019.
“New twin crossbar architecture of binary memristors for low-power [131] S. Angizi, Z. He, A. Awad, and D. Fan, “MRIMA: An MRAM-based
image recognition with discrete cosine transform,” IEEE Trans. in-memory accelerator,” IEEE Trans. Comput.-Aided Design Integr.
Nanotechnol., vol. 14, no. 6, pp. 1104–1111, Nov. 2015. Circuits Syst., vol. 39, no. 5, pp. 1123–1136, May 2020.
[132] Y. Long et al., “A ferroelectric FET-based processing-in-memory [149] Z. Jiang, S. Yin, J.-S. Seo, and M. Seok, “C3SRAM: In-memory-
architecture for DNN acceleration,” IEEE J. Explor. Solid-State computing SRAM macro based on capacitive-coupling comput-
Comput. Devices Circuits, vol. 5, no. 2, pp. 113–122, Dec. 2019. ing,” IEEE Solid-State Circuits Lett., vol. 2, no. 9, pp. 131–134,
[133] H. Fujiwara et al., “A 5-nm 254-TOPS/W 221-TOPS/mm2 Sep. 2019.
fully-digital computing-in-memory macro supporting wide-range [150] H. Wang et al., “A 32.2 TOPS/W SRAM compute-in-memory macro
dynamic-voltage-frequency scaling and simultaneous MAC and write employing a linear 8-bit C-2C ladder for charge domain computation
operations,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), in 22nm for edge inference,” in Proc. IEEE Symp. VLSI Technol.
vol. 65, Feb. 2022, pp. 1–3. Circuits (VLSI Technol. Circuits), Jun. 2022, pp. 36–37.
[134] F. Tu et al., “A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 [151] H. Jia et al., “15.1 a programmable neural-network inference
reconfigurable digital CIM processor with unified FP/INT pipeline accelerator based on scalable in-memory computing,” in Proc.
and bitwise in-memory booth multiplication for cloud deep learning IEEE Int. Solid-State Circuits Conf. (ISSCC), vol. 64, Feb. 2021,
acceleration,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. 236–238.
vol. 65, Feb. 2022, pp. 1–3. [152] H. Valavi, P. J. Ramadge, E. Nestler, and N. Verma, “A mixed-signal
[135] H. Kim, Q. Chen, T. Yoo, T. T.-H. Kim, and B. Kim, “A 1-16b binarized convolutional-neural-network accelerator integrating dense
precision reconfigurable digital in-memory computing macro featur- weight storage and multiplication for reduced data movement,” in
ing column-MAC architecture and bit-serial computation,” in Proc. Proc. IEEE Symp. VLSI Circuits, Jun. 2018, pp. 141–142.
IEEE 45th Eur. Solid-State Circuits Conf. (ESSCIRC), Sep. 2019, [153] D. Bankman, L. Yang, B. Moons, M. Verhelst, and B. Murmann,
pp. 345–348. “An always-on 3.8 µ J/86% CIFAR-10 mixed-signal binary
[136] C.-F. Lee et al., “A 12nm 121-TOPS/W 41.6-TOPS/mm2 all digital CNN processor with all memory on chip in 28-nm CMOS,” IEEE
full precision SRAM-based compute-in-memory with configurable J. Solid-State Circuits, vol. 54, no. 1, pp. 158–172, Jan. 2019.
bit-width for AI edge applications,” in Proc. IEEE Symp. VLSI [154] M. E. Sinangil et al., “A 7-nm compute-in-memory SRAM macro
Technol. Circuits (VLSI Technol. Circuits), Jun. 2022, pp. 24–25. supporting multi-bit input, weight and output and achieving 351
[137] H. Oh, H. Kim, N. Kang, Y. Kim, J. Park, and J.-J. Kim, “Single TOPS/W and 372.4 GOPS,” IEEE J. Solid-State Circuits, vol. 56,
RRAM cell-based in-memory accelerator architecture for binary neu- no. 1, pp. 188–198, Jan. 2021.
ral networks,” in Proc. IEEE 3rd Int. Conf. Artif. Intell. Circuits Syst. [155] W. Wan et al., “A compute-in-memory chip based on resistive
(AICAS), Jun. 2021, pp. 1–4. random-access memory,” Nature, vol. 608, no. 7923, pp. 504–512,
Aug. 2022. [Online]. Available: https://www.nature.com/articles/
[138] X. Sun, S. Yin, X. Peng, R. Liu, J.-S. Seo, and S. Yu, “XNOR-
s41586-022-04992-8
RRAM: A scalable and parallel resistive synaptic architecture for
binary neural networks,” in Proc. Design Autom. Test Europe Conf. [156] H. Jiang, W. Li, S. Huang, and S. Yu, “A 40nm analog-input ADC-
Exhibit. (DATE), Mar. 2018, pp. 1423–1428. free compute-in-memory RRAM macro with pulse-width modulation
between sub-arrays,” in Proc. IEEE Symp. VLSI Technol. Circuits
[139] S. Yin, X. Sun, S. Yu, and J.-S. Seo, “High-throughput in-memory
(VLSI Technol. Circuits), Jun. 2022, pp. 266–267.
computing for binary deep neural networks with monolithically inte-
grated RRAM and 90-nm CMOS,” IEEE Trans. Electron Devices, [157] C.-X. Xue et al., “15.4 a 22nm 2Mb ReRAM compute-in-memory
vol. 67, no. 10, pp. 4185–4192, Oct. 2020. macro with 121-28TOPS/W for Multibit MAC computing for tiny AI
edge devices,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC),
[140] Y.-F. Qin, R. Kuang, X.-D. Huang, Y. Li, J. Chen, and X.-S. Miao, Feb. 2020, pp. 244–246.
“Design of high robustness BNN inference accelerator based on
[158] Q. Liu et al., “33.2 a fully integrated analog ReRAM based
binary memristors,” IEEE Trans. Electron Devices, vol. 67, no. 8,
78.4TOPS/W compute-in-memory chip with fully parallel MAC
pp. 3435–3441, Aug. 2020.
computing,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC),
[141] A. P. Chowdhury, P. Kulkarni, and M. N. Bojnordi, “MB-CNN: Feb. 2020, pp. 500–502.
Memristive binary convolutional neural networks for embedded
[159] R. Khaddam-Aljameh et al., “HERMES-core—A 1.59-TOPS/mm2
mobile devices,” J. Low Power Electron. Appl., vol. 8, no. 4, p. 38,
PCM on 14-nm CMOS in-memory compute core using 300-ps/LSB
Dec. 2018. [Online]. Available: https://www.mdpi.com/2079-9268/8/
linearized CCO-based ADCs,” IEEE J. Solid-State Circuits, vol. 57,
4/38
no. 4, pp. 1027–1038, Apr. 2022.
[142] D. Saito et al., “Analog in-memory computing in FeFET-based [160] V. Joshi et al., “Accurate deep neural network inference using com-
1T1R array for edge AI applications,” in Proc. Symp. VLSI Circuits, putational phase-change memory,” Nat. Commun., vol. 11, no. 1,
Jun. 2021, pp. 1–2. p. 2473, May 2020. [Online]. Available: https://www.nature.com/
[143] C. Matsui, K. Toprasertpong, S. Takagi, and K. Takeuchi, “Energy- articles/s41467-020-16108-9
efficient reliable HZO FeFET computation-in-memory with local [161] W.-S. Khwa et al., “A 40-nm, 2M-cell, 8b-precision, hybrid SLC-
multiply & global accumulate array for source-follower & charge- MLC PCM computing-in-memory macro with 20.5–65.0TOPS/W for
sharing voltage sensing,” in Proc. Symp. VLSI Circuits, Jun. 2021, tiny-Al edge devices,” in Proc. IEEE Int. Solid-State Circuits Conf.
pp. 1–2. (ISSCC), vol. 65, Feb. 2022, pp. 1–3.
[144] J.-W. Su et al., “16.3 a 28nm 384kb 6T-SRAM computation-in- [162] P. Deaville, B. Zhang, and N. Verma, “A 22nm 128-kb MRAM
memory macro with 8b precision for AI edge chips,” in Proc. row/column-parallel in-memory computing macro with memory-
IEEE Int. Solid-State Circuits Conf. (ISSCC), vol. 64, Feb. 2021, resistance boosting and multi-column ADC readout,” in Proc. IEEE
pp. 250–252. Symp. VLSI Technol. Circuits (VLSI Technol. Circuits), Jun. 2022,
[145] H. Jia, H. Valavi, Y. Tang, J. Zhang, and N. Verma, “A programmable pp. 268–269.
heterogeneous microprocessor based on bit-scalable in-memory com- [163] T. Soliman et al., “Ultra-low power flexible precision FeFET based
puting,” IEEE J. Solid-State Circuits, vol. 55, no. 9, pp. 2609–2621, analog in-memory computing,” in Proc. IEEE Int. Electron Devices
Sep. 2020. Meeting (IEDM), Dec. 2020, pp. 29.2.1–29.2.4.
[146] Q. Dong et al., “15.3 a 351TOPS/W and 372.4GOPS compute-in- [164] C.-X. Xue et al., “16.1 a 22nm 4Mb 8b-precision ReRAM
memory SRAM macro in 7nm FinFET CMOS for machine-learning computing-in-memory macro with 11.91 to 195.7TOPS/W for tiny AI
applications,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), edge devices,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC),
Feb. 2020, pp. 242–244. vol. 64, Feb. 2021, pp. 245–247.
[147] S. Yin, Z. Jiang, J.-S. Seo, and M. Seok, “XNOR-SRAM: In-memory [165] J.-M. Hung et al., “An 8-Mb DC-current-free binary-to-8b precision
computing SRAM macro for binary/ternary deep neural networks,” ReRAM nonvolatile computing-in-memory macro using time-space-
IEEE J. Solid-State Circuits, vol. 55, no. 6, pp. 1733–1743, readout with 1286.4-21.6TOPS/W for edge-AI devices,” in Proc.
Jun. 2020. [Online]. Available: https://ieeexplore.ieee.org/document/ IEEE Int. Solid-State Circuits Conf. (ISSCC), vol. 65, Feb. 2022,
8959407/ pp. 1–3.
[148] P.-F. Chiu, W. H. Choi, W. Ma, M. Qin, and M. Lueker-Boden, [166] A. Glukhov et al., “Statistical model of program/verify algorithms in
“A binarized neural network accelerator with differential crosspoint resistive-switching memories for in-memory neural network accel-
memristor array for energy-efficient MAC operations,” in Proc. IEEE erators,” in Proc. IEEE Int. Rel. Phys. Symp. (IRPS), Mar. 2022,
Int. Symp. Circuits Syst. (ISCAS), May 2019, pp. 1–5. pp. 1–7.
[167] S. Ambrogio et al., “Reducing the impact of phase-change memory [179] D. Ielmini, S. Lavizzari, D. Sharma, and A. L. Lacaita, “Physical
conductance drift on the inference of large-scale hardware neural interpretation, modeling and impact on phase change memory
networks,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), (PCM) reliability of resistance drift due to chalcogenide structural
Dec. 2019, pp. 6.1.1–6.1.4. relaxation,” in Proc. IEEE Int. Electron Devices Meeting, 2007,
[168] S. Ambrogio, S. Balatti, V. McCaffrey, D. C. Wang, and pp. 939–942.
D. Ielmini, “Noise-induced resistance broadening in resistive switch- [180] N. Ciocchini, E. Palumbo, M. Borghi, P. Zuliani, R. Annunziata,
ing memory—Part I: Intrinsic cell behavior,” IEEE Trans. Electron and D. Ielmini, “Modeling resistance instabilities of set and
Devices, vol. 62, no. 11, pp. 3805–3811, Nov. 2015. reset states in phase change memory with ge-rich GeSbTe,”
[169] N. Lepri, M. Baldo, P. Mannocci, A. Glukhov, V. Milo, and IEEE Trans. Electron Devices, vol. 61, no. 6, pp. 2136–2144,
D. Ielmini, “Modeling and compensation of IR drop in crosspoint Jun. 2014.
accelerators of neural networks,” IEEE Trans. Electron Devices, [181] Y.-H. Lin et al., “Performance impacts of analog ReRAM non-
vol. 69, no. 3, pp. 1575–1581, Mar. 2022. ideality on neuromorphic computing,” IEEE Trans. Electron Devices,
[170] N. Lepri, A. Glukhov, and D. Ielmini, “Mitigating read-program vol. 66, no. 3, pp. 1289–1295, Mar. 2019.
variation and IR drop by circuit architecture in RRAM-based neural [182] I. Muñoz-Martín, S. Bianchi, O. Melnic, A. G. Bonfanti,
network accelerators,” in Proc. IEEE Int. Rel. Phys. Symp. (IRPS), and D. Ielmini, “A drift-resilient hardware implementation of
Mar. 2022, pp. 1–6. neural accelerators based on phase change memory devices,”
[171] F. L. Aguirre, N. M. Gomez, S. M. Pazos, F. Palumbo, J. Suñé, IEEE Trans. Electron Devices, vol. 68, no. 12, pp. 6076–6081,
and E. Miranda, “Minimization of the line resistance impact on Dec. 2021.
memdiode-based simulations of multilayer perceptron arrays applied [183] M. Bertuletti, I. Munoz-Martín, S. Bianchi, A. G. Bonfanti, and
to pattern recognition,” J. Low Power Electron. Appl., vol. 11, no. 1, D. Ielmini, “A multilayer neural accelerator with binary activations
p. 9, Mar. 2021. [Online]. Available: https://www.mdpi.com/2079- based on phase-change memory,” IEEE Trans. Electron Devices,
9268/11/1/9 vol. 70, no. 3, pp. 986–992, Mar. 2023.
[172] C. Mackin et al., “Optimised weight programming for analogue [184] C.-C. Chang et al., “NV-BNN: An accurate deep convolu-
memory-based deep neural networks,” Nat. Commun., vol. 13, no. 1, tional neural network based on binary STT-MRAM for adap-
p. 3765, Jun. 2022. [Online]. Available: https://www.nature.com/ tive AI edge,” in Proc. 56th Annu. Design Autom. Conf. (DAC),
articles/s41467-022-31405-1 2019, pp. 1–6. [Online]. Available: https://doi.org/10.1145/3316781.
[173] F. Zhang and M. Hu, “Mitigate parasitic resistance in resis- 3317872
tive crossbar-based convolutional neural networks,” ACM J. Emerg. [185] M. Le Gallo, A. Sebastian, G. Cherubini, H. Giefers, and
Technol. Comput. Syst., vol. 16, no. 3, pp. 1–25, 2020. [Online]. E. Eleftheriou, “Compressed sensing with approximate message pass-
Available: https://doi.org/10.1145/3371277 ing using in-memory computing,” IEEE Trans. Electron Devices,
[174] D. Joksas et al., “Nonideality-aware training for accurate and robust vol. 65, no. 10, pp. 4304–4312, Oct. 2018.
low-power memristive neural networks,” Adv. Sci., vol. 9, no. 17, [186] P.-Y. Chen, X. Peng, and S. Yu, “NeuroSim+: An integrated device-
2022, Art. no. 2105784. [Online]. Available: https://onlinelibrary. to-algorithm framework for benchmarking synaptic devices and array
wiley.com/doi/abs/10.1002/advs.202105784 architectures,” in Proc. IEEE Int. Electron Devices Meeting (IEDM),
[175] V. Milo et al., “Accurate program/verify schemes of resistive switch- Dec. 2017, pp. 6.1.1–6.1.4. [Online]. Available: http://ieeexplore.ieee.
ing memory (RRAM) for in-memory neural network circuits,” IEEE org/document/8268337/
Trans. Electron Devices, vol. 68, no. 8, pp. 3832–3837, Aug. 2021. [187] S. Achour, R. Sarpeshkar, and M. C. Rinard, “Configuration syn-
[176] A. Athmanathan, M. Stanisavljevic, N. Papandreou, H. Pozidis, and thesis for programmable analog devices with Arco,” ACM SIGPLAN
E. Eleftheriou, “Multilevel-cell phase-change memory: A viable tech- Notices, vol. 51, no. 6, pp. 177–193, Aug. 2016. [Online]. Available:
nology,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 6, no. 1, https://dl.acm.org/doi/10.1145/2980983.2908116
pp. 87–100, Mar. 2016. [188] S. Achour and M. Rinard, “Noise-aware dynamical system com-
[177] S. Ambrogio, S. Balatti, A. Cubeta, A. Calderoni, N. Ramaswamy, pilation for analog devices with Legno,” in Proc. 25th Int.
and D. Ielmini, “Statistical fluctuations in HfOx resistive-switching Conf. Archit. Support Program. Lang. Oper. Syst., Mar. 2020,
memory: Part I—Set/reset variability,” IEEE Trans. Electron Devices, pp. 149–166. [Online]. Available: https://dl.acm.org/doi/10.1145/
vol. 61, no. 8, pp. 2912–2919, Aug. 2014. 3373376.3378449
[178] E. Pérez et al., “Analysis of the statistics of device-to- [189] S. Misailovic, M. Carbin, S. Achour, Z. Qi, and M. C. Rinard,
device and cycle-to-cycle variability in TiN/Ti/Al:HfO2 /TiN “Chisel: Reliability- and accuracy-aware optimization of approximate
RRAMs,” Microelectron. Eng., vol. 214, pp. 104–109, Jun. 2019. computational kernels,” ACM SIGPLAN Notices, vol. 49, no. 10,
[Online]. Available: https://www.sciencedirect.com/science/article/ pp. 309–328, Dec. 2014. [Online]. Available: https://dl.acm.org/doi/
pii/S0167931719301303 10.1145/2714064.2660231