0% found this document useful (0 votes)
4 views

Reliable Computing of ReRAM Based Compute-in-Memory Circuits for AI Edge Devices

This paper discusses the development of reliable ReRAM-based compute-in-memory (nvCIM) circuits for AI edge devices, highlighting the trade-offs between reliability, energy efficiency, and computing latency. It addresses challenges such as process variation, signal margins, and the impact of computing errors on inference accuracy when using nvCIM macros with complex datasets. The paper also summarizes recent advancements in the field and the importance of achieving high bit-precision outputs for effective AI applications.

Uploaded by

chetansb2003
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Reliable Computing of ReRAM Based Compute-in-Memory Circuits for AI Edge Devices

This paper discusses the development of reliable ReRAM-based compute-in-memory (nvCIM) circuits for AI edge devices, highlighting the trade-offs between reliability, energy efficiency, and computing latency. It addresses challenges such as process variation, signal margins, and the impact of computing errors on inference accuracy when using nvCIM macros with complex datasets. The paper also summarizes recent advancements in the field and the importance of achieving high bit-precision outputs for effective AI applications.

Uploaded by

chetansb2003
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Reliable Computing of ReRAM Based

Compute-in-Memory Circuits for AI Edge Devices


Invited paper

Meng-Fan Chang Je-Ming Hung Ping-Cheng Chen Tai-Hao Wen


National Tsing Hua National Tsing Hua I-Shou University National Tsing Hua
University University Kaohsiung, Taiwan University
Hsinchu, Taiwan Hsinchu, Taiwan Hsinchu, Taiwan

Abstract— Compute-in-memory macros based on non-volatile density, high resistance on-off ratio, and ease of manufacture
memory (nvCIM) are a promising approach to break through the using existing technology nodes.
memory bottleneck for artificial intelligence (AI) edge devices; Researchers have demonstrated the efficacy of
however, the development of these devices involves unavoidable
ReRAM-based CIM macros in performing MAC operations
tradeoffs between reliability, energy efficiency, computing latency,
and readout accuracy. This paper outlines the background of from binary to 8-bit precision. Note that a reliable nvCIM
ReRAM-based nvCIM as well as the major challenges in its macro should also support high bit-precision output with short
further development, including process variation in ReRAM computing latency, low energy consumption, and sufficient
devices and transistors and the small signal margins associated readout accuracy against process variation. These factors are
with variation in input-weight patterns. This paper also particularly important when dealing with complex datasets
investigates the error model of a nvCIM macro, and the
(CIFAR-10, CIFAR-100, and ImageNet) and advanced neural
correspondent degradation of inference accuracy as a function of
error model when using nvCIM macros. Finally, we summarize network models.
recent trends and advances in the development of reliable The remainder of the paper is organized as follows. Section
ReRAM-based nvCIM macro. II introduces the background of nvCIM based on near-memory
and in-memory computing architectures for MAC operations.
Index Terms—Artificial intelligence, CNN edge processors, Section III outlines the challenges inherent in the design of
ReRAM, Computing-in-memory, Multiply-and-accumulate. normal nvCIM. Section IV introduces the challenges in
developing a reliable nvCIM. Section V summarizes recent
I. INTRODUCTION trends in silicon-verified nvCIM macros. Section VI concludes

A rtificial intelligence (AI) edge devices


Internet-of-Things (AIoT) applications that involve
and this work.

convolutional neural networks (CNNs) perform a massive II. NVCIM FOR MAC OPERATIONS: BACKGROUND
number of multiply-and-accumulate (MAC) operations, which MAC operations involve multiplying inputs and weights
generate an enormous amount of intermediate data. When using (stored in the NVM cell array) and then accumulating the
the conventional von Neumann computing architecture, this results. Most nvCIM macros possess mega-bit (Mb) memory
data must be transferred frequently between the processing capacity, which is generally sufficient for the storage of all
element (PE) and memory. When applied to complex neural weight data required by the neural network models
network models with high bit-precision and complex datasets, implemented in tiny AI edge devices. As shown in Fig. 1, MAC
the transfer of intermediate data can result in long computing operations can be performed using near-memory computing
latency and high energy consumption. Non-volatile ((NMC)) or in-memory y computing
p g ((IMC).
)
compute-in-memory (nvCIM) [1]-[34] is a promising approach
to overcome the so-called von Neumann bottleneck by
combining computing functions with nonvolatile storage
functions within a single macro. Implementing parallel analog
MAC operations within memory cell arrays can greatly reduce
latency and energy consumption associated with computation
as well as system wake-up. Nonvolatile memory (NVM) can
also be used for power-off data storage to reduce power
consumption in standby mode.
One of the most promising advances for NVM is
one-transistor one-resistor (1T1R)-based resistive
random-access memory (ReRAM), due to its high storage Fig. 1. Concepts underlying near-memory and in-memory NVM computation
3HUPLVVLRQWRPDNHGLJLWDORUKDUGFRSLHVRIDOORUSDUWRIWKLVZRUNIRUSHUVRQDORU
FODVVURRPXVHLVJUDQWHGZLWKRXWIHHSURYLGHGWKDWFRSLHVDUHQRWPDGHRUGLVWULEXWHGIRU
SURILWRUFRPPHUFLDODGYDQWDJHDQGWKDWFRSLHVEHDUWKLVQRWLFHDQGWKHIXOOFLWDWLRQRQ
WKHILUVWSDJH&RS\ULJKWVIRUFRPSRQHQWVRIWKLVZRUNRZQHGE\RWKHUVWKDQ$&0PXVW
EHKRQRUHG$EVWUDFWLQJZLWKFUHGLWLVSHUPLWWHG7RFRS\RWKHUZLVHRUUHSXEOLVKWRSRVW
RQVHUYHUVRUWRUHGLVWULEXWHWROLVWVUHTXLUHVSULRUVSHFLILFSHUPLVVLRQDQGRUDIHH
5HTXHVWSHUPLVVLRQVIURP3HUPLVVLRQV#DFPRUJ
,&&$' 2FWREHU1RYHPEHU6DQ'LHJR&$86$
‹&RS\ULJKWLVKHOGE\WKHRZQHUDXWKRU V 
$&0,6%1
KWWSVGRLRUJ
A. MAC Operations using Near-Memory Computing (NMC) with weight data stored in the ReRAM cell (RMC). Thus, a
As with typical memory devices, NMC arrays [32]-[34] multiplication result of “1” indicates a current in low resistance
perform data storage, while an adjacent computing circuit block state (ILRS), whereas a multiplication result of “0” indicates a
performs MAC operations (see Fig. 2(a)). Weight data is current in high resistance state (IHRS). The sum of the current
readout by sense amplifiers (SA) from memory array and values in a given column indicates the current of the bitline (IBL)
generates a digital output, whereas data readout by an analog accessed during that MAC operation. Finally, a readout circuit
readout circuit generates an analog output, either of which can converts IBL into a digital output. In-memory computing can
be used as an input for the computing block. This means that also be applied to multi-level cell (MLC) memory devices. The
the computing circuit block can be implemented using digital or fact that the 2N-level resistance state of an MLC device can also
analog circuits. be used as N-bit weight data. However, the design of
When performing multi-bit MAC operations, an NMC corresponding circuitry poses numerous challenges. By using
device is able to extract weight data (W) from the memory cell the in-memory-array computing technique, the energy
array, multiply it with the digital input (IN), and perform efficiency of IMC could potentially exceed that of NMC;
accumulation and/or place value computation within one In the following, we examine the challenges involved in
memory cycle. The von Neumann architecture requires one designing IMC-based nvCIM and recent advances in this area.
cycle for memory access/movement and a second cycle for
MAC computing/accumulation. III. DEVELOPING NVCIM MACROS: CHALLENGES
A. Signal Margin due to read disturb voltage, cell-resistance
variation, and resistance drift
Most NVM devices require high-precision low-voltage
biasing to prevent read disturbance. Note that any read voltage
or current stress across an NVM cell can lead to data corruption.
Thus, the clamping voltage applied to the selected BL within a
1T1R cell array should not exceed the read disturb voltage.
Note however that the use of low bias voltage results in a small
sensing margin and limited signal swing on the BL, which can
degrade the accuracy of the readout circuit when performing
(a) MAC operations.
Note also that emerging NVM devices vary considerably in
terms of cell-resistance, due to process variation during mass
production (see Fig. 3). The resistance variation in NVM cells
constrains the signal margin between neighboring MAC values
in the analog domain and limits the number of operations that
can be performed within an NVM cell array [1], [36].
Resistance drift can also can also degrade readout accuracy
over time, particularly in multi-level-cell (MLC) devices.
When performing large MAC operations, devices in a moderate
resistance state are more susceptible to resistance drift than are
devices in high resistance state (HRS) or low resistance state
(b) (LRS).
Fig. 2. Architectures used for nonvolatile computing-in-memory (nvCIM): (a)
NMC; and (b) IMC

B. MAC Operations using In-Memory Computing (IMC)


The memory cell array in an IMC structure performs data
storage as well as analog computation [1]-[31]. When
performing MAC operations, each NVM memory cell is tasked
with multiplying an input with a stored weight value, after
which the bitline (BL) is tasked with accumulating the result
from memory cells in the same column. The resulting analog
voltage or current is then converted by a readout circuit (e.g., a Fig.3. Resistance distribution and summary table of foundry provided ReRAM
voltage-mode analog-to-digital converter or current-mode cell.
readout circuit) into a digital MAC value for output.
Fig. 2(b) illustrates the structure of an IMC-based nvCIM B. Influence of pattern dependent variation on signal margins
equipped with a single-level-cell (SLC) ReRAM device. This Most existing NVM technologies are limited in terms of
example has binary inputs applied to the wordlines (WLs), cell-resistance ratio; i.e., a narrow difference in memory cell
wherein high and low wordline voltages are respectively used current between LRS (ILRS) and HRS (IHRS). Therefore, unlike
to represent input values of 1 or 0. The current of the accessed those memory devices with large cell-resistance ratio, the IHRS
memory cell (IMC) is derived by multiplying the input value
is non-negligible during MAC operation for ReRAM-based Note that the more complicated dataset results in more
nvCIM macro. degradation in inference accuracy. Also, a precise
The effect of variations in IHRS on the bitline current (IBL) programming algorithm [38], [40], [41] is required when
depends on the input x weight configuration and the number seeking to develop a reliable CIM macro based on non-volatile
(NWL) of activated wordlines. As shown in Fig. 4., this issue memory. Inference accuracy can be affected by mis-program
(referred to as pattern-dependent variation) can lead to a errors in the place-value of weight data; however, the severity
decrease in signal margin. As an example, let us consider an of those effects depends on where it occurs. For example, errors
nvCIM macro with 9 accumulations per MAC operation. A pertaining to the LSB readout channel are not as severe as those
situation where the IBL for MACV = 6 could be associated with in pertaining to the MSB readout channel.
six WLs switched on, involving six LRS and zero HRS cells.
Likewise, a situation where the IBL for MACV = 6 could be
associated with nine WLs are switched on, involving six LRS
cells and three HRS cells. This means that when the MACV is
small, variations in bitline current are dominated by
pattern-dependent variation; however, when MACV is large,
variations in bitline current are dominated by cell-resistance. If
the MACV were at a moderate level, then the bitline current
would be susceptible to influence by both cell-resistance and
pattern-dependent variation.

((a))

Fig. 4. MAC current distribution resulting from pattern-dependent variation

IV. DEVELOPING RELIABLE NVCIM MACROS: CHALLENGES


A. Computing Error versus System-level Inference Accuracy
Computing error can have a profound impact on the overall
inference accuracy of neural network models. In [2], the
authors proposed an 8Mb nvCIM macro that enables MAC (b)
Fig. 5 (a). Readout accuracy and (b) readout distribution of an output channel of
operations with 8b-input, 8b-weight, 8-channel accumulation, [2]
and 19-bit output. Note that the 19-bit output was achieved by
combining partial MAC values (pMACV) from eight output
channels through 8 cycles. Figs. 5(a) and (b) show the readout
accuracy and readout distribution (error model) of an output
channel in their system. Readout distribution of a readout path
was measured under real-world variations, including process
variation in transistors and ReRAM cell resistance. The
measured readout yield ranged from 96.2% to 100% for lower
partial MAC values (0 to 4), and ranged from 92.70% to 95.3%
for higher partial MAC values (5 to 7). Note that lower partial
MAC values frequently occur in neural network models, and
higher partial MAC values less frequently occur in neural Fig. 6. Inference accuracy degradation when applying [2] to various neural
network models. Fig. 6 shows the degradation of system level network models and datasets.
inference accuracy when that system was applied to various B. Readout Quantization versus System-level Inference
neural network models and datasets. When applied to classify Accuracy
the CIFAR-10 dataset using ResNet-50, VGG-16, and The inference accuracy can be impacted by the exploited
ResNet-20 models, the degradation in inference accuracy as CIM architecture and its readout quantization method. Fig. 7
follows: -0.79%, -0.63%, and -0.46% compared to respective shows the readout accuracy of a readout channel in a CIM
software baseline. When applied to classify the CIFAR-100 macro example that performs 8b-input, 8b-weight, 16-channel
dataset using ResNet-50, VGG-16, and ResNet-20 models, the accumulation, and 20-bit output. Each readout channel
degradation in inference accuracy was as follows: -1.11%, performs 2b-input, 1b-weight and 16-channel accumulation.
-1.1%, and -0.92% compared to respective software baseline.
The 20-bit output was achieved by combining partial MAC A number of nvCIM schemes using hybrid analog-digital
values (pMACV) from eight output channels through 4 cycles. CIM (blue-symbols) have achieved a high output-ratio without
Note that each output channel exploits clipped quantization, signal margin degradation. The partial MAC operations are
which merges the analog pMAC values exceeding a selected conducted in the analog domain and readout by
threshold into a single digital value, while those analog pMAC analog-to-digital converters (ADCs). The digital circuits then
values below the threshold are read out without quantization merging these partial MAC values generated from ADCs into
loss. The output levels on a readout channel are 48 and the final MAC value. A sufficient output ratio and input precision x
threshold is set at 31 due to those pMAC values lower than 31 weight precision enables those works applied to complicated
are frequently occur in neural network models, as shown in Fig. dataset for advanced applications.
8. In the case of low pMACV (<15), the analog readout circuit
achieved readout accuracy of 100%~98%. In cases of moderate
to high pMACV, readout accuracy ranges from 97% to 98%.
When applied this CIM macro to ResNet-20 neural network
model trained for CIFAR-100 dataset, the inference accuracy
was 0.78% lower than software baseline. The behavior of
quantization in a CIM macro and the readout accuracy
degradation can impact the overall inference accuracy.

Fig. 9. Input precision x weight precision versus output ratio of recent nvCIM
works.

Fig. 7. Readout accuracy


y of partial
p MAC values on a readout channel.

Fig. 10. Figure-of-the-merit (FoM) of recent nvCIM works in which the FoM is
the product of input precision (IN), weight precision (W), output-ratio
(OUT-ratio), and energy efficiency (EF).
In assessing the performance of nvCIM macros, it is
necessary to consider their energy efficiency as well as the
inference accuracy of the AI edge devices to which they are
Fig. 8. Distribution of partial MAC values on a readout channel when applied to applied. The inference accuracy required for most practical
the CIFAR-100 dataset using the ResNet20 model. applications is achievable only under high input, weight, and
output bit-precision; however, tradeoffs are inevitable. In this
V. NVCIM FOR MAC OPERATIONS: RECENT TRENDS paper, we sought to account for the various strengths and
Fig. 9. illustrates the relationship between input precision x weaknesses of existing nvCIMs by comparing them based on a
weight precision versus the output ratio of recent nvCIM works figure of merit (FoM) derived as the product of input precision,
based on pure analog (current/voltage) and hybrid readout weight precision, output-ratio, and energy efficiency. As shown
(current/voltage + digital) schemes. Note that the output ratio is in Fig. 10, nvCIMs developed before 2020 achieved high
obtained by dividing the actual readout precision by the full energy efficiency; however, low input-weight-output precision
output precision (ideal MACV precision). when applied to simple datasets (e.g., MNIST or CIFAR-10)
Most analog-CIM-based nvCIM devices with high resulted in low FoMs. The higher bit precision of recent works
full-output-precision (red-symbols) have been unable to has been shown to reduce data loss when dealing with complex
achieve a high output-ratio, due to difficulties in designing datasets. We expect that the corresponding increase in FoM
high-resolution readout circuits with high output precision. scores will continue steadily with further advances in energy
Devices with limited output ratio are unable to deal with efficiency as well as bit precision.
complex datasets, such as ImageNet or CIFAR-100. Their
applicability to complex datasets is also hindered by inference VI. CONCLUSIONS
accuracy degradation due to limited input precision x weight nvCIM is a promising candidate to improve the energy
precision. efficiency of AI edge devices by eliminating the memory
bottleneck. This review article examined recent silicon-verified (2018).
nvCIM macros and the challenges in attaining further advances [12] Wang, Z. et al. Fully memristive neural networks for
in circuit design. We also modeled the influence of readout pattern classification with unsupervised learning. Nature
errors on inference accuracy. Future nvCIM macros for Electronics 1, 137-145 (2018).
applications of higher complexity will require higher [13] Ambrogio, S. et al. Equivalent-accuracy accelerated
input-weight-output precision and a lower readout error rate; neural-network training using analogue memory. Nature
however, this will require innovations in circuit design to 558, 60-67 (2018).
overcome limitations on signal margin and variations in [14] Dong, Q. et al. A 351TOPS/W and 372.4GOPS
memory devices, while maintaining good macro-level energy Compute-in-Memory SRAM Macro in 7nm FinFET
efficiency and system-level inference accuracy. CMOS for Machine-Learning Applications. IEEE
International Solid-State Circuits Conference (ISSCC)
REFERENCES Digest of Technical Papers, 242-243 (2020).
[1] Hung, JM., Xue, CX., Kao, HY. et al. A four-megabit [15] Chang, M.-F. et al. Embedded 1Mb ReRAM in 28nm
compute-in-memory macro with eight-bit precision CMOS with 0.27V to 1V Read Using
based on CMOS and resistive random-access memory for Swing-Sample-and-Couple Sense Amplifier and
AI edge devices. Nat Electron 4, 921–930 (2021). Self-Boost-Write-Termination Scheme. IEEE
[2] J. -M. Hung, et al., "An 8-Mb DC-Current-Free International Solid-State Circuits Conference (ISSCC)
Binary-to-8b Precision ReRAM Nonvolatile Digest of Technical Papers, 332-334 (2014).
Computing-in-Memory Macro using [16] M. Giordano et al., "CHIMERA: A 0.92 TOPS, 2.2
Time-Space-Readout with 1286.4-21.6TOPS/W for TOPS/W Edge AI Accelerator with 2 MByte On-Chip
Edge-AI Devices," 2022 IEEE International Solid- State Foundry Resistive RAM for Efficient Training and
Circuits Conference (ISSCC), 2022, pp. 1-3. Inference," 2021 Symposium on VLSI Circuits, 2021, pp.
[3] C Chen, WH., Dou, C., Li, KX. et al. CMOS-integrated 1-2.
memristive non-volatile computing-in-memory for AI [17] W. -S. Khwa et al., "A 40-nm, 2M-Cell, 8b-Precision,
edge processors. Nat Electron 2, 420–428 (2019). Hybrid SLC-MLC PCM Computing-in-Memory Macro
[4] Xue, CX., Chiu, YC., Liu, TW. et al. A CMOS-integrated with 20.5 - 65.0TOPS/W for Tiny-Al Edge
compute-in-memory macro based on resistive Devices," 2022 IEEE International Solid- State Circuits
random-access memory for AI edge devices. Nat Conference (ISSCC), 2022, pp. 1-3.
Electron 4, 81–90 (2021). [18] J. -W. Su et al., "16.3 A 28nm 384kb 6T-SRAM
[5] C. -X. Xue, et al., "Embedded 1-Mb ReRAM-Based Computation-in-Memory Macro with 8b Precision for AI
Computing-in- Memory Macro With Multibit Input and
Edge Chips," 2021 IEEE International Solid- State
Weight for CNN-Based AI Edge Processors," in IEEE
Circuits Conference (ISSCC), 2021, pp. 250-252.
Journal of Solid-State Circuits, vol. 55, no. 1, pp.
[19] W. Wan et al., "33.1 A 74 TMACS/W CMOS-RRAM
203-215, Jan. 2020.
[6] J. -H. Yoon, et al., "A 40nm 64Kb 56.67 TOPS/W Neurosynaptic Core with Dynamically Reconfigurable
Read-Disturb-Tolerant Compute-in-Memory /Digital Dataflow and In-situ Transposable Weights for
RRAM Macro with Active-Feedback-Based Read and Probabilistic Graphical Models," 2020 IEEE
In-Situ Write Verification," 2021 IEEE International International Solid- State Circuits Conference - (ISSCC),
Solid- State Circuits Conference (ISSCC), 2021, pp. 2020, pp. 498-500.
404-406. [20] Le Gallo, M., Sebastian, A., Mathis, R. et
[7] Mochida, R. et al. A 4M Synapses integrated Analog al. Mixed-precision in-memory computing. Nat
ReRAM based 66.5 TOPS/W Neural-Network Processor Electron 1, 246–253 (2018).
with Cell Current Controlled Writing and Flexible [21] R. Khaddam-Aljameh et al., "HERMES Core – A 14nm
Network Architecture. IEEE Symposium on VLSI CMOS and PCM-based In-Memory Compute Core using
Technology, 175-176 (2018). an array of 300ps/LSB Linearized CCO-based ADCs and
[8] Yao, P. et al. Fully hardware-implemented memristor local digital processing," 2021 Symposium on VLSI
convolutional neural network. Nature 577, 641-646 Technology, 2021, pp. 1-2.
(2020). [22] Y. Liao et al., "Novel In-Memory Matrix-Matrix
[9] Liu, Q. et al. A Fully Integrated Analog ReRAM Based Multiplication with Resistive Cross-Point Arrays," 2018
78.4TOPS/W Compute-In-Memory Chip with Fully IEEE Symposium on VLSI Technology, 2018, pp. 31-32.
Parallel MAC Computing. IEEE International [23] Ielmini, D., Wong, HS.P. In-memory computing with
Solid-State Circuits Conference (ISSCC) Digest of resistive switching devices. Nat Electron 1, 333–343
Technical Papers, 500-501 (2020). (2018).
[10] Cai, F. et al. A Fully Integrated Reprogrammable [24] C. -X. Xue et al., "24.1 A 1Mb Multibit ReRAM
Memristor–CMOS System for Efficient Computing-In-Memory Macro with 14.6ns Parallel
Multiply–Accumulate Operations. Nature Electronics 2, MAC Computing Time for CNN Based AI Edge
290-299 (2019). Processors," 2019 IEEE International Solid- State
[11] Li, C. et al. Analogue signal and image processing with Circuits Conference - (ISSCC), 2019, pp. 388-390.
large memristor crossbars. Nature Electronics 1, 52-59
[25] C. -X. Xue et al., "15.4 A 22nm 2Mb ReRAM with a Self-Tracking Reference and a Low Ripple Charge
Compute-in-Memory Macro with 121-28TOPS/W for Pump to Achieve a Configurable Read Window and a
Multibit MAC Computing for Tiny AI Edge Wide Operating Voltage Range," 2020 IEEE Symposium
Devices," 2020 IEEE International Solid- State Circuits on VLSI Circuits, 2020, pp. 1-2.
Conference - (ISSCC), 2020, pp. 244-246. [37] X. Si et al., "Circuit Design Challenges in
[26] C. -X. Xue et al., "16.1 A 22nm 4Mb 8b-Precision Computing-in-Memory for AI Edge Devices," 2019
ReRAM Computing-in-Memory Macro with 11.91 to IEEE 13th International Conference on ASIC (ASICON),
195.7TOPS/W for Tiny AI Edge Devices," 2021 IEEE 2019, pp. 1-4.
International Solid- State Circuits Conference (ISSCC), [38] C. -P. Lo et al., "A ReRAM Macro Using Dynamic
2021, pp. 245-247. Trip-Point-Mismatch Sampling Current-Mode Sense
[27] J. -M. Hung, C. -J. Jhang, P. -C. Wu, Y. -C. Chiu and M. Amplifier and Low-DC Voltage-Mode
-F. Chang, "Challenges and Trends of Nonvolatile Write-Termination Scheme Against Resistance and
In-Memory-Computation Circuits for AI Edge Devices," Write-Delay Variation," in IEEE Journal of Solid-State
in IEEE Open Journal of the Solid-State Circuits Society, Circuits, vol. 54, no. 2, pp. 584-595, Feb. 2019.
vol. 1, pp. 171-183, 2021. [39] C. Dou et al., "Challenges of emerging memory and
[28] P. -C. Wu et al., "A 28nm 1Mb Time-Domain memristor based circuits: Nonvolatile logics, IoT security,
Computing-in-Memory 6T-SRAM Macro with a 6.6ns deep learning and neuromorphic computing," 2017 IEEE
Latency, 1241GOPS and 37.01TOPS/W for 8b-MAC 12th International Conference on ASIC (ASICON), 2017,
Operations for Edge-AI Devices," 2022 IEEE pp. 140-143.
International Solid- State Circuits Conference (ISSCC), [40] W. -H. Chen et al., "A 16Mb dual-mode ReRAM macro
2022, pp. 1-3. with sub-14ns computing-in-memory and memory
[29] C. -J. Jhang, C. -X. Xue, J. -M. Hung, F. -C. Chang and M. functions enabled by self-write termination
-F. Chang, "Challenges and Trends of SRAM-Based scheme," 2017 IEEE International Electron Devices
Computing-In-Memory for AI Edge Devices," in IEEE Meeting (IEDM), 2017, pp. 28.2.1-28.2.4.
Transactions on Circuits and Systems I: Regular Papers, [41] A. Lee et al., "A ReRAM-Based Nonvolatile Flip-Flop
vol. 68, no. 5, pp. 1773-1786, May 2021. With Self-Write-Termination Scheme for Frequent-OFF
[30] J. -M. Hung, X. Li, J. Wu and M. -F. Chang, "Challenges Fast-Wake-Up Nonvolatile Processors," in IEEE Journal
and Trends inDeveloping Nonvolatile Memory-Enabled of Solid-State Circuits, vol. 52, no. 8, pp. 2194-2207, Aug.
Computing Chips for Intelligent Edge Devices," in IEEE 2017.
Transactions on Electron Devices, vol. 67, no. 4, pp. [42] W. -H. Chen et al., "Circuit design for beyond von
1444-1453, April 2020. Neumann applications using emerging memory: From
[31] C. Xue and M. Chang, "Challenges in Circuit Designs of nonvolatile logics to neuromorphic computing," 2017
Nonvolatile-memory based computing-in-memory for AI 18th International Symposium on Quality Electronic
Edge Devices," 2019 International SoC Design Design (ISQED), 2017, pp. 23-28.
Conference (ISOCC), 2019, pp. 164-165. [43] S. D. Spetalnick et al., "A 40nm 64kb 26.56TOPS/W
[32] D. Rossi et al., "4.4 A 1.3TOPS/W @ 32GOPS Fully 2.37Mb/mm2RRAM Binary/Compute-in-Memory
Integrated 10-Core SoC for IoT End-Nodes with 1.7μW Macro with 4.23x Improvement in Density and >75%
Cognitive Wake-Up From MRAM-Based Use of Sensing Dynamic Range," 2022 IEEE
State-Retentive Sleep Mode," 2021 IEEE International International Solid- State Circuits Conference (ISSCC),
Solid- State Circuits Conference (ISSCC), 2021, pp. 2022, pp. 1-3.
60-62. [44] J. M. Correll et al., "An 8-bit 20.7 TOPS/W Multi-Level
[33] Y. -C. Chiu et al., "A 22nm 4Mb STT-MRAM Cell ReRAM-based Compute Engine," 2022 IEEE
Data-Encrypted Near-Memory Computation Macro with Symposium on VLSI Technology and Circuits (VLSI
a 192GB/s Read-and-Decryption Bandwidth and Technology and Circuits), 2022, pp. 264-265.
25.1-55.1TOPS/W 8b MAC for AI Operations," 2022 [45] W. -H. Chen et al., "A 65nm 1Mb nonvolatile
IEEE International Solid- State Circuits Conference computing-in-memory ReRAM macro with sub-16ns
(ISSCC), 2022, pp. 178-180. multiply-and-accumulate for binary DNN AI edge
[34] Y. -C. Chiu et al., "A 22-nm 1-Mb 1024-b Read processors," 2018 IEEE International Solid - State
Data-Protected STT-MRAM Macro With Near-Memory Circuits Conference - (ISSCC), 2018, pp. 494-496.
Shift-and-Rotate Functionality and 42.6-GB/s Read [46] Jung, S., Lee, H., Myung, S. et al. A crossbar array of
Bandwidth for Security-Aware Mobile Device," in IEEE magnetoresistive memory devices for in-memory
Journal of Solid-State Circuits, vol. 57, no. 6, pp. computing. Nature 601, 211–216 (2022).
1936-1949, June 2022. [47] M. Chang et al., "A 40nm 60.64TOPS/W ECC-Capable
[35] Y. -C. Chiu et al., "A 40nm 2Mb ReRAM Macro with Compute-in-Memory/Digital 2.25MB/768KB
85% Reduction in FORMING Time and 99% Reduction RRAM/SRAM System with Embedded Cortex M3
in Page-Write Time Using Auto-FORMING and Microprocessor for Edge Recommendation
Auto-Write Schemes," 2019 Symposium on VLSI Systems," 2022 IEEE International Solid- State Circuits
Technology, 2019, pp. T232-T233. Conference (ISSCC), 2022, pp. 1-3.
[36] C. -C. Chou et al., "A 22nm 96KX144 RRAM Macro

You might also like