Fault and Error Tolerance in Neural Networks A Review

This document provides an abstract for a paper that reviews fault and error tolerance in neural networks. The abstract discusses how faults are becoming a major constraint for computing devices due to defects in physical substrates. Neural networks are potentially capable of absorbing some faults due to their intrinsic properties. The paper presents a survey that focuses on established passive techniques to improve the limited intrinsic fault tolerance of neural networks, particularly for feedforward networks. It provides a taxonomy of main techniques and discusses challenges that remain to be addressed.

Uploaded by

Yohanna Yerima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views20 pages

Fault and Error Tolerance in Neural Networks A Review

Uploaded by

Yohanna Yerima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Received July 6, 2017, accepted August 8, 2017, date of publication August 21, 2017, date of current version September

19, 2017.
Digital Object Identifier 10.1109/ACCESS.2017.2742698

Fault and Error Tolerance in Neural

Networks: A Review
CESAR TORRES-HUITZIL1 , (Senior Member, IEEE), AND BERNARD GIRAU2
1 Information Technology Laboratory, CINVESTAV-Tamaulipas, Ciudad Victoria 87130, Mexico
2 Université de Lorraine and LORIA, 54000 Nancy, France
Corresponding author: Cesar Torres-Huitzil ([email protected])
This work was supported by Conacyt, Mexico, under Grant 237417.

ABSTRACT Beyond energy, the growing number of defects in physical substrates is becoming another
major constraint that affects the design of computing devices and systems. As the underlying semiconductor
technologies are getting less and less reliable, the probability that some components of computing devices
fail also increases, preventing designers from realizing the full potential benefits of on-chip exascale
integration derived from near atomic scale feature dimensions. As the quest for performance confronts
permanent and transient faults, device variation, and thermal issues, major breakthroughs in computing
efficiency are expected to benefit from unconventional and new models of computation, such as brain-
inspired computing. The challenge is then to find not only high-performance and energy-efficient, but also
fault-tolerant computing solutions. Neural computing principles remain elusive, yet as source of a promising
fault-tolerant computing paradigm. In the quest to fault tolerance can be translated into scalable and reliable
computing systems, hardware design itself and/or to use circuits even with faults has further motivated
research on neural networks, which are potentially capable of absorbing some degrees of vulnerability based
on their natural properties. This paper presents a survey on fault tolerance in neural networks manly focusing
on well-established passive techniques to exploit and improve, by design, such potential but limited intrinsic
property in neural models, particularly for feedforward neural networks. First, fundamental concepts and
background on fault tolerance are introduced. Then, we review fault types, models, and measures used to
evaluate performance and provide a taxonomy of the main techniques to enhance the intrinsic properties of
some neural models, based on the principles and mechanisms that they exploit to achieve fault tolerance
passively. For completeness, we briefly review some representative works on active fault tolerance in neural
networks. We present some key challenges that remain to be overcome and conclude with an outlook for this
field.

INDEX TERMS Fault tolerance, neural networks, redundancy, fault masking, fault models, taxonomy.

I. INTRODUCTION does not necessarily imply that it is the only goal that has
Artificial neural networks models have attracted intensive been or should be pursued [2]. Artificial neural networks are
research interest and enjoyed significant renewed growth in generally assumed to acquire some other desirable intrinsic
artificial intelligence related applications over the last two features of biological systems such as their tolerance against
decades, e.g., deep learning models based on a feedforward imprecision, uncertainty, and faults [3], which also make
deep network or multilayer perceptron [1]. Indeed for some them harder to study or design [4].
applications that extract data from the noisy physical envi- According to neurobiological studies, the human brain is
ronment, speech recognition and visual object recognition, able to tolerate a small amount of synapse or neuron faults,
they appear to be the preferable choice. In neural networks or even use noise as a source of computation [5]. Nervous
research, one of the main problems that has been addressed systems are complex, highly massive parallel information
is the architecture optimization, which aims at appropriately processing architectures made of seemingly imperfect and
choosing the neural architecture and its parameters for high slow, but exceptionally adaptive and power-efficient compo-
generalization performance at solving a given task. However, nents that carry out information processing functions [6], [7].
the fact that performance maximization is of primary concern Moreover, brains have the capability to relearn by growth of

2169-3536
2017 IEEE. Translations and content mining are permitted for academic research only.
17322 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 5, 2017
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
C. Torres-Huitzil, B. Girau: Fault and Error Tolerance in NNs: A Review

new neurons and/or neural connections and/or retraining of imperfect components from the beginning or even use
the existing neural architecture [8]. Derived from these obser- components whose functionality degrades with time without
vations, it is commonly claimed that the majority of neural compromising functionality. As a consequence, computa-
network models, abstracted from biological ones, have built- tional organizations must be prepared for faults/errors, and
in or intrinsic fault tolerance properties due to their parallel provisioned to be able to exploit late-bound information
and distributed structure, and the fact that usually they contain about how variation and faults are affecting the system over
more neurons or processing elements than the necessary to time [24]. More specifically, from a pragmatic point of
solve a given problem, i.e., some natural redundancy due to view, the potential fault-tolerant property of neural models
overprovisioning. However, claiming such an equivalent fault will be crucial to the success of attempts to integrate large
tolerance only on the basis of rough architectural similarities neural models onto silicon for embedded applications, when
therefore cannot hold true in general, especially for small problems of yield become unavoidable [25], [26]. Custom
size neural networks [9], [10]. Furthermore, the assessment hardware implementations of neural networks can benefit the
of fault tolerance across different neural models still remains emerging high-performance machine learning applications
difficult to generalize, due to fault tolerance is network- and but faults can compromise the reliability of such acceler-
application- dependent, an inconsistent use of the principal ators under nanoscale manufacturing process in practical
concepts exists, and the lack of systematic methods and tools scenarios.
for evaluation across neural models. Fault tolerance in a conventional digital computing system
Computational studies have shown that neural networks is usually achieved by increasing its redundancy in space,
are robust to noisy inputs and they also provide graceful time or code, [27], [28] combined with some sort of cen-
degradation due to their resilience to inexact computations tralized voting-based strategies, which usually implies higher
when implemented in a physical substrate. The tolerance to implementation costs and lower performance that sometimes
approximation, for instance, can be leveraged for substantial make it even infeasible to be applied in computing systems
performance and energy gains through the design of cus- at large scale. Research around fault tolerance capabilities
tom low-precision neural accelerators that operate on sen- of neural networks is expected to provide novel solutions to
sory input streams [11]–[13]. However, in practice, a neural improve existing fault tolerance and reliability technologies
network has a very limited fault tolerance capability and, and play a more fundamental role in the future. The style
as a matter of fact, neural networks cannot be considered of neural computation, the parallel, and distributed architec-
intrinsically fault tolerant, without a proper design. Further- ture of neural models have been argued as the source for
more, as a consequence of computation and information are inherent fault tolerance but more general and comprehen-
naturally distributed in neural networks, error confinement sive analysis for large class of perturbations affecting neural
and replication techniques, key to conventional fault toler- computation, and large scale fault tolerance mechanisms tai-
ance solutions, cannot be applied directly so as to limit the lored to neural models must be envisioned at an affordable
error propagation when implemented in potentially faulty cost by further exploiting the inherent capabilities of neural
substrates. computing [29], [30]. As such, a literature review is important
Obtaining truly fault tolerant neural networks is still a to understand how fault/error tolerance in neural networks
very attractive and important issue to obtain more biologi- has been addressed and to gain insight in the foundations
cal plausible models, both for i) artificial intelligence based and recent developments in this field towards new promis-
solutions, where, for instance, pervasive embedded systems ing directions. This survey is of great value to investigate
will require smart objects fully merged with the environment how faults/errors will affect the operation of hardware neural
in which they are deployed to cope with unforeseeable condi- networks and whether the faults/errors can be mitigated by
tions [14]–[16], and ii) as a source to build reliable computing leveraging the intrinsic features of neural networks with com-
systems from unreliable components, as suggested by [17]. plementary techniques.
Rooted on the neural paradigm computing systems might take In the literature, several experimental and less analytic
advantage of new emerging devices at nanoscale dimensions works have been carried out to study neural networks fault
and deal both with manufacturing defects and transient faults tolerance related issues, which include the analysis on effect
as well [18], [19] and even considers faults/errors an essential of noise on the output sensitivity [31], [32], the weight
and intrinsic part of the design. error sensitivity [33]–[35], and the relationship among fault
In this last direction, the robustness and the potential fault- tolerance, generalization and model complexity [2], [10],
tolerant properties of neural models call for attention as per- [36]–[38]. Such works have been carried out at different
manent and transient faults, device variation, thermal issues, levels of abstraction, from very specific low level physical
and aging will force designers to abandon current assump- implementations to the high level intrinsic fault masking
tions that transistors, wires, and other circuit elements will capacity of neural paradigms. In fact, most works use a high
function perfectly over the entire lifetime of a computing sys- level approach focusing on errors instead of faults. Despite
tem, relying mainly on digital integrated circuits [20]–[23]. of an important number of works for fault tolerance in neural
To achieve real benefits from future technologies at networks exist, a survey providing a framework for fault
nanoscale, we must find inexpensive ways to exploit such tolerance study and a categorization for the discussion of