Fault and Error Tolerance in Neural Networks A Review
Fault and Error Tolerance in Neural Networks A Review
19, 2017.
Digital Object Identifier 10.1109/ACCESS.2017.2742698
ABSTRACT Beyond energy, the growing number of defects in physical substrates is becoming another
major constraint that affects the design of computing devices and systems. As the underlying semiconductor
technologies are getting less and less reliable, the probability that some components of computing devices
fail also increases, preventing designers from realizing the full potential benefits of on-chip exascale
integration derived from near atomic scale feature dimensions. As the quest for performance confronts
permanent and transient faults, device variation, and thermal issues, major breakthroughs in computing
efficiency are expected to benefit from unconventional and new models of computation, such as brain-
inspired computing. The challenge is then to find not only high-performance and energy-efficient, but also
fault-tolerant computing solutions. Neural computing principles remain elusive, yet as source of a promising
fault-tolerant computing paradigm. In the quest to fault tolerance can be translated into scalable and reliable
computing systems, hardware design itself and/or to use circuits even with faults has further motivated
research on neural networks, which are potentially capable of absorbing some degrees of vulnerability based
on their natural properties. This paper presents a survey on fault tolerance in neural networks manly focusing
on well-established passive techniques to exploit and improve, by design, such potential but limited intrinsic
property in neural models, particularly for feedforward neural networks. First, fundamental concepts and
background on fault tolerance are introduced. Then, we review fault types, models, and measures used to
evaluate performance and provide a taxonomy of the main techniques to enhance the intrinsic properties of
some neural models, based on the principles and mechanisms that they exploit to achieve fault tolerance
passively. For completeness, we briefly review some representative works on active fault tolerance in neural
networks. We present some key challenges that remain to be overcome and conclude with an outlook for this
field.
INDEX TERMS Fault tolerance, neural networks, redundancy, fault masking, fault models, taxonomy.
I. INTRODUCTION does not necessarily imply that it is the only goal that has
Artificial neural networks models have attracted intensive been or should be pursued [2]. Artificial neural networks are
research interest and enjoyed significant renewed growth in generally assumed to acquire some other desirable intrinsic
artificial intelligence related applications over the last two features of biological systems such as their tolerance against
decades, e.g., deep learning models based on a feedforward imprecision, uncertainty, and faults [3], which also make
deep network or multilayer perceptron [1]. Indeed for some them harder to study or design [4].
applications that extract data from the noisy physical envi- According to neurobiological studies, the human brain is
ronment, speech recognition and visual object recognition, able to tolerate a small amount of synapse or neuron faults,
they appear to be the preferable choice. In neural networks or even use noise as a source of computation [5]. Nervous
research, one of the main problems that has been addressed systems are complex, highly massive parallel information
is the architecture optimization, which aims at appropriately processing architectures made of seemingly imperfect and
choosing the neural architecture and its parameters for high slow, but exceptionally adaptive and power-efficient compo-
generalization performance at solving a given task. However, nents that carry out information processing functions [6], [7].
the fact that performance maximization is of primary concern Moreover, brains have the capability to relearn by growth of
2169-3536
2017 IEEE. Translations and content mining are permitted for academic research only.
17322 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 5, 2017
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
C. Torres-Huitzil, B. Girau: Fault and Error Tolerance in NNs: A Review
new neurons and/or neural connections and/or retraining of imperfect components from the beginning or even use
the existing neural architecture [8]. Derived from these obser- components whose functionality degrades with time without
vations, it is commonly claimed that the majority of neural compromising functionality. As a consequence, computa-
network models, abstracted from biological ones, have built- tional organizations must be prepared for faults/errors, and
in or intrinsic fault tolerance properties due to their parallel provisioned to be able to exploit late-bound information
and distributed structure, and the fact that usually they contain about how variation and faults are affecting the system over
more neurons or processing elements than the necessary to time [24]. More specifically, from a pragmatic point of
solve a given problem, i.e., some natural redundancy due to view, the potential fault-tolerant property of neural models
overprovisioning. However, claiming such an equivalent fault will be crucial to the success of attempts to integrate large
tolerance only on the basis of rough architectural similarities neural models onto silicon for embedded applications, when
therefore cannot hold true in general, especially for small problems of yield become unavoidable [25], [26]. Custom
size neural networks [9], [10]. Furthermore, the assessment hardware implementations of neural networks can benefit the
of fault tolerance across different neural models still remains emerging high-performance machine learning applications
difficult to generalize, due to fault tolerance is network- and but faults can compromise the reliability of such acceler-
application- dependent, an inconsistent use of the principal ators under nanoscale manufacturing process in practical
concepts exists, and the lack of systematic methods and tools scenarios.
for evaluation across neural models. Fault tolerance in a conventional digital computing system
Computational studies have shown that neural networks is usually achieved by increasing its redundancy in space,
are robust to noisy inputs and they also provide graceful time or code, [27], [28] combined with some sort of cen-
degradation due to their resilience to inexact computations tralized voting-based strategies, which usually implies higher
when implemented in a physical substrate. The tolerance to implementation costs and lower performance that sometimes
approximation, for instance, can be leveraged for substantial make it even infeasible to be applied in computing systems
performance and energy gains through the design of cus- at large scale. Research around fault tolerance capabilities
tom low-precision neural accelerators that operate on sen- of neural networks is expected to provide novel solutions to
sory input streams [11]–[13]. However, in practice, a neural improve existing fault tolerance and reliability technologies
network has a very limited fault tolerance capability and, and play a more fundamental role in the future. The style
as a matter of fact, neural networks cannot be considered of neural computation, the parallel, and distributed architec-
intrinsically fault tolerant, without a proper design. Further- ture of neural models have been argued as the source for
more, as a consequence of computation and information are inherent fault tolerance but more general and comprehen-
naturally distributed in neural networks, error confinement sive analysis for large class of perturbations affecting neural
and replication techniques, key to conventional fault toler- computation, and large scale fault tolerance mechanisms tai-
ance solutions, cannot be applied directly so as to limit the lored to neural models must be envisioned at an affordable
error propagation when implemented in potentially faulty cost by further exploiting the inherent capabilities of neural
substrates. computing [29], [30]. As such, a literature review is important
Obtaining truly fault tolerant neural networks is still a to understand how fault/error tolerance in neural networks
very attractive and important issue to obtain more biologi- has been addressed and to gain insight in the foundations
cal plausible models, both for i) artificial intelligence based and recent developments in this field towards new promis-
solutions, where, for instance, pervasive embedded systems ing directions. This survey is of great value to investigate
will require smart objects fully merged with the environment how faults/errors will affect the operation of hardware neural
in which they are deployed to cope with unforeseeable condi- networks and whether the faults/errors can be mitigated by
tions [14]–[16], and ii) as a source to build reliable computing leveraging the intrinsic features of neural networks with com-
systems from unreliable components, as suggested by [17]. plementary techniques.
Rooted on the neural paradigm computing systems might take In the literature, several experimental and less analytic
advantage of new emerging devices at nanoscale dimensions works have been carried out to study neural networks fault
and deal both with manufacturing defects and transient faults tolerance related issues, which include the analysis on effect
as well [18], [19] and even considers faults/errors an essential of noise on the output sensitivity [31], [32], the weight
and intrinsic part of the design. error sensitivity [33]–[35], and the relationship among fault
In this last direction, the robustness and the potential fault- tolerance, generalization and model complexity [2], [10],
tolerant properties of neural models call for attention as per- [36]–[38]. Such works have been carried out at different
manent and transient faults, device variation, thermal issues, levels of abstraction, from very specific low level physical
and aging will force designers to abandon current assump- implementations to the high level intrinsic fault masking
tions that transistors, wires, and other circuit elements will capacity of neural paradigms. In fact, most works use a high
function perfectly over the entire lifetime of a computing sys- level approach focusing on errors instead of faults. Despite
tem, relying mainly on digital integrated circuits [20]–[23]. of an important number of works for fault tolerance in neural
To achieve real benefits from future technologies at networks exist, a survey providing a framework for fault
nanoscale, we must find inexpensive ways to exploit such tolerance study and a categorization for the discussion of
FIGURE 1. Cause effect relationship between fault, error and failure, and its propagation from the physical-implementation level to the behavioral
application level of a neural network model.
formal techniques and methods that produce fault tolerant which are widely used in computing systems at hardware
neural networks is still missing. level and that have been also applied and extended in neural
In this paper, a review on reported works addressing the computing. The interested reader is referred to [42]–[45] for
fault tolerance of neural networks for a given behavioral further information on fault-tolerant systems, concepts and
fault/error model, which evaluate the impact of such errors principles.
on the neural computation in a rather technology-independent
way, is presented. This paper proposes a categorizing frame- A. FAULT TYPES
work that groups together a number of existing techniques There are three fundamental concepts in fault-tolerant sys-
to improve fault tolerance into categories and compare their tems, which are fault, error, and failure. A cause-effect rela-
advantages and drawbacks. The rest of this paper is organized tionship exists between them, from the physical level to the
as follows. Section II presents background and some key behavioral level, as conceptually shown in figure 1 for a
concepts for fault tolerance in neural networks and discusses neural network that performs a computational task and is
the similarities and differences between them. Section III implemented in a digital substrate.
formalizes fault tolerant neural networks and presents typical A fault is an anomalous physical condition in a system that
fault models and measures that have been used for fault gives rise to an error. An error is a manifestation of a fault
tolerance assessment. Section IV presents the taxonomy and in a system, the deviation from the expected output, in which
a discussion of the principal techniques that have been used the logical state of an element differs from its intended value
to produce fault tolerant neural networks, focusing on pas- [43], [46]. A failure refers to a system’s inability to perform
sive fault tolerance. We present and discuss commonly cited its intended functionality or behavior because of errors in its
techniques for each class in the taxonomy. In section V we elements or perturbations in its environment. Propagation of
describe some open challenges for current/future research. an error to the system level results in system failure, however,
Section VI provides some concluding remarks. a fault in a system does not necessary result in an error or
failure as it might go inactivated. A fault is said to be active
II. BACKGROUND AND TERMINOLOGY when it produces an error; otherwise it is called dormant.
Neural networks are claimed to have a built-in or intrin- Faults can be classified by their temporal characteristics as
sic fault tolerance property mainly due to their distributed follows:
connectionist structure. Fault tolerance in a neural network • A permanent fault is continuous and stable with time;
is directly related to the redundancy introduced because of it is mainly a result of an irreversible physical damage.
spare capacity (over-provisioning), i.e., when the complexity • A transient fault may only persist for a short period of
of the problem is less than the raw computational capacity time and it is often result of external disturbances.
that actually the network can provide [39]. Nevertheless, the Transient faults, which recur with some frequency are
analysis and evaluation of fault tolerance remain difficult called intermittent. Usually, an intermittent fault results
because many different architectural and functional features from marginal or unstable device operation and they are more
under diverse conceptual frameworks are usually involved, difficult to detect than permanent ones. Transient and inter-
and there are no common systematic methods or tools for mittent faults cover the vast majority of faults which occur
evaluation [40], [41]. Technical and quantitative reasoning in digital computing systems built with the current semicon-
about these features calls for clear definitions, highlighting ductor technology [47]–[49]. Even, future implementation
their similarities and differences, as those concepts appear in technologies are expected to suffer transient faults due to a
different contexts and areas of application. reduced device quality, exhibiting a high level of process and
This section provides some basic definitions related to environmental variations as well as considerable performance
faults, fault models, fault tolerance and other alike terms, degradation due to the potential high stress of materials [50].
Timing faults change the timing behavior rather than the of abstraction, which can accurately capture the faults at
structure of circuits; they affect circuit parameters which lower physical levels.
define the timing characteristics of the device, such as prop- The following fault models have been widely, and success-
agation delay, hold and set-up times, etc. fully used as abstractions of physical defect mechanisms in
Figure 2 shows a classification of some fault types accord- digital electronics devices and systems [53]–[55]:
ing to their temporal characteristics, indicating some typical • Stuck-at, a data or control line appears to be held exclu-
causes and mechanisms that generate them, which are mod- sively high (stuck-at-1) or low (stuck-at-0).
eled with the corresponding permanent/transient fault mod- • Random bit flips, a data or memory element has some
els, shown as gray rounded boxes in the figure. For instance, incorrect, but random value.
a bridging fault occurs when two leads in a logic network are The stuck-at fault model has been the source of a great
connected accidentally and wired logic is performed at the research effort in fault tolerance. It is still very popular since it
connection [51]. There exist other fault classifications using has been shown that many defects at the transistor and inter-
different criteria, such as value and extent as proposed in [42]. connection structures can be modeled, as permanent faults,
at the logic level with reasonable accuracy [21], [56]. The
stuck-at model is a binary model that do not capture the inde-
terminate states that faults may induce while occurring. Also,
but less frequently for fault tolerance assessment in comput-
ing systems, stuck-open or stuck-short faults are considered
in the literature [28]. Stuck-open models are necessary, for
instance, to characterize the fact that a floating line has a high
capacitance and retains its charge for a significant length of
time in current semiconductor technology.
The random bit-flip model is intended to model transient
faults that usually happen at registers or memory elements
due to external perturbations, for instance, a single event
upset. Under this model, damage/corruption is done only to
the data and not to the circuit itself. Conceptually, it consists
of a register bit that is switched randomly, resulting in that
memory element holding a wrong logic value. The related
pulse model accounts for bit flips produced in combinational
logic is used to differentiate from the bit-flip produced in
memory circuits. Recall that, single event effects (SEE) in
microelectronics are mainly caused when highly energetic
particles, present in the natural space environment, strike sen-
sitive regions of a microelectronic circuit [57], however, it is
expected that those effects happen in normal environments
FIGURE 2. Fault types with some representative causes and mechanisms due to near atomic scale integration as well.
of permanent and transient faults and the corresponding fault models,
shown as gray rounded boxes. As for fault-tolerant hardware implementations, high level
fault models should be consistent with manufacturing defects
In order to facilitate the detection of faults and the correc- or physical ones. Indeed, recently, in spite of its importance, it
tion of their errors, researchers develop models of them to has been shown that classical stuck-at and bit-flip fault mod-
examine the variety of faults that need to be tolerated during els are not enough to cope with the fault mechanisms of new
the operation of a given system. deep-submicrometer technologies and new fault models are
needed to cover aspects like transient pulses, indetermination,
B. Fault Models delays, stuck-opens, shorts, open-lines, and bridgings [54],
A fault model lists which components can become defective [55], [58], [59], some of which are illustrated in figure 2 with
in a system, and also when and how they will misbehave. their corresponding fault models in gray rounded boxes.
They describe the physical manifestation of faults, the types On the other hand, contrary to fault models, error models
of faults and where and how they will occur in a system do not attempt to capture or locate the underlying physical
[52]. The two major requirements for defining fault models, effect of a failure [28]. They rather characterize the devia-
which group faults that cause similar effects into the system, tion, due to the fault, of the function performed from input
in some sense, are contradictory. On one hand, accuracy is to output within a system, at a higher level of abstraction
pursued, that is, realistic faults should be modeled and on the for a better tractability. Mapping criteria of physical faults
other hand tractability, complex or large scale systems should onto the abstract errors are required to show the usabil-
be studied at affordable computational costs. Research, ity and consistency of the error analysis in evaluating the
therefore, deals with deriving realistic models at higher levels actual fault tolerance of a system physical implementation.
Yet, similar errors can be induced by different types of faults, performance accuracy when perturbations (e.g. noise) affect
as no one-to-one correspondence might exist. its parameters. Hence, a system might both be resilient to
lower accuracy (i.e., reduced number of bits) and tolerant to
C. FAULT TOLERANCE AND RELATED TERMS a class of parameter fluctuations or perturbations [2].
The main goal of this section is to identify the proper subset of According to the concepts exposed above, fault tolerance
concepts, and highlight their intersection, to develop a com- can be defined as the attribute of a system that allows it to pre-
mon and consistent understanding of their meaning without serve its expected behavior after faults have manifested them-
reference to a specific discipline or implementation media, selves within the system [42]. More preciselly, for the purpose
and then use them with the exposed in the previous subsection of this review, a fault-tolerant system might be defined as one
concepts for an analytical framework that can be used for that has provisions to avoid failure, as measured by a figure
fault tolerance in neural networks. Some terms of interest of merit, after faults have caused errors within the system.
related to the dependability or truthwordiness of a system
[42], [60], [61] are as follows: D. ACTIVE AND PASSIVE FAULT TOLERANCE
• Reliability, a system is reliable if it performs correctly Fault tolerance can be classified into passive and active,
with high probability in the presence of faults under taking into account the mechanisms by which it is achieved in
previous stated conditions and for a specified period of a system. A system with passive fault tolerance does not react
time. in any special way to compensate for the effect of internal
• Fault tolerance is the property that guarantees the faults, but by exploiting the intrinsic redundancy and fault
proper operation of the system in the event of fault(s) masking, built into the system structure, which efficiently
within some of its components. masks the fault effects ensuring correct outputs in spite of
• Graceful degradation is referred to as a low sensitivity such faults [63]. The system is designed to mask, by compen-
to the occurring faults instead of a complete or catas- sation, a given maximum number of faults. No diagnostics,
trophic failure. relearning, or reconfiguration is needed in such passive fault-
• Robustness is a property that allows a system to con- tolerant system. Thus, fault detection and location can be
tinue operating correctly despite noise in its inputs or totally avoided under this approach.
parameters variation. On the other hand, a system with active fault tolerance,
• Error resilience, tolerance to inexact or approximate explicitly and dynamically recognizes and manages its redun-
computations as originally designed. dant resources to compensate the fault effects (by adaptation,
Reliability is a quality over time and it is associated with retraining or self-repairing mechanisms) when they appear.
unexpected failures of systems. Understanding why these Active fault tolerance requires special detection/localization
failures occur is key to improve the system performance in and supervising/control components, whose design may turn
specific working environments. Reliability is a measure of out to be rather complex and intrusive [64]. Active fault
uncertainty and therefore estimating reliability means using tolerance provides a system the ability to recover from faults
statistics and probability theory. by reallocating the tasks performed by the faulty elements to
Fault tolerance is often associated with robustness to noisy the fault-free ones [65].
inputs, functioning correctly in the presence of such inputs, Generally speaking, it is more difficult to achieve the same
but they are rather different terms [61]. Fault tolerance might degree of fault tolerance of an active approach than with the
generally exploit some sort of redundancy to provide the passive approach, mainly because not all the faulty scenarios
functionality needed to counterbalance the effects of faults. can be considered at the system design, and no repair or
The redundancy might be manifested mainly in two ways: reconfiguration is possible afterwards. However, in a hybrid
extra time or extra components [28]. approach, passive and active tolerance can complement one
Intuitively, the term graceful degradation means that a sys- to each other; a static base configuration masks a given num-
tem tolerates failures by reducing its functionality or perfor- ber of faults, while faulty modules are detected online and
mance, rather than going into a catastrophic behavior. In order replaced within fault-free ones in the base configuration [43].
to graceful degradation be possible, the system must have
some level of reduced or auxiliary functionality; i.e., it must III. FAULT-TOLERANT NEURAL NETWORKS
be possible to define the system’s state as working but not Since a neural network relies on its neurons to collectively
completely functional. perform its function, a claimed property of neural networks is
Error resilience of systems means that they tolerate that they can still perform their overall function even if some
some accuracy reduction, or inexact computations [62], of the neurons/synapses are not functioning. Neural networks
in return for potential resource savings. Approximate com- are not commonly built with the exact or minimum number
puting exploits the gap between the level of accuracy required of neurons to perform a computation for solving a given task.
by applications and that provided by the computing system, In fact, it has been experimentally observed and documented
for achieving diverse optimizations; it is more related to that such overprovisioning leads to a natural robustness and
specific implementations [11]. On the other hand, it can potential fault tolerance, considering a neural network as a
be said that a robust system provides a graceful loss in fully parallel and distributed system where neurons/synapses
A. A BASIC DEFINITION
A neural network N performing a computation HN is said
to be fault tolerant if the computation HNfault , performed by
a faulty network Nfault obtained from N , is close to HN .
Formally, for > 0, N is called -fault-tolerant, [39], [66],
if it tolerates faulty components (for instance neurons/
synapses) for any subset of size at most nfails : FIGURE 3. Abstract neuron model and its main components.
HN (X ) − HNfault (X )
≤ , ∀X ∈ T (1)
• As unexpected values of signals in the communication
where X is any stimuli, applied to the networks N and Nfault , channels due to faulty interconnections or noise.
that belongs to the training set T or is part of the input data to • In the synaptic weight or the associated computation,
be processed by the networks. Given a problem, he goal for which in the absence of implementation details can be
fault tolerance is to determine the network N that performs considered as indistinguishable.
the required computation and has the additional property that • In the neuron body itself, affecting the summation or the
is -fault-tolerant with respect to T . evaluation of the nonlinear activation function.
Furthermore, in a strict sense, a neural network is truly or The first two errors, in digital implementations, are often
complete fault tolerant with respect to a class and number modeled as both stuck-at-0 or stuck-at-1, since an asymmetric
of faults if their effects measured by the chosen figure of behavior for such faults has been reported [70]. Synapse
merit is null. The complete fault tolerance requirement can errors are modeled at stuck-at-value, where value is within
be weakened toward graceful degradation if we allow that the the domain for weights [wmin , wmax ]. Errors caused by faults
increase in the error is below a predefined threshold as stated in the neuron body, usually saturates its output to posi-
in equation 1. Thus, recall that, when a statement about fault tive/negative values, thus they are modeled as stuck-at-1 or
tolerance is made, it should be implicitly assumed a failure stuck-at-(-1), as the activation function is often in this range.
condition or criterion of the network functionality, which is However, neurons cannot only stop computing by saturating,
the threshold below which it cannot longer perform its func- but generally they might even send a value different from
tion according to the specification. As such, fault tolerance in their nominal expected output of the transfer function [65].
neural networks depends on the definition of the acceptable Neurons that can fail by transmitting arbitrary values (known
degree of performance and its intended application [67]. in the literature as Byzantine neurons) has been only recently
considered for fault tolerance assessment [39].
B. FAULTS IN NEURAL MODELS The stuck-at model essentially allows to investigate fault
At a high level of abstraction, fault tolerance, within neural tolerance at the behavioral level, independently of the actual
models, might be analyzed by the effects of errors in the main implementation or detailed characteristics of physical faults.
operators that support the whole neural computational task, It abstracts and simplifies faults into stuck-at values affecting
rather independent from their intended physical implementa- single components. Such an abstraction has been widely used
tion. In fact, this has been the practice in most works reported in testing of digital circuits and has proved to be sufficient
in the literature. In a more comprehensive and structured to model a large number of physical faults. Some other
approach, as the one described in [29], after this initial step, faults/errors can be even considered for neurons but they can
physical faults affecting a specific implementation can be mask each other in the sense that it can be undistinguishable
mapped onto such errors so that the expected fault tolerance which fault occurred, for instance a fault in the synaptic
of a given architectural implementation of a neural model can operation itself (multiplication) instead of a fault in the weight
be estimated, and further by identifying critical neural com- storage. Considerations on physically realistic fault models
ponents, complementary and ad-hoc fault tolerance policies for analog VLSI neural networks are also needed [71]–[73].
can be further applied to enhance the properties of the neural Among the most important works reported in the literature
model implementation. regarding fault models in neural networks, mostly feedfor-
In neural networks, an error model can be defined depend- ward multilayer networks, are the following.
ing only on the neuron behavior itself, rather independent of Sequin and Clay [68] used a bottom-up approach to cate-
its physical implementation, which is usually targeted to a gorize the types of faults that usually might occur in neural
digital substrate, so as to estimate the influence of faults on networks looking at the main components that comprise a
the neural computation from the initial design stages. More network and focusing in fault cases that yield a worse effect
specifically, in the behavioral neuron model, as conceptually on the overall performance of the network. In their modeling,
shown in figure 3, errors may occur [68], [69]: authors distinguished three types of units, input, output and
FIGURE 4. a) A feedforward neural network, b) A faulty synapse between node 5 and 8, c) Network considering that node 5 is faulty.
hidden units, all of which can fail and potentially impact Especially, its critical components might be identified, which
differently the network operation because of their location. can then be protected against possible faults targeted to a
They focused in the following types of faults: i) missing specific physical implementation. Faults are probabilistically
hidden units stuck at an intermediate output value, regardless introduced into a neural model and the degree of failure,
of its inputs, not delivering an effective signal, ii) saturated impact on the performed neural computation, is evaluated
hidden units stuck at an extreme value, iii) missing weights, according to some measures. Figure 4 shows an example of
so called disabled, which do not transmit any signal, and a feedforward neural network, and the corresponding derived
iv) saturated weights, the weight is driven to the maximum networks when a faulty synapse/neuron is considered in the
or minimum values of their allowed range. connection graph. The measure of fault tolerance from many
Bolt [52] introduced a method to develop fault models experiments can be evaluated against the number of con-
for neural networks at the abstract level, considering the sidered faults injected into the neural model. The limit of
fault location and then defining the faults characteristics, by the fault tolerance of the network, assessed in this way, is
enumerating the manifestations of a fault to be such that problem-dependent and is determined by operating scenar-
the maximum harm is caused to the neural network. A fault ios of multiple faults that would lead to a violation of the
model for the multi-layer perceptron was developed, at a performance constraints. With known failure rates and faults
high level of abstraction, thus allowing their inherent fault occurring at random locations, these worst-case scenarios can
tolerance to be estimated. Two types of fault components be used to estimate an upper bound for the fault tolerance of
were identified, stable entities whose associated information the neural network [67]. If a minimum number of faults is
does not change at any time, such as weights and activa- stablished, nfails , it is necessary to prove that the network will
tion functions, and temporary entities whose information is perform well with nfails or fewer faults from a specified set.
only valid for a limited period of time, such as outputs and For large neural networks, exhaustive testing of all pos-
activation values. sible single faults is prohibitive, not to mention that even
Chandra and Singh [8] investigated pre-trained feedfor- multiple faults might occur concurrently. Hence, the strategy
ward neural networks and proposed a framework of study of randomly testing a small fraction of the total number of
for neural fault tolerance. Particularly, they proposed fault possible faults in a network has been adopted for tractability.
models and fault/error measures to quantitatively assess fault It yields partial fault tolerance estimates that statistically are
tolerance in such feedforward networks. According to their very close to those obtained by exhaustive testing. Moreover,
proposal, fault tolerance can be divided intro three sepa- when the fraction of faulty components tested is held fixed,
rate sets of categories: i) tolerance to faults/errors in the the accuracy of the estimate generated by random testing is
learning rule, ii) tolerance to faults/errors outside the neural seen to improve as the network size grows [65].
network structure (incorrectness in the inputs due to noise), Table 1 summarizes some general measures used to assess
and iii) tolerance to fault/errors inside the neural network fault tolerance, which basically measure the performance
(structural fault). Specific faults were defined according to distance (closeness) between fault-free or a baseline neu-
such proposed categories. ral network and the derived faulty networks in classifica-
tion tasks. The measures selection is problem and neural
C. FAULT INJECTION AND MEASURES model dependent but broadly fall into two main categories
For fault tolerance assessment, a fault injection method is of the neural paradigm: those requiring supervised or unsu-
required for gaining insights of the behavior of a system [74]. pervised learning. Chandra and Singh [8] suggested the use of
TABLE 1. Some typical measures used to assess performance and fault tolerance in neural networks.
mean squared error (MSE) and the mean absolute percentage passively. Nonetheless, other important works on fault toler-
error (MAPE) to measure the effect of faults, and particularly ance in neural networks are also briefly referred throughout
for classification problems, the percentage of misclassifica- this review. As for instance, the work reported in [79], where
tion is suggested. For other MSE-like and sensitivity related an empirical study of the influence of the activation function
measures see [37], [76], [77]. On the other hand, sensitivity on fault tolerance properties of feedforward neural networks
measures the change in the output due to a change in the input is presented, showing that the activation function largely has
or internal parameters [69]. The memory capacity has been relevance on fault tolerance and the generalization property of
used for evaluating fault tolerance in associative memories the network. Furthermore, for completeness in section IV-B,
with faulty neurons [78]. we briefly review some representative works on active fault
For neural models for clustering tasks, silhouette statistics tolerance in neural networks.
can be used as a fault tolerance measure since ground truth Before going into details, it is worth to point out that the
labels are not known. It gives a measure of the quality of the majority of reported works has been focused in feedforward
clusters obtained. Such measure is defined for each sample neural networks and few attempts have been made to improve
and is composed of two scores: i) ai , the mean distance fault tolerance in some other neural models. In section IV-C
between a sample and all other points in the same cluster, works that discuss and analyze fault tolerance of non-
and ii) bi , the mean distance between a sample and all other feedforward neural networks will be briefly described, even
points in the next nearest cluster. The silhouette coefficient though some works do not propose any specific technique
score is bounded between −1, for incorrect clustering, and for fault tolerance improvement. This issue is of importance
+1, for highly dense clustering. Scores around zero indicate since the studies and results in the literature concerned
overlapping clusters. with fault tolerance in feedforward neural networks, despite
As a matter of fact, such figures of merit measures two of its importance (e.g. for deep learning), are difficult to
overlapping aspects, on one hand how well the problem is generalize and directly apply across other different neural
solved and on the other hand the fault tolerance that the cor- models.
responding network would roughly provide. But those mea-
sures do not provide a more comprehensive fault tolerance A. PASSIVE FAULT TOLERANCE
assessment, such as for example, on the tight bounds on the In the proposed taxonomy, as schematically shown in
number of neurons that can fail, without harming the result of figure 5, the reviewed works are classified based on their main
a computation in terms of weight and failure distribution [39]. strategies to achieve or improve fault tolerance in the recall
This call for new measures for better understanding of the stage of neural networks without considering retraining, i.e.,
extent to which neural networks can be fault-tolerant. we mainly focus on passive fault tolerance. Since only passive
fault tolerance is considered in depth herein, the main mech-
IV. TAXONOMY OF FAULT TOLERANCE anisms to provide the needed redundancy or fault masking to
Different starting points and criteria usually might lead to enhance fault tolerance will be presented. Each technique is
different taxonomies of fault tolerance. A general but widely explained based on its characteristics, design objectives, and
adopted frame is to classify fault tolerance as passive or the considered fault types in the performed study.
active, based on the principles and mechanisms that they Three main categories, in the passive approach, are iden-
exploit to achieve fault tolerance as outlined in section II-D, tified, which group together related methods and techniques
and particularly emphasized for fault tolerance in neural mod- to enhance the intrinsic fault tolerance capabilities of neural
els in [67]. We follow this frame in reviewing the literature networks: i) explicitly augmenting redundancy, ii) modifying
related to neural networks fault tolerance, and we principally learning/training algorithms, and iii) neural network opti-
focus on methods and techniques to enhance fault tolerance mization with constraints.
FIGURE 5. Taxonomy for techniques and methods to enhance fault tolerance in neural network models grouped into two main categories, passive and
active fault tolerance. This survey mostly reviews research on the three subcategories of passive fault tolerance.
TABLE 2. Summary of some representative works for enhancing fault tolerance in neural models by explicitly adding redundancy in the network after
training, with a representative example of a NN topology used in the experiments.
After training, networks were subject to random connections Chiu et al. [83] extended their previous work [82] and, in
cuts, as a physically plausible type of fault. Experiments, this contribution, they proposed three methods for improving
repeated several times and averaged, showed counterintu- fault tolerance of feedforward neural networks under a hybrid
itively that fault tolerance does not improve as the number approach, which involves both modifying training and use of
of hidden units increases, and that backpropagation training explicit redundancy. In the first method weights are restricted
fails to exploit redundancy (additional hidden units). They to have low magnitudes during the backpropagation training,
proposed a mechanism called augmentation to improve fault since fault tolerance is degraded by the use of high magnitude
tolerance, consisting in the replication of each hidden neuron weights; at the same time, hidden nodes are added dynami-
and their associated connections. Since each node now has cally to the network to ensure that the desired performance
twice as many inputs as in the original network, the weights can be obtained. The second method adds artificial faults to
connecting the augmented network’s hidden layer to the out- various components (nodes and links) of a network during
put layer must be the half of those in the original network to training since injecting a specific fault during training can
maintain the same input-output mapping, as it is shown in the produce a network that can tolerate that specific fault very
example in figure 6. Augmented networks showed better fault well. Perturbation of weight values and stuck at zero faults
tolerance, and the inserted redundancy, the excess nodes, was were considered for synapses, and stuck at 0/1 faults for
verified by means of singular value decomposition. nodes. The third method removes nodes that do not signifi-
cantly affect the network output, and then adds new nodes that
share the load of the most critical ones in the network. Note
that the first two methods of this work can be also considered
in the second category of the taxonomy for fault tolerance as
training was modified.
Phatak and Koren [65] studied fault tolerance in feed-
forward neural networks with a single hidden layer consid-
ering permanent stuck-at type faults of single components.
They proposed a method to synthesize fault tolerant neural
networks by replication of the hidden units. The method
exploits the computational characteristics of the intrinsic
weighted summation performed by neurons. It starts with a
near minimal network that learns the given input/output pat-
FIGURE 6. a) A critical hidden neuron (7) in a feedforward neural
tern mapping. The hidden neurons are replicated as a whole
network and b) Explicitly augmenting redundancy by duplicating and inputs/biases of the output neurons are scaled down/up
neuron 7. The postsynaptic weights of neurons 7 and 7’ are halved. accordingly. There is no majority voter to explicitly mask
out the faults. Compared to previous works that use stuck-
Chiu et al. [82] addressed fault tolerance of feedforward at-0 permanent faults, herein the fault model was extended
neural networks by measuring the sensitivity of links and to allow permanent stuck-at-±W type faults on a single
nodes in the network output, and implemented a technique component (weight/bias). Analytical, as well as, extensive
to ensure the design of networks that satisfy well-defined simulations showed that a significant amount of application-
fault tolerance criteria. Their method takes as input a well- dependent redundancy is needed to achieve complete fault
trained network and then follow two main steps, i) unimpor- tolerance, despite the somewhat simple restrictive assumption
tant nodes in the hidden layers are removed, according to of single faults. Moreover, authors pointed out, as future
the sensitivity measure and a threshold, and ii) the pruned extensions, to include modifications of learning algorithms
network is retrained and some redundant nodes are introduced to find weights and biases that optimize fault tolerance as a
to this network so as to share the task of the critical nodes promising alternative to be further explored.
(neurons with high sensitivity). Faults are injected as a weight Dias and Antunes [84] proposed a technique to improve
perturbation, and sensitivity of links and nodes are evaluated fault tolerance by changing the architecture of feedforward
in terms of the MSE. Two criteria for adding nodes are used, neural networks after training, while maintaining its input-
1) adding extra nodes until the sensitivity of the current most output mapping unchanged. Following a similar approach
critical node is less than some proportion of the sensitivity to previous works in this category, this technique evaluates
of the initial most critical node, and 2) adding extra nodes the elements of the network which are more sensitive to a
until the number of nodes is equal to the original number fault and duplicates inputs, bias, weights or even neurons,
of nodes, in order to compare two networks of the same according to the evaluation criteria. The fault model, used
size. Weights in the augmented network are modified in a for weights and inputs, is the stuck-at considering 0, min and
similar way as in [81]. The obtained results showed a consid- max values. The proposed dividing technique diminishes the
erable improvement in the fault tolerance (changes affecting importance of the fault by splitting a potential faulty synapses
weights) of networks trained for two multiclass classification and dividing its original strength accordingly. A complete
problems. critical neuron can also be duplicated, including all of its
TABLE 3. Summary of some representative works for enhancing fault tolerance in neural models by modifying learning/training, with a representative
example of a NN topology used in the experiments. BP stands for backpropagation, WP for weight perturbation, WC for weight constraint, PT for penalty
term, and R for regularization.
connections to the previous layer and the connections coming 2) MODIFYING TRAINING/LEARNING
from this neuron to the next layer will have half of their previ- These methods modify conventional training/learning
ous values in the unmodified network. Interestingly, authors schemes used for neural networks models in order to tolerate
in a further work introduces the Fault Tolerance Simulation faults a posteriori, i.e., by explicitly targeting fault tolerance
and Evaluation Tool that evaluates and assists to improve while training/learning to achieve the desired computational
fault tolerance for neural networks [85]. The tool is composed task. A summary of some representative works in this cate-
of three main sub-tools: the Insertion tool, to receive the gory is shown in table 3. According to [86], ANN models may
neural network that were previously trained and prepared; be described by the conceptual relation between two main
the Evaluator, for evaluation of the fault tolerance and the factors, as established in equation 2:
Improver, for improving the built-in fault tolerance in an
{ANN model} = {Architecture}
integrated environment.
+ {Training/learning Paradigm} (2)
As a summary, methods for explicitly augmenting redun-
dancy in the neural model could be effective, but they often Following this conceptualization, in this category, two
result in large networks with too many hidden nodes and main subcategories can be identified. On one hand, some
parameters. Thus, pruning, used to determine the relevance works have focused on the training experience provided to
or contribution of hidden units and to identify excess units the network for the development of techniques to obtain
that might be removed to produce a reduced network, is fault-tolerant networks by adding noise, perturbations or by
of relevance in this approach. Even though these methods direct faults injection during training. On the other hand,
do not appear to be different from the conventional redun- some other works focus on the learning rule by including
dancy approach, such as triple modular redundancy schemes, a regularization/penalty term in the performance measure to
they are different in one major respect. There is no major- be improved to indirectly incorporate faults in conventional
ity voter to explicitly mask out the faults, but faults are algorithms such as backpropagation, or by a major adapta-
masked by exploiting the intrinsic characteristics of neu- tion/modification of learning algorithms so as to, for instance,
ral networks such as the weighted summation and the fact search for weights values that are more equally distributed
that the hidden-layer nodes operate close to their saturation and avoid saliency.
points. However, most of these techniques in this category In the first subcategory, Sequin and Clay [68] showed
make a trained network fault tolerant by replication, similarly that fault tolerance can be improved by a suitable training
to the conventional approach for fault tolerance, whereas process, where a feedforward neural network is presented
the main question for neural computing is about the inher- with representative faults so as to learn an internal redun-
ent passive fault tolerance of neural networks, as discussed dant distributed representation. They modified the training
in [86]. procedure such that temporary random faulty hidden units
could be injected. For each pattern presentation, from one is also minimized. Thus, the addition of these terms forces
to three hidden neurons were randomly selected to be faulty. the search to look for a solution with better fault tolerance.
The resulting internal representation assumes a more spread- The fault tolerance constraints in the modified training algo-
out and distributed form that is also more tolerant to faults. rithms can be interpreted as imposing regularity conditions
From experiments, both for classification and approximation on the estimated function by the neural network, such as
tasks, it was observed that training with only single faults weight decay for penalizing large weights, by adding noise to
can lead to fault tolerance against multiple faults as well. the weights, and weight smoothing for uniform information
Another key remark is that weight values induce a sharpening distribution.
of the transition regions of the sigmoids and thus produce Wei et al. [88] presented a learning method, derived from
more extreme binary output signals. However, for analog the backpropagation algorithm, to improve the fault tolerance
approximation tasks it is more difficult to mask the effect of in classification tasks. The classical backpropagation algo-
faults. rithm develops nonuniform weights with a few that are critical
Similarly, Arad and El-Amawy in [87] presented an algo- and many others that are not significant. During training, the
rithm, derived from the backpropagation algorithm, with weights magnitude is constrained to be within a limited range
built-in measures to promote fault tolerance during training. for evenly information distribution across weights. Authors
They demonstrated that feedforward neural networks are able highlight that as some layers are intrinsically more important
to tolerate any combination of two faulty hidden units even than others, they only evaluate that information is evenly
with mixed fault types. They considered a pattern presenta- distributed across the weights, which are between the same
tion as the execution of the forward pass of the backprop- pair of successive layers: if a weight has a relatively high
agation algorithm for a particular pattern, assuming certain influence degree, its magnitude is constrained not to rise
number of faulty hidden neurons with a relatively higher temporarily. Two types of faults were considered: i) node fault
probability. A comprehensive presentation (CP) of pattern p of stuck-at node’s extreme values and ii) connection faults,
is defined to be the execution of all desired presentations which consider setting the relevant weights to zero.
of the pattern. By varying the CPs parameters the network Edwards and Murray [89] proposed a method for enhanc-
can be trained to exhibit different fault tolerance degrees and ing fault tolerance via penalty terms, which are incorporated
varying learning efficiencies. Each hidden-layer neuron can into the learning rule to optimize the networks for smoothness
be assumed faulty with a relatively higher probability in each of the solution towards low average weight saliency and
iteration. They argue that the ability to tolerate various fault optimally distributed computation. Such method focuses on
types can be associated with an increase in the size of the small weight perturbation fault tolerance. Two penalty terms
training set due to the larger number of fault types considered. are introduced, a first order term intuitively linked to the use
Moreover, they confirmed that using critical fault types, stuck of weight-noise, particularly multiplicative noise, and also
at the extreme values of the activation function, results in fault a second order term is proposed using the well established
tolerance against any single faulty neuron stuck at any value statistical theory of smoothing splines. Neural networks were
which lies between the two extreme values. trained using a simple steepest descent algorithm with an
Most works in the second subcategory, in order to cope incorporated line search technique to optimize the step size in
with faults at recall phase, adds a regularization/penalty multilayer perceptron networks. Fault tolerance was assessed
term to the training cost function so as to bias the solu- using the average value of the error hessian, as this value has
tion toward fault-tolerant networks. Commonly, well-known been shown to be directly related to the inverse of the fault
learning algorithms, such as backpropagation, are modified tolerance of a given network.
by introducing such terms in the error function to promote Hammadi et al. [90] proposed a constructive algorithm
uniform information distribution. Regularization is an essen- for fault tolerant feedforward neural networks, which starts
tial technique that has been proved to be useful to improve with a single hidden neuron and incrementally adds neu-
generalization ability of neural networks [96]–[98], and by rons whenever the network fails to converge. The baseline
imposing smoothness constraints on the estimated function, algorithm is modified to update any weight whose relevance
small changes in inputs or parameters produce small changes is less than a given threshold, and the weights are updated
in the computed outputs. Under this approach, there are two using the backpropagation algorithm. The relevance of a
main terms in the objective function as shown in equation 3: given weight is defined as the maximum error caused at
the primary output by the stuck-at fault of this weight. The
E + λJ (3)
algorithm consists of three main stages: training a normal
The first term E includes the standard error term used in learn- network using backpropagation, training of candidates where
ing algorithms such as backpropagation. The second term J , only input-to-candidate and candidate-to-output connections
is the penalty function, which takes into account the errors are trained, and neuron addition. This process is repeated until
that arise due to faults, and λ is the regularization parameter the convergence criterion is satisfied or the maximum net-
that controls the compromise between the degree of fault work size is reached. The main fault type considered was the
tolerance and accuracy. When E is minimized, the error loss of a connection between two neurons (open fault). The
between the target outputs and the faulty network outputs used fault tolerance metric was the percentage of recognized
patterns as function of the percentage of faulty weights in the Simon [93] modified the recursive training algorithm [101]
network. The constructive algorithm was based in a previous for the optimal interpolative classification network to include
work, where in [99] proposed a learning method to enhance distributed fault tolerance against small weight perturbations
fault tolerance ability, which uses the Taylor expansion of the from their trained values. Recall that, the optimal inter-
output around fault-free weights, to estimate the relevance of polative network is a three layer classification network that
the weights to the output error. In this algorithm, the weight grows only as many middle layer neurons as necessary to
that produces the maximum relevance is decreased. correctly classify the training set [102]. The proposed algo-
Cavalieri and Mirabella in [91] proposed an algorithm that rithm attempts to distribute the weights evenly throughout
updates synaptic weights so as to distribute their absolute the network to achieve fault tolerance in such a way that
values as uniformly as possible in a multilayer perceptron both the sum of each row of the weight matrix is equal and
with sigmoidal activation functions, based on the observation the sum of each column of the weight matrix is also equal.
that a fault in large weights is critical for the fault tolerance of The proposed technique can be viewed as a special purpose
the network as a whole, particularly for weights in the output regularization algorithm as it imposes some structure on the
layer. The modified backpropagation algorithm updates each network weights, specifically designed to minimize the effect
weight only as long as the new weight does not exceed a given of particular types of faults.
threshold, which in turn is updated dynamically during the Xiao et al. [94] studied the performance of faulty radial
learning phases based on the current weight values. The basic basis function (RBF) networks considering a general node
principle of inhibiting large absolute weight values is at the fault model, which includes stuck-at-zero, stuck-at-one, and
cost of larger network training convergence time. Two kinds the stuck-at level, with arbitrary distribution. Authors derived
of faults were considered, stuck-at-0 and stuck-at-1 and fault an expression to describe the performance of faulty RBF
tolerance was assessed when multiple faults occur. networks and identify an objective function. With this func-
Bernier et al. [100] and Bernier et al. [37] presented an tion, a training algorithm for the general node situation was
algorithm that tries to maximize fault tolerance in a given net- developed. A mean prediction error (MPE) measure that is
work. Such algorithm explicitly adds a new term to the back- able to estimate the test set error of faulty networks is derived.
propagation learning rule related to the mean square error As previous works focused in feedforward networks, it is
degradation in the presence of weight deviations. Authors not straightforward to compare the results for these neural
presented a quantitative measure to evaluate the fault tol- network models. In an extension of this work, in [95] studied
erance and the noise immunity of a multilayer perceptron. how the open weight fault and the multiplicative weight noise
This measure, termed mean squared sensitivity, was derived degrade the performance of RBF networks, and then devel-
from an explicit relation between the mean squared error oped two learning algorithms, one batch mode and one online
degradation of the multilayer perceptron in the presence of mode. The first one produces the optimal weight vector with
perturbations and the statistical sensitivity. This new term can respect to the average training set error of faulty networks. For
be considered as a stabilizer that tends to smooth the square the online mode, a cyclic learning scheme, in each training
error surface with respect to the weight values in order to cycle an example is learned exactly once according to a fixed
obtain configurations that are stable against perturbations of order. From the experiments it was found out that when the
their values. The proposed algorithm showed better perfor- RBF nodes increase, the fault-tolerant ability can be further
mance with respect to fault tolerance and similar performance improved.
with respect to learning abilities to the conventional back- As a summary, methods and techniques that modify
propagation algorithm. training/learning algorithms often significantly increase the
Zhou et al. [92] introduced a method called T3 computational cost and slow down the convergence of train-
(Test-Train-Test) in order to exploit the fact that the perfor- ing/learning, but the performance/fault-tolerance tradeoff is
mance of trained neural networks does not linearly decrease solved during this phase without any external interaction
with the increasing of the severity of faults characterized by a afterwards. They try to avoid that key or critical neural
fault rate. The proposed method uses a multi-node open fault elements appear, i.e., synapses/neurons that being faulty
model where several faulty hidden nodes are concurrently cause a great impact on the function of the network or
considered. T3 utilizes a validation set to build the fault curve in other words to evenly distribute information among the
of a trained network, then it heuristically locates the inflection network weights. Learning can induce intrinsic fault/error
point of the fault curve and repeatedly trains the network masking ability by forcing neurons to work towards the satu-
according to the corresponding fault rate so that the spatial ration regions of the nonlinear activation functions, so that
redundancy is added to the network in a proper manner. even a large variation of the weighted summation affects
Eventually, the function of faulty nodes are undertaken by the neuron output marginally [105]. Generally, fault toler-
the additional appended nodes, both the number and the ant neural networks generated by these methods appear to
function of the appended nodes are different to those of exhibit better generalization than unconstrained solutions,
the faulty nodes. The T3 algorithm was only applied to and also it appears that enforcing uniformity in the network
some feedforward neural networks whose hidden nodes are is similar to making all hidden units equally relevant in the
dynamically appended during training. network.
TABLE 4. Summary of some representative works for enhancing fault tolerance in neural models posed as constrained optimization problem, with a
representative example of a NN topology used in the experiments.
3) OPTIMIZATION UNDER CONSTRAINTS problem. Authors provided a first formalization of the con-
In this approach, the training/learning process and fault tol- cept of epsilon-fault tolerant neural network, as introduced in
erance are transformed into an optimization problem solved section III. Their method selects the weights that perform the
by nonlinear optimization algorithms in order to find the required computation and have the additional property that
neural network topology and its parameters that perform a whenever any single hidden unit is deleted, the faulty network
given task and fulfill fault tolerance constraints as well [104]. continues to perform the computation satisfactorily. Pattern
Table 4 summarizes some representative works that fall in this recognition examples were analyzed showing that uniformly
category. The fault tolerance constraints in the optimization fault tolerant solutions exist in a network with a single hidden
approach can be interpreted as imposing regularity conditions layer. Uniformity of fault tolerance is a measure of the extent
on the estimated function by the neural network with respect to which the computation performed is evenly distributed
to the weights values. through neurons in the hidden layer. To maximize the number
Usually, the fault tolerance problem in this category has of different single hidden units while finding the weights
been formulated as a constrained minimax optimization prob- that minimize the error, a successive quadratic programming
lem where the goal is to minimize the maximum deviation algorithm to calculate the weights was used in this work.
from the desired output for each input in the presence of Deodhare textit et al. [103] presented a technique for gener-
single unit faults in the neural network model: ating feedforward networks tolerant to the loss of a node and
its associated weights. The problem was also formulated as
min max E(W i ) (4)
W i∈Vh a minimax optimization problem and two different solutions
subject to the following constraints: were addressed: i) optimization is converted to a sequence
of unconstrained least-squares optimization problems, whose
d l − yl = 0, ∀l = 1, . . . , p (5) solutions converge to the solution of the original problem.
Here the term E(W i )
in equation 4 represents the error in Then a gradient-based minimization technique is applied to
the network output when a hidden node i is removed, as the unconstrained minimization, ii) the problem is converted
it is graphically shown in figure 4. The goal is to find a to single unconstrained problem equivalent to the original
weight configuration that minimizes E(W i ) for all nodes; one. The methods proposed here lead to networks that exhibit
minimization of the maximum E(W i ) implies minimization a partial degree of fault tolerance. However, authors claim that
of each of the E(W i ). The performance constraints, as indi- those methods might be extended to ensure complete fault
cated in equation 5, capture the requirement that when all the tolerance. Both in terms of time and space, achieving fault tol-
nodes in the network are functional, for each input x l to the erance in this approach is significantly more expensive than
network the output yl must be equal the corresponding desired previous methods. Authors argue that such problem mainly
output d l . The objective is to determine a weight matrix such arises due to the choice of using conventional optimization
that the network not only classifies the patterns or performs algorithms to perform the minimization, thus more advanced
the computational task as desired, but it is also maximally methods should be further explored.
fault tolerant. Considering that genetic algorithms are a powerful tool for
The main difficulty with a minimax optimization problem optimization, some works have employed them to search for
is that the objective function is in general nondifferentiable; a solution to fault tolerance. Zhou and Chen [104] employed
hence well-known gradient-based methods cannot be used to a genetic algorithm to improve the tolerance of feedforward
solve such problems. Fault tolerance posed as an optimization neural networks against an open fault, where a hidden node
problem does not explicitly add, replicate, any spatial redun- and its associated weights are considered to be faulty. The
dancy to the network, nor does it involves the modification of proposed method does not explicitly add any redundancy to
standard training algorithms. the network as other works in this category, nor does it modify
Neti et al. [66] proposed the concept of maximally fault conventional training algorithm. The proposed method fol-
tolerant feedforward neural networks, where the determina- lows the key idea of genetic algorithms, maintain a population
tion of weights was formulated as a nonlinear optimization of neural networks, then use some fault-tolerant measures,
TABLE 5. Summary of some representative works that address active fault tolerance of neural networks.
fitness, to promote the population to evolve good fault highlighted its inherent fault tolerance. They demonstrated
tolerance. Experiments show that the proposed method that such cortical maps can tolerate some failure modes
improves fault tolerance as well as the generalization ability that can occur in commodity GPGPUs systems. The model
of neural networks. Similarly, the approaches proposed in software implementation can intrinsically preserve its func-
[111] and [112] use genetic algorithms as the optimization tionality in the presence of faulty hardware, considering a
method of choice for obtaining fault tolerant multilayer neu- stuck-at fault model, without requiring any reprogramming
ral networks. However, in the first work, a fault tolerant or recompilation. Periodically retraining of the application is
multi-layer neural network, employing both hardware redun- needed to adapt to the new configuration, but without explic-
dancy and weight retraining in order to realize self-recovering itly specifying that configuration; the learning process will
neural networks, is proposed, i.e., it provides active automatically adjust to the faulty hardware. Fault injection
fault-tolerance. experiments validated that such systems are inherently far
more tolerant to permanent faults than conventional ones and
B. ACTIVE FAULT TOLERANCE that can be applied for the robust implementation of tasks on
In this section, we present a brief review of some representa- future computing systems built of faulty components.
tive works that propose methods/techniques to achieve active Deng et al. [109] studied the impacts of timing errors in
fault tolerance in neural models when targeted to physical hardware neural networks, feedforward multilayer models,
hardware implementations. Such works are summarized in to suit the de-facto distribution of timing variation in each
table 5. Under this approach, low-latency fault detection and individual chip. They proposed a timing variation-aware
recovery techniques are required to ensure that a neural net- retraining method, thereby mitigating the negative effects of
work is reset into a fault-free and consistent state after a fault timing violations through the intrinsic resilience of neural
has occurred and propagated. networks. Once the accumulated delay of all gates and wires
Khunasaraphan et al. [106] introduced a self-recovery along a path exceeds the specified clock cycle, a timing vio-
mechanism called weight shifting applied to feedforward lation occurs. Specifically, when timing errors significantly
neural networks and outlined a hardware architecture for affect the output results, they retrain the neural accelerators
implementing this technique. Once a link or a neuron is to change their weights. In this way circumventing excessive
detected to be faulty, weight shifting is invoked. If some timing errors. Experimental results show that timing errors
links of a given neuron are faulty, their weights are shifted in neural accelerators can be well tolerated for different
to other fault-free links of the same neuron. In case of a applications.
complete faulty neuron, all the output links of the output Finally, Naeem et al. [110] demonstrated that a network
neuron are considered to be faulty. The fault detection circuit model of spiking neurons can self-repair in the presence of a
for links/neurons was considered as a black box in this work. uniform and significant (up to 80%) fault distribution. Recall
In a further work, [107], authors applied weight shifting to that, neurons of the central nervous system interact primarily
recover a self-organized maps after some faulty link/neurons with action potentials or spikes, which are stereotyped electri-
occur during operation without retraining or hardware repair. cal impulses [120]. In this model, faults manifest themselves
The fault detection or self-testing technique is not clearly as silent or near silent neurons because of a sudden drop in
described but it is rather suggested that information coding probability of release (PR) at synaptic sites. The enhancement
techniques can be employed for such purpose. Results were of PR, associated with non-faulty or healthy synapses, by the
presented and validated for a 2 × 2 self-organizing map. indirect retrograde signal is a key step in the repair process.
To address fault detection, Tanprasert et al. [113] presented Authors hypothesize that this repair strategy is effective for
a technique for detecting the faulty links and determining a nonuniform fault distribution because the proposed repair
the faulty weights in single-output two-layered feedforward mechanism relies on the level of neural activity within the net-
neural networks by using a set of predefined probing vectors work being sufficient to maintain calcium oscillations across
as inputs. all astrocytes. Authors point out that by moving toward a
Hashmi et al. [108] described a biologically plausi- more astrocentric computing paradigm that captures the self-
ble computational model of cortical perceptual maps and repairing capability of the brain, it will open up an entirely
new generation of brain-inspired autonomous computing of bipolar pattern pairs, namely library pairs. They stud-
systems. ied how many number of pattern pairs can be stored in a
faulty BAM even when there are some errors in the initial
C. BEYOND FEEDFORWARD NEURAL NETWORKS stimulus patterns. They established some boundaries for the
As it was previously pointed out, most revised works degradation factor in the memory capacity, and margins on
have been focused in feedforward neural networks and few noise providing a chance to recall the desired library pair.
attempts have been made to analyze some other neural models Leduc-Primeau et al. [115] studied fault-tolerant associative
for fault tolerance such as recurrent networks, RBF networks, memories based on c-partite graphs. By analytical and sim-
associative memories, or self-organizing maps (SOM). This ulation results they show that these associative memories
section provides a brief review of some of such works, sum- can be made resilient to faults by modifying the retrieval
marized in table 6, even though these do not propose any algorithm. Faults were grouped into those affecting the rep-
technique to improve fault tolerance in those models. resentation of the graph’s adjacency relationships, and those
Protzel et al. [67] investigated fault tolerance of continuous affecting the state of the retrieval algorithm. For a case study,
time recurrent neural networks aimed at solving optimization the memory retains 88% of its efficiency when 1% of the
problems. Networks were subjected to up to 13 simultaneous storage cells are faulty, or 98% when 0.1% of the binary
stuck-at faults for sizes up to 900 neurons. Fault locations outputs of the retrieval algorithm are faulty.
were randomly selected but no two stuck-at-1 faults are Parra and Català [116] presented a sensitivity analysis
allowed to occur within the same row or column, as this for determining the most critical neural elements in a RBF
automatically preclude a valid solution. In their study, mixed network. Parametric faults in neural elements such as weights
stuck-at faults were not considered so as to distinguish and and biases, which involves a tolerance parameter related to
compare the effect of a different fault type in the same loca- multiplicative and additive noise in weights, were considered.
tions. They defined a conditional performance measure by The RMSE was used to calculate the approximation quality
viewing the faults as constraints to the problem. According and as a measure of fault tolerance. The RBF networks and
to results, optimization networks exhibit partial fault toler- their topologies consisted in one hidden layer with Gaussian
ance, which is achieved without the explicit use of redundant functions and one output layer with linear weighted addition.
components. They concluded that the larger the weights, the worse the fault
Nijhuis and Spaaenenburg [9] studied fault tolerance of tolerance. Eickhoff and Rückert [117] studied the robust-
neural associative memories using the Hopfield model under ness of RBF networks in noisy and unreliable environments.
stuck-at-0 and stuck-at-1 faults in the neuron output, broken If the network parameters are constrained, upper bounds
connections and weight deviation of their nominal values. on the MSE can be determined under noise contaminated
The fault tolerance was considered to be the probability that parameters and inputs. A technique to identify sensitive
the network will still function if there are x faulty neurons neurons and to apply methods to increase reliability or to
and y faulty connections. Results showed that the degree reduce noise to high sensitive neurons was only evaluated.
of fault tolerance in such models strongly depend on the Sum et al. [121] studied two different node-fault-injection-
assumed physical faults and the nature of the stored informa- based on-line learning algorithms, including (1) injecting
tion: number of stored patterns, their correlation and desired multinode fault during training and (2) weight decay with
radius of attraction. For small fault rates (below 20% of the injecting multinode fault. By fault injection, either fault or
total number of connections) such neural models proved to noise is introduced to a network before each step of training.
be less vulnerable to broken connections, whereas for large Proofs on the convergences of two node-fault-injection-based
fault rates deviations in the connection strengths have less on-line training RBF methods have been shown and their
influence on the functioning of the network. corresponding objective functions were deduced. Some other
Leung et al. [114] studied the effect of multiplicative fault tolerant learning methods for RBF have been proposed
noise and open weight faults on the performance of bidi- such as those reported in [94], [95], and [121]–[124].
rectional associative memories (BAM). Recall that a BAM Yasunaga et al. [118] studied the fault tolerant capa-
is a two-layer heteroassociator that stores a prescribed set bility of SOM in the presence of defective neurons.
under stuck-at faults. It was shown that defective SOMs can VI. CONCLUSIONS
eventually organize itself for fault tolerance if the defective The connectionist and distributed nature of neural computing
neuron stuck-output is larger than a critical-stuck-output. potentially leads to graceful degradation as exhibited by most
Only a linear array was analyzed and discussed, where defec- neural networks models, i.e., neural networks will not suffer
tive neurons, were concentrated in one place in the array, catastrophic failure, but any fault will influence the output to
forming what is called a defective-neuron cluster. In the some degree since all components take part in the compu-
experiments, 100 neurons, including six defective ones, were tational task. Considering neurons and synapses as physical
used. Talumassawatdi and Lursinsap [119] addressed fault entities that can fail independently is key in a truly distributed
tolerance improvement in SOMs by using a technique of fault and scalable computing model with biological plausibility.
immunization of the synaptic connections. Stuck-at-a faults, Fault tolerance is not inherent within neural networks and
where a is a real value, were considered. Only one neuron is far from being complete; it does need to be specifically
can be faulty at any time, but no restriction on the number of designed and built into the models. Approaching failure in
faulty links of the neuron is assumed. Weights are immunized neural network models is possible to be detected by using a
by adding a constant value that is increased or decreased continuous measure. The additional computational complex-
as much as possible without creating any misclassification. ity that arises in fault tolerance enhancement techniques is
Fault immunization is formulated as an optimization prob- absent in standard neural networks design.
lem on finding the corresponding constant value for each In this paper, we presented a review of the main passive
neuron. techniques used for improving the fault tolerance of neural
networks, mainly for small size feedforward multilayer mod-
V. OPEN CHALLENGES els, that exploit redundancy and fault masking. However, the
Research directions that merit further exploration to enhance obtained results for feedforward multilayer neural networks
fault tolerance in neural computing need to pointed out. Some currently are both of relevance and a great opportunity for
of these open issues - certainly not an exhaustive list - include further exploration, since such networks are the quintessential
the following: elements of deep learning models, which have shown state-
1) Novel fault models, more realist models need to be of-the-art performance on real-world artificial intelligence
developed based on a deep understanding of modern applications. The reviewed works have been categorized into
fabrication technologies. Moreover, due to the vari- three main categories based on key parameters and mecha-
ety of failure modes and the high level of interplay nisms to highlight their similarities and differences in pursu-
involved, combinatorial strategies may be needed to ing fault tolerance in neural networks. The key role of fault
obtain consistent, high quality fault tolerant neural tolerance in future computing systems has been highlighted
mechanisms. by an important body of work. From a pragmatic point of
2) Fault tolerance at architecture and application level, view, the potential fault-tolerant property of neural models
efficient large scale fault tolerance mechanisms need will be crucial to the success of attempts to integrate large
to be designed while also leveraging the intrinsic char- scale neural models onto silicon, e.g. neural hardware accel-
acteristics of underlying neural models. Scalability erators, when problems of yield become unavoidable.
of fault tolerance, currently is limited to small prob-
lem sizes, but neural networks are rapidly growing in REFERENCES
size and complexity for emerging machine learning [1] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
applications. pp. 436–444, May 2015.
3) Fault tolerance across different neural models, fault [2] C. Alippi, ‘‘Selecting accurate, robust, and minimal feedforward neural
networks,’’ IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 49,
tolerance enhancement techniques applicability is ad- no. 12, pp. 1799–1810, Dec. 2002.
hoc to specific neural model topologies, particularly for [3] H. R. Mahdiani, S. M. Fakhraie, and C. Lucas, ‘‘Relaxed fault-tolerant
feedforward neural networks; ensuring different fault hardware implementation of neural networks in the presence of multiple
transient errors,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 8,
tolerance techniques operating at different levels is a pp. 1215–1228, Aug. 2012.
major research challenge. [4] S. Srinivasan and C. F. Stevens, ‘‘Robustness and fault tolerance make
4) Integration and coordination between different brains harder to study,’’ BMC Biol., vol. 9, p. 46, Jun. 2011.
approaches need be promoted. Even if a neural network [5] W. Maass, ‘‘Noise as a resource for computation and learning in net-
works of spiking neurons,’’ Proc. IEEE, vol. 102, no. 5, pp. 860–880,
may have its rather inherent fault tolerance, some addi- May 2014.
tional mechanisms to enhance it should be incorporated [6] T. Sejnowksi and T. Delbruck, ‘‘The language of the brain,’’ Sci. Amer.,
for an specific implementation media. vol. 307, pp. 54–59, Oct. 2012.
[7] W. Maass, ‘‘To spike or not to spike: That is the question,’’ Proc. IEEE,
5) Interdisciplinary interaction, reinforce potential vol. 103, no. 12, pp. 2219–2224, Dec. 2015.
interactions between neuroscience, computational neu- [8] P. Chandra and Y. Singh, ‘‘Fault tolerance of feedforward artificial neural
roscience and neural networks so as to look for bio- networks—A framework of study,’’ in Proc. Int. Joint Conf. Neural Netw.,
logical plausible fault tolerant mechanisms that allows vol. 1. Jul. 2003, pp. 489–494.
[9] J. A. G. Nijhuis and L. Spaaenenburg, ‘‘Fault tolerance of neural asso-
to explore active fault tolerance principles and self ciative memories,’’ IEE Proc. E-Comput. Digit. Techn., vol. 136, no. 5,
repairing mechanisms. pp. 389–394, Sep. 1989.
[10] S. S. Venkatesh, ‘‘The science of making ERORS: What error tolerance [35] M. Stevenson, R. Winter, and B. Widrow, ‘‘Sensitivity of feedforward
implies for capacity in neural networks,’’ IEEE Trans. Knowl. Data Eng., neural networks to weight errors,’’ IEEE Trans. Neural Netw., vol. 1, no. 1,
vol. 4, no. 2, pp. 135–144, Apr. 1992. pp. 71–80, Mar. 1990.
[11] H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, ‘‘Neural acceler- [36] Y. L. Cun, J. S. Denker, and S. A. Solla, ‘‘Advances in neural information
ation for general-purpose approximate programs,’’ IEEE Micro, vol. 33, processing systems,’’ in Optimal Brain Damage, D. S. Touretzky, Ed.
no. 3, pp. 16–27, May 2013. San Francisco, CA, USA: Morgan Kaufmann, 1990, pp. 598–605.
[12] Z. Du, A. Lingamneni, Y. Chen, K. V. Palem, O. Temam, and C. Wu, [37] J. L. Bernier, J. Ortega, E. Ros, I. Rojas, and A. Prieto, ‘‘A quantitative
‘‘Leveraging the error resilience of neural networks for designing highly study of fault tolerance, noise immunity, and generalization ability of
energy efficient accelerators,’’ IEEE Trans. Comput.-Aided Des. Integr. MLPs,’’ Neural Comput., vol. 12, no. 12, pp. 2941–2964, 2000.
Circuits Syst., vol. 34, no. 8, pp. 1223–1235, Aug. 2015. [38] E. B. Tchernev, R. G. Mulvaney, and D. S. Phatak, ‘‘Investigating
[13] G. Volanis, A. Antonopoulos, A. A. Hatzopoulos, and Y. Makris, ‘‘Toward the fault tolerance of neural networks,’’ Neural Comput., vol. 17, no. 7,
silicon-based cognitive neuromorphic ICs—A survey,’’ IEEE Design pp. 1646–1664, Jul. 2005.
Test, vol. 33, no. 3, pp. 91–102, Jun. 2016. [39] E. M. El Mhamdi and R. Guerraoui, ‘‘When neurons fail—Technical
[14] J. Arlat, Z. Kalbarczyk, and T. Nanya, ‘‘Nanocomputing: Small devices, report,’’ EPFL, Lausanne, Tech. Rep. EPFL-WORKING-217561, 2016.
large dependability challenges,’’ IEEE Security Privacy, vol. 10, no. 1, [40] Y. Wang and A. Avižienis, ‘‘A unified reliability model for fault-tolerant
pp. 69–72, Jan. 2012. computers,’’ IEEE Trans. Comput., vol. C-29, no. 11, pp. 1002–1011,
[15] Z. Wang, K. H. Lee, and N. Verma, ‘‘Overcoming computational errors Nov. 1980.
in sensing platforms through embedded machine-learning kernels,’’ IEEE [41] A. Kulakov, M. Zwolinski, and J. S. Reeve, ‘‘Fault tolerance in distributed
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 8, pp. 1459–1470, neural computing,’’ CoRR, vol. abs/1509.09199, pp. 1–9, Sep. 2015.
Aug. 2015. [42] A. Avižienis, ‘‘Framework for a taxonomy of fault-tolerance attributes
[16] D. Terry, ‘‘Toward a new approach to IoT fault tolerance,’’ Computer, in computer systems,’’ SIGARCH Comput. Archit. News, vol. 11, no. 3,
vol. 49, no. 8, pp. 80–83, Aug. 2016. pp. 16–21, Jun. 1983.
[17] J. von Neumann, ‘‘Probabilistic logics and the synthesis of reliable organ- [43] V. P. Nelson, ‘‘Fault-tolerant computing: Fundamental concepts,’’ Com-
isms from unreliable components,’’ Autom. Stud., vol. 34, pp. 43–98, puter, vol. 23, no. 7, pp. 19–25, Jul. 1990.
1956. [44] A. K. Somani and N. H. Vaidya, ‘‘Understanding fault tolerance and
[18] D. D. Thaker, R. Amirtharajah, F. Impens, I. L. Chuang, and F. T. Chong, reliability,’’ Computer, vol. 30, no. 4, pp. 45–50, Apr. 1997.
‘‘Recursive TMR: Scaling fault tolerance in the nanoscale era,’’ IEEE [45] P. G. Depledge, ‘‘Fault-tolerant computer systems,’’ IEE Proc. A-Phys.
Design Test Comput., vol. 22, no. 4, pp. 298–305, Jul. 2005. Sci., Meas. Instrum., Manage. Edu.-Rev., vol. 128, no. 4, pp. 257–272,
[19] K. Roy, B. Jung, D. Peroulis, and A. Raghunathan, ‘‘Integrated systems May 1981.
in the more-than-Moore era: Designing low-cost energy-efficient systems [46] G. Buja and R. Menis, ‘‘Dependability and functional safety: Applica-
using heterogeneous components,’’ IEEE Design Test, vol. 33, no. 3, tions in industrial electronics systems,’’ IEEE Ind. Electron. Mag., vol. 6,
pp. 56–65, Jun. 2016. no. 3, pp. 4–12, Sep. 2012.
[20] M. Haselman and S. Hauck, ‘‘The future of integrated circuits: A survey [47] S. Gai, M. Mezzalama, and P. Prinetto, ‘‘A review of fault models for
of nanoelectronics,’’ Proc. IEEE, vol. 98, no. 1, pp. 11–38, Jan. 2010. LSI/VLSI devices,’’ Softw. Microsyst., vol. 2, no. 2, pp. 44–53, Apr. 1983.
[21] A. Eghbal, P. M. Yaghini, N. Bagherzadeh, and M. Khayambashi, [48] J. Sosnowski, ‘‘Transient fault tolerance in digital systems,’’ IEEE Micro,
‘‘Analytical fault tolerance assessment and metrics for TSV-based 3D vol. 14, no. 1, pp. 24–35, Feb. 1994.
network-on-chip,’’ IEEE Trans. Comput., vol. 64, no. 12, pp. 3591–3604, [49] P. Pop, V. Izosimov, P. Eles, and Z. Peng, ‘‘Design optimization of time-
Dec. 2015. and cost-constrained fault-tolerant embedded systems with checkpointing
[22] A. Rahimi, L. Benini, and R. K. Gupta, ‘‘Variability mitigation in and replication,’’ IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
nanometer CMOS integrated systems: A survey of techniques from cir- vol. 17, no. 3, pp. 389–402, Mar. 2009.
cuits to software,’’ Proc. IEEE, vol. 104, no. 7, pp. 1410–1448, Jul. 2016. [50] N. Aymerich, S. D. Cotofana, and A. Rubio, ‘‘Adaptive fault-tolerant
[23] S. Mittal and J. S. Vetter, ‘‘A survey of techniques for modeling architecture for unreliable technologies with heterogeneous variability,’’
and improving reliability of computing systems,’’ IEEE Trans. Parallel IEEE Trans. Nanotechnol., vol. 11, no. 4, pp. 818–829, Jul. 2012.
Distrib. Syst., vol. 27, no. 4, pp. 1226–1238, Apr. 2016. [51] K. C. Y. Mei, ‘‘Bridging and stuck-at faults,’’ IEEE Trans. Comput.,
[24] A. DeHon, N. Carter, and H. Quinn. (Mar. 2011). ‘‘CCC vol. C-23, no. 7, pp. 720–727, Jul. 1974.
cross-layer reliability visioning study,’’ Nat. Sci. Found., USA, [52] G. Bolt, ‘‘Fault models for artificial neural networks,’’ in Proc. IEEE Int.
Tech. Rep. LA-UR 10-08387. [Online]. Available: http://www. Joint Conf. Neural Netw., vol. 2. Nov. 1991, pp. 1373–1378.
relxlayer.org [53] A. Pancholy, J. Rajski, and L. J. McNaughton, ‘‘Empirical failure analysis
[25] P. M. Furth and A. G. Andreou, ‘‘On fault probabilities and yield models and validation of fault models in CMOS VLSI circuits,’’ IEEE Design Test
for VLSI neural networks,’’ IEEE J. Solid-State Circuits, vol. 32, no. 8, Comput., vol. 9, no. 1, pp. 72–83, Mar. 1992.
pp. 1284–1287, Aug. 1997. [54] P. Gil, J. Arlat, H. Madeira, Y. Crouzet, T. Jarboui, and K. Kanoun,
[26] J. M. Shalf and R. Leland, ‘‘Computing beyond Moore’s law,’’ Computer, ‘‘Fault representativeness,’’ Eur. Community Dependability Benchmark-
vol. 48, no. 12, pp. 14–23, Dec. 2015. ing Project, France, Tech. Rep. IST-200025425, 2002.
[27] A. E. Barbour and A. S. Wojcik, ‘‘A general constructive approach to [55] D. de Andres, J. C. Ruiz, D. Gil, and P. Gil, ‘‘Fault emulation for
fault-tolerant design using redundancy,’’ IEEE Trans. Comput., vol. 38, dependability evaluation of VLSI systems,’’ IEEE Trans. Very Large
no. 1, pp. 15–29, Jan. 1989. Scale Integr. (VLSI) Syst., vol. 16, no. 4, pp. 422–431, Apr. 2008.
[28] M. Peercy and P. Banerjee, ‘‘Fault tolerant VLSI systems,’’ Proc. IEEE, [56] J. A. Abraham and W. K. Fuchs, ‘‘Fault and error models for VLSI,’’ Proc.
vol. 81, no. 5, pp. 745–758, May 1993. IEEE, vol. 74, no. 5, pp. 639–654, May 1986.
[29] V. Piuri, ‘‘Analysis of fault tolerance in artificial neural networks,’’ [57] P. E. Dodd and L. W. Massengill, ‘‘Basic mechanisms and modeling of
J. Parallel Distrib. Comput., vol. 61, no. 1, pp. 18–48, 2001. single-event upset in digital microelectronics,’’ IEEE Trans. Nucl. Sci.,
[30] A.-D. Almasi, S. Wozniak, V. Cristea, Y. Leblebici, and T. Engbersen, vol. 50, no. 3, pp. 583–602, Jun. 2003.
‘‘Review of advances in neural networks: Neural design technology [58] S. Ghosh and K. Roy, ‘‘Parameter variation tolerance and error resiliency:
stack,’’ Neurocomputing, vol. 174, pp. 31–41, Jan. 2016. New design paradigm for the nanoscale era,’’ Proc. IEEE, vol. 98, no. 10,
[31] N. C. Hammadi and H. Ito, ‘‘Improving the performance of feedforward pp. 1718–1751, Oct. 2010.
neural networks by noise injection into hidden neurons,’’ J. Intell. Robot. [59] M. Y. C. Kao, K.-T. Tsai, and S.-C. Chang, ‘‘A fault detection and
Syst., vol. 21, no. 2, pp. 103–115, 1998. tolerance architecture for post-silicon skew tuning,’’ IEEE Trans. Very
[32] P. J. Edwards and A. F. Murray, ‘‘Toward optimally distributed computa- Large Scale Integr. (VLSI) Syst., vol. 23, no. 7, pp. 1210–1220, Jul. 2015.
tion,’’ Neural Comput., vol. 10, no. 4, pp. 987–1005, Sep. 1998. [60] M. Al-Kuwaiti, N. Kyriakopoulos, and S. Hussein, ‘‘A comparative anal-
[33] N. S. Merchawi, S. R. T. Kumara, and C. R. Das, ‘‘A probabilistic model ysis of network dependability, fault-tolerance, reliability, security, and
for the fault tolerance of multilayer perceptrons,’’ IEEE Trans. Neural survivability,’’ IEEE Commun. Surveys Tuts., vol. 11, no. 2, pp. 106–124,
Netw., vol. 7, no. 1, pp. 201–205, Jan. 1996. 2nd Quart., 2009.
[34] X. Zeng and D. S. Yeung, ‘‘Sensitivity analysis of multilayer perceptron [61] R. Frei, R. McWilliam, B. Derrick, A. Purvis, A. Tiwari, and
to input and weight perturbations,’’ IEEE Trans. Neural Netw., vol. 12, G. Di Marzo Serugendo, ‘‘Self-healing and self-repairing technologies,’’
no. 6, pp. 1358–1366, Nov. 2001. Int. J. Adv. Manuf. Technol., vol. 69, no. 5, pp. 1033–1061, 2013.
[62] M. A. Breuer, ‘‘Multi-media applications and imprecise computation,’’ [88] N. Wei, S. Yang, and S. Tong, ‘‘A modified learning algorithm for
in Proc. 8th Euromicro Conf. Digit. Syst. Des. (DSD), Aug. 2005, improving the fault tolerance of bp networks,’’ in Proc. IEEE Int. Conf.
pp. 2–7. Neural Netw., vol. 1. Jun. 1996, pp. 247–252.
[63] M. Stanisavljević, A. Schmid, and Y. Leblebici, Fault-Tolerant Architec- [89] P. J. Edwards and A. F. Murray, ‘‘Penalty terms for fault tolerance,’’
tures and Approaches. New York, NY, USA: Springer, 2011, pp. 35–47. in Proc. Int. Conf. Neural Netw., vol. 2. Jun. 1997, pp. 943–947.
[64] E. Dubrova, Hardware Redundancy. New York, NY, USA: Springer, [90] N. C. Hammadi, T. Ohmameuda, E. Kaneko, and H. Ito, ‘‘Fault tolerant
2013, pp. 55–86. constructive algorithm for feedforward neural networks,’’ in Proc. Pacific
[65] D. S. Phatak and I. Koren, ‘‘Complete and partial fault tolerance of Rim Int. Symp. Fault-Tolerant Syst., Dec. 1997, pp. 215–220.
feedforward neural nets,’’ IEEE Trans. Neural Netw., vol. 6, no. 2, [91] S. Cavalieri and O. Mirabella, ‘‘A novel learning algorithm which
pp. 446–456, Mar. 1995. improves the partial fault tolerance of multilayer neural networks,’’ Neu-
[66] C. Neti, M. H. Schneider, and E. D. Young, ‘‘Maximally fault tolerant ral Netw., vol. 12, no. 1, pp. 91–106, Jan. 1999.
neural networks,’’ IEEE Trans. Neural Netw., vol. 3, no. 1, pp. 14–23, [92] Z.-H. Zhou, S.-F. Chen, and Z.-Q. Chen, ‘‘Improving tolerance of neural
Jan. 1992. networks against multi-node open fault,’’ in Proc. Int. Joint Conf. Neural
[67] P. W. Protzel, D. L. Palumbo, and M. K. Arras, ‘‘Performance and Netw. (IJCNN), vol. 3. 2001, pp. 1687–1692.
fault-tolerance of neural networks for optimization,’’ IEEE Trans. Neural [93] D. Simon, ‘‘Distributed fault tolerance in optimal interpolative nets,’’
Netw., vol. 4, no. 4, pp. 600–614, Jul. 1993. IEEE Trans. Neural Netw., vol. 12, no. 6, pp. 1348–1357, Nov. 2001.
[68] C. H. Sequin and R. D. Clay, ‘‘Fault tolerance in artificial neural net- [94] Y. Xiao, R.-B. Feng, C. S. Leung, and J. Sum, ‘‘Objective function and
works,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), vol. 1. Jun. 1990, learning algorithm for the general node fault situation,’’ IEEE Trans.
pp. 703–708. Neural Netw. Learn. Syst., vol. 27, no. 4, pp. 863–874, Apr. 2016.
[69] K. Mehrotra, C. K. Mohan, and S. Ranka, ‘‘Fault tolerance in neural net- [95] C.-S. Leung, W. Y. Wan, and R. Feng, ‘‘A regularizer approach for RBF
works,’’ School Comput. Inf. Sci., Syracuse Univ., Syracuse, NY, USA, networks under the concurrent weight failure situation,’’ IEEE Trans.
Tech. Rep. RL-TR-94-93, Jul. 1994. Neural Netw. Learn. Syst., vol. 28, no. 6, pp. 1360–1372, Jun. 2017.
[70] Y. Tohma and Y. Koyanagi, ‘‘Fault-tolerant design of neural networks for [96] T. Poggio and F. Girosi, ‘‘Networks for approximation and learning,’’
solving optimization problems,’’ IEEE Trans. Comput., vol. 45, no. 12, Proc. IEEE, vol. 78, no. 9, pp. 1481–1497, Sep. 1990.
pp. 1450–1455, Dec. 1996. [97] R. Reed, R. J. Marks, II, and S. Oh, ‘‘Similarities of error regularization,
[71] D. B. I. Feltham and W. Maly, ‘‘Physically realistic fault models for sigmoid gain scaling, target smoothing, and training with jitter,’’ IEEE
analog CMOS neural networks,’’ IEEE J. Solid-State Circuits, vol. 26, Trans. Neural Netw., vol. 6, no. 3, pp. 529–538, May 1995.
no. 9, pp. 1223–1229, Sep. 1991. [98] M. M. Islam, M. A. Sattar, M. F. Amin, X. Yao, and K. Murase,
[72] A. S. Orgenci, G. Dundar, and S. Balkur, ‘‘Fault-tolerant training of ‘‘A new adaptive merging and growing algorithm for designing artificial
neural networks in the presence of MOS transistor mismatches,’’ IEEE neural networks,’’ IEEE Trans. Syst., Man, B (Cybern.), vol. 39, no. 3,
Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 48, no. 3, pp. 705–722, Jun. 2009.
pp. 272–281, Mar. 2001. [99] N. Hammadi and H. Ito, ‘‘A learning algorithm for fault tolerant feed-
[73] O. Temam, ‘‘A defect-tolerant accelerator for emerging high-performance forward neural networks,’’ IEICE Trans. Inf. Syst., vol. E80-D, no. 1,
applications,’’ SIGARCH Comput. Archit. News, vol. 40, no. 3, pp. 21–27, 1997.
pp. 356–367, Jun. 2012. [100] J. Bernier, J. Ortega, I. Rojas, and A. Prieto, ‘‘Improving the tolerance of
[74] M.-C. Hsueh, T. K. Tsai, and R. K. Iyer, ‘‘Fault injection techniques and multilayer perceptrons by minimizing the statistical sensitivity to weight
tools,’’ Computer, vol. 30, no. 4, pp. 75–82, Apr. 1997. deviations,’’ Neurocomputing, vol. 31, nos.1–4, pp. 87–103, 2000.
[75] J. L. Bernier, J. Ortega, E. Ros, I. Rojas, and A. Prieto, ‘‘Obtaining fault [101] S.-K. Sin and R. J. P. DeFigueiredo, ‘‘Efficient learning procedures for
tolerant multilayer perceptrons using an explicit regularization,’’ Neural optimal interpolative nets,’’ Neural Netw., vol. 6, no. 1, pp. 99–113, 1993.
Process. Lett., vol. 12, no. 12, pp. 107–113, 2001. [102] R. J. P. de Figueiredo, ‘‘An optimal matching-score net for pattern classifi-
[76] O. Fontenla-Romero, E. Castillo, A. Alonso-Betanzos, and B. Guijarro- cation,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), vol. 3. Jun. 1990,
Berdias, ‘‘A measure of fault tolerance for functional networks,’’ Neuro- pp. 909–916.
computing, vol. 62, pp. 327–347, Dec. 2004. [103] D. Deodhare, M. Vidyasagar, and S. S. Keethi, ‘‘Synthesis of fault-
[77] R. Xu and D. Wunsch, II, ‘‘Survey of clustering algorithms,’’ IEEE Trans. tolerant feedforward neural networks using minimax optimization,’’ IEEE
Neural Netw., vol. 16, no. 3, pp. 645–678, May 2005. Trans. Neural Netw., vol. 9, no. 5, pp. 891–900, Sep. 1998.
[78] M. N. Shirazi, M. N. Shirazi, and S. Maekawa, ‘‘The capacity of associa- [104] Z.-H. Zhou and S.-F. Chen, ‘‘Evolving fault-tolerant neural networks,’’
tive memories with malfunctioning neurons,’’ IEEE Trans. Neural Netw., Neural Comput. Appl., vol. 11, nos. 3–4, pp. 156–160, Jun. 2003.
vol. 4, no. 4, pp. 628–635, Jul. 1993. [105] S. Bettola and V. Piuri, ‘‘High performance fault-tolerant digital neural
[79] N. C. Hammadi and H. Ito, ‘‘On the activation function and fault tolerance networks,’’ IEEE Trans. Comput., vol. 47, no. 3, pp. 357–363, Mar. 1998.
in feedforward neural networks,’’ IEICE Trans. Inf. Syst., vols. E81-D, [106] C. Khunasaraphan, K. Vanapipat, and C. Lursinsap, ‘‘Weight shifting
no. 1, pp. 66–72, 1998. techniques for self-recovery neural networks,’’ IEEE Trans. Neural Netw.,
[80] L. C. Chu and B. W. Wah, ‘‘Fault tolerant neural networks with hybrid vol. 5, no. 4, pp. 651–658, Jul. 1994.
redundancy,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), vol. 2. [107] C. Khunasaraphan, T. Tanprasert, and C. Lursinsap, ‘‘Recovering faulty
Jun. 1990, pp. 639–649. self-organizing neural networks: By weight shifting technique,’’ in Proc.
[81] M. D. Emmerson and R. I. Damper, ‘‘Determining and improving the fault IEEE Int. Conf. Neural Netw., vol. 3. Jun. 1994, pp. 1513–1518.
tolerance of multilayer perceptrons in a pattern-recognition application,’’ [108] A. Hashmi, H. Berry, O. Temam, and M. Lipasti, ‘‘Automatic abstraction
IEEE Trans. Neural Netw., vol. 4, no. 5, pp. 788–793, Sep. 1993. and fault tolerance in cortical microachitectures,’’ in Proc. 38th Annu. Int.
[82] C.-T. Chiu, K. Mehrotra, C. K. Mohan, and S. Ranka, ‘‘Robustness of Symp. Comput. Archit. (ISCA), Jun. 2011, pp. 1–10.
feedforward neural networks,’’ in Proc. IEEE Int. Conf. Neural Netw., [109] J. Deng et al., ‘‘Retraining-based timing error mitigation for hard-
vol.2. Apr. 1993, pp. 783–788. ware neural networks,’’ in Proc. Design, Autom. Test Eur. Conf.
[83] C.-T. Chin, K. Mehrotra, C. K. Mohan, and S. Rankat, ‘‘Training tech- Exhibit. (DATE), Mar. 2015, pp. 593–596.
niques to obtain fault-tolerant neural networks,’’ in Proc. 24th Int. Symp. [110] M. Naeem, L. J. McDaid, J. Harkin, J. J. Wade, and J. Marsland,
Fault-Tolerant Comput. (FTCS) Dig. Papers, Jun. 1994, pp. 360–369. ‘‘On the role of astroglial syncytia in self-repairing spiking neural
[84] F. M. Dias and A. Antunes, ‘‘Fault tolerance improvement through networks,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 10,
architecture change,’’ in Artificial Neural Networks. Berlin, Germany: pp. 2370–2380, Oct. 2015.
Springer, 2008, pp. 248–257. [111] E. Sugawara, M. Fukushi, and S. Horiguchi, ‘‘Fault tolerant multi-layer
[85] F. M. Dias, R. Borralho, P. Fontes, and A. Antunes, ‘‘FTSET–a software neural networks with ga training,’’ in Proc. 18th IEEE Int. Symp. Defect
tool for fault tolerance evaluation and improvement,’’ Neural Comput. Fault Tolerance VLSI Syst., Nov. 2003, pp. 328–335.
Appl., vol. 19, no. 5, pp. 701–712, 2010. [112] F. Su, P. Yuan, Y. Wang, and C. Zhang, ‘‘The superior fault tolerance
[86] P. Chandra and Y. Singh, ‘‘Feedforward sigmoidal networks— of artificial neural network training with a fault/noise injection-based
Equicontinuity and fault-tolerance properties,’’ IEEE Trans. Neural genetic algorithm,’’ Protein Cell, vol. 7, no. 10, pp. 735–748, 10 2016.
Netw., vol. 15, no. 6, pp. 1350–1366, Nov. 2004. [113] T. Tanprasert, C. Tanprasert, and C. Lursinsap, ‘‘Probing technique for
[87] B. S. Arad and A. El-Amawy, ‘‘On fault tolerant training of feedforward neural net fault detection,’’ in Proc. IEEE Int. Conf. Neural Netw., vol. 2.
neural networks,’’ Neural Netw., vol. 10, no. 3, pp. 539–553, 1997. Jun. 1996, pp. 1001–1005.
[114] A. C.-S. Leung, P. F. Sum, and K. Ho, ‘‘The effect of weight fault on CESAR TORRES-HUITZIL received the master’s
associative networks,’’ Neural Comput. Appl., vol. 20, no. 1, pp. 113–121, degree in electronics and the Ph.D. degree in
2011. computer science from the National Institute for
[115] F. Leduc-Primeau, V. Gripon, M. G. Rabbat, and W. J. Gross, ‘‘Fault- Astrophysics Optics and Electronics in 2003 and
tolerant associative memories based on c-partite graphs,’’ IEEE Trans. 1998, respectively. He is currently a Researcher
Signal Process., vol. 64, no. 4, pp. 829–841, Feb. 2016. with the Information Technology Laboratory,
[116] X. Parra and A. Català, ‘‘Sensitivity analysis of radial basis function Center for Research and Advanced Studies in
networks for fault tolerance purposes,’’ in Foundations and Tools for
the research unit located in Tamaulipas, Mexico.
Neural Modeling. Alicante, Spain: Springer, 1999, pp. 566–572.
His main research lines are around reconfigurable
[117] R. Eickhoff and U. Rückert, ‘‘Robustness of radial basis functions,’’
Neurocomputing, vol. 70, nos. 16–18, pp. 2758–2767, 2007. computing and the computational applications of
[118] M. Yasunaga, I. Hachiya, K. Moki, and J. H. Kim, ‘‘Fault-tolerant self- FPGA devices in different domains such as computer vision, digital signal
organizing map implemented by wafer-scale integration,’’ IEEE Trans. processing, and neural computing.
Very Large Scale Integr. (VLSI) Syst., vol. 6, no. 2, pp. 257–265, Jun. 1998.
[119] R. Talumassawatdi and C. Lursinsap, ‘‘Fault immunization concept for
self-organizing mapping neural networks,’’ Int. J. Uncertainty, Fuzziness
Knowl.-Based Syst., vol. 9, no. 6, pp. 781–790, 2001.
[120] R. Brette, ‘‘Philosophy of the spike: Rate-based vs. spike-based theories
of the brain,’’ Frontiers Syst. Neurosci., vol. 9, p. 151, Nov. 2015.
[121] J. Sum, C.-S. Leung, and K. Ho, On Node-Fault-Injection Training of an
RBF Network. Berlin, Germany: Springer, 2009, pp. 324–331.
[122] J. Pajarinen, J. Peltonen, and M. A. Uusitalo, ‘‘Fault tolerant machine BERNARD GIRAU received the Ph.D. degree
learning for nanoscale cognitive radio,’’ Neurocomputing, vol. 74, no. 5, in computer science from the Ecole Normale
pp. 753–764, 2011. Superieure de Lyon in 1999. He is currently a Full
[123] K. Ho, C.-S. Leung, and J. Sum, ‘‘Training RBF network to tolerate single
Professor of computer science with Université de
node fault,’’ Neurocomputing, vol. 74, no. 6, pp. 1046–1052, 2011.
[124] R. Martolia, A. Jain, and L. Singla, ‘‘Analysis & survey on fault tolerance
Lorraine and a Researcher with the Biscuit Team,
in radial basis function networks,’’ in Proc. Int. Conf. Comput., Commun. LORIA Lorraine Laboratory, Nancy, France. His
Autom. (ICCCA), May 2015, pp. 469–473. current research interest includes embedded paral-
[125] R.-B. Feng, Z.-F. Han, W. Y. Wan, and C.-S. Leung, ‘‘Properties and lel connectionism and bio-inspired neural models
learning algorithms for faulty RBF networks with coexistence of weight for visual perception.
and node failures,’’ Neurocomputing, vol. 224, pp. 166–176, Feb. 2017.