0% found this document useful (0 votes)
29 views10 pages

Topological Constraints and Robustness in Liquid State Machines

This document discusses topological constraints and robustness in liquid state machines. It shows that liquid state machines are very vulnerable to failures in parts of the model, unlike previous work showing robustness to input noise. It then shows that specifying topological constraints like small world networks, which are biologically plausible, can restore robustness to liquid state machines.

Uploaded by

Phuong an
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views10 pages

Topological Constraints and Robustness in Liquid State Machines

This document discusses topological constraints and robustness in liquid state machines. It shows that liquid state machines are very vulnerable to failures in parts of the model, unlike previous work showing robustness to input noise. It then shows that specifying topological constraints like small world networks, which are biologically plausible, can restore robustness to liquid state machines.

Uploaded by

Phuong an
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Expert Systems with Applications 39 (2012) 1597–1606

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Topological constraints and robustness in liquid state machines


Hananel Hazan ⇑, Larry M. Manevitz
Department of Computer Science, University of Haifa, Mount Carmel, Haifa 31905, Israel

a r t i c l e i n f o a b s t r a c t

Keywords: The Liquid State Machine (LSM) is a method of computing with temporal neurons, which can be used
Liquid State Machine amongst other things for classifying intrinsically temporal data directly unlike standard artificial neural
Reservoir computing networks. It has also been put forward as a natural model of certain kinds of brain functions. There are
Small world topology two results in this paper: (1) We show that the Liquid State Machines as normally defined cannot serve
Robustness
as a natural model for brain function. This is because they are very vulnerable to failures in parts of the
Machine learning
model. This result is in contrast to work by Maass et al. which showed that these models are robust to
noise in the input data. (2) We show that specifying certain kinds of topological constraints (such as
‘‘small world assumption’’), which have been claimed are reasonably plausible biologically, can restore
robustness in this sense to LSMs.
Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction (Widrow & Hoff, 1960), Back-Propagation, SVM1 or Tempotron


(Gutig & Sompolinsky, 2006). (The name ‘‘liquid state’’ comes from
Processing in artificial neurons typically is a-temporal. This is the idea that the history of, e.g. timings of rocks thrown into a pond
because the underlying basic neuronal model, that of Pitts and of water, is completely contained in the wave structure.) Moreover,
McCulloch (1943) is a-temporal by nature. As a result, most appli- the ‘‘persistence of the trace’’ (or as Maass put it, the ‘‘fading
cations of artificial neural networks are related in one way or an- memory’’ (Lukosevicius & Jaeger, 2009)) allows one to recognize at
other to static pattern recognition. On the other hand, it has long a temporal distance the signal that was sent to the liquid; and
been recognized in the brain science community that the McCul- sequence and timing effects of inputs.
lough–Pitts paradigm is inadequate. Various models of differing The Liquid State Machine is a recurrent neural network. In its
complexity have been promulgated to explain the temporal capa- usual format (Lukosevicius & Jaeger, 2009; Maass et al., 2002a),
bilities (amongst other things) of natural neurons and neuronal each neuron is a biologically inspired artificial neuron such as an
networks. ‘‘integrate and fire’’ (LIF) neuron or an ‘‘Izhikevich’’ style neuron
However, during the last decade, computational scientists have (Izhikevich, 2003). The connections between neurons define the
begun to pay attention to this issue from the neurocomputation dynamical process, and the recurrence connections define what
perspective as well, e.g. Fern and Sojakka (n.d.), Jaeger (2001a, we call the ‘‘topology’’ in this paper. The properties of the artificial
2001b, 2002), Lukosevicius and Jaeger (2009) and Maass, neurons, together with these recurrences, results in any sequence
Natschläger, and Markram (2002a, 2002b, 2002d), and investiga- of history input being transformed into a spatio-temporal pattern
tions as to the computational capabilities of various models are activation of the liquid. The nomenclature comes from the fact that
being investigated. one can intuitively look at the network as if it was a ‘‘liquid’’ such
One such model, the Liquid State Machine (LSM) (see Fig. 1) as a pond of water, the stimuli are rocks thrown into the water, and
(Maass et al., 2002a), has had substantial success recently. The the ripples on the pond are the spatio-temporal pattern.
Liquid State Machine is a somewhat different paradigm of compu- In the context of LSM the ‘‘detectors’’ are classifier systems that
tation. It assumes that information is stored, not in ‘‘attractors’’ as receive as input a state (or in large systems a sample of the ele-
is usually assumed in recurrent neural networks, but in the activity ments of the liquid) and are trained to recognize patterns that
pattern of all the neurons which feed-back in a sufficiently recur- evolve from a given class of inputs. Thus a detector could be a
rent and inter-connected network. This information can then be SVM or an Adaline (Widrow & Hoff, 1960), perceptron (Pitts &
recognized by any sufficiently strong classifier such as an Adaline McCulloch, 1943), or three level back propagation neural networks,
etc.
⇑ Corresponding author. Tel.: +972 4 8288337; fax: +972 4 8288181.
E-mail addresses: [email protected] (H. Hazan), [email protected]
1
(L.M. Manevitz). SVM = support vector machine.

0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.06.052
1598 H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606

Fig. 1. Liquid State Machine framework.

The term detector is standard in the LSM community and date sense that small damages to the LSM neurons reduce the trained
back to Maass et al. (Jaeger, 2001a; Lukosevicius & Jaeger, 2009; classifiers dramatically, even to essentially random values (Hazan
Maass, 2002; Maass & Markram, 2004; Maass et al., 2002b) the & Manevitz, 2010; Manevitz & Hazan, 2010).
idea is that the ‘‘detectors’’ are testing whether the information Seeking to correct this problem, we experimented with differ-
for classification resides in the liquid; and thus are not required ent architectures of the liquid. The essential need of the LSM is that
to be biological. In this way, it is theoretically possible for the there should be sufficient recurrent connections so that on the one
detectors to recognize any spatio-temporal signal that has been hand, the network maintains the information in a signal, while on
fed into the liquid; and thus the system could be used for, e.g. the other hand it separates different signals. The models typically
speech recognition, or vision, etc. used are random connections; or those random with a bias to-
This is an exciting idea and, e.g. Maass and his colleagues have wards ‘‘nearby’’ connections. Our experiments with these topolo-
published a series of papers on it. Amongst other things, they have gies show that the network is very sensitive to damage because
recently shown that once a detector has been sufficiently trained at the recurrent nature of the system causes substantial feedback.
any time frame, it is resilient to noise in the input data and thus it Taking this as a clue, we tried networks with ‘‘hub’’ or ‘‘small
can be used successfully for generalization (Bassett & Bullmore, world’’ (Albert & Barabási, 2000; Barabási, 2000; Barabási & Albert,
2006; Fern & Sojakka, n.d.; Maass et al., 2002b). 1999) architecture. This architecture has been claimed (Achard,
Furthermore, there is a claim that this abstraction is faithful to Salvador, Whitcher, Suckling, & Bullmore, 2006; Bassett &
the potential capabilities of the natural neurons and thus is explan- Bullmore, 2006; Varshney, Chen, Paniagua, Hall, & Chklovskii,
atory to some extent from the viewpoint of computational brain 2011) to be ‘‘biologically feasible’’.
science. Note that one of the underlying assumptions is that the The intuition was that the hub topology, on the one hand, inte-
detector works without memory; that is the detector should be grates information from many locations and so is resilient to dam-
able to classify based on instantaneous static information; i.e. by age in some of them; and on the other hand, since such hubs follow
sampling the liquid at a specific time. That this is theoretically pos- a power rule distribution, they are rare enough that damage usu-
sible is the result of looking at the dynamical system of the liquid ally does not affect them directly. This intuition was in fact borne
and noting that it is sufficient to cause the divergence of the two out by our experiments.
classes in the space of activation.
Note that the detector systems (e.g. a back propagation neural
network, a perceptron or a support vector machine (SVM)) are 2. Materials and methods
not required to have any biological plausibility; either in their de-
sign or in their training mechanism, since the model does not try to We simulated the Liquid State Machine with 243 integrate and
account for the way the information is used in nature. Despite this, fire neurons (LIF) in the liquid following the exact set up of Maass
since natural neurons exist in a biological and hence noisy environ- and using the code available at the Maass laboratory software ‘‘A
ment, for these models to be successful in this domain, they must neural Circuit SIMulator’’.2 To test variants of topology we re-
be robust to various kinds of noise. As mentioned above, Maass et implemented the code, available at our website.3 The variants of
al. (Lukosevicius & Jaeger, 2009; Maass, Legenstein, & Markram, the topologies implemented are described in the paper below as
2002; Maass et al., 2002b; Maass & Markram, 2004) addressed are the types of damages. Input to the liquid was at 30% of the neu-
one dimension of this problem by showing that the systems are rons, the same input at all locations in a given time instances. The
in fact robust to noise in the input. Thus small random shifts in a detectors of the basic networks were back propagation networks
temporal input pattern will not affect the LSM’s ability to recognize with three levels with 3 neurons in the hidden level and one output
the pattern. From a machine learning perspective, this means that neuron. In most experiments, the input was given by the output of
the model is capable of generalization. all non-input neurons of the liquid (i.e. 170 inputs to the detector).
However, there is another component to robustness; that of the In some experiments (see section below) the inputs to the detector
components of the system itself. were given over 20 time instances and so the detector had 3400
In this paper we report on experiments performed with various
kinds of ‘‘damage’’ to the LSM and unfortunately have shown that 2
http://www.lsm.tugraz.at/csim/.
the LSM with any of the above detectors is not resistant, in the 3
http://www.cri.haifa.ac.il/neurocomputation.
H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 1599

inputs. The networks were tested with 20 random temporal binary as in the liquid (Maass & Markram, 2002) to recognize ten of these
sequences of length 45 chosen with uniform distribution. The inputs and reject the other ten. Each choice of architecture was run
experiments were repeated 500 times and statistics reported. 500 times varying the precise connections randomly. We tested the
robustness of the recognition ability of the network with the fol-
lowing parameters:
3. Theory/calculations
– The neurons in the network were either leaky integrate and fire
As discussed in the introduction, in a system, there are two
neurons (Maass, 2002) or Izhikevich (Izhikevich, 2003) style
sources of potential instability. First is the issue of small variants
neurons.
in the input. Systems need to balance the need of separation with
– The average connectivity of the networks was maintained at
generalization. That is, on the one hand, one may need to separate
about 20% chosen randomly in all cases although with different
inputs with small variations into separate treatment, but on the
distributions.
other hand, small variants may need to be treated as ‘‘noise’’ or
– The damages were either ‘‘generators", i.e. the neurons issued a
generalization of the trained system. For the LSM, as is typically
spike whenever their refractory period allowed it; or they were
presented in the literature, it is understood, e.g. from the work of
‘‘dead’’ neurons that could not spike.
Lukosevicius and Jaeger (2009) and Maass (2002)) that the LSM
– The degree of damage was systematically checked at 0.1%, 0.5%,
and its variants do this successfully in the case of spatio-temporal
1%, 5%, and 10% in randomly chosen neurons.
signals.
The second issue concerns that of the sensitivity of the system
The results shown in tables throughout the paper are in per-
to small changes in the system itself, which we choose to call
centages, over the (500) repeated tests. One hundred percent indi-
‘‘damages’’ in this paper. This is very important if, as is the case
cates that all the 20 vectors of one test, over 500 repetitions of the
for LSM, it is supposed to be explanatory for biological systems.
test were fully recognized correctly. Fifty percent indicates that
Our experiments therefore are based on simulating the LSM
only half the vectors over 500 times were recognized. (This corre-
with temporal sequences and calculating how resistant they are
sponds to a chance baseline). The graphs presented below show
to two main kinds of such damages. The damages chosen for inves-
the full distribution of all the tests and the results over all the kinds
tigation were: (1) at each time instance a certain percentage of
of damages and all varieties of topologies. As expected, they dis-
neurons in the liquid would refuse to fire regardless of the internal
tribute as Gaussian, but note that the average success rate varies
charge in its state; (2) at each time instance a certain percentage of
from a baseline of 10 successes (50%) for random guessing (see
neurons would fire regardless of the internal charge, subject only
Fig. 2) to as high as almost 20 (98%) for generalization in certain
to the limitation of the refractory period.
cases and 88% for some of the damages.
Since the basic results (see below) showed that the standard
variants of LSM were not robust to these damages at various small
levels, we considered topological differences in the connectivity of 3.2. Second experiments: modifications of the LSM
the LSM.
3.2.1. Different kinds of basic neurons
In attempts to restore the robustness to damage, we experi-
3.1. First experiments: LSMs are not robust
mented with the possibility that a different kind of basic neuron
might result in a more resilient network. Accordingly, we imple-
3.1.1. The experiments
mented the LSM with various variants of ‘‘leaky integrate and fire
To test the resistance of standard LSM to noise, we (i) down-
neurons’’, e.g. with history dependent refractory period (Manevitz
loaded the code of Maass et al. from his laboratory site4 and then
& Marom, 2002) and by using the model of neurons due to
implemented two kinds of damage to the liquid and (ii) re-imple-
Izhikevich (2003). The results under these variants were
mented the LSM code so that we could handle variants. These mod-
qualitatively the same as the standard integrate and fire neuron.
els use a kind of basic neuron that is of the ‘‘leaky integrate and fire’’
(The Izhikevich model produces a much more dense activity in
(LIF)5 variety and in Maass’ work, the neurons are connected
the network and thus the detector was harder to train but in the
randomly but with some biologically inspired parameters: 20%
end the network was trainable and the results under damage were
inhibitory and a connectivity constraint giving a preference to geo-
very similar.) Accordingly, we report only results with the standard
metrically nearby neurons over more remote ones. (For precise
integrate and fire neuron as appears, e.g. in Maass’ work (Maass,
details on these parameters, see: neural Circuit SIMulator4 and
2002).
Maass and Markram (2002).) External stimuli to the network were
always sent to 30% of the neurons, always chosen to be excitatory
neurons. Initially, we experimented with two parameters: (i) the
percentage of neurons damaged; (ii) the kinds of damages. The kinds
were either transforming a neuron into a ‘‘dead" neuron; i.e. one that
never fires or transforming a neuron into a ‘‘generator’’ neuron, i.e.
one which fires as often as its refractory period allows it, regardless
of its input. We did experiments with different kinds of detectors:
Adaline (Widrow & Hoff, 1960), Back-Propagation, SVM and Tempo-
tron (Gutig & Sompolinsky, 2006).
Classification of new data could then be done at any of the sig-
nal points. We ran experiments as follows: we randomly chose
twenty temporal inputs; i.e. random sequences of 0s and 1s of
length 45, corresponding to spike inputs over a period of time;
and trained an LSM composed of 243 integrate and fire neurons

Fig. 2. Results of identification of random vectors on an untrained LSM with


4
A neural Circuit SIMulator: http://www.lsm.tugraz.at/csim/. uniform random connections. This is a baseline. The result is a Gaussian distribution
5
LIF = leaky integrate and fire. around 10 vectors.
1600 H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606

3.2.2. Allowing detectors to have memory the ‘‘signal points’’; after training there was no particular impor-
In trying to consider how to make the model more robust to tance to the choice of separation of the signal points except that
damage, we investigated the fact that the detector has no memory. there was no overlap between the data points. While we did not
Perhaps, if we allow the detector to follow the development of the control for any connections between the intervals of data points
network for a substantial amount of time, both in training and run- (i.e. 50, and we also checked other time intervals) and possible nat-
ning, it would be more robust. To check this, we took the most ex- ural oscillations in the network, we do not believe there were any.
treme other case; we assumed that the detector system in fact As anticipated, there was no significant trouble in training the net-
takes as input a full time course of 20 iterations of the output neu- work to even 100% of recognition of the training data.
rons of the liquid. This means that instead of a NN with input of The ‘‘detectors’’ were three level neural networks, trained by
170; we had one with 20 times 170 time course inputs. It seemed back-propagation. We also did some experiments with the Tempo-
reasonable that (i) with so much information, it should be rela- tron (Gutig & Sompolinsky, 2006); and with a simple Adaline
tively easy to train the detector; (ii) one could hope that damage detector (Widrow & Hoff, 1960). Training for classification could
in the liquid would be local enough that over the time period, be performed in the damage-less environment successfully with
the detector could correct for it. In order to test this, we re-imple- any of these detectors. Then we exhaustively ran tests on these
mented the LSM detector to allow for this time entry. possibilities.
Our detector was trained and tested as follows. There were 170 In all of these tests, following Maass (2002), Maass and
output units. At a ‘‘signal point’’ each of them was sampled for the Markram (2002) and Maass et al. (2002a), we assumed that
next 20 iterations and all of these values were used as a single data approximately 20% of the neurons of the liquid were of the inhib-
point to the detector. Thus the detector had 170 times 20 inputs. itory type. The architecture of the neural network detector was 204
We chose separate detector points typically at intervals of 50. input neurons (which were never taken from the neurons in the
We then used back propagation on these data points. This means LSM which were also used as inputs to the LSM) 100 hidden level
that eventually the detector could recognize the signal at any of neurons and one neuron for the output. Results running the Maass
et al. architecture are presented in Fig. 4 and Table 4 and can be
compared with a random connected network of 10% average
connectivity, see Table 2.
The bottom line (see the results section) was that even with low
amounts damage and under most kinds of connectivity, the net-
works would fail; i.e. the trained but damaged network loss of
function was very substantial and in many cases could not perform
substantially differently from a random selection.

3.3. Third experiments: changing the architecture

Our next approach, and ultimately the successful one, was to


experiment with different architectures. The underlying intuition
Fig. 3. Histogram of connection distributions when the output connections were is that the recurrent nature of the liquid results in feedback of
randomly selected chosen according to a power-law. Note that the input histogram information making the network dynamics too sensitive to
is different than the output histogram. changes in the network. Since one can look at ‘‘damages’’ as

Fig. 4. Maass LSM (a) normal operation; (b) with 10% dead damage; (c) with 10% noise. One can easily discern the large change in the reaction of the network.
H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 1601

instantaneous changes in the architecture, it seems reasonable to i. Input connectivity is power law. That is we assign a link
design architectures that can somehow ‘‘filter’’ out minor changes. from a uniformly randomly chosen neuron to a second neu-
The liquids were varied in their topologies in the following ron chosen randomly according to a power law. In this case
ways: the input connectivity follows a power law; while the output
connectivity follows a Gaussian distribution.
1. Random connectivity. Each neuron in the network is connected ii. Output connectivity is power law. That is we reverse the
to 20% of the other neurons in a random fashion. (i) In the ori- above. In this case the input connectivity is Gaussian while
ginal Maass topology the connections are chosen with a larger the output connectivity is power law.
bias for nearby neurons (see Maass, 2002; Maass et al., 2002a; iii. Replacing ‘‘Gaussian’’ with ‘‘uniform’’ in case (i) above.
Maass, Natschläger, & Markram, 2002c). This is the literature iv. Replacing ‘‘Gaussian’’ with ‘‘uniform’’ in case (ii) above.
standard and is what is usually meant as LSM. (ii) We also v. We also tried choosing a symmetric network with power
tested a network without such bias; i.e. the connections are law connectivity (i.e. for both input and output.) Note that
chosen to 20% of the other neurons randomly and uniformly. in this case, the same neurons served as ‘‘hubs’’ both for
The results presented below showed that these architectures input and output.
are not robust. vi. Finally, we designed an algorithm to allow distinct input and
2. Reducing the connectivity to 10% and 5% in the above arrange- output power law connectivity. In this case the hubs in the
ment. The intuition for this was that with lower connectivity, two directions are distinct. Algorithms 1 and 2 below accom-
the feed-back should be reduced. The results presented below plish this task.
show that this intuition is faulty and that these networks are
even less robust than the above (see Tables 1, 2, 5 and 6).
3. Implementation of ‘‘Hub’’ topologies in either input connectiv-
ity or output. The intuition here is that the relative rarity of Algorithm 1
‘‘hubs’’ results in their damage being a very rare event. But Generate a random number between min and max value
when they are not damaged, they receive information from with Power law distribution, Input: min,max, size,
many sources and can thus filter out the damage thus alleviat- How_many_numbers, counter Arry = array, Magnify = 5
ing the feedback in the input case. In the output hub case, the for i = 1 to How_many_numbers
existence of many hubs should allow the individual neurons index = random(array.start,array.end)
to filter out noise. end_array = array.end
candidate = array[index]
The construction of hubs was done in various fashions: AddCells(array, Magnify);
for t = 0 to Magnify
a. Hand design of a network with one hub for input. See array[end_array + t] = candidate
Appendix A for a full description of this design. end for
b. Small world topologies. Since small world topologies follow shuffle(array)
power law connectivity, they produce hubs. On the other output_Array[i] = candidate
hand such topologies are thought to emerge in a ‘‘natural’’ counterArry[candidate]++
fashion (Albert & Barabási, 2000; Barabási, 2000; Barabási end for
& Albert, 1999; Varshney et al., 2011) and appear in real shuffle(counterArry)
neuronal systems (Albert & Barabási, 2000; Bassett & Output output_Array,counterArry
Bullmore, 2006), see Fig. 3. Note however, that in our context
there are two directions to measure the power law: input
and output connectivity histograms for the neurons. We
checked the following variants:
Algorithm 2
Create the connectivity matrix for the liquid network using
the Algorithm 1 as an Input weight_Matrix
Table 1
use algorithm 1 to creart (arraylist, counterArry)
Five percentage uniform random connectivity without memory input to the detector.a
counter = 0
Damage Non 0.1% 0.5% 1% 5% 10% for i=1 to counterArry.lenght
Dead neurons 100% 55% 53% 52% 51% 49% for t=1 to counterArry[i]
Noisy neurons 100% 63% 54% 55% 51% 50% weight_Matrix[i, arraylist[counter]]=true
Dead and noisy 100% 55% 52% 52% 50% 50%
counter++
Generalization 100% 93% 88% 80% 75% 78%
end for
a
For all the tables that are shown in this paper, 50% is the baseline of random end for
classification.

One problem with the various algorithms for designing power law
connectivity is that under a ‘‘fair’’ sampling, the network might
Table 2
not be connected. This means that such a network actually has a
Ten percentage uniform random connectivity without memory input to the detector.
lower, effective connectivity. We decided to eliminate this problem
Damage Non 0.1% 0.5% 1% 5% 10% by randomly connecting the disconnected components (either from
Dead neurons 100% 56% 53% 51% 51% 49% an input or output perspective) to another neuron chosen randomly
Noisy neurons 100% 73% 58% 54% 51% 52% but proportionally to the connectivity. (This does not guarantee
Dead and noisy 100% 59% 54% 52% 52% 51%
connectivity of the graph, but makes it unlikely, so that the effective
Generalization 100% 100% 93% 88% 83% 81%
connectivity is not substantially affected.)
1602 H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606

4. Results Table 4
Twenty percentage connectivity under Maass’s distribution preferring local
connections.
4.1. First experiments: LSM is not robust
Damage Non 0.1% 0.5% 1% 5% 10%
First, there was not much difference between the detectors; so Dead neurons 90% 60% 52% 51% 50% 50%
eventually we restricted ourselves to the back-propagation Noisy neurons 90% 78% 57% 52% 52% 52%
Dead and noisy 90% 54% 52% 53% 50% 50%
detector. (Note that none of units of the liquid input were accessed
Generalization 90% 96% 93% 93% 84% 84%
by the detectors were allowed to be input neurons of the liquid.) It
turned out that while the detector is able to learn the randomly
chosen test classes successfully, if there is sufficient average
connectivity (e.g. 20%), almost any kind of damage caused the Table 5
Five percentage uniform random connectivity with memory input to the detector.
detector to have a very substantial decay in its detecting ability
(see Table 3). Note that even with lower connectivity, which has Damage Non 0.1% 0.5% 1% 5% 10%
less feedback, the same phenomenon occurs. See Table 1 Dead neurons 100% 55% 53% 53% 51% 50%
(5% connectivity) and Table 2 (10% connectivity). Noisy neurons 100% 63% 54% 54% 53% 51%
When the network is connected randomly but with bias for geo- Dead and noisy 100% 56% 53% 52% 51% 51%
Generalization 100% 93% 87% 80% 75% 79%
metric closeness as in Maass’ distribution, the network is still very
sensitive (although a bit less so). Compare Table 4 to Table 3.
After our later experiments, we returned to this point (see con-
cluding remarks, below). In Fig. 4 we illustrate the difference in Table 6
Ten percentage uniform random connectivity with memory input to the detector.a
reaction of the network by a raster (ISI) display. Note that with
10% damage, it is quite evident to the eye that the network di- Damage Non 0.1% 0.5% 1% 5% 10%
verges dramatically from the noise free situation. In Tables 1–4 Dead neurons 100% 58% 55% 53% 49% 50%
one can see this as well with 5% noise for purely random connec- Noisy neurons 100% 74% 59% 57% 54% 50%
tivity. Actually, with low degrees of damage the detectors under Dead and noisy 100% 61% 54% 55% 50% 50%
Generalization 100% 96% 92% 85% 82% 82%
even the Maass connectivity (see Table 4) show dramatic decay
in recognition although not to the extremes of random connectiv- a
For all the tables that shown in this paper, 50% is the baseline of random
ity. These results (see Tables 1–4) were robust and repeatable classification.
under many trials and variants.
Accordingly, we conclude that the LSM, either as purely defined
with random connectivity, or, as implemented in Maass et al.
Table 7
(2002a) cannot serve as a biologically relevant model.
Twenty percentage uniform random connectivity with memory input to the detector.

4.2. Second experiments: varying the neurons and allowing the Damage Non 0.1% 0.5% 1% 5% 10%
detectors to have memory Dead neurons 100% 63% 55% 52% 50% 50%
Noisy neurons 100% 87% 67% 61% 54% 52%
Dead and noisy 100% 68% 57% 52% 50% 49%
4.2.1. Variants of neurons (history dependent refractory period and
Generalization 100% 98% 97% 95% 89% 86%
izhikevich)
The results under these variants were qualitatively the same as
the standard integrate and fire neuron. (The Izhikevich model pro-
duces a much more dense activity in the network and thus the
Table 8
detector was harder to train but in the end the network was train-
Maass’s distribution like in Table 4 but with memory input to the detectors.
able and the results under damage were very similar.) Accordingly,
we report only results with the standard integrate and fire neuron Damage Non 0.1% 0.5% 1% 5% 10%
as appears, e.g. in Maass’ work. Dead neurons 100% 61% 53% 49% 49% 50%
Noisy neurons 100% 79% 60% 55% 51% 49%
Dead and noisy 100% 64% 55% 52% 51% 52%
4.2.2. Detectors with memory input
Generalization 100% 100% 96% 93% 84% 85%
The ‘‘detectors’’ in our experiments were either three level neu-
ral networks, trained by back-propagation, the Tempotron (Gutig &
Sompolinsky, 2006); or with a simple Adaline detector (Widrow &
Hoff, 1960). Training for classification could be performed in the
damage-less environment successfully with any of these detectors.
We exhaustively ran tests on these possibilities; including damage
degree and kinds and detector types.
Tables 5–8 show the results with different uniform connectivity
in the liquid when there is memory input to the detector. Table 8

Table 3
Twenty percentage uniform random connectivity without memory input to the
detector.

Damage Non 0.1% 0.5% 1% 5% 10%


Dead neurons 99% 60% 53% 51% 51% 50%
Noisy neurons 99% 86% 65% 58% 52% 50%
Dead and noisy 99% 65% 55% 53% 50% 51% Fig. 5. Histographs of correctness results in LSM networks with 20 time interval
Generalization 99% 100% 97% 94% 87% 84% input, different amounts of ‘‘dead’’ neuron damage, average connectivity of 20%
with a uniform random distribution on the connections.
H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 1603

Fig. 6. Histographs of correctness results in LSM networks with 20 time interval Fig. 7. Histographs of correctness results in LSM networks with one hub distribu-
input, different amounts of ‘‘noise generator’’ neuron damage, average connectivity tion with different amounts of ‘‘noise generator’’ neuron damage.
of 20% with a uniform random distribution on the connections.

shows similar result (like in Table 4) for the Maass connectivity


with memory input to the detector. Histographs of sample results
with 5% and 10% damage for the neural network detectors are pre-
sented in Figs. 5–13 . (Since the results for the other detectors were
similar, we did not run as many tests on them) Note, Figs. 5–13 re-
fer to the various kinds of hub architectures with the memory in
the detector.
In all of these tests, following Maass, we assumed that approx-
imately 20% of the neurons of the liquid were of the inhibitory
type. The architecture of the neural network detector was 204 in-
put neurons (which were never taken from the neurons in the LSM
which were also used as inputs to the LSM  30 times) 3 hidden le-
vel neurons and one neuron for the output. For 20% connections Fig. 8. Histographs of correctness results in LSM networks with different amounts
the Maass et al. architecture without memory in the detector as of ‘‘dead’’ neuron damage with one hub distribution.
presented in Table 4 can be compared with a uniform random con-
nected network of 20% average connectivity without memory in
the detector in Table 3, and can be compared as well with the started to create different topologies to test the robustness of the
Maass topology with memory in Table 8 and with uniform random liquid with the same parameters as those set by Maass et al. (see
of 20% connectivity with memory in Table 7. Note that Table 1 can Jaeger, 2001a; Lukosevicius & Jaeger, 2009; Maass, 2002; Maass
be also compared to Table 5 and Table 2 can be compared to Table et al., 2002a, 2002c, 2002d; Natschläger, Maass, & Markram,
6. Since this paper is about robustness Figs. 5 and 6 present the full 2002a; Natschläger, Markram, & Maass, 2002b). One of those
distribution of the experiments of Table 7 under these conditions topologies is the hub topology that is described in detail in Appen-
with different degrees of damage. Note that with damage over dix A. In this case, one can see from Table 9, Figs. 7 and 8, that the
1%, the histogram deteriorates dramatically. robustness was substantially increased. However, under the con-
The bottom line of all these comparisons is that decreasing con- struction as presented in Appendix A, there can appear substantial
nectivity and adding memory to the detector slightly increases the disconnected components in the liquid. Moreover, in results not
robustness performance with low amounts of damage, but even presented in this paper, the signal has weaker persistence, i.e.
with low amounts of damage and under all our variants of random detectors are able to recognize the signals in a substantially smal-
connectivity, the networks would fail. That is, the trained but dam- ler time window.
aged network loss of function was very substantial and in many
cases could not perform substantially differently from a random 4.3.2. Small world topologies
classification (see Figs. 5 and 6). The general connectivity in the human brain has been held to
have some small world properties (Achard et al., 2006) Algorithm
4.3. Third experiments: varying the architecture of the network 1 is designed to obtain a hub topology using a more ‘‘natural’’ algo-
rithm thus creating a topology that will be robust to damage in the
4.3.1. Hand chosen one-hub topology liquid and to be more ‘‘natural’’ in its construction. One of the
Since the Maass et al. topology and the uniform random distri- properties of the small world is the power-law distribution.
bution topology showed a high level of vulnerability to any small The results of the small world with a power-law distribution
amount of damage, and since adding memory to the detector (see Table 10, Figs. 9 and 10) were, however, very similar to the
helped only marginally to recover from damage in the liquid; we Maass topology and to the uniform random topology in the

Table 9 Table 10
One hub network with memory input to the detector. Small world with a power-law distribution with memory input to the detector.

Damage Non 0.1% 0.5% 1% 5% 10% Damage Non 0.1% 0.5% 1% 5% 10%
Dead neurons 100% 95% 88% 85% 76% 67% Dead neurons 100% 55% 51% 51% 50% 51%
Noisy neurons 100% 97% 91% 86% 70% 62% Noisy neurons 100% 79% 58% 53% 50% 51%
Dead and noisy 100% 96% 89% 86% 75% 68% Dead and noisy 100% 58% 51% 50% 48% 50%
Generalization 100% 100% 97% 97% 96% 95% Generalization 100% 100% 97% 93% 90% 89%
1604 H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606

Table 11
Small world with a double power-law distribution with memory input to the
detector.

Damage Non 0.1% 0.5% 1% 5% 10%


Dead neurons 96% 95% 87% 83% 74% 69%
Noisy neurons 96% 99% 93% 88% 72% 64%
Dead and noisy 96% 97% 89% 84% 70% 66%
Generalization 96% 99% 99% 98% 97% 97%

Table 12
Small world with a double power-law distribution without memory input to the
detector.

Fig. 9. Histographs of correctness results in LSM networks with different amounts Damage Non 0.1% 0.5% 1% 5% 10%
of ‘‘dead’’ neuron damage with small world topology obtained with a power law Dead neurons 62% 83% 67% 61% 56% 53%
distribution. Noisy neurons 62% 91% 75% 66% 54% 55%
Dead and noisy 62% 86% 69% 65% 52% 55%
Generalization 62% 100% 96% 95% 93% 91%

Fig. 10. Histographs of correctness results in LSM networks with different amounts
of ‘‘noise generator’’ neuron damage for small world topology obtained with a
power-law distribution.
Fig. 12. Histographs of correctness results in LSM networks with different amounts
of ‘‘dead’’ neuron damage with small world topology obtained with a double power
law distribution.
robustness from damage in the liquid. On the other hand, they had
improved generalization capability (see Table 10).
Looking closer at the distribution, as can be seen from Fig. 3,
Algorithm 1 actually creates a power-law distribution in terms of
total connections, but when we separate the connections to input
and output connections, we see that while the output has a power
law distribution, the input connections have a roughly random
uniform distribution.

4.3.3. Small world topologies with double power-law distribution


Accordingly, using Algorithms 1 and 2 we created a double
power-law distribution (using the reverse order for input connec-
tions and output connections as in Fig. 11). The robustness and the
generalization ability was much improved The best results were
Fig. 13. Histographs of correctness results in LSM networks with different amounts
with a double-power law where the distributions are over distinct
of ‘‘noise generator’’ neuron damage for small world topology obtained with a
neurons and these are the results presented here in Tables 11 and double power-law distribution.
12 and Figs. 12 and 13 .

5. Discussion

In this work, we looked at the robustness of the LSM paradigm


and by experimenting with temporal sequences showed that the
basic structural set up in the literature is not robust to two kinds
of damages; even at small levels of damages.
We also investigated this for various degrees of connectivity.
While lowering the average degree of connectivity resulted in
decreased sensitivity in all architectures to some extent, the bot-
tom line is that decreased connectivity is ineffective. In addition,
it became evident that lowering the connectivity also decreases
Fig. 11. Connection distribution of small-world with double power-law. the strength the network has in representability and, importantly,
H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 1605

Fig. 14. A graphical summary of the results presented in this paper. The ‘‘standard’’ LSM topologies either uniform or in Maass’s original papers are not robust; but small
world topologies show an improvement, which is most marked in the case of a two-way power law distribution.

in the persistence of the signal. (That is, a low degree of connectiv-


ity causes the activity to die down quickly because of the lack of
feedback. Thus the network is bounded in time and cannot
recognize an ‘‘older’’ input signal.) Thus we see, as is to be expected
from the analysis in Jaeger (2001a, 2001b, 2002) and Maass et al.
(2002a) that a higher connectivity gives a larger set of ‘‘filters’’ that
separate signals, but on the other hand makes it more sensitive to
changes.
In any case, even with low connectivities, the random topology
was not robust; nor was the Maass topology. (While not at random
levels of identification, as we have seen, e.g. in Tables 1 and 2 it suf-
fered very substantial decays with even small amounts of damages.
In addition, other experiments (not shown here) with connectivi-
Fig. A1. Hub topology.
ties below 15–20%, show that the networks do not maintain the
trace for very long.)
We also investigated some variants in the kinds of neurons. It
seems that the LSM (or ‘‘reservoir computing’’ concept) does not for close neurons. This distribution is superior to the totally ran-
change much vis a vis robustness to internal noise based on these dom one, but is still not sufficiently robust. Choosing a power
choices. law distribution and being careful to making the assignments dif-
We did see substantial improvement when supplying a window ferently for in and out connectivity proved to be the best. Since this
of time input to the detector rather than an instant of time. How- is thought of as a potentially biological arrangement (Barabási &
ever, alone this was not sufficient. Albert, 1999; Bassett & Bullmore, 2006); LSM style networks with
The major affect was changing the topology of connectivity to this additional topological constraint can, as of this date, be consid-
accommodate the idea of hubs, power law and small world con- ered sufficiently biological. Other distributions may also work.
nectivity. Under these topologies, with the best result occurring
when we have power law histogram of both input and output con-
nectivity to the neurons with separate neurons as hubs in both Acknowledgements
directions, the liquids are robust to damages.
We want to acknowledge the distinction and support provided
by Sociedad Mexicana de Inteligencia Artificial (SMIA) and the 9th
6. Conclusions Mexican International Conference on Artificial Intelligence (MICAI-
2010) in order to enhance, improve, and publish this work.
We have shown experimentally that the basic LSM is not robust We thank the Caesarea Rothschild Institute for support of this
to ‘‘damages’’ in its underlying neurons and thus without elabora- research. The first author thanks Prof. Alek Vainstein for support
tion cannot be seen to be a good fit for a model for biological com- in the form of a research fellowship. A short version of this work
putation. (We mention (data not shown here) that this result holds was presented in the MICAI-2010 meeting (Manevitz & Hazan,
even if training is continued while the network is suffering 2010) whom we thank for inviting us to write this extended ver-
damage.) sion. We also thank the Maass laboratory for the public use of their
However, choosing certain power law topologies of the connec- code.
tivity can result in more robust maintenance of the pertinent infor-
mation over time. A graphical summary of the results for
robustness under different topologies is given in Fig. 14. Appendix A
In the papers (Bassett & Bullmore, 2006; Varshney et al., 2011),
a distribution was chosen for biological reasons to allow preference The architecture one hub was made as the following:
1606 H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606

 Divide all the neurons (240) to groups; the size of each group is Jaeger, H. (2002). Adaptive nonlinear system identification with echo state networks.
Retrieved from <http://www.faculty.iu-bremen.de/hjaeger/pubs/esn_NIPS02>.
randomly chosen between 3 and 6 neurons in one group. Each
Lukosevicius, M., & Jaeger, H. (2009). Reservoir computing approaches to recurrent
neuron in the entire group connects to 2 of his neighbors in neural network training. Computer Science Review, 3(3), 127–149. doi:10.1016/
the same group. j.cosrev.2009.03.005.
 Choose 1=4 of the groups to be hubs and the rest of the groups we Maass, W. (2002). Paradigms for computing with spiking neurons. In J. L. van
Hemmen, J. D. Cowan, & E. Domany (Eds.). Models of neural networks. Early
will call them the base. vision and attention (Vol. 4, pp. 373–402). New York: Springer.
 For 20% connection (that is 11,472 connections) 90% of the con- Maass, W., Legenstein, R. A., & Markram, H. (2002). A new approach towards vision
nections are from the base groups to the hub groups, 7% are suggested by biologically realistic neural microcircuit models. In Proceedings of
the 2nd workshop on biologically motivated computer vision. Lecture notes in
from the hub group to base group and 3% are connections computer science. Springer. Retrieved from papers/lsm-vision-146.pdf.
between the hub groups. To accomplish that: Maass, W., & Markram, H. (2002). Temporal integration in recurrent microcircuits.
– Choose (10324 times) random neurons from the base groups and In M. A. Arbib (Ed.), The handbook of brain theory and neural networks (2nd ed..
Cambridge: MIT Press.
connect each one with a randomly neuron from a hub group. Maass, W., & Markram, H. (2004). On the computational power of circuits of spiking
– Randomly choose (803 times) a randomly neuron and connect it neurons. Journal of Computer and System Sciences, 69(4), 593–616. doi:10.1016/
to a randomly chosen neuron from the base neurons. j.jcss.2004.04.001.
Maass, W., Natschläger, T., & Markram, H. (2002a). Computational models for
– Connect (345 times) randomly one of the neurons from the hub generic cortical microcircuits. In J. Feng (Ed.), Computational neuroscience: A
neurons to anther neuron but from a different group (see comprehensive approach. CRC-Press. Retrieved from papers/lsm-feng-chapter-
Fig. A1). 149.pdf.
Maass, W., Natschläger, T., & Markram, H. (2002b). Real-time computing without
stable states: A new framework for neural computation based on perturbations.
Neural Computation, 14(11), 2531–2560. Retrieved from papers/lsm-nc-130.pdf.
References Maass, W., Natschläger, T., & Markram, H. (2002c). A model for real-time
computation in generic neural microcircuits. In Proceedings of NIPS 2002 (Vol.
Achard, S., Salvador, R., Whitcher, B., Suckling, J., & Bullmore, E. (2006). A Resilient, 15, pp. 229–236). Retrieved from papers/lsm-nips-147.pdf
low-frequency, small-world human brain functional network with highly Maass, W., Natschläger, T., Markram, H. (2002d). A fresh look at real-time
connected association cortical hubs. The Journal of Neuroscience, 26(1), 63–72. computation in generic recurrent neural circuits, Tech. Report, Institute for
doi:10.1523/JNEUROSCI.3874-05.2006. Theoretical Computer Science, TU Graz, Graz, Austria.
Albert, R., & Barabási, A.-L. (2000). Topology of evolving networks: Local events and Manevitz, L., & Hazan, H. (2010). Stability and topology in reservoir computing. In G.
universality. Physical Review Letters, 85(24), 5234–5237. Retrieved from <http:// Sidorov, A. Hernández Aguirre, & C. Reyes García (Eds.), Advances in soft
www.ncbi.nlm.nih.gov/pubmed/11102229>. computing. Lecture notes in computer science (Vol. 6438, pp. 245–256). Berlin/
Barabási, G. B. A.-L. (2000). Competition and multiscaling in evolving networks. Heidelberg: Springer. Retrieved from <http://dx.doi.org/10.1007/978-3-642-
cond-mat/0011029. Retrieved from <http://arxiv.org/abs/cond-mat/0011029>. 16773-7_21>.
Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Manevitz, L. M., & Marom, S. (2002). Modeling the process of rate selection in
Science, 286(5439), 509–512. doi:10.1126/science.286.5439.509. neuronal activity. Journal of Theoretical Biology, 216(3), 337–343. Retrieved from
Bassett, D. S., & Bullmore, E. (2006). Small-world brain networks. The Neuroscientist, <http://www.ncbi.nlm.nih.gov/pubmed/12183122>.
12(6), 512–523. doi:10.1177/1073858406293182. Natschläger, T., Maass, W., & Markram, H. (2002). The ‘‘Liquid Computer’’: A novel
Fern, C., & Sojakka, S. (n.d.). Pattern recognition in a bucket. Retrieved from <http:// strategy for real-time computing on time series. Special Issue on foundations of
citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.97.3902>. information processing of TELEMATIK, 8 (1), 39–43. Retrieved from papers/lsm-
Gutig, R., & Sompolinsky, H. (2006). The tempotron: A neuron that learns spike telematik.pdf.
timing-based decisions. Nature Neuroscience, 9(3), 420–428. doi:10.1038/ Natschläger, T., Markram, H., & Maass, W. (2002). Computer models and analysis
nn1643. tools for neural microcircuits. In R. Kötter (Ed.), A practical guide to
Hazan, H., & Manevitz, L. M. (2010). The liquid state machine is not robust to neuroscience databases and associated tools. Boston: Kluwer Academic
problems in its components but topological constraints can restore robustness. Publishers. Retrieved from papers/lsm-koetter-chapter-144.pdf.
In IJCCI (ICFC-ICNC) (pp. 258–264). Pitts, W., & McCulloch, W. S. (1943). A logical calculus of the ideas immanent in
Izhikevich, E. M. (2003). Simple model of spiking neurons. IEEE Transactions on nervous activity. Bulletin of Mathematical Biology, 52(1–2), 99–115. discussion
Neural Networks, 14(6), 1569–1572. doi:10.1109/TNN.2003.820440. 73-97.
Jaeger, H. (2001a). The ‘‘echo state’’ approach to analysing and training recurrent Varshney, L. R., Chen, B. L., Paniagua, E., Hall, D. H., & Chklovskii, D. B. (2011).
neural networks (No. GMD Report 148). German National Research Center for Structural Properties of the Caenorhabditis elegans Neuronal Network. PLoS
Information Technology. Retrieved from <http://www.faculty.iu-bremen.de/ Computational Biology, 7(2), e1001066. doi:10.1371/journal.pcbi.1001066.
hjaeger/pubs/EchoStatesTechRep.pdf>. Widrow, B., & Hoff, M. (1960). Adaptive switching circuits. 1960 {IRE} {WESCON}
Jaeger, H. (2001b). Short term memory in echo state networks (No. GMD Report 152). Convention Record, Part 4 (pp. 96–104). {IRE}. Retrieved from <http://isl-
German National Research Center for Information Technology. Retrieved from www.stanford.edu/~widrow/papers/c1960adaptiveswitching.pdf>.
<http://www.faculty.iu-bremen.de/hjaeger/pubs/STMEchoStatesTechRep.pdf>.

You might also like