STDP-based Neural Networks
STDP-based Neural Networks
Tehran, Iran
2
CerCo UMR 5549, CNRS – Université Toulouse 3, France
1
poral cortex (IT), where neural activity provides ples are needed to avoid over-fitting. However, pri-
a robust, invariant, and linearly-separable object mates, especially humans, can learn from far fewer
representation [10, 9]. Despite the extensive feed- examples while most of the time no label is avail-
back connections in the visual cortex, the first feed- able. They may be able to do so thanks to spike-
forward wave of spikes in IT (∼ 100 − 150 ms post- timing-dependent plasticity (STDP), an unsuper-
stimulus presentation) appears to be sufficient for vised learning mechanism which occurs in mam-
crude object recognition [54, 19, 33]. malian visual cortex [38, 18, 37]. According to
During the last decades, various computational STDP, synapses through which a presynaptic spike
models have been proposed to mimic this hierar- arrived before (respectively after) a postsynaptic
chical feed-forward processing [15, 28, 48, 36, 31]. one are reinforced (respectively depressed).
Despite the limited successes of the early mod- To date, various spiking neural networks (SNN)
els [42, 16], recent advances in deep convolutional have been proposed to solve object recognition
neural networks (DCNN) led to high perform- tasks. A group of these networks are actually the
ing models [27, 58, 51]. Beyond the high preci- converted versions of traditional DCNNs [6, 20, 13].
sion, DCNNs can tolerate object variations as hu- The main idea is to replace each DCNN comput-
mans do [24, 25], use IT-like object representa- ing unit with a spiking neuron whose firing rate is
tions [5, 22], and match the spatio-temporal dy- correlated with the output of that unit. The aim
namics of the ventral visual pathway [7]. of these networks is to reduce the energy consump-
Although the architecture of DCNNs is somehow tion in DCNNs. However, the inevitable drawbacks
inspired by the primate’s visual system [29] (a hi- of such spike-rate coding are the need for many
erarchy of computational layers with gradually in- spikes per image and the long processing time. Be-
creasing receptive fields), they totally neglect the sides, the use of back-propagation learning algo-
actual neural processing and learning mechanisms rithm and having both positive (excitatory) and
in the cortex. negative (inhibitory) output synapses in a neuron
The computing units of DCNNs send floating- are not biologically plausible. On the other hand,
point values to each other which correspond to their there are SNNs which are originally spiking net-
activation level, while, biological neurons commu- works and learn spike patterns. First group of these
nicate to each other by sending electrical impulses networks exploit learning methods such as auto-
(i.e., spikes). The amplitude and duration of all encoder [40, 4] and back-propagation[1] which are
spikes are almost the same, so they are fully charac- not biologically plausible. The second group con-
terized by their emission time. Interestingly, mean sists of SNNs with bioinspired learning rules which
spike rates are very low in the primate visual sys- have shallow architectures [3, 17, 44, 59, 11] or only
tems (perhaps only a few of hertz [50]). Hence, neu- one trainable layer [36, 2, 23].
rons appear to fire a spike only when they have to In this paper we proposed a STDP-based spik-
send an important message, and some information ing deep neural network (SDNN) with a spike-
can be encoded in their spike times. Such spike- time neural coding. The network is comprised of
time coding leads to a fast and extremely energy- a temporal-coding layer followed by a cascade of
efficient neural computation in the brain (the whole consecutive convolutional (feature extractor) and
human brain consumes only about 10-20 Watts of pooling layers. The first layer converts the input
energy [34]). image into an asynchronous spike train, where the
The current top-performing DCNNs are trained visual information is encoded in the temporal or-
with the supervised back-propagation algorithm der of the spikes. Neurons in convolutional layers
which has no biological root. Although it works integrate input spikes, and emit a spike right after
well in terms of accuracy, the convergence is rather reaching their threshold. These layers are equipped
slow because of the credit assignment problem [45]. with STDP to learn visual features. Pooling layers
Furthermore, given that DCNNs typically have mil- provide translation invariance and also compact the
lions of free parameters, millions of labeled exam- visual information [48]. Through the network, vi-
2
sual features get larger and more complex, where vious layer which detects simpler visual features.
neurons in the last convolutional layer learn and Convolutional neurons emit a spike as soon as they
detect object prototypes. At the end, a classifier detect their preferred visual feature which depends
detects the category of the input image based on on their input synaptic weights. Through the learn-
the activity of neurons in the last pooling layer with ing, neurons that fire earlier perform the STDP and
global receptive fields. prevent the others from firing via a winner-take-
We evaluated the proposed SDNN on Caltech all mechanism. In this way, more salient and fre-
face/motorbike and ETH-80 datasets with large- quent features tend to be learned by the network.
scale images of various objects taken form differ- Pooling layers provide translation invariance using
ent viewpoints. The proposed SDNN reached the maximum operation, and also help the network to
accuracies of 99.1% on face/motorbike task and compress the flow of visual data. Neurons in pool-
82.8% on ETH-80, which indicates its capability ing layers propagate the first spike received from
to recognize several natural objects even under se- neighboring neurons in the previous layer which are
vere variations. Based on our knowledge, there is selective to the same feature. Convolutional and
no other spiking deep network which can recog- pooling layers are arranged in a consecutive order.
nize large-scale natural objects. We also examined Receptive fields gradually increase through the net-
the proposed SDNN on the MNIST dataset which work and neurons in higher layers become selective
is a benchmark for spiking neural networks, and to complex objects or object parts.
interestingly, it reached 98.4% recognition accu- It should be noted that the internal potentials of
racy. In addition to the high performance, the pro- all neurons are reset to zero before processing the
posed SDNN is highly energy-efficient and works next image. Also, learning only happens in convo-
with a few number of spikes per image, which lutional layers and it is done layer by layer. Since
makes it suitable for neuromorphic hardware imple- the calculations of each neuron is independent of
mentation. Although current state-of-the-art DC- other adjacent neurons, to speed-up the computa-
NNs achieved stunning results on various recogni- tions, each of the convolution, pooling, and STDP
tion tasks, continued work on brain-inspired models operations are performed in parallel on GPU.
could end up in strong intelligent systems in future,
which can even help us to improve our understand-
ing of the brain itself.
DoG and temporal coding
The important role of the first stage in SNNs is to
encode the input signal into discrete spike events
Proposed Spiking Deep Neural in the temporal domain. This temporal coding
Network determines the content and the amount of infor-
mation carried by each spike, which deeply affects
A sample architecture of the proposed SDNN with the neural computations in the network. Hence,
three convolutional and three pooling layers is using efficient coding scheme in SNNs can lead to
shown in Fig. 1. Note that the architectural prop- fast and accurate responses. Various temporal cod-
erties (e.g., the number of layers and receptive field ing schemes can be used in visual processing (see
sizes) and learning parameters should be optimized ref. [53]). Among them, rank-order coding is shown
for the desired recognition task. to be efficient for rapid processing (even possibly in
The first layer of the network uses Difference retinal ganglion cells) [55, 43].
of Gaussians (DoG) filters to detect contrasts in Cells in the first layer of the network apply a DoG
the input image. It encodes the strength of these filter over their receptive fields to detect positive or
contrasts in the latencies of its output spikes (the negative contrasts in the input image. DoG well
higher the contrast, the shorter the latency). Neu- approximates the center-surround properties of the
rons in convolutional layers detect more complex ganglion cells of the retina. When presented with
features by integrating input spikes from the pre- an image, these DoG cells detect the contrasts and
3
Conv-window Conv-window
Input Image DoG
(real-value) (spiking)
(Temporal Coding) Conv 1
Spiking synapse
Pooling window
Pool 1 Conv 2 Real-value synapse
Pool 2 Conv 3
Global pooling
𝑝1
𝑤1𝑐1 𝑤1
𝑤1𝐷 𝑤1𝑐2 𝑤1
𝑝2
𝑤1𝑐3
Classifier
𝑤2𝐷 𝑤2𝑐1 𝑝1
𝑤2
𝑤2𝑐2
𝑝2
𝑤2 𝑤2𝑐3 ?
𝑛2 𝑛3 𝑛3
𝑛1 𝑛2
2 𝑛1
Figure 1: A sample architecture of the proposed SDNN with three convolutional and three pooling layers. The first layer applies ON- and
OFF-center DoG filters of size w1D × w2D on the input image and encode the image contrasts in the timing of the output spikes. The ith
convolutional layer, Conv i, learns combinations of features extracted in the previous layer. The ith pooling layer, Pool i, provides translation
invariance for features extracted in the previous layer and compress the visual information using a local maximum operation. Finally the
classifier detects the object category based on the feature values computed by the global pooling layer. The window size of the ith convolutional
and pooling layers are indicated by w1,2ci and w pi , respectively. The number of the neuronal maps of the ith convolutional and pooling layer
1,2
are also indicated by ni below each layer.
4
of the ith neuron is updated as follows: Another important role of pooling layers is to
X compress the visual information. Regarding to
Vi (t) = Vi (t − 1) + Wj,i Sj (t − 1), (1) the maximum operation performed in pooling lay-
j
ers, adjacent neurons with overlapped inputs would
where Vi (t) is the internal potential of the ith con- carry redundant information (each spike is sent to
volutional neuron at time step t, Wj,i is the synaptic many neighboring pooling neurons). Hence, in the
weight between the jth presynaptic neuron and the proposed network, the overlap between the input
ith convolutional neuron, and Sj is the spike train windows of two adjacent pooling neurons (belong-
of the jth presynaptic neuron (Sj (t − 1) = 1 if the ing to the same map) is set to be very small. It
neuron has fired at time t − 1, and Sj (t − 1) = 0 helps to compress the visual information by elim-
otherwise). If Vi exceeds its threshold, Vthr , then inating the redundancies, and also, to reduce the
the neuron emits a spike and Vi is reset: size of subsequent layers.
5
tains all synapses in an excitatory mode in adding higher similarity between its learned feature and
to implementing soft-bound effect. input pattern.
Note that choosing large values for the learning Synaptic weights of convolutional neurons initi-
parameters (i.e., a+ and a− ) will decrease the learn- ate with random values drown from a normal dis-
ing memory, therefore, neurons would learn the last tribution with the mean of µ = 0.8 and STD of
presented images and unlearn previously seen im- σ = 0.05. Note that by choosing a small µ, neurons
ages. Also, choosing tiny values would slow down would not reach their threshold to fire and will not
the learning process. At the beginning of the learn- learn anything. Also, by choosing a large σ, some
ing, when synaptic weights are random, neurons initial synaptic weights will be smaller (larger) than
are not yet selective to any specific pattern and others and have less (more) contribution to the neu-
respond to many different patterns, therefore, the ron activity, and regarding the STDP rule, they
probability for a synapse to get depressed is higher have a higher tendency to converge to zero (one).
than being potentiated. Hence, by setting a− to be In other words, dependency on the initial weights
greater than a+ , synaptic weights gradually decay will be higher for a large σ.
insofar as neurons can not reach their threshold to As the learning of a specific layer progresses, its
fire anymore. Therefore, a+ is better to be greater neurons gradually converge to different visual fea-
than a− , however, by setting a+ to be much greater tures which are frequent in the input images. As
than a− , neurons will tend to learn more than one mentioned before, learning in the subsequent con-
pattern and respond to all of them. All in all, it is volutional layer statrs whenever the learning in the
better to choose a+ and a− not too big and not too current convolutional layer is finalized. Here we
small, and it is better to set a+ a bit greater than measure the learning convergence of the lth convo-
a− . lutional layer as
During the learning of a convolutional layer, neu- XX
rons in the same map, detecting the same feature in Cl = wf,i (1 − wf,i )/nw (4)
different locations, integrate input spikes and com- f i
pete with each other to do the STDP. The first
neuron which reaches the threshold and fires, if any, where, wf,i is the ith synaptic weight of the f th fea-
is the winner (global intra-map competition). The ture and nw is the total number of synaptic weights
winner triggers the STDP and updates its synaptic (independent of the features) in that layer. Cl tends
weights. As mentioned before, neurons in differ- to zero if each of the synaptic weights converge to-
ent locations of the same map have the same input wards zero or one. Therefore, we stop the learning
synaptic weights (i.e., weight sharing) to be selec- of the lth convolutional layer, whenever Cl was suf-
tive to the same feature. Hence, the winner neuron ficiently close to zero (i.e. Cl < 0.01).
prevents other neurons in its own map to do STDP
and duplicates its updated synaptic weights into
Global pooling and classification
them. Also, there is a local inter-map competi-
tion for STDP. When a neuron is allowed to do the The global pooling layer is only used in the clas-
STDP, it prevents the neurons in other maps within sification phase. Neurons of the last layer perform
a small neighborhood around its location from do- a global max pooling over their corresponding neu-
ing STDP. This competition is crucial to encourage ronal maps in the last convolutional layer. Such a
neurons of different maps to learn different features. pooling operation provides a global translation in-
Because of the discretized time variable in the variance for prototypical features extracted in the
proposed model, it is probable that some competi- last convolutional layer. Hence, there is only one
tor neurons fire at the same time step. One pos- output value for each feature, which indicates the
sible scenario is to pick one randomly and allow presence of that feature in the input image. The
it to do STDP. But a better alternative is to pick output of the global pooling layer over the train-
the one which has the highest potential indicating ing images is used to train a linear SVM classifier.
6
In the testing phase, the test object image is pro- the first and second pooling layers are 7×7 and 2×2
cessed by the network and the output of the global with the strides of 6 and 2, correspondingly. The
pooling layer is fed to the classifier to determine its third pooling layer performs a global max pooling
category. operation. The learning rates of all convolutional
To compute the output of the global pooling layers are set to a+ = 0.004 and a− = 0.003. In
layer, first, the threshold of neurons in the last con- addition, each image is processed for 30 time steps.
volutional layer were set to be infinite, and then, Fig. 2 shows the preferred visual features of some
their final potentials (after propagating the whole neuronal maps in the first, second and third con-
spike train generated by the input image) were volutional layers through the learning process. To
measured. These final potentials can be seen as visualize the visual feature learned by a neuron, a
the number of early spikes in common between the backward reconstruction technique is used. Indeed,
current input and the stored prototypes in the last the visual features in the current layer can be recon-
convolutional layer. Finally, the global pooling neu- structed as the weighted combinations of the visual
rons compute the maximum potential at their cor- features in the previous layer. This backward pro-
responding neuronal maps, as their output value. cess continues until the first layer, whose preferred
visual features are computed by DoG functions. As
shown in Fig. 2A, interestingly, each of the four
Results neuronal maps of the first convolutional layer con-
verges to one of the four orientations: π/4, π/2,
Caltech face/motorbike dataset 3π/4, and π. This shows how efficiently the associ-
ation of the proposed temporal coding in DoG cells
We evaluated our SDNN on the face and motorbike and unsupervised learning method (the STDP and
categories of the Caltech 101 dataset available at learning competition) led to highly diverse edge de-
http://www.vision.caltech.edu (see Fig. 4 for sam- tectors which can represent the input image with
ple pictures). The training set contains 200 ran- edges in different orientations. These edge detec-
domly selected images per category, and remain- tors are similar to the simple cells in primary visual
ing images constitute the test set. The test images cortex (i.e., V1 area) [8].
are not seen during the learning phase but used af- Fig. 2B shows the learning progress for the neu-
terward to evaluate the performance on novel im- ronal maps of the second convolutional layer. As
ages. This standard cross-validation procedure al- mentioned, the first convolutional layer detects
lows measuring the system’s ability to generalize, edges with different orientations all over the im-
as opposed to learning the specific training exam- age, and due to the used temporal coding, neu-
ples. All images were converted to grayscale values rons corresponding to edges with higher contrasts
and rescaled to be 160 pixels in height (preserv- (i.e., salient edges) will fire earlier. On the other
ing the aspect ratio). In all the experiments, we hand, STDP naturally tends to learn those combi-
used linear SVM classifiers with penalty parameter nation of edges that are consistently repeating in
C = 1.0 (optimized by a grid search in the range of the training images (i.e., common features between
(0, 10]). the target objects). Besides, the learning competi-
Here, we used a network similar to Fig. 1, with tion tends to prevent the neuronal maps from learn-
three convolutional layers each of which followed by ing similar visual features. Consequently, neurons
a pooling layer. For the first layer, only ON-center in the second convolutional layer learn the most
DoG filters of size 7 × 7 and standard deviations salient, common, and diverse visual features of the
of 1 and 2 pixels are used. The first, second and target objects, and do not learn the backgrounds
third convolutional layers consists of 4, 20, and 10 that drastically change between images. As seen in
neuronal maps with conv-window sizes of 5 × 5, Fig. 2B, each of the maps gradually learns a differ-
16×16×4, and 5×5×20 and firing thresholds of 10, ent visual feature (combination of oriented edges)
60, and 2, respectively. The pooling window sizes of representing a face or motorbike feature.
7
Figure 2: The synaptic changes of some neuronal maps in different layers through the learning with the Caltech face/motorbike dataset. A)
The first convolutional layer becomes selective to oriented edges. B) The second convolutional layer converges to object parts. C) The third
convolutional layer learns the object prototype and respond to whole objects.
The learning progress for two neuronal maps of features detected in the previous layer. Neurons in
the third convolutional layer are shown in Fig. 2C. the second layer compete with each other and send
As seen, one of them gradually becomes selective to spikes toward the third layer as they detect their
a complete motorbike prototype as a combination preferred visual features. Since, different combina-
of motorbike features such as back wheel, middle tions of these features are detected for each object
body, handle, and front wheel detected in the sec- category, neuronal maps of the third layer will learn
ond layer. Also, the other map learns a whole face different prototypes of different categories. There-
prototype as a combination of facial features. In- fore, the STDP and the learning competition mech-
deed, the third convolutional layer learns the whole anism direct neuronal maps of the third convolu-
object prototypes using intermediate complexity tional layer to learn highly category specific proto-
8
Figure 3: Each curve shows the variation of the convergence index through the learning of a convolutonal layer. The weight histogram of the
first convolutional layer at some critical points during the learning are shown next to its convergence curve.
9
Figure 4: The spiking activity of the convolutional layers with the face and motorbike images. The preferred features of neuronal maps in each
convolutional layer are shown on the right. Each feature is coded by a specific color border. The spiking activity of the convolutional layers,
accumulated over all the time steps, is shown in the corresponding panels. Each point in a panel indicates that a neuron in that location has
fired at a time step, and the color of the point indicates the preferred feature of the activated neuron.
borders are demonstrated on top, and their corre- volutional layer which are selective to object pro-
sponding spiking activity are shown in panels below totypes. As seen, when a face (motorbike) image
them. Each colored point inside a panel indicates is presented, neurons in the face (motorbike) maps
the neuronal map of the neuron which has fired in fire. To better illustrate the learning progress of
that location at a time step. As seen, neurons of all the layers as well as their spiking activity in the
the DoG layer detect image contrasts, and edge de- temporal domain, we prepared a short video (see
tectors in the first convolutional layer detect the Video 1).
orientation of edges. Neurons in the second convo-
lutional layer, which are selective to intermediate As mentioned in the previous section, the out-
complexity features, detect their preferred visual put of the global pooling layer is used by a linear
feature by combining input spikes from edge detec- SVM classifier to specify the object category of the
tor cells in the first layer. Finally, the coincidence input images. We trained the proposed SDNN on
of these features activates neurons in the third con- training images and evaluated it over the test im-
ages, where the model reached the categorization
10
ond and third convolutional layers, the number of
active synapses in the random features was dou-
bled. Note that with fewer active synapses, neu-
rons in the second and third convolutional layers
could not reach their threshold, and with more ac-
tive synapses, many of the neurons tend to fire to-
gether. Anyways, we evaluated the network using
these random features. Table 1 presents the SDNN
accuracy when we had random features in the third,
or second and third, or in all the three convolutional
layers. As seen, by replacing the learned features of
the lower layers with random ones, the accuracy de-
creases more. Especially, using random features in
Figure 5: Recognition accuracies (mean ± std) of the proposed SDNN the second layer, the accuracy severely drops. This
for different number of training images per category used in the STDP- shows the critical role of intermediate complexity
based feature learning and/or training the classifier. The red curve
presents the model’s accuracy when the different number of images are features for object categorization.
used to train the network (by unsupervised STDP) and the classifier
as well. The blue curve shows the model’s accuracy when the layers of
We also evaluated how the proposed SDNN is ro-
the network are trained using STDP with 400 images (200 from each bust to noise. To this end, we added a white noise
category), and the classifier is trained with different number of labeled
images per category.
to the neurons’ threshold during both training and
testing phases. Indeed, for each image, we added to
the threshold of each neuron, Vthr , a random value
accuracy of 99.1 ± 0.2%. It shows how the ob- drawn from a uniform distribution in range ±α%
ject prototypes, learned in the highest layer, can of Vthr . We evaluated the proposed DCNN for dif-
well represent the object categories. Furthermore, ferent amount of noise (from α = 5% to 50%) and
we also calculated the single neuron accuracy. In the results are provided in Table 2. Up to the 20%
more details, we separately computed the recogni- of noise, the accuracy is still reasonable, but by in-
tion accuracy of each neuron in the global pooling creasing the noise level, the accuracy dramatically
layer. Surprisingly, some single neurons reached drops and reaches to the chance level in case of 50%
an accuracy of 93%, and the mean accuracy was of noise. In other words, the network is more or less
89.8%. Hence, it can be said that single neurons able to tolerate the instability caused by the noise
in the highest layer are highly class specific, and below 20%, but when it goes further, the neurons’
different neurons carry complementary information behavior drastically change during the learning and
which altogether provide robust object representa- STDP can not extract informatic features from the
tions. input images.
To better demonstrate the role of the STDP In another experiment, we changed the number
learning rule in the proposed SDNN, we used ran- of training samples and calculated the recognition
dom features in different convolutional layers and accuracy of the proposed SDNN. For instance, we
assessed the final accuracy. To this end, we first trained the network and the classifier with 5 sam-
trained all the three convolutional layers of the net- ples from each category, and then evaluated the
work (using STDP) until they all converged (synap- final system with the test samples. As shown by
tic weights had a bimodal distribution with 0 and 1 the red curve in Fig. 5, with 5 images per category
as centers). Then, for a certain convolutional layer, the model reached the accuracy of 78.2%, and only
we counted the number of active synapses (i.e., 40 images from each category are sufficient to reach
close to one) of each of its feature maps. Corre- 95.1% recognition accuracy. Although having more
sponding to each learned feature in the first convo- training samples leads to higher accuracies, the pro-
lutional layer, we generated a random feature with posed SDNN can extract diagnostic features and
the same number of active synapses. For the sec- reach reasonable accuracies even using a few tens
11
Table 1: Recognition accuracies of the proposed SDNN with random features in different convolutional layers.
Conv layers with random features Non 3rd 2nd & 3rd 1st & 2nd & 3rd
Accuracy (%) 99.1 80.2 67.8 66.3
Table 2: Recognition accuracies of the proposed SDNN for differen amounts of noise.
Figure 6: Some sample images of different object categories of ETH-80 in different viewpoints. For each image, the preferred feature of an
activated neuron in the third convolutional layer is shown in below.
of training images. Due to the unsupervised na- need for labeled examples.
ture of STDP, the proposed SDNN does not suffer
much from the overfitting challenge caused by small
training set size in supervised learning algorithms ETH-80 dataset
such as back-propagation. In a real world, the num-
The ETH-80 dataset contains eight different ob-
ber of labeled samples is very low. Therefore, learn-
ject categories: apple, car, toy cow, cup, toy dog,
ing in humans or other primates is mainly unsu-
toy horse, pear, and tomato (10 instances per cat-
pervised. Here, in another experiment, we trained
egory). Each object is photographed from 41 view-
the network using STDP over 200 images per cat-
points with different view angles and different tilts.
egory, then we used different portions of these im-
Some examples of objects in this dataset are shown
ages as labeled samples (from 1 image to 200 im-
in Fig. 6. ETH-80 is a good benchmark to show
ages per category) to train the classifier. It lets us
how the proposed SDNN can handle multi-object
see whether the visual feature that the network has
categorization tasks with high inter-instance vari-
learned from the unlabeled images (using unsuper-
ability, and how it can tolerate huge view-point
vised STDP) is sufficient for solving the categoriza-
variations. In all the experiments, we used linear
tion task with only a few labeled training samples.
SVM classifiers with penalty parameter C = 1.2
As shown by the blue curve in Fig. 5, with only
(optimized by a grid search in the range of (0, 10]).
one sample per category the model could reach the
average accuracy of 93.8%. It suggests that the un- Five randomly chosen instances of each object
supervised STDP can provide a rich feature space category are selected for the training set used in the
well explaining the object space and reducing the learning phase. The remaining instances constitute
the testing set, and are not seen during the learn-
12
Table 3: Recognition accuracies of the proposed SDNN and some other
methods over the ETH-80 dataset. Note that all the models are trained
visual features and used linear SVM for classifi-
on 5 object instances of each category and tested on other 5 instances. cation. AlexNet is the first DCNN that signifi-
cantly improved recognition accuracy on Imagenet
Method Accuracy (%)
Dataset. Here, we compared our proposed SDNN
HMAX [23] 69.0
with both Imagenet pre-trained and ETH-80 fine-
Convolutional SNN [23] 81.1 tuned versions of AlexNet. To obtain the accuracy
Pre-trained AlexNet 79.5 of the pre-trained AlexNet, images were shown to
Fine-tuned AlexNet 94.2 the model and feature vectors of the last layer were
Supervised DCNN 81.9 used to train and test a linear SVM classifier. Also,
Unsupervised DCA 80.7 to fine-tune the Imagenet pre-trained AlexNet on
Proposed SDNN 82.8 ETH-80, its decision layer was replaced by an eight-
neuron decision layer and trained by the stochastic
gradient descent learning algorithm.
ing phase. All the object images were converted to Regarding the fact that the pre-trained AlexNet
grayscale values. To evaluate the proposed SDNN has not seen ETH-80 images and fine-tuned
on ETH-80, we used a network architecturally sim- AlexNet has already trained by millions of images
ilar to the one used for Caltech face/motorbike from Imagenet dataset, the comparison may not
dataset. The other parameters are also similar, ex- be fair enough. Therefore, we compared the pro-
cept for the number of neuronal maps in the second posed SDNN to a supervised DCNN with the same
and third convolutional and pooling layers. Here structure but having two extra dense and decision
we used 400 neuronal maps in each of these layers. layers on top with 70 and 8 neurons, respectively.
Similar to the caltech dataset, neuronal maps of The DCNN was trained on the gray-scaled images
the first convolutional layer converged to the four of ETH-80 using the backpropagation algorithm.
oriented edges. Neurons in the second and third The ReLU and soft-max activation functions were
convolutional layers also became selective to in- employed for the intermediate and decision layers,
termediate features and object prototypes, respec- respectively. We used a cross-entropy loss func-
tively. Fig. 6 shows sample images from the ETH- tion with L2 kernel regularization and L1 activity
80 dataset and the preferred features of some neu- regularization terms. We also performed a 50%
ronal maps in the third convolutional layer which dropout regularization on the dense layer. The
are activated for those images. As seen, neurons in hyper-parameters such as learning rates, momen-
the highest layer respond to different views of dif- tums, and regularization factors were optimized us-
ferent objects, and altogether, provide an invariant ing a grid search. Also, we used an early stopping
object representation. Thus, the network learns 2D startegy to prevent the network from overfitting.
and view-dependent prototypes of each object cat- Eventually, the supervised DCNN reached the av-
egory to achieve 3D representations. erage accuracy of 81.9% (see Table 3). Note that we
As mentioned before, we evaluated the proposed tried to evaluate the DCNN with a dense layer of
SDNN over the test instances of each object cate- more than 70 neurons, but all the time the network
gory which are not shown to the network during the got quickly overfitted on the training data with no
training. The recognition accuracy of the proposed accuracy improvement over the test set. It seems
SDNN along with some other models on the ETH- that the supervised DCNN suffers from the lack of
80 dataset are presented in Table 3. HMAX [49] sufficient training data.
is one of the classic computational models of the In addition, we compared the proposed SDNN to
object recognition process in visual cortex. It has a deep convolutional autoencoder (DCA) which is
5000 features and uses a linear SVM as the clas- one of the best unsupervised learning algorithms in
sifier (see [23] for more details). Convolutional machine learning. We developed a DCA with an
SNN [23] is an extension of the Masquelier et al. encoder network having the same architecture as
2007 [36] which had one trainable layer with 1200 our SDNN followed by a decoder network with re-
13
versed architecture. The ReLU activation function
was used in the convolutional layers of both encoder
and decoder networks. We used the cross-entropy
loss function and stochastic gradient descent learn-
ing algorithm to train the DCA. The learning pa-
rameters (i.e., learning rate and momentum) were
optimized through a grid search. When the learn-
ing had converged, we eliminated the decoder part
and used the encoder’s output representations to
train and test a linear SVM classifier. Note that
DCA was trained on gray-scaled images of ETH-80.
The DCA reached the average accuracy of 80.7%
(see Table 3) and the proposed SDNN could out-
perform the DCA network with the same structure. Figure 7: The confusion matrix of the proposed SDNN over the ETH-
We also evaluated the proposed SDNN that has 80 dataset.
two convolutional layers. Indeed, we removed the
third convolutional layer, applied the global pool-
ing over the second convolutional layer, and trained
a new SVM classifier over its output. The model’s
accuracy dropped by 5% and reached to 77.4%. Al-
though the features in the second layer represent Figure 8: Some misclassified samples of the ETH-80 dataset by the
proposed SDNN.
object parts with intermediate complexity, it is the
combination of these features in the last layer which
culminates the object representations and makes 0.02 spike per neuron per image. Note that the
the classification easier. In other words, the simi- number of inhibitory events is equal to the number
larity between the smaller parts of objects of dif- of spikes. These together points to the fact that the
feren categories but with similar shapes (e.g., apple proposed SDNN can recognize objects with high
and potato) is higher than the whole objects, hence, precision but low computational cost. This effi-
the visual features in the higher layers can provide ciency is caused by the association of the proposed
better object representations. temporal coding and STDP learning rule which led
In a subsequent analysis, we computed the confu- to a sparse but informative visual coding.
sion matrix, to see which categories are mostly con-
fused with each other. Fig. 7 illustrates the confu-
MNIST dataset
sion matrix of the proposed SDNN over the ETH-80
dataset. As seen, most of the errors are due to the MNIST [30] is a benchmark dataset for SNNs which
miscategorization of dogs, horses, and cows. We has been widely used [21, 59, 44, 39, 11, 12]. We
checked whether these errors belong to the some also evaluated our SDNN on the MNIST dataset
specific viewpoints or not. We found out that the which contains 60,000 training and 10,000 test
errors are uniformly distributed between different handwritten single-digit images. Each image is of
viewpoints. Some misclassified samples are shown size 28 × 28 pixels and contains one of the digits
in Fig. 8. Overall, it can be concluded that these 0–9. For the first layer, ON- and OFF-center DoG
categorization errors are due to the overall shape filters with standard deviations of 1 and 2 pixels are
similarity between these object categories. used. The first and second convolutional layers re-
The other important aspect of the proposed spectively consist of 30 and 100 neuronal maps with
SDNN is the computational efficiency of the net- 5 × 5 convolution-window and firing thresholds of
work. For each ETH-80 image, on average, about 15 and 10. The pooling-window of the first pooling
9100 spikes are emitted in all the layers, i.e., about layer was of size 2 × 2 with the stride of 2. The sec-
14
MNIST images in 30 time steps only. Notably,
whenever a neuron in our network fires it inhibits
other neurons at the same position, therefore, the
total number of inhibitory events per each MNIST
image is equal to the number of spikes(i.e., 600).
As stated in Section 2, the proposed SDNN uses
a temporal code which encodes the information of
the input image in the spike times, and each neu-
ron in all layers, is allowed to fire at most once.
This temporal code associated with unsupervised
STDP rule leads to a fast, accurate, and efficient
processing.
Figure 9: The Gabor-like features learned by the neuronal maps of the
first convolutional layer from the MNIST images. The red and green
colors receptively indicate the strength of input synapses from ON-
and OFF-center DoG cells.
Discussion
ond pooling layer performs a global max operation. Recent supervised DCNNs have reached high accu-
Note that the learning rates of all convolutional lay- racies on the most challenging object recognition
ers were set to a+ = 0.004 and a− = 0.003. In datasets such as Imagenet. Architecture of theses
all the experiments, we used linear SVM classifiers networks are largely inspired by the deep hierar-
with penalty parameter C = 2.4 (optimized by a chical processing in the visual cortex. For instance,
grid search in the range of (0, 10]). DCNNs use retinotopically arranged neurons with
Fig. 9 shows the preferred features of some neu- restricted receptive fields, and the receptive field
ronal maps in the first convolutional layer. The size and feature complexity of the neurons grad-
green and red colors correspond to ON- and OFF- ually increase through the layers. However, the
center DoG filters. Interestingly, this layer con- learning and neural processing mechanisms applied
verged to Gabor-like edge detectors with different in DCNNs are inconsistent with the visual cortex,
orientations, phase and polarity. These edge fea- where neurons communicate using spikes and learn
tures are combined in the next layer and provide the input spike patterns in a mainly unsupervised
easily separable digit representation. manner. Employing such mechanisms in DCNNs
Recognition performance of the proposed can improve their energy consumption and decrease
method and some recent SNNs on the MNIST their need for an expensive supervised learning with
dataset are provided in Table 4. As seen, the millions of labeled images.
proposed SDNN outperforms unsupervised SNNs A popular approach in previous researches is to
by reaching 98.4% recognition accuracy. Besides, convert pre-trained supervised DCNNs into equiva-
the accuracy of the proposed SDNN is close to the lent spiking network. To simulate the floating-point
99.1% accuracy of the totally supervised rate-based calculations in DCNNs, they have to use the firing
SDNN [12] which is indeed the converted version of rate as the neural code, which in result increases
a traditional DCNN trained by back-propagation. the number of required spikes and the processing
The important advantage of the proposed SDNN time. For instance, the converted version of a sim-
is the use of much fewer spikes. Our SDNN uses ple two-layer DCNN for the MNIST dataset with
only about 600 spikes for each MNIST images in images of 28×28 pixels requires thousands of spikes
total for all the layers, while the supervised rate- and hundreds of time steps per image. On the other
based SDNN uses thousands of spikes per layer [12]. hand, there are some SDNNs [40, 4, 1] which are
Also, because of using rate-based neural coding in originally spiking network but employ firing rate
such networks, they need to process images for hun- coding and biologically implausible learning rules
dreds of time steps, while our network process the such as autoencoders and back-propagation.
15
Table 4: Recognition accuracies of the proposed SDNN and some other SNNs over the MNIST dataset.
Here, we proposed a STDP-based SDNN with images. Also, it was better than the supervised
a spike-time coding. Each neuron was allowed to deep network which largely suffered from overfit-
fire at most once, where its spike-time indicates the ting and lack of sufficient training data. Although,
significance of its visual input. Therefore, neurons the state-of-the-art supervised DCNNs have stun-
that fire earlier are carrying more salient visual in- ning performance on large datasets like Imagenet,
formation, and hence, they were allowed to do the but contrary to the proposed SDNN, they fall in
STDP and learn the input patterns. As the learn- trouble with small and mideum size datasets.
ing progresses, each layer converged to a set of di- Our SDNN could be efficiently implemented in
verse but informative features, and the feature com- parallel hardware (e.g., FPGA [57]) using address
plexity gradually increases through the layers from event representation (AER) [52] protocol. With
simple edge features to object prototypes. The AER, spike events are represented by the addresses
proposed SDNN was evaluated on several image of sending and receiving neurons, and time is rep-
datasets and reached high recognition accuracies. resented by the asynchronous occurrence of spike
This shows how the proposed temporal coding and events. Since these hardware are much faster than
learning mechanism (STDP and learning competi- biological hardware, simulations could run several
tion) lead to discriminative object representations. order of magnitude faster than real time [47]. The
The proposed SDNN has several advantages to primate visual system extracts the rough content
its counterparts. First, our proposed SDNN is the of an image in about 100 ms [54, 19, 26, 33]. We
first spiking neural network with more than one thus speculate that some dedicated hardware will
learnable layer which can process large-scale nat- be able to do the same in the order of a millisecond
ural object images. Second, due to the use of an or less.
efficient temporal coding, which encodes the visual Also, the proposed SDNN can be modified to use
information in the time of the first spikes, it can spiking retinal models [56, 35] as the input layer.
process the input images with a low number of These models mimic the spatiotemporal filtering
spikes and in a few processing time steps. Third, of the retinal ganglion cells with center/surround
the proposed SDNN exploits the bio-inspired and receptive fields. Alternatively, we could use neu-
totally unsupervised STDP learning rule which can romorphic asynchronous event-based cameras such
learn the diagnostic object features and neglect the as dynamic vision sensor (DVS), which generate
irrelevant backgrounds. output events when they capture transients in the
We compared the proposed SDNN to several scene [32]. Finally, due to the DoG filtering in the
other networks including unsupervised methods input layer of the proposed SDNN, some visual in-
such as HMAX and convolutional autoencoder net- formation such as texture and color are lost. Hence,
work, and supervised methods such as DCNNs. future studies should focus on encoding these addi-
The proposed SDNN could outperform the unsu- tional pieces of information in the input layer.
pervised methodes which shows its advantages in Biological evidence indicate that in addition
extracting more informatic features from training to the unsupervised learning mechanisms (e.g.,
16
STDP), there are also dopamine-based reinforce- Framework Program (FP/2007-2013) / ERC Grant
ment learning strategies in the brain [41]. Besides, Agreement n.323711 (M4 project). The authors
although it is still unclear how supervised learn- thank the NVIDIA Academic Programs team for
ing is implemented in biological neural networks, donating a GPU hardware.
it seems that for some tasks (e.g., motor control
and sensory inputs prediction) the brain must con-
stantly learn temporal dynamics based on error References
feedback [14]. Employing such reinforcement and
supervised learning strategies could improve the [1] Yoshua Bengio, Dong-Hyun Lee, Jorg Born-
proposed SDNN in different aspects which are in- schein, and Zhouhan Lin. Towards biologically
evitable with unsupervised learning methods. Par- plausible deep learning. arXiv:1502.04156,
ticularly, they can help to reduce the number of 2015.
required features and to extract optimized task-
[2] Michael Beyeler, Nikil D Dutt, and Jef-
dependent features.
frey L Krichmar. Categorization and decision-
making in a neurobiologically plausible spiking
Supporting Information network using a stdp-like learning rule. Neural
Networks, 48:109–124, 2013.
Video 1 We prepared a video available at https:
//youtu.be/u32Xnz2hDkE showing the learning [3] Joseph M Brader, Walter Senn, and Stefano
progress and neural activity over the Caltech face Fusi. Learning real-world stimuli in a neural
and motorbike task. Here we presented the face network with spike-driven synaptic dynamics.
and motorbike training examples, propagated the Neural computation, 19(11):2881–2912, 2007.
corresponding spike waves, and applied the STDP [4] Kendra S Burbank. Mirrored stdp implements
rule. The input image is presented at the top-left autoencoder learning in a network of spik-
corner of the screen. The output spikes of the in- ing neurons. PLoS Computational Biology,
put layer (i.e., DoG layer) at each time step is pre- 11(12):e1004566, 2015.
sented in the top-middle panel, and the accumula-
tion of theses spikes is shown in the top-right panel. [5] Charles F Cadieu, Ha Hong, Daniel LK
For each of the subsequent convolutional layers, the Yamins, Nicolas Pinto, Diego Ardila, Ethan A
preferred features, the output spikes at each time Solomon, Najib J Majaj, and James J Di-
step, and the accumulation of the output spikes are Carlo. Deep neural networks rival the repre-
presented in the corresponding panels. Note that sentation of primate it cortex for core visual
4, 8, and 2 features from the first, second and third object recognition. PLoS Computational Biol-
convolutional layers are selected and shown, respec- ogy, 10(12):e1003963, 2014.
tively. As mentioned, the learning occurs layer by
layer, thus, the label of the layer which is currently [6] Yongqiang Cao, Yang Chen, and Deepak
doing the learning is specified by the red color. As Khosla. Spiking deep convolutional neural
seen, the first layer learns to detect edges, the sec- networks for energy-efficient object recogni-
ond layer learns intermediate features, and finally tion. International Journal of Computer Vi-
the third layer learns face and motorbike prototype sion, 113(1):54–66, 2015.
features.
[7] Radoslaw Martin Cichy, Aditya Khosla, Dim-
itrios Pantazis, Antonio Torralba, and Aude
Acknowledgments Oliva. Comparison of deep neural networks to
spatio-temporal cortical dynamics of human
This research received funding from the European visual object recognition reveals hierarchical
Research Council under the European Unions 7th correspondence. Scientific Reports, 6, 2016.
17
[8] Arnaud Delorme, Laurent Perrinet, and Si- [16] Masoud Ghodrati, Amirhossein Farzmahdi,
mon J Thorpe. Networks of integrate-and-fire Karim Rajaei, Reza Ebrahimpour, and Seyed-
neurons using rank order coding b: Spike tim- Mahdi Khaligh-Razavi. Feedforward object-
ing dependent plasticity and emergence of ori- vision models only tolerate small image varia-
entation selectivity. Neurocomputing, 38:539– tions compared to human. Frontiers in Com-
545, 2001. putational Neuroscience, 8(74):1–17, 2014.
[9] James J DiCarlo and David D Cox. Untan- [17] Stefan Habenschuss, Johannes Bill, and Bern-
gling invariant object recognition. Trends in hard Nessler. Homeostatic plasticity in
Cognitive Sciences, 11(8):333–341, 2007. bayesian spiking networks as expectation max-
imization with posterior constraints. In Ad-
[10] James J DiCarlo, Davide Zoccolan, and vances in Neural Information Processing Sys-
Nicole C Rust. How does the brain solve visual tems, pages 773–781, Lake Tahoe, Nevada,
object recognition? Neuron, 73(3):415–434, USA, December 2012.
2012.
[18] Shiyong Huang, Carlos Rozas, Mario Treviño,
Jessica Contreras, Sunggu Yang, Lihua Song,
[11] Peter U Diehl and Matthew Cook. Unsuper- Takashi Yoshioka, Hey-Kyoung Lee, and Al-
vised learning of digit recognition using spike- fredo Kirkwood. Associative hebbian synaptic
timing-dependent plasticity. Frontiers in com- plasticity in primate visual cortex. The Jour-
putational neuroscience, 9:99, 2015. nal of Neuroscience, 34(22):7575–7579, 2014.
[12] Peter U Diehl, Daniel Neil, Jonathan Binas, [19] Chou P Hung, Gabriel Kreiman, Tomaso Pog-
Matthew Cook, Shih-Chii Liu, and Michael gio, and James J DiCarlo. Fast readout of ob-
Pfeiffer. Fast-classifying, high-accuracy spik- ject identity from macaque inferior temporal
ing deep networks through weight and thresh- cortex. Science, 310(5749):863–866, 2005.
old balancing. In IEEE International Joint
Conference on Neural Networks (IJCNN), [20] Eric Hunsberger and Chris Eliasmith.
pages 1–8, Killarney, Ireland, July 2015. Spiking deep networks with lif neurons.
arXiv:1510.08829, 2015.
[13] Peter U Diehl, Guido Zarrella, Andrew Cas-
[21] Shaista Hussain, Shih-Chii Liu, and Arindam
sidy, Bruno U Pedroni, and Emre Neftci. Con-
Basu. Improved margin multi-class classifica-
version of artificial recurrent neural networks
tion using dendritic neurons with morpholog-
to spiking neural networks for low-power neu-
ical learning. In IEEE International Sympo-
romorphic hardware. In IEEE International
sium on Circuits and Systems (ISCAS), pages
Conference on Rebooting Computing, pages 1–
2640–2643, Melbourne, VIC, Australia, 2014.
8, San Diego, California, USA, October 2016.
[22] Seyed-Mahdi Khaligh-Razavi and Nikolaus
[14] Kenji Doya. Complementary roles of basal Kriegeskorte. Deep supervised, but not un-
ganglia and cerebellum in learning and mo- supervised, models may explain it cortical
tor control. Current Opinion in Neurobiology, representation. PLoS Computational Biology,
10(6):732–739, 2000. 10(11):e1003915, 2014.
[15] K Fukushima. Neocognitron : a self organiz- [23] Saeed Reza Kheradpisheh, Mohammad Gan-
ing neural network model for a mechanism of jtabesh, and Timothée Masquelier. Bio-
pattern recognition unaffected by shift in po- inspired unsupervised learning of visual fea-
sition. Biological Cybernetics, 36(4):193–202, tures leads to robust invariant object recogni-
1980. tion. Neurocomputing, 205:382–392, 2016.
18
[24] Saeed Reza Kheradpisheh, Masoud Gho- [33] Hesheng Liu, Yigal Agam, Joseph R Madsen,
drati, Mohammad Ganjtabesh, and Timothée and Gabriel Kreiman. Timing, timing, timing:
Masquelier. Deep networks resemble human fast decoding of object information from in-
feed-forward vision in invariant object recog- tracranial field potentials in human visual cor-
nition. Scientific Reports, 6:32672, 2016. tex. Neuron, 62(2):281–290, 2009.
[25] Saeed Reza Kheradpisheh, Masoud Gho- [34] Wolfgang Maass. Computing with spikes. Spe-
drati, Mohammad Ganjtabesh, and Timothée cial Issue on Foundations of Information Pro-
Masquelier. Humans and deep networks cessing of TELEMATIK, 8(1):32–36, 2002.
largely agree on which kinds of variation make
object recognition harder. Frontiers in Com- [35] Pablo Martı́nez-Cañada, Christian Morillas,
putational Neuroscience, 10:92, 2016. Begoña Pino, Eduardo Ros, and Francisco
Pelayo. A Computational Framework for Re-
[26] H Kirchner and S J Thorpe. Ultra-rapid ob-
alistic Retina Modeling. International Journal
ject detection with saccadic eye movements:
of Neural Systems, 26(7):1650030, 2016.
Visual processing speed revisited. Vision Re-
search, 46(11):1762–1776, 2006. [36] Timothée Masquelier and Simon J Thorpe.
[27] Alex Krizhevsky, I Sutskever, and GE Hin- Unsupervised learning of visual features
ton. Imagenet classification with deep convolu- through spike timing dependent plasticity.
tional neural networks. In Neural Information PLoS Computational Biology, 3(2):e31, 2007.
Processing Systems (NIPS), pages 1–9, Lake
Tahoe, Nevada, USA, 2012. [37] David BT McMahon and David A Leopold.
Stimulus timing-dependent plasticity in high-
[28] Y LeCun and Y Bengio. Convolutional net- level vision. Current Biology, 22(4):332–337,
works for images, speech, and time series. 2012.
In The Handbook of Brain Theory and Neu-
ral Networks, pages 255–258. Cambridge, MA: [38] C Daniel Meliza and Yang Dan. Receptive-
MIT Press, 1998. field modification in rat visual cortex induced
by paired visual stimulation and single-cell
[29] Yann LeCun, Yoshua Bengio, and Geof- spiking. Neuron, 49(2):183–189, 2006.
frey Hinton. Deep learning. Nature,
521(7553):436–444, 2015. [39] Peter O’Connor, Daniel Neil, Shih-Chii Liu,
Tobi Delbruck, and Michael Pfeiffer. Real-time
[30] Yann LeCun, Léon Bottou, Yoshua Bengio,
classification and sensor fusion with a spik-
and Patrick Haffner. Gradient-based learning
ing deep belief network. Frontiers in Neuro-
applied to document recognition. Proceedings
science, 7:178, 2013.
of the IEEE, 86(11):2278–2324, 1998.
[31] Honglak Lee, Roger Grosse, Rajesh Ran- [40] Priyadarshini Panda and Kaushik Roy. Unsu-
ganath, and Andrew Y. Ng. Convolutional pervised regenerative learning of hierarchical
deep belief networks for scalable unsupervised features in spiking deep networks for object
learning of hierarchical representations. pages recognition. In IEEE International Joint Con-
1–8, New York, New York, USA, 2009. ACM ference on Neural Networks (IJCNN), pages
Press. 1–8, Vancouver, Canada, July 2016.
[32] P Lichtsteiner, C Posch, and T Delbruck. [41] Marco Pignatelli and Antonello Bonci. Role
An 128x128 120dB 15us-latency temporal con- of dopamine neurons in reward and aversion:
trast vision sensor. IEEE J. Solid State Cir- a synaptic plasticity perspective. Neuron,
cuits, 43(2):566–576, 2007. 86(5):1145–1157, 2015.
19
[42] Nicolas Pinto, Youssef Barhomi, David D Cox, [50] Shy Shoham, Daniel H OConnor, and Ronen
and James J DiCarlo. Comparing state-of-the- Segev. How silent is the brain: is there a
art visual features on invariant object recogni- dark matter problem in neuroscience? Journal
tion tasks. In IEEE workshop on Applications of Comparative Physiology A, 192(8):777–784,
of Computer Vision (WACV), pages 463–470, 2006.
Kona, Hawaii, USA, 2011.
[51] Karen Simonyan and Andrew Zisserman. Very
[43] Geoffrey Portelli, John M Barrett, Ger- deep convolutional networks for large-scale im-
rit Hilgen, Timothée Masquelier, Alessandro age recognition. arXiv:1409.1556, 2014.
Maccione, Stefano Di Marco, Luca Berdon-
dini, Pierre Kornprobst, and Evelyne Ser- [52] M Sivilotti. Wiring considerations in ana-
nagor. Rank order coding: a retinal infor- log VLSI systems with application to field-
mation decoding strategy revealed by large- programmable networks. PhD thesis, Comput.
scale multielectrode array retinal recordings. Sci. Div., California Inst. Technol., Pasadena,
Eneuro, 3(3):ENEURO–0134, 2016. CA, 1991.
[44] Damien Querlioz, Olivier Bichler, Philippe [53] Simon Thorpe, Arnaud Delorme, and Rufin
Dollfus, and Christian Gamrat. Immunity to Van Rullen. Spike-based strategies for rapid
device variations in a spiking neural network processing. Neural Networks, 14(6):715–725,
with memristive nanodevices. IEEE Transac- 2001.
tions on Nanotechnology, 12(3):288–295, 2013.
[54] Simon Thorpe, Denis Fize, Catherine Marlot,
[45] Edmund T.. Rolls and Gustavo Deco. Com- et al. Speed of processing in the human visual
putational neuroscience of vision. Oxford uni- system. Nature, 381(6582):520–522, 1996.
versity press, Oxford, UK, 2002.
[55] Rufin Van Rullen and Simon J Thorpe. Rate
[46] Guillaume A Rousselet, Simon J Thorpe, and
coding versus temporal order coding: what
Michèle Fabre-Thorpe. Taking the max from
the retinal ganglion cells tell the visual cortex.
neuronal responses. Trends in Cognitive Sci-
Neural Computation, 13(6):1255–1283, 2001.
ences, 7(3):99–102, 2003.
[47] T Serrano-Gotarredona, T Masquelier, T Pro- [56] Adrien Wohrer and Pierre Kornprobst. Virtual
dromakis, G Indiveri, and B Linares-Barranco. Retina: a biological retina model and simula-
STDP and STDP variations with memristors tor, with contrast gain control. 26(2):219–49,
for spiking neuromorphic learning systems. apr 2009.
Frontiers in neuroscience, 7(February):2, jan
[57] A. Yousefzadeh, T. Serrano-Gotarredona, and
2013.
B. Linares-Barranco. Fast Pipeline 128??128
[48] T Serre, L Wolf, S Bileschi, M Riesenhu- pixel spiking convolution core for event-driven
ber, and T Poggio. Robust object recognition vision processing in FPGAs. In Proceedings of
with cortex-like mechanisms. IEEE Trans- 1st International Conference on Event-Based
actions on Pattern Analysis Machine Intelli- Control, Communication and Signal Process-
gence, 29(3):411–426, 2007. ing, EBCCSP 2015, 2015.
[49] Thomas Serre, Aude Oliva, and Tomaso Pog- [58] Matthew D Zeiler and Rob Fergus. Visualiz-
gio. A feedforward architecture accounts for ing and understanding convolutional networks.
rapid categorization. Proceedings of the Na- In European Conference on Computer Vision
tional Academy of Sciences, 104(15):6424– (ECCV), pages 818–833, Zurich, Switzerland,
6429, 2007. September 2014.
20
[59] Bo Zhao, Ruoxi Ding, Shoushun Chen, Bern-
abe Linares-Barranco, and Huajin Tang. Feed-
forward categorization on aer motion events
using cortex-like features in a spiking neural
network. IEEE Transactions on Neural Net-
works and Learning Systems, 26(9):1963–1978,
2015.
21