Logarithmic Multiplier in Hardware Implementation of Neural Networks
Logarithmic Multiplier in Hardware Implementation of Neural Networks
net/publication/226414458
CITATIONS READS
8 192
2 authors, including:
Uros Lotric
University of Ljubljana
41 PUBLICATIONS 336 CITATIONS
SEE PROFILE
All content following this page was uploaded by Patricio Bulić on 29 July 2018.
Abstract. Neural networks on chip have found some niche areas of ap-
plications, ranging from massive consumer products requiring small costs
to real-time systems requiring real time response. Speaking about latter,
iterative logarithmic multipliers show a great potential in increasing per-
formance of the hardware neural networks. By relatively reducing the size
of the multiplication circuit, the concurrency and consequently the speed
of the model can be greatly improved. The proposed hardware implemen-
tation of the multilayer perceptron with on chip learning ability confirms
the potential of the concept. The experiments performed on a Proben1
benchmark dataset show that the adaptive nature of the proposed neural
network model enables the compensation of the errors caused by inexact
calculations by simultaneously increasing its performance and reducing
power consumption.
1 Introduction
Artificial neural networks are commonly implemented as software models run-
ning in general purpose processors. Although widely used, these systems usually
operate on von-Neumann architecture which is sequential in nature and as such
can not exploit the inherent concurrency present in artificial neural networks.
On the other hand, hardware solutions, specially tailored to the architecture of
neural network models, can better exploit the massive parallelism, thus achiev-
ing much higher performances and smaller power consumption then the ordinary
systems of comparable size and cost. Therefore, the hardware implementations
of artificial neural network models have found its place in some niche applica-
tions like image processing, pattern recognition, speech synthesis and analysis,
adaptive sensors with teach-in ability and so on.
Neural chips are available in analogue and digital hardware designs [1,2]. The
analogue designs can take advantage of many interesting analogue electronics el-
ements which can directly perform the neural networks’ functionality resulting
in very compact solutions. Unfortunately, these solutions are susceptible to noise,
which limits their precision, and are extremely limited for on-chip learning. On the
other hand, digital solutions are noise tolerant and have no technological
obstacles for on-chip learning, but result in larger circuit size. Since the design of
A. Dobnikar, U. Lotrič, and B. Šter (Eds.): ICANNGA 2011, Part I, LNCS 6593, pp. 158–168, 2011.
c Springer-Verlag Berlin Heidelberg 2011
Logarithmic Multiplier in Hardware Implementation of Neural Networks 159
The iterative logarithmic multiplier (ILM) was proposed by Babic et al. in [5].
It simplifies the logarithm approximation introduced in [3] and introduces an
iterative algorithm with various possibilities for achieving an error as small as
required and the possibility of achieving an exact result.
160 U. Lotrič and P. Bulić
N = 2k + N (1) , (1)
can be calculated by applying only few shift and add operations, the term
(1) (1)
E (0) = N1 · N2 , E (0) > 0 , (4)
where C (1) is the approximate value of E (0) , and E (1) the corresponding absolute
error. The combination of Eq. 2 and Eq. 5 gives
(0)
Ptrue = Papprox + C (1) + E (1) = Papprox
(1)
+ E (1) . (6)
Table 1. Average and maximal relative errors for 16-bit iterative multiplier [5]
number of iterations i 0 1 2 3
(i)
average Er [%] 9.4 0.98 0.11 0.01
(i)
max Er [%] 25.0 6.25 1.56 0.39
The number of iterations required for an exact result is equal to the number of
bits with the value of 1 in the operand with the smaller number of bits with the
value of 1. Babic at al. [5] showed that in the worst case scenario the relative error
(i)
introduced by the proposed multiplier Er = E (i) /N1 N2 decays exponentially
−2(i+1)
with the rate 2 . Table 1 presents the average and maximal relative errors
with respect to the number of considered iterations.
The proposed method assumes non-negative numbers. To apply the method
on signed numbers, it is most appropriate to specify them in sign and magnitude
representation. In that case, the sign of the product is calculated as the EXOR
operation between sign bits of the both multiplicands.
N1 N2
LOD LOD
2 k1 2 k2
k1 + k2
STAGE 4 STAGE 3
STAGE 3 DECODER +
(0)
Papprox Register Register
STAGE 4
k2
Register 2 k1 + k2 ( N1-2 k1)2 + ( N2-2 k2)2
k1
(1) STAGE 4
C +
+ Register
(1) (0)
Papprox Papprox
a. b.
Fig. 1. Block diagrams of a. a pipelined iterative logarithmic multiplier with one error-
correction circuit, and b. its basic block
One of the most widely used neural networks is the multilayer perceptron, which
gained its popularity with the development of the back propagation learning
algorithm [6]. Despite its simple idea the learning phase still presents a hard nut
to crack when hardware implementations of the model are in question.
A multilayer perceptron is a feed-forward neural network consisting of a set
of source nodes forming the input layer, one or more hidden layers of computa-
tion nodes, and an output layer of computation nodes. A computation node or
a neuron n in a layer l first computes an activation potential vnl = i ωni l
xl−1
i ,
l l−1
a linear combination of weights ωni and outputs from the previous layer xi .
To get the neuron output, the activation potential is passed to an activation
function, xln = ϕ(vnl ), for example ϕ(v) = tanh(v). The objective of a learning
algorithm is to find such a set of weights and biases that minimizes the perfor-
mance function, usually defined as a squared error between calculated outputs
and target values. For the back-propagation learning rule, the weight update
l l l l l l
equation in its simplest form becomes ni = ηδn xi , with δn = ϕ (vn )(tn − xn ) in
ωl+1
l l l+1
the output layer and δn = ϕ (vn ) o δo wno otherwise, where η is a learning
parameter and tn the n-th element of a target output.
Logarithmic Multiplier in Hardware Implementation of Neural Networks 163
4 Experimental Work
To asses the performance of the iterative logarithmic multiplier, a set of experi-
ments was performed on multilayer perceptron neural networks with one hidden
layer. The models were compared in terms of classification or approximation
accuracy, speed of convergence, and power consumption. Three types of models
were evaluated: a) an ordinary software model (SM) using floating point arith-
metic, b) a hardware model with exact matrix multipliers (HMM ), and c) the
proposed hardware model using the iterative logarithmic multipliers with one
error correction circuit (HML ).
The models were evaluated on Proben1 collection of freely available bench-
marking problems for the neural network learning [9]. A rather heterogeneous
collection contains 15 data sets from 12 different domains, and all but one consist
of real world data. Among them 11 data sets are from the area of pattern clas-
sification and the remaining four from the area of function approximation. The
datasets, containing from few hundred to few thousand input-output samples,
are already divided into training, validation and test set, generally in proportion
50 : 25 : 25. The number of attributes in input samples ranges from 9 to 125 and
in output samples from 1 to 19. Before modelling, all input and output samples
were rescaled to the interval [−0.8, +0.8].
The testing of models on each of the data sets mentioned above was performed
in two steps. After finding the best software models, the modelling of hardware
models started, keeping the same number of neurons in the hidden layer.
During the software model optimization, the topology parameters as well as
the learning parameter η were varied. Since the number of inputs and outputs
is predefined with a data set, the only parameter influencing the topology of
the model is the number of neurons in the hidden layer. It was varied from
one to a maximum value, determined in such a way, that the number of model
weights did not exceed the number of training samples. The learning process
in the backpropagation scheme heavily depends on the learning parameter η.
Since the data sets are very heterogeneous, the values 2−2 , 2−4 , . . . , 2−12 were
used for the learning parameter η. Powers of two are very suitable for hardware
implementation because the multiplications can be replaced by shift operation.
While the software model uses 64-bit floating point arithmetic, both hardware
models use fixed point arithmetic with weights represented with 16, 18, 20, 22,
or 24 bits. For both hardware models the weights were limited to the interval
[−4, +4]. The processing values including inputs and outputs were represented
with 16 bits in the interval [−1, +1]. The values of activation function ϕ(v) =
tanh(1.4 v) and its derivatives for 256 equidistant values of v from the interval
[−2, 2] were stored in two separate lookup tables.
By applying the early stopping criterion, the learning phase was stopped as
soon as the classification or approximation error on the validation set started
to grow. The analysis on test set was performed with the model parameters
which gave the minimal value of the normalized squared error on validation
set. The normalized squared error is defined as a squared difference between
the calculated and the target outputs averaged over all samples and output
Logarithmic Multiplier in Hardware Implementation of Neural Networks 165
Fig. 3. Performance of the models with respect to the weight precision on Hearta1
data set
attributes, and divided with a squared difference between the maximal and the
minimal value of the output attributes. Results, presented in the following, are
only given for the test set samples which were not used during the learning phase.
In Fig. 3, a typical dependence of the normalized squared error on the weight
precision is presented. The normalized squared error exponentially decreases
with increasing precision of weights. However, the increasing precision of weights
also requires more and more hardware resources. Since there is a big drop in the
normalized squared error from 16 to 18 bit precision and since we can make use
of numerous prefabricated 18 × 18 - bit matrix multipliers in the new Xilinx
FPGA programmable circuits, our further analysis is confined to 18-bit weight
precision.
The model performance for some selected data sets from Proben1 collection
is given in Table 3. Average values and standard variations for all three types of
models over ten runs are given in terms of three measures: the number of epochs,
the normalized squared error Ete and the percentage of misclassified samples
pmiss
te . The latter is only given for the data sets from classification domain.
The results obtained for software models using the backpropagation algorithm
are similar to those reported in [9], where more advance learning techniques were
applied. The most noticeable difference between software and hardware models
is in the number of epochs needed to train a model. The number of epochs in
the case of the hardware models is for many data sets and order of magnitude
smaller than in the case of the software models. The reason probably lies in
the inability of hardware models to further optimize the weights due to their
representation in limited precision.
As a rule, the hardware models exhibit slightly poorer performance in case
of the normalized squared error and the percentage of misclassified samples. A
discrepancy is very large at gene1 and thyroid1 data sets, where more than 18
bits representation of weights is needed to close the gap.
The comparison of hardware models HMM and HML reveals that the replace-
ment of the exact matrix multipliers with the proposed approximate iterative
logarithmic multipliers does not have any notable effect on the performance of
the models. The reasons for the very good compensation of the errors caused by
166 U. Lotrič and P. Bulić
Table 3. Performance of software and hardware models on some data sets. For each
data set the results obtained with models SM, HMM , and HML are given in the first,
second, and third row, respectively.
Table 4. Estimation of FPGA device utilization for a neural network model with 32
inputs, 8 hidden neurons and 10 outputs using 16 × 18 - bit matrix multipliers
multipliers can lead to more than 10 % smaller device utilization and more than
30 % smaller power consumption.
5 Conclusion
Neural networks offer a high degree of internal parallelism, which can be effi-
ciently used in custom design chips. Neural network processing comprises of a
huge number of multiplications, i.e. arithmetic operations consuming a lot of
space, time and power. In this paper we have shown that exact matrix multipli-
ers can be replaced with approximate iterative logarithmic multipliers with one
error correction circuit.
Due to the highly adaptive nature of neural network models which compen-
sated the erroneous calculation, the replacement of the multipliers did not have
any notable impact on the models’ processing and learning accuracy. Even more,
the proposed logarithmic multipliers require less resources on a chip, which leads
to smaller designs on one hand and on the other hand to designs with more con-
current units on the same chip. A consumption of less resources per multiplier
also results in more power efficient circuits. The power consumption reduced for
roughly 20 % makes the hardware neural network models with iterative loga-
rithmic multipliers favourable candidates for battery powered applications.
Acknowledgments
This research was supported by Slovenian Research Agency under grants P2-
0241 and P2-0359, and by Slovenian Research Agency and Ministry of Civil
Affairs, Bosnia and Herzegovina, under grant BI-BA/10-11-026. [1]
References
1. Zhu, J., Sutton, P.: FPGA implementations of neural networks - a survey of a
decade of progress. In: Cheung, P.Y.K., Constantinides, G.A., de Sousa, J.T. (eds.)
FPL 2003. LNCS, vol. 2778, pp. 1062–1066. Springer, Heidelberg (2003)
2. Dias, F.M., Antunesa, A., Motab, A.M.: Artificial neural networks: a review
of commercial hardware. Engineering Applications of Artificial Intelligence 17,
945–952 (2004)
3. Mitchell, J.N.: Computer multiplication and division using binary logarithms. IRE
Transactions on Electronic Computers 11, 512–517 (1962)
4. Mahalingam, V., Rangantathan, N.: Improving Accuracy in Mitchell’s Logarithmic
Multiplication Using Operand Decomposition. IEEE Transactions on Computers 55,
1523–1535 (2006)
5. Babic, Z., Avramovic, A., Bulic, P.: An Iterative Logarithmic Multiplier.
Microprocessors and Microsystems 35(1), 23–33 (2011) ISSN 0141-9331,
doi:10.1016/j.micpro.2010.07.001
6. Haykin, S.: Neural networks: a comprehensive foundation, 2nd edn. Prentice-Hall,
New Jersey (1999)
168 U. Lotrič and P. Bulić