0% found this document useful (0 votes)
5 views

Design_of_single_neuron_on_FPGA

This paper discusses the design of a digital neuron architecture implemented on an FPGA, focusing on the conversion of electrochemical sensor signals into a processable format. Key design elements include data structure, the implementation of a Multiplier Accumulator (MAC), and the activation function using a hyperbolic tangent approximation. Performance evaluations indicate that the Carry Look Ahead design significantly reduces delay compared to traditional methods, enhancing the overall functionality of the neuron architecture.

Uploaded by

cbjr0096
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Design_of_single_neuron_on_FPGA

This paper discusses the design of a digital neuron architecture implemented on an FPGA, focusing on the conversion of electrochemical sensor signals into a processable format. Key design elements include data structure, the implementation of a Multiplier Accumulator (MAC), and the activation function using a hyperbolic tangent approximation. Performance evaluations indicate that the Carry Look Ahead design significantly reduces delay compared to traditional methods, enhancing the overall functionality of the neuron architecture.

Uploaded by

cbjr0096
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2012 IEEE Symposium on Humanities, Science and Engineering Research

Design of Single Neuron on FPGA

Khairudin Mohamad, Mohamad Faiz Omar Mahmud, Fadzilatul Husna Adnan, Wan Fazlida Hanim Abdullah
Fakulti Kejuruteraan Elektrik, Universiti Teknologi MARA, Selangor, Malaysia
[email protected]

Abstract— This paper presents a digital design of neuron module is the design of mathematical operation. This includes
architecture on field-programmable gate array (FPGA). The issues relating to data structure, design of Multiplier
objective of this project is to translate data from electrochemical Accumulator (MAC) and activation function implementation.
sensor signals and process the data with neuron structure on The final module is displaying the result from the data that
digital hardware. The hardware realization of neural network have been accumulating by the neuron. A 7-segment driver is
requires investigation of many design issues relating to signal included to enable reading and displaying the data that comes
interfacing and design of a single neuron. Analysis focuses on out from neuron architecture.
effect of digital design decisions such as module architecture
towards data accuracy and delay. The work touches on analogue
to digital interfacing, data structure and digital module design
that includes adder, multiplier and multiplier accumulator
(MAC). A major component of the algorithm is the design of the
activation function. The chosen activation function is the
hyperbolic tangent which is approximated by Taylor Series
expansion. The neuron is evaluated on an Altera DE2-70 FPGA.
The performances are evaluated in terms of functionality, usage Fig.1.General flow of the project system linking applied chemical sensor to
of resources and timing analysis. For the data structure, it was digital processing
demonstrated that increasing the fractional bits will increases the
precision. The neuron functionality was demonstrated on digital
platform. It was found that less delay were produce by using II. DESIGN AND METHODOLOGYS
Carry Look Ahead design compared to Ripple Carry Adder by
25% in the MAC performance.
This section presents the deisgn of the submodules in
implementing Fig 1. This covers the interfacing issues such as
Keywords-component; formatting; style; styling; insert (key analog to digital implementation, data structure and the neuron
words) architecture topology.

I. INTRODUCTION A. Analog to Digital Interfacing


In this project the 10 bit ADC chip-MCP3001 were used to
Electrochemical sensors are often used to determine
convert analog signal from electrochemical sensor to digital.
concentrations of various analytes in testing samples such as
The Microchip Technology Inc. MCP3001 is a successive
fluids and dissolved solid materials. Electrochemical sensors
approximation 10-bit A/D converter with onboard sample and
are frequently used in occupational safety, medical
hold circuitry. The device provides a single pseudo-differential
engineering, process measuring engineering, environmental
input. Communication with the device is done using a simple
analysis [1]. ANN is known to be able to improve
serial interface compatible with the SPI protocol. SPI is an
electrochemical sensor this signal interpretation [2]. In general,
interface that allows one chip to communicate with one or
hardware realization requires a good compromise between
more other chips and in this case is ADC chip-MCP3001 with
accuracy and complexity of the processing units to allow a low
the FPGA-Altera DE2-70 board. The SPI algorithm is required
cost effective device [3]. This paper describes a system
to be implemented in hardware description language (HDL) on
realization of translating data from electrochemical sensor for
FPGA.Fig.2. demonstrates SPI interfacing that allows one chip
neuron to process on FPGA. Analysis on the effect of different
to communicate with one or more other chips. As shown in the
digital module architecture towards neuron design is
figure above the wires are called SCK, MOSI, MISO and
investigated. The structure of a neuron is split into various sub
SSEL, and one of the chip is called the SPI master, while the
blocks and these blocks are implemented individually first and
other the SPI slave. A clock is generated by the master, and one
then they are integrated to form the entire neuron.The digital
bit of data is transferred each time the clock toggles. Data is
platform is Field Programmable Grid Array (FPGA). The
serialized before being transmitted, so that it fits on a single
approach for this project can be represented in block diagram
wire. There are two wires for data, one for each direction. The
as shown in the Fig.1. The key issue in designing this system is
master and slave know beforehand the details of the
modular design for reconfigurability. The first issue is to
communication (bit order, length of data words exchanged,
convert the signal from an analog to a digital form, by
etc...). The master is the one who initiates communication.
sampling it using an analog-to-digital converter (ADC), which
Because SPI is synchronous and full-duplex, every time the
turns the analog signal into a stream of numbers. The next

Authorized licensed use limited to: Amrita School Of Engineering - Kollam. Downloaded on December 17,2024 at 11:17:15 UTC from IEEE Xplore. Restrictions apply.
978-1-4673-1310-0/12/$31.00 ©2012 IEEE 133
2012 IEEE Symposium on Humanities, Science and Engineering Research

clock toggles, two bits are actually transmitted (one in each scaled down by 2 to the appropriate value that can be
direction). In term of performance, SPI can easily achieve a accommodated by the bits number. Finally, the value will be
few Mbps (mega-bits-per-seconds) [4]. For this module, the rounded (truncated) to integer value and be represented in
approach taken is hardware implementation of existing binary number [5].
technique, tailored to 10-bit environment.
A neuron can be viewed as processing data in three steps;
the weighting of its input values, the summation of them all
and their filtering by a activation function. The Neuron can be
expressed by the following equation:

(1)

where y is the output of the neuron, w is the synaptic


Fig.2.SPI Interfacing applied to signal
weight, x is the input and θ is the bias. The subscript i denotes
the preceding neuron and j the neuron considered.
B. Digital Design: Data Structure and Modules for Neuron The neuron computes the product of its inputs, with the
In this section, there are 2 major parts: data structure and corresponding synaptic weights, and then the results are
digital modules for neuron design on FPGA. For design tools, added. The result is presented to a comparison unit designed to
Modelsim to simulate the design at multiple stages throughout represent an appropriate activation function such as linear,
the design process and Quartus to program the board are used. sigmoid or hyperbolic tangent [3]. The equation is shown in
block diagrams in Fig.4. For the weighted inputs to be
Generally, a data structure is a particular way of storing calculated in parallel using conventional design techniques, a
and organizing data in a computer so that it can be used large number of multiplier units would be required. To avoid
efficiently. Data structures are generally based on the ability this, multiplier/Accumulator architecture has been selected. It
of a computer/chip to fetch and store data at any place in its takes the input serially, multiplies them with the
memory, specified by an address that can be manipulated by corresponding weight and accumulates their sum in a register
the program. For this project, the data computed from ADC [6-7].
will be converted into fixed-point number representation.
Fixed-point DSPs use 2’s complement fixed-point numbers in
different Q formats. Among the major issues in data structure
is the conversion technique of fixed-point number from a Q Trained
format to an integer value so that it can be stored in memory weight value
wn
Sensor array signals

and recognized by simulator. It is also required to keep track


Readout circuit

of the position of the binary point when manipulating fixed-


point numbers in writing verilog codes.
DC voltage Activation
signal Vn Function
%&$&

Predicted
!#'&&
output yj

 !

"$ +
 Fig.4. Structure of Neuron [7]
 !#'&

$'!&&"!&$(' The block diagram and flow of the hardware


 !&!$*$#$%!&&"! implementation is shown in Fig. 5 and 6. The accumulator unit
is composed of a bit-serial adder and 16 bit register. The
!
design of multiplier accumulator consists of adder and
multiplier. MAC are frequently used in general computing and
are especially critical to performance of digital signal
Fig.3. Flow of fixed point arithmetic conversion processing applications. The MAC typically operate on a
digital, and usually binary, multiplier quantity and a
The DSP (Digital Signal Processing) flows throughout the corresponding digital multiplicand quantity and generate a
conversion to Q format representation are shown in the Fig. 3. binary poduct. The design of multiplier accumulator proposed
As shown in the flowchart, a fractional number is converted to in this project consists of adder and multiplier that can
an integer value that can be recognized by a DSP assembler accommodate or handle 4 channel of input (array of sensor).
using the Q15 format .The number is first normalize then

Authorized licensed use limited to: Amrita School Of Engineering - Kollam. Downloaded on December 17,2024 at 11:17:15 UTC from IEEE Xplore. Restrictions apply.
134
2012 IEEE Symposium on Humanities, Science and Engineering Research

Activation function in a backpropagation network defines


the way to obtain output of a neuron given the collective input
from source synapses. The bakcpropagation algorithm requires
the activation function to be continuous and differentiable. It
is desirable to have an activation function with its derivative
easy to compute. The mathematical algorithm for tanh
approximation using Taylor’s Series expansion that is used in
the hardware calculation is provided by equation 2 [10-11]:
The design flow is presented in Fig 8.
Fig.5:.The flow of proposed neuron architecture y = x - x3/3 + 2x5/15+... (2)

%&$&

       
& & & &

'&#$ '&#$ '&#$ '&#$


'& )& '& )& '& )& '& )&
& & & &

' '&"$
'& &&&&

'&#'&&
Fig.8:.Design flow of activation function
!

Fig.6:.Signal handling of multiplier accumulator III. RESULTS AND DISCUSSION


The analysis considers clock-to-output delay which is the
The architecture for the MAC is shown in Fig. 7. With tree time to obtain a valid output at an output pin fed by a register
configuration as shown in Fig.12, the use of tile logic is quite (tCO) for all output pins and both minimum length of time that
uneven and less efficient than with a chain. The idea of this data must arrive before the active clock edge (tSU) and must be
configuration is that the 2 value from multiplier were added stable after the active clock edge (tH) for all input pins. The
separately. The partition of the computation is then added at time required for an input pin signal to propagate through
adder4 for the final output. combinatorial logic and appear at an external output pin (tPD) is
taken into consideration for any pin-to-pin combinatorial paths
in the design [12]. The in-built timing analysis algortihm in
Quartus is utilized to measure performances of the design by
stages throughout the project.
Comparison was made using booth multiplier to shift-add
multiplier timing performance in the architecture shown in Fig
7. The multiplier timing performance itself is first compared as
shown in Table I. The booth multiplier performs faster as
shown by 18.6% as compared to shift-add multiplier for tCO.
Booth multiplier performs faster compared to the basic shift-
add structure. The effect of the multiplier performance is
amplified in the MAC architecture as shown in Table II with
the MAC structure built with shift-add multiplier performing
slower by 70.3%.

TABLE I. TIMING PERFORMANCE OF MULTIPLIER MODULE IN NEURON


Multiplier Timing Parameter
Architecture tsu(ns) tco(ns) th(ns)
Booth Multiplier 5.611 7.399 0.232
Array Multiplier 5.851 9.131 0.914
Fig.7:Tree configuration of Multiplier Accumulator

Authorized licensed use limited to: Amrita School Of Engineering - Kollam. Downloaded on December 17,2024 at 11:17:15 UTC from IEEE Xplore. Restrictions apply.
135
2012 IEEE Symposium on Humanities, Science and Engineering Research

as its main component in the architecture. The activation


function was achieved with better performance by 23% using
TABLE II. TIMING PERFORMANCE OF MULTIPLIER ACCUMULATOR UNIT array multipliers compared to behavioral statements of the
Timing Multiplier Architecture Utilized
mathematical expansion.
parameter Using Booth Multiplier Using Shift-Add
Multiplier ACKNOWLEDGMENTS
tSU(ns) 5.040 15.049
tCO(ns) 9.805 31.605 The authors would like to thank Universiti Teknologi
tH(ns) -0.435 17.867 MARA for funding the research work through the Excellence
Fund Grant 600-RMI/ST/DANA 5/3/Dst(171/2011).
For the activation function, two approaches were compared.
Approach 1 is constructed using behavioral statements of REFERENCES
equation (2) up to 3 terms of the expansion with continuous
assignment, utilizing the inbuilt library architectures. In
approach 1, whenever the value of a variable on the right-hand [1] Wan Fazlida Hanim Abdullah., Masuri Othman, Mohd Alaudin Mohd
Ali, and Md. Shabiul Islam. Improving ion-sensitive field-effect
side changes, the expression is re-evaluated and the value of transistor selectivity with backpropagation neural network. WSEAS
the left-hand side is updated. Approach 2 uses RTL statements Transactions of Circuits and Systems. 9(11): 700-712,2010
of the multiplier architecture and registers designed from [2] Stradiotto, N. R., Yamanaka, H., & Zanoni, M. V. B. Electrochemical
previous stage. Table 3 shows that the time required for an sensors: a powerful tool in analytical chemistry. Journal of the Brazilian
input pin signal to propagate through combinatorial logic and Chemical Society 14: 159-173, 2003
appear at an external output pin for Approach 1 is 61.272ns and [3] FPGA Implementation of a Multilayer Perceptron Neural Network using
this is higher by 13.9ns compared to designed based on VHDL,Yamina TARIGHT, Michel HUBIN,Proceedings of ICSP '98
[page1-3]
Approach 2. This is due to the fact that parallel multiplication
were used in Approach 2 as compared to Approach 1. [4] ‘Overview and Use of the PICmicro Serial Peripheral Interface',
Microchip(TM),[page 1-9]
[5] Poole, D., Linear algebra : a modern introduction. Belmont, CA,
TABLE III. TIMING PERFORMANCE OF ACTIVATION FUNCTION UNIT Thomson Brooks/Cole, 2005.
[6] Polikar, R. Ensemble based systems in decision making. IEEECircuits
Activation_Function Architecture tpd and Systems Magazine 6(3): 21-45, 2006.
Approach 1: Behavioral Statement 61.272 ns
Approach 2: Multiplier and registers RTL 47.372ns [7] A.Durg,W.V.Stoelker,J.P.Cookson,S.E.Umbaugh and R.H.Moss,
”Identification of Variegating Coloring in Skin Tumors:Neural Network
Statements
vs Rule Based Induction methods",IEEE Eng. in med. and Biol., vol.12
pp.71-74 & 98,1993
[8] FPGA Implementation of Artifial Neural Networks: An application on
IV. CONCLUSION Medical Expert Systems,G.P.K Economou,E.P.Mariatos,
Modules for a neuron structure with hyperbolic activation N.M.Economopoulos, D.Lymberopoulos, and C.E. Goutis,Department
of Electrical Engineering University of Patras, GR 261 10,Patras, Greece
on has been designed for hardware realization on digital
[9] A.R Ormondi and J.C Rajapakse, FPGA Implementation of Neural
platform. The work demonstrates that the performance of Network, [page 271-296], 2006
neuron architecture on FPGA depend strongly on the [10] Simon Haykin ’Neural Networks and Learning Machines, third edition,
methodology, coding styles and also type of multiplier and [page 40-45]
adder used. The neuron implemented on FPGA with 38.832ns [11] Benard Widrow, David E. Rumelhart, and Michael A. Lehr, “Neural
propagation delay and maximum fanout is 68. From the point networks: Applications in industry, business and science,”
of view neuron architecture it was found that using Tree Communications of the ACM,vol. 37, no. 3, pp. 93–105, Mar. 1994.
structure for MAC Booth Multiplier gives lower delay [12] Altera Corporation ‘Cyclone III Device Handbook, Volume 1’,[chapter
compared to using ripple-carry adder and shift-add multiplier 5,page 1-8], July 2007

Authorized licensed use limited to: Amrita School Of Engineering - Kollam. Downloaded on December 17,2024 at 11:17:15 UTC from IEEE Xplore. Restrictions apply.
136

You might also like