0% found this document useful (0 votes)
12 views

Ann On Fpga Ieee

Implementation of ANN on FPGA.

Uploaded by

H.Bhargav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Ann On Fpga Ieee

Implementation of ANN on FPGA.

Uploaded by

H.Bhargav
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Hardware Implementation of an Artificial Neural Network

Nazeih M.Botros, Ph.D., and M.Abdul-Aziz


Departmentof Electrical Engineering
Southern Illinois University
Carbondale,IL 62901.
Tel: 618-536-2364 Fax: 618-453-7455

Abstract-In this study we present a hardware implementation 1


of a fully digital and fully interconnected feed forward back- f(a)= -
1+ e-"
...............(2)
propagation artificial network using Xilinx FPCAs. The
network consists of an input layer witb five nodes, a single Training of the network is carried out as follows:
hidden layer with four nodes, and an output layer with two i) Initialize the weights and offsets. Each node, except the
nodes. These nodes are fully interconnected to each other input nodes, is assigned to an initial offset. The input nodes
between adjacent layers. Training is done off-line on a
conventional digital computer where the final values of the have no offset; they act as buffers.
weights are obtained. Each node is implemented with two ii) Starting from the output layer and going backward to the
XC3042 FPGAs and a 1K x 8 EPROM. Tbe network is tested input layer, adjust the weights and offsets recursively until
successfhlly by comparing the values of the output nodes for a the weights stabilized. 'Ihe weights and offsets are adjusted
different Input pattems with thase obtained from simulating by using the formulas:
the network on a PC. The number of FPGAs used can be
dgninccrlly decreased m well as the speed can be increased U 4K wi jnew = W.1 J.old + ej xi ................(3)
or higher family FPCA is used.
0 Few= e + P Ej 1 ..................(4)
I. INTRODUCTION
In recent years, implementation of Field Programmable ou',yv',
Gate Arrays (FPGA) in realizing complex hardware system uyg~

has been accelerated. The relatively low cost and easiness of


implementation and reprogramming of FFGA offer attractive
features for the hardware designer in comparison with the
VLSI technology [11. Field programmable gate arrays are
high density ASKS that can be configured by the user. They
combine the flexibility of gate arrays with desktop m"
programmability.
The artificial neural network implementedin this study is a
three-layer back-propagation network. See Figure 1. The
network is used to classify selected input patterns. It
consists of 3 layers: an input layer with 5 nodes, a hidden ................. ......-......."......................
I.."

layer with 4 nodes and an output layer with 2 nodes. The


network is selected due to its high performance as a classifier
and the easiness of its training procedure [2]. 'Ihe input to
the network is a continuous valued vector x,, x *,........x 5 .
5
The output of the network is the class of the current input.
The output of node j, y., is calculated as follows: Figure 1. The Artificial Neural Network

yj= f[[Twv.j'- oil............ (1)


Where p is the gain factor and is assumed to be 0.5, is the
error. In this study, the weights are considered stabilized if
the value of each new weight is greater than 95% of its
Where e* is the offst (bias) of node jv WiJ is the weight of pr,=vious(old)value. If node j is an outputnode, then
J
the connection between node j and node i, and the function Ej = yj (1- yj Xdj- yj ).................(5)
f is the sigmoid non-linearity: Where dj is the desired output of node j and yj is the actual
output. The desired output for all output nodes is set to zero
01993 I E E
0-7803-0999-5/93/$03.00
except for the node that corresponds to the current input bit input to the hidden and output nodes y,. is multiplied by
training set which is set to 1, [2]. If node j is a hidden node, 8-bit signed weights Wij to form 16-bit signed product. The
thcn: products (five in the hidden and four in the output layer) and
cj = Yj (1- Yj)Z %wjk .............(6) a 16-bit signed bias value 8; are accumulated into a 20-bit
k
Where k is over all nodes in the layers above node j.
11. GENERAL ARCHITEXTURE
The 20-bit sum in the accumulator isthen scaled to a 10-bit
Figure 2 shows the general architecture of the network. value. The 10-bit scaling is selected because it is the
The input layer consists of five nodes (neurons), the hidden minimum number of bits that can be relained without
layer of four nodes and the output layer of two d e s . These deterioratingthe accuzacy of the sum.This Whit scaled sum
nodes am fully interconnected to each other between adjacent serves as the address of a 1K x 8 EPROM where a sigmoid
layers. Training is done by off-line simulation of the activation function f is realized as a lookup table. The
network on a PC. The final values of weights are obtained at activation function produces an %-bit output. Two's
the end of training session. The input layer does no complement number system is employed to handle the
processing but simply buffers the data. The nodes in the multiplications and additionsof negative numbers. This
other layers first form a weighted sum of their inputs. The 8-

clock

I control bus

lnput layer

Figure 2. General Architecture of the Network.

1253
partial products and the bias value is stored in a 20-bit
accumulator. An investigation of the behavior of the sigmoid
function shows that it saturates to approximately 1.0 when
U 2 +7 and saturates to approximately 0.0 if a s -8.
1
f W =-
1+e-=’
Figure 3 shows a schematic diagram of Accordingly the 20-bits is scaled to 9 bits.

the processing flow for the hidden node. Each node in the Control Unit
network is built by two XC3042 FPGAs and a 1K by 8
EPROM. The first FPGA carries input latches and A separate micro programmed controller drives the entire
multipliers. The second carries a 20 bit fast circuit. Two 4-bit asynchronous nZ counters are cascaded
adder/accumulator circuit and a scaling logic. These two to generate the addresses for the control memory. The
FPGAs compute the weighted sum of five inputs (four in the counters are driven by a 4 MHz system clock. The
output layer) and the bias value and then scale the result. The asynchronous clear inputs of the counters are connected to a
EPROM holds the activation functions for the node. The push button switch. the EIW of the least signifxant counter
system clock is 4 MHz which is the maximum speed that is connected to the Q output of a JK flip flop whose Preset &
could be achieved due to the use of slower EPROMs and two Clear inputs are driven by two control signals produced by
FpGAs per node. The en& network is driven by a micro the conml memory. See figure 4.
programmed controller The controller generates a proper
sequence of signals to control the timing for both layers. IV. RESULTS
Computations for nodes of the same layer are done in
The network is simulated by software and same input
-1. pattems are applied to both the software and the hardwate
network. In both cases the outputs are calculated. Table I
III. DETAILS OF THE ARCHITECWRE shows the outputs and y2 of both networks. See Figure 1.
As shown in this table the hardware network (Hard W.)
Multiplication performs correctly. The hardware implementation computes
4 million interconnections (approx. 70,000 decisions) per
Since the weights and biases are contants @redetermined), second. This speed allows the implementation of the
multiplication of any number with them can be done in a network in real-time applications.
loot-up table fashion The CLBs (Configurable Logic
Blocks) of the FPGA are programmed to realize these look-
up tables. Multiplying an 8-bit number by an 8-bit constant V.DISCUSSION AND CONCLUSION
produces a sixteen bit product. The 8x8 multiplication is
broken into two-8x4 multiplications and one addition. The
mast significant partial mutt (8x4) is shifted by four bits We have presented a successful hardware implementation
before adding it to the least significant partial product. The of a simple artificial neural network. The implementation
shift is realized by physically shifting (routing) the most can be expanded to realize more complex networks.
significant bits. One CLB is used lo generate two bits of Reconfigurability and adaptability are the main features of
the product. The twelve bits of the partial product are the hardware. For a new application only the weights, biases
generated by using six CLBs. and scaling parameters need to be re configured on the CLBs
without changing the basic design. It is easily expandable
just by adding more nodes with the same design.
Summation
Xilinx FPGAs and other alike FPGAs are found feasible
The next task performed by the nodes is to produce the and efficient tools for the design of neural nets. They offer
sum of the partial products into a single 20-bit sum. The 20 acceptable densities without the cost and length design cycles
bits are selected so that no overflow can happen. A 20-bit of full custom circuits. Their reconfigurability and desktop
fast carry look-ahead adder is designed to carry out the programmability allow to make design changes at user‘s
summations. Each node in the hidden layer adds up ten 16- terminal, thereby avoiding the fabrication cycle times and
bit partial products and a bias into a 20-bit positive edge non-recurring engineering charges. Although (due to our
triggered accumulator to produce a single sum. The output limited funds) the use. of two XC3042 FPGAs (50 MHz) and
nodes perform the Same but for eight partial product. a 1K x 8 EPROM (450ns)per ndde makes the network bulky,
we found that its size and speed can be greatly improved by
Scaling and Activation Function using higher density FPGAs. FPGA XC3090 can easily
accommodate the circuits in the two XC3042 used in this
The final task performed by the nodes of hidden and study. It will also significantly reduce the size as well as
output layers is the scaling and the application of activation increase the speed by eliminating the 5511s [approx.) delay
(sigmoid) function. The final result of addition of all the between the I/O pins of two FPGAs. RAMS can be
1254
. r

4
Control unlt . *
b
1 b I,

* * 2

lnput buffers' I r J LCA


hardw lr ed L
b
a s
U3
3
HP8 8,~ 5
2 1
8 x 8 two's complement *duct selecl
multtpller

20 b l t accumulator

blt sum

I Actlvatlon
lKx8EPR functlon
-

t8 bit output for next layer

Flgure 3.. Schematic 01 dlgltal neuron (hldden unlt)


1255
implemented inside the 4SK FPGA series. These R A M can
be programmed for sigmoid lookup tables and can be
downloaded with the bit stream during configuration, to
further increase thc speed and reduce cht size to 1 chip per
node. Use of pipeline techniques in each node as well as
bctween successive layers of thc network and higher speed
EPROMs and FPGAs can also greatly increase the speed.
Very high density FPGAs may provide room to build m m
than one neuron in one chip. A1 present time we are
modifying thc design to include onchip training.

VI. REFERENCES
(1) C. E. Cox, and W. Ekkehard Blanz, ' Ganglon- A Fast
Hardwan Implementation of a Connectionist Classifier,
a EEE-CICC Phoenix, Arizona. 1991.

[2] R. Lippmann, ' An Introduction to Computing With


Neural Nets," IEEE-ASSP, Magazine, 4-22, April 1987.

ha1t 2
-0 fF 4j-
/

t
Figure 4. The Control Circuit.

1256

- ... . . . ., ... .". ..


Table 1. Results of Software and Hardware Networks

1257

You might also like