Ann On Fpga Ieee
Ann On Fpga Ieee
clock
I control bus
lnput layer
1253
partial products and the bias value is stored in a 20-bit
accumulator. An investigation of the behavior of the sigmoid
function shows that it saturates to approximately 1.0 when
U 2 +7 and saturates to approximately 0.0 if a s -8.
1
f W =-
1+e-=’
Figure 3 shows a schematic diagram of Accordingly the 20-bits is scaled to 9 bits.
the processing flow for the hidden node. Each node in the Control Unit
network is built by two XC3042 FPGAs and a 1K by 8
EPROM. The first FPGA carries input latches and A separate micro programmed controller drives the entire
multipliers. The second carries a 20 bit fast circuit. Two 4-bit asynchronous nZ counters are cascaded
adder/accumulator circuit and a scaling logic. These two to generate the addresses for the control memory. The
FPGAs compute the weighted sum of five inputs (four in the counters are driven by a 4 MHz system clock. The
output layer) and the bias value and then scale the result. The asynchronous clear inputs of the counters are connected to a
EPROM holds the activation functions for the node. The push button switch. the EIW of the least signifxant counter
system clock is 4 MHz which is the maximum speed that is connected to the Q output of a JK flip flop whose Preset &
could be achieved due to the use of slower EPROMs and two Clear inputs are driven by two control signals produced by
FpGAs per node. The en& network is driven by a micro the conml memory. See figure 4.
programmed controller The controller generates a proper
sequence of signals to control the timing for both layers. IV. RESULTS
Computations for nodes of the same layer are done in
The network is simulated by software and same input
-1. pattems are applied to both the software and the hardwate
network. In both cases the outputs are calculated. Table I
III. DETAILS OF THE ARCHITECWRE shows the outputs and y2 of both networks. See Figure 1.
As shown in this table the hardware network (Hard W.)
Multiplication performs correctly. The hardware implementation computes
4 million interconnections (approx. 70,000 decisions) per
Since the weights and biases are contants @redetermined), second. This speed allows the implementation of the
multiplication of any number with them can be done in a network in real-time applications.
loot-up table fashion The CLBs (Configurable Logic
Blocks) of the FPGA are programmed to realize these look-
up tables. Multiplying an 8-bit number by an 8-bit constant V.DISCUSSION AND CONCLUSION
produces a sixteen bit product. The 8x8 multiplication is
broken into two-8x4 multiplications and one addition. The
mast significant partial mutt (8x4) is shifted by four bits We have presented a successful hardware implementation
before adding it to the least significant partial product. The of a simple artificial neural network. The implementation
shift is realized by physically shifting (routing) the most can be expanded to realize more complex networks.
significant bits. One CLB is used lo generate two bits of Reconfigurability and adaptability are the main features of
the product. The twelve bits of the partial product are the hardware. For a new application only the weights, biases
generated by using six CLBs. and scaling parameters need to be re configured on the CLBs
without changing the basic design. It is easily expandable
just by adding more nodes with the same design.
Summation
Xilinx FPGAs and other alike FPGAs are found feasible
The next task performed by the nodes is to produce the and efficient tools for the design of neural nets. They offer
sum of the partial products into a single 20-bit sum. The 20 acceptable densities without the cost and length design cycles
bits are selected so that no overflow can happen. A 20-bit of full custom circuits. Their reconfigurability and desktop
fast carry look-ahead adder is designed to carry out the programmability allow to make design changes at user‘s
summations. Each node in the hidden layer adds up ten 16- terminal, thereby avoiding the fabrication cycle times and
bit partial products and a bias into a 20-bit positive edge non-recurring engineering charges. Although (due to our
triggered accumulator to produce a single sum. The output limited funds) the use. of two XC3042 FPGAs (50 MHz) and
nodes perform the Same but for eight partial product. a 1K x 8 EPROM (450ns)per ndde makes the network bulky,
we found that its size and speed can be greatly improved by
Scaling and Activation Function using higher density FPGAs. FPGA XC3090 can easily
accommodate the circuits in the two XC3042 used in this
The final task performed by the nodes of hidden and study. It will also significantly reduce the size as well as
output layers is the scaling and the application of activation increase the speed by eliminating the 5511s [approx.) delay
(sigmoid) function. The final result of addition of all the between the I/O pins of two FPGAs. RAMS can be
1254
. r
4
Control unlt . *
b
1 b I,
* * 2
20 b l t accumulator
blt sum
I Actlvatlon
lKx8EPR functlon
-
VI. REFERENCES
(1) C. E. Cox, and W. Ekkehard Blanz, ' Ganglon- A Fast
Hardwan Implementation of a Connectionist Classifier,
a EEE-CICC Phoenix, Arizona. 1991.
ha1t 2
-0 fF 4j-
/
t
Figure 4. The Control Circuit.
1256
1257