0% found this document useful (0 votes)

154 views108 pages

FFT Full Docc

mech doc

Uploaded by

kota naik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

154 views108 pages

FFT Full Docc

mech doc

Uploaded by

kota naik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 108

VHDL MODELING AND FPGA BASED

IMPLEMENTATION OF FAST FOURIER

TRANSFORM PROCESSOR

DISSERTATION .
SUBMITTED IN PARTIAL FULFILMENT OF
THE REQUIREMENTS FOR THE AWARD OF THE DEGREE OF
.1/' .''' ; \ s^^?

iUtefiter of ®eci|nol0gg (iEkctronicB)

(Electronics Circuits and System Design)

SYED MOHAMMED QAMARUZZAMAN

Under the Supervision of

PROF. SHUJA A. ABBASI MR. M. HASAN

DEPARTMENT OF ELECTRONICS ENGINEERING

ALIGARH MUSLIM UNIVERSITY
ALIGARH (INDIA)

1999-2000
DS3225
CONTENTS

Dissertation approval sheet i

Certificate ii
Acknowledgement iii
Chapter-1 Introduction 1
Chapter-2 Fast Fourier transform algorithms 3
2.1 Introduction 3
2.2 Decimation-in-time FFT algorithms 5
2.3 Decimation-in-ftequency FFT algorithms 11
2.4 Prevention of overflow in fixed point arithmetic 15
Chapter-3 Introduction to VHDL 16
3.1 VHDL advantages 18
3.2 VHDL design flow 20
3.3 Logic synthesis 21
Chapter-4 Design of FFT processor 22
4.1 Architecture of FFT processor 22
4.2 Radix-2 pipeline FFT 22
4.3 VHDL coding and synthesis of radix-2 pipeline FFT 26
4.3.1 Architecture of butterfly 27
(a) Add-Subtract module 29
(b) Two's complement multiplier 30
(c) Correction and rounding-off circuit 38
(d) Overall code of butterfly circuit 41
4.3.2 Shift registers 42
(a) Shift register (Sh4_l_l 6) 42
(b) Shift register (Sh2_2_l 6) 43
(c) Shift register (Shl_2_l 6) 44
4.3.3 Switches 45
(a) Switch (Sw4_l) 45
(b) Switch (Sw4_2) 46
4.3.4 Counter 47
4.3.5 Weight factor generator 48
4.3.6 Pipeline fast Fourier transform circuit 49
Chapter-5 FPGA based implementation 53
5.1 Introduction 53
5.2 FPGA architecture 56
(a) SRAMFPGAs 56
(b) ANTIFUSE FPGAs 57
(c) CPLD FPGAs 58
5.3 XILINX FPGA families 59
5.3.1 CMOS XC4000 series 60
Chapter-6 Simulation and implementation results 62
6.1 S imulation results 62
(a) Simulation results for butterfly circuit 62
(b) Timing diagram of FFT circuit 67
(c) Gate level schematics generated by the synthesis tool 67
6.2 Implementation results 67
Chapter-7 Conclusion and future scope 90
References 91
Appendix 93
DISSERTATION APPROVAL SHEET

Dissertation entitled "VHDL modeling and FPGA based implementation of fast

Fourier transform processor" by Syed Mohammed Qamaruzzaman is approved

for the award of the degree of Master of Technology in Electronics Engineering

(Electronics Circuits and System Design)

Internal Examiner External Examiner

CHAIRMAN
DEPARTMENT OF ELECTRONICS ENGINEERING
Z.H. COLLEGE OF ENGINEERING AND TECHNOLOGY
AMUALIGARH
INDIA
CERTIFICATE

Certified that the dissertation entitled "VHDL modeling and FPGA based

implementation of fast Fourier transform processor" which is being submitted by

Mr. Syed Mohammed Qamaruzzaman in partial fulfillment of the requirements for

the award of the degree of the Master of Technology in Electronics Engineering

(Electronics circuits and system design) of Aligarh Muslim University, Aligarb, is a

record of the candidates own work carried out by him under our supervision and

guidance. The matter embodied in this dissertation has not been submitted for the

award of any other degree.

Prof^Sl/uja Ahmad Abbasi -^ Mr. Mohd Hasan

(Reader)

DEPARTMENT OF ELECTRONICS ENGINEERING

Z.H. COLLEGE OF ENGINEERING AND TECHNOLOGY
A.M.U. ALIGARH
INDIA
^^ckviowledaewievit

ralie id due to ^îmiaktu ^^llak, modt oenejîcent ana mefcijût, wko

u/itk ^Jv^ii mefcu made me to complete mu aiddenation, ducceid^vtlitj.

^ coniidef it a matter of proud and pleadure to expre66 mu iincere

qratitu.de to m,u eiteem,ed 6uperuiiori l-^roP.^kuja ^.y^km-aa ^^^bbaii an

ll/lokd ^J^a6an ilKeaderj, deptt. of C^iectronicd L^naineerina oie

m,oral ana conitant encouragement, kind attitude and affectionate nature

ancouraaed m.e all the wau tkrouak.

^ take tkii opportunitu to expreii m.u aratitude to j-^rof.

kfooz u f\ekman, Chairman deptt. of L^lectronici C^naineerina,

^^ivlijl, for prouidina me i/arioui facilitiei.

J/^ feel priuileaed and kiaklu ooliaed to mu parenti and all mu fam.ilu

m,em beri for their addidtance, bleddinad and tkeir cooperation.

I'm ipecial thankd aoed to all the lab attendantd for their cooperation.

lA final and deepedt appreciation aoei to mu friendi, particulariu

^ajjad ^,^hm.ad lone for their moral boodt induced bu them.

^C- fabrication
am.aruzzam.an

^u^u6t 7000;
CHAPTER-1
INTRODUCTION
INTRODUCTION

Digital signal processing, a field which has its roots in 1 f and 18' century
mathematics, has become an important modern tool in a multitude of diverse fields of
science and technology. The techniques and applications of this field are as old as
Newton and Gauss and as new as digital computers and integrated circuits.
Digital signal processing is concerned with the representation of signals by
sequences of numbers or symbols and the processing of these sequences. The purpose
of such processing may be to estimate characteristic parameters of a signal or to
transform a signal into a form, which is in some sense more desirable.
The evolution of a new point of view toward digital signal processing was
further accelerated by the disclosure in 1965 of an efficient algorithm for computation
of Fourier transforms. This class of algorithms has come to be known as the fast
Fourier transform or FFT. The implications of the FFT were significant from a
number of points of view. Many signal-processing algorithms, which had been
developed on digital computers, required processing times several orders of
magnitude greater than real time. Often this was tied to the facts that spectrum
analysis was an important component of the signal processing and that no efficient
means had been known for implementing it. Tlie fast Fourier transform algorithm
reduced the computation time of the Fourier transform by orders of magnitude. This
permitted the implementation of increasingly sophisticated signal processing
algorithms with processing times that allowed interaction with the system.
Furthennore, with the realization that the fast Fourier transform algorithm might, in
fact, be implementable in special purpose digital hardware, many signal processing
algoritluns which previously had appeared to be impractical began to appear to have
practical implementations with special purpose digital hardware.

The techniques and applications of digital signal processing are expanding at a

tremendous rate. With the advent of large-scale integration and the resulting reduction
in cost and size of digital components, together with increasing speed, the class of
applications of digital signal processing techniques is growing. Special purpose digital
filters can now be implemented at sampling rates in the megahertz range. Special
purpose processors for implementing the fast Fourier transform at high data rates are
commercially available. Simple digital filters have been integrated on circuit chips.
Almost all current discussions of speech bandwidth compression systems are directed
toward all digital implementation because these are now most practical. Digital
processors also form an integral part of many modern radar and sonar systems. In
addition to the development of special purpose digital signal processing hardware,
there are available special programmable digital signal processing computers whose
architecture is matched to signal processing problems such as TMS320 and ADSP-
2100. Such computers are finding application in real-time signal processing as well as
for real-time simulations directed toward the development of special purpose digital
hardware.
Recently field programmable Gate Arrays (FPGAs) have become very popular
for implementing application specific integrated circuits (ASICs). FPGAs are right
candidate for implementation of FFT as high degree of parallelism can be achieved
and thus high speed.
CHAPTER-2
FAST FOURIER TRANSFORM
ALGORITHMS
FAST F O U R I E R T R A N S F O R M A L G O R I T H M S

2.1 Introduction

Discrete Fourier transform plays an important role in the analysis, the design,
and the implementation of digital signal processing algorithms and systems. One of
the reasons that Fourier analysis is of such wide-ranging importance in digital signal
processing is because of the existence of efficient algorithms for computing the
discrete Fourier transform [1].
The discrete Fourier transform (DFT) is

X(k) = Y,x(n)j;f^, k=0,l,-,N-l [2.1]

Where J^^ - ^'^^"'^^ . xhe inverse discrete Fourier transform (IDFT) is

^W =7 7 E ^ W ^ r ' « = 0,1,...,A^-1 [2.2]

In Eqs. [2.1] and [2.2], both x{n) andy^(A;) may be complex. The expressions of Eqs.
UN. Thus a discussion of computation procedures for Eq. [2.1] applies with
[2.1] and [2.2] differ only in the sign of the exponent of ^^ and in a scale factor

straightforward modifications to Eq. [2.2].

To indicate the importance of efficient computation schemes, it is instructive
to consider the direct evaluation of the DFT equations. Since x(n) may be complex we
can write

X{k) = 2;{(Re[x(«)]Re[]^™]-Im[x(«)]Im[j^;])

+ 7(ReW«)]Im[J^^] + Im[;v(«)]Re[J^J)},

k = 0,l,...,N-\ [2.3]

From Eq. [2.3] it is clear that for each value of k, the direct computation of X(k)
requires 4A'^ real muhiplications and (4N - 2) real additions. Since X(k) must be
computed for A^ different values of k, tire direct computation of tlie discrete Fourier
transform of a sequence x{n) requires 4N^ real muhiplications and A^(4JV - 2) real
additions or, alternatively, N^ complex multiplications and N{N - 1) complex
additions, hi addition to the multiplications and addhions called for by Eq. [2.3], the
implementation of the computation of the DFT on a general-purpose'digital computer
or with special-puipose hardware of course requires provision for storing and
kn

accessing the input sequence values x(n) and values of the coefficients JY^^ • Since
the amount of accessing and storing of data in numerical computation algorithms is
generally proportional to the number of arithmetic operations, it is generally accepted
that a meaningful measure of complexity, or, of the time required to implement a
computational algorithm, is the number of multiplications and additions required.
Thus, for the direct computation of the discrete Fourier transform, a convenient
measure of the efficiency of the computation is the fact that 4N^ real multiplications
and Ni4N - 2) real additions are required. Since the amount of computation, and thus
the computation time, is approximately proportional to N^, it is evident that the
number of arithmetic operations required to compute the DFT by the direct method
becomes very large for large values of N. For this reason, computational procedures
that reduce the number of multiplications and additions are of considerable interest.
Most approaches to improving the efficiency of the computation of the DFT
kn

exploit one or both of the following special properties of the quantities JY ^ '-

1. wT"'=(w>-

For example, using the first property, i.e., the symmetry of the cosine and sine
functions, we can group terms in Eq. [2.3] as

Re[x(«)]Re[]^*;] + Rt[x(N - «)]Re[]^7-"'] = (Re[x(n)] + Re[x(iV - n)])Re[J^';;]

and

- Im[x(«)]Im[|;f';] - Im[x(iV - n)]Jm[WT'"'^ =" (Imt^(«)]" ^i^(^ " «)])Im[)^';]

Similar groupings can be found for the other terms in Eq. [2.3] by this method, the
number of multiplications can be reduced by approximately a factor of 2. Also we can
take advantage of the fact that for certain values of the product kn, the sine and cosine
fiinctions take on the values 1 or 0, thereby eliminating the need for multiplications.
However, reductions of this type still leave us with an amount of computation that is
approximately proportional to JV^. Fortunately, the second property, i.e., the
kn

periodicity of the complex sequence J^ , can be employed in achieving significantly

greater reductions of the computation.

Computational algorithms that exploit both the symmetry and periodicity of

sequence J^*" were known long before the era of high-speed digital computation. At

that time, any scheme that reduced hand computation by even a factor of 2 was
welcomed. Runge [2] and later Danielson and Lanczos [3] described algorithms for
which computation was roughly proportional to N log A^' rather than N . The
possibility of greatly reduced computation was generally overlooked until about 1965,
when Cooley and Tukey [1] published an algorithm for the computation of the
discrete Fourier transform that is applicable when A^ is a composite number; i.e., A^' is
the product of two or more integers. The publication of this paper touched off a flurry
of activity in the application of the discrete Fotirier transform to signal processing and
resulted ui the discovery of a number of computational algorithms which have come
to be known as fast Fourier transform, or simply FFT, algorithms. Collectively, the
entire set of such algorithms is often loosely referred to as "the FFT" [5].
The fundamental principle that all these algorithms are based upon is that of
decomposing the computation of the discrete Fourier transform of a sequence of
length A'' into successively smaller discrete Fourier transforms. The manner in which
this principle is implemented leads to a variety of different algorithms, all with
comparable improvements in computational speed. Here we will discuss two classes
of algorithms.
1. Decimation-in-time FFT algorithm.
2. Decimation-in-frequency FFT algorithm.

2.2 Decimation-in-time FFT Algorithms

To achieve the dramatic increase in efficiency, it is necessary to decompose

the DFT computation into successively smaller DFT computations, hi this process we
exploit both the symmetry and the periodicity of the complex exponential

J^^ - Q ' . Algorithms in which the decomposition is based on decomposing

the sequence x{n), into successively smaller subsequences, are called decimation-in-
time algorithms. The principle of decimation-in-time is most conveniently illustrated
by considering the special case of A^ an integer power of 2; i.e.,
A^=2"
Since A'^ is an even integer, we can consider computing X{k) by separating x(ji) into
two M2-point sequences consisting of the even-numbered points in x{n) and the odd-
numbered points in x(n). With X{k) given by

X{k)^Y^x{n)W"',, k^O,\,...,N-l [2.4]

and separating x(n) into its even- and odd-numbered points we obtain

X{k)=Y.<n)W'",^Y.<^)Wl
noii
N

or with the substitution of variables n = 2r for n even and n = 2r+l for n odd,

Xik)^ X xi2r)WT^ Z xi2r + l)WT'

= Z ^(20(|rA^) -^^^ E ^ ( 2 ^ + l)(PrA^) P.5]

/ =0 '- = 0

But ] ^ > ) ^ ^ / , since

Consequently Eq. [2.5] can be written as

(NI2)-\ (/V/2)-l

XW= X ^(^'•Wtn + Wl E ^^'-^^W^n P.6]

r=0 '=0

Each of the sums in Eq. [2.6] is recognized as an 7V/2-point DFT, the first sum being
tlie AV2-point DFT of the even-numbered points of the original sequence and the
second being the AV2-point DFT of the odd-numbered points of the original sequence.
Although the index k ranges over A^ values, A: = 0, 1, . . . , TV^ - 1, each of the sums need
only be computed for k between 0 and M2 - 1, since G(}z) and H{\i) are each periodic
in k with period iV/2. After the two DFTs corresponding to the two sums in Eq. [2.6]
are computed, they are then combined to yield the A^-point DFT, X{]c). Fig. 2.1
indicates the computation involved in computing Xili) according to Eq. [2.6] for an
eight-point sequence, i.e., for N= 8. Li this figure we have used the signal flow graph
conventions for representing difference equations [5,7]. That is, branches entering a
node are summed to produce the node variable. When no coefficient is indicated, the
branch transmittance is assumed to be one. For other branches, the transmittance of a
branch is an integer power of J^ .
X(0)

X(l)

X(,2)

X(3)

A(4)

X(5)

x(5)0 X(6)

X(l)
x(7)a

Fig. 2.1 Flowgraphof the decimation-in-time decomposition of anN-point

DFT computation into two N/2-point DFT computations (N = 8).

Thus we note in Fig. 2.1 that two four-point DFTs are computed, with G{k)
designating thefo-ur-pointDFT of the even-numbered points and H{K) designating the
four-point DFT of the odd-numbered points. X(0) is then obtained by multiplymg

H(0) by JYfi ^"^ adding the product to G(Q). X{1) is obtained by multiplying H{1) by

]Yi^ and adding the result to G(l). For X{A) we would want to multiply H{A) by j ^ ^

and add the result to G(4). However, since G(k) and H{k) are both periodic in k with
period 4, //(4) = //(O) and G(4) = G(0). Thus X(4) is obtained by multiplying //(O) by
4
] ^ ^ and adding the result to G(0).
With the computation restructured according to Eq. [2.6], we can compare the
number of multipHcations and additions required with those required for a direct
computation of the DFT. Previously we saw that for direct computation without
exploiting symmetry, N complex multiplications and additions were required. By
comparison, Eq. [2.6] requires the computation of two M2-point DFTs, which in turn
requires 2{NI2f complex multiplications and approximately 2(M2)^ complex
additions. Then the two AV2-point DFTs must be combined, requiring A^ complex

multiplications, corresponding to multiplying the second sum by J^^ and then N

complex additions, corresponding to adding that product to the first sum.

Consequently, the computation of Eq. [2.6] for all values of A: requires N + 2{N/2) or
N + {N^/2) complex multiplications and complex additions. It is easy to verify that for
N>2,N+ (N^/2) will be less than N^.
Equation [2.6] corresponds to breaking the original //-point computation into
two A'/2-point computations. If N/2 is even, as it always is when N is equal to a power
of 2, then we can consider computing each of the iV/2-point DFTs in Eq. [2.6] by
breaking each of the sums m Eq. [2.6] into two iV/4-point DFTs, which would then be
combined to yield the M2-point DFTs. Thus G{k) and H(k) in Eq. [2.6] would be
computed as indicated below:
(N/2)-l (///4)-l (Af/4)-l

r=0 1=0 1=0

or
(A'/4)-l (A'/4)-l

Gik)= Z S(21WL + W.n Z S(2l + lW.>. [2-7]

Similarly,
(_N/A)-\ {N/A)-l

H{k)^ 2 K2l)Wl,,-^W\n E h{2Ul)Wl„ [2.8]

;=o /=o

Thus if the four-point DFTs in Fig. 2.1 are computed according to.Eqs. [2.7]
and [2.8], then that computation would be carried out as indicated in Fig. 2.2.
Inserting the computation indicated in Fig. 2.2 into the flow graph of Fig. 2.1, we
obtain the complete flow graph of Fig. 2.3. Note that we have used the fact that

For the eight-point DFT that we have been using as an illustration, the
computation has been reduced to a computation of two-point DFTs. The two-point
DFT of, for example, x(0) and x(4), is depicted in Fig. 2.4. With the computation of
Fig. 2.4 inserted in the flow graph of Fig. 2.3, we obtain the complete flow graph for
computation of the eight-point DFT, as shown in Fig. 2.5.
AO) O- <V(0)

xW 0 - * C(l)

G-(2)
•'^(2) a
N
' - point
4

^(6)0J '
DFT G(3)

Fig. 2.2 Flow graph of the decimation-ui-time decomposition of an Ny2-point

DFT computation into two N/4-point DFT computations (N - 8).

J:(0) JV(0)

J:(4) Ad)

x(2) A(2)

x(6) X(3)

^(I) XW

J:(5) X(5)

j:(3) X(6)

•'(T) O A-(7)

Fig. 2.3 Result of substituting Fig. 2.2 into Fig. 2.1.

W>i

W2=wT'-i
Fig. 2.4 Flow graph of a two-point
DFT.

^0) m)

x(4) m)
x(2) X(2)

X(3)

A'(4)

x{5) X(5)

x(3) ;tf[6)

A<7) ;^7)

(f.

Fig. 2.5 Flowgraph of complete decimation-in-time decomposition

of an eight-point DFT computation.

For the more general case with A^' a power of 2 greater than 3, we would
proceed by decomposing the M4-point transforms in Eqs. [2.7] and [2.8] into AV8-
point transforms, and continue until left with only two-point transforms. This requires
V stages of computation, where v = log2 N. Previously we found that in the original

10
decomposition of an A'^-point transform into two iV/2-point transforms, the number of
complex multiplications and additions required was A^ + 2{NI2f. When then M2-point
-J

transforms are decomposed into //M-point transforms, then the factor of (A72) is
replaced by Nil + 2{NIAf, so the overall computation then requires N + N + 4(N/4f
complex multipications and additions. If N = 2\ this can be done at most v= log2 N
times, so that after carrying out this decomposition as many times as possible the
number of complex multiplications and additions is equal to A'' log2 N.
The flow graph of Fig. 2.5 displays the operations explicitly. By counting
branches with transmittances of the form fP"^, we note that each stage has N

complex multiplications and N complex additions. Since there are log2 N stages, we
have, as before, a total of A'' Iog2 A^ complex multiplications and additions. This is the
substantial computational savings that previously indicated was possible. We shall see

that the symmetry and periodicity of fy^ ^^^ ^^ exploited to obtain fiirther

reductions in computation.

2,3 Decimation-in-frequency FFT Algorithms

The decimation-in-time FFT algorithms were all based upon the

decomposition of the DFT computation by forming smaller and smaller subsequences
of the input sequence, x(n). Alternatively we can consider dividing the output
sequence, X{k), into smaller and smaller subsequences in the same maimer. The class
of FFT algorithms based on this procedure is conmionly referred to as decimation-in-
frequency. To derive the decimation-in-frequency forms of the FFT algorithm for A'' a
power of 2, we can first divide the input sequence into the first half and the last half of
the points so that

n =0 n = N 12

It is important to obsei-ve that while Eq. [2.9] contains two summations over A^/2

points, each of these summations is not an A'/2-point DFT since j ^ " ^ rather than

u
Jj/"'' ^ appears in each of the sums. Combining the two summations in Eq. [2.9] and

using the fact that J^^ ' = (- 1) we obtain

{N/2)-\

n=0
c{n) + (-\^ x{n+ — w: nk
[2.10]

Let us now consider k even and k odd separately, with X{2r) and X{2r + 1)
representing the even-numbered points and the odd-numbered points, respectively, so
that
(W/2)-l

X{2r)=
n=0
2 x{n) + X « + • w:2/71
[2.11]

(A'/2)-l
yv
X{2r + \)^ X
n=0
x(«)- n+- r;jr;' 2m

r=0,l,...,(iV/2-l) [2.12]

Equations [2.11] and [2.12] can be recognized as M2-point DFTs; in the case of Eq.
[2.11], of the sum of the first half and the last half of the mput sequence, and the in

case of Eq. [2.12], of the product of J ^ ^ with the difference of the first half and the

last half of the input sequence. As distinguished from Eq. [2.10] the two summations
in Eqs. [2.11] and [2.12] correspond to 7V/2-pokit DFTs because

w7=w:n
Thus on the basis of Eqs. [2.11] and [2.12] with g{n) - x(n) + x{n + N/2) and h{n) =
x(n) - x(n + NI 2), the DFT can be computed by first forming the sequences g{ri) and

h{n), then computing h{n) J^"^, and finally computing the A'72-point DFTs of these

two sequences to obtain tlie even-numbered output points and odd-numbered output
points, respectively. The procedure suggested by Eqs. [2.11] and [2.12] is illustrated
for the case of an eight-point DFT in Fig. 2.6.
Proceeding in a manner similar to that followed in deriving the decimation-in-
time algorithm, we note that since iV is a power of 2, N/2 is even, and consequently,
the A^/2-point DFTs can be computed by computing the even-numbered and odd-
numbered output points for those DFTs separately. As in the case of the original
decomposition leading to Eqs. [2.11] and [2.12], this is accomplished by combining
the first half and the last half of the input points for each of the N/2 -point DFTs and
then computing W4-point DFTs. The flow chart resulting from taking this step for the
eight-point example is shown in Fig. 2.7. For the eight-point example, the
computation has now been reduced to the computation of two-point DFTs, which, as
was discussed previously, are implemented by adding and subtracting the input point.
Thus the two-point DFTs in Fig. 2.7 can be replaced by the computation shown in
Fig. 2.8, so the computation of the eight-point DFT becomes that shown in Fig. 2.9.
By counting the arithmetic operations in Fig. 2.9, and generalizing to A'^ = 2",
we see that the computation of Fig. 2.9 requires Nil \0g2N complex multiplications
and N log2N complex additions. Thus the total computation is the same for the
decimation-in-frequency and the decimation-in-time algorithms.

0^(0)

o^(2)

* OX(4)

OX(6)

oxa)

* OA:(3)

OX(5)

OX(7)

Fig. 2.6 Flow graph of the decimation-in-frequency decomposition of an N-point

DFT computation into two N/2-point DFT computations (N = 8).

13
xm

O '*^(''>

*• Qxii)

O ^(«)

^—O^'"

0^(5)

»^—O^m

X(7)

Fig. 2.7 Flowgraph of the decimation-in-frequency decomposition of an

eight-point DFT computation into fom- two-point DFT computation.

>-.(p) OXip)

X..(q) OX^q)

Fig. 2.8 Flowgraph of a typical two-point DFT as required in

the last stage of decimation-in-frequency decomposition.

^•(0) o * O^m

^0)0

•^a) o

Fig. 2.9 Flowgraph of complete decimation-in-frequency decomposition

of an eight-point DFT computation. .

14
2.4 Prevention of overflow in fixed point arithmetic

In implementing an FFT algorithm with fixed-point arithmetic we must ensure

against overflow. There are following techniques used for preventing overflow,
(i) If a N-point FFT is implemented in fixed-point aritlnnetic then input sequence
is attenuated by a factor of N.
\xin)\<\/N, 0<n<N-l
This ensures no overflow, but this method severely distorts the signal,
(ii) Overflow can be prevented by requiring that |;c(«)|<l i.e. input sequence is

fi'action number and incorporating an attenuation of 1/2 at the input to each

stage i.e. right shifting at input of every stage,
(iii) A third approach to avoiding overflow is the use of block floating point. In
this procedure the original array is normalized to the far left of the computer
word, with the restriction that \x{n)\ < 1; the computation proceeds in fixed-
point manner, except that after every addition there is an overflow test. If
overflow is detected, the entire array is divided by 2 and the computation
continues. The number of necessary shifts we counted to determine a scale
factor or exponent for the entire final array.

15
CHAPTERS
INTRODUCTION TO VHDL
INTRODUCTION TO VHDL

As the size and complexity of the digital systems increases, more computer
aided design tools are introduced into the hardware design process. The early paper-
and- pencil design methods have given way to sophisticated design entry, verification
and automatic hardware generation tools. The newest addition to this design
methodology is the introduction of Hardware Description Languages (HDL). Based
on HDLs, new digital system CAD (Computer Aided Design) tools have been
developed and are now being utilized by the hardware designers. Hardware
description languages are used to describe hardware for the purpose of simulation,
modeling, testing, design, and documentation of digital systems. These languages
provide a convenient and compact format for the hierarchical representation of
functional and wiring details of digital systems. Some Hardware Description
Languages consists of a simple set of symbols and notations which replace schematic
diagrams of digital circuits, while others are more formally defined and may present
the hardware at one or more levels of abstraction. Available software for HDLs
includes simulators and hardware synthesis programs. For the design of large digital
systems, much engineering time is spent in changing formats for using various design
aids and simulators. An integrated design environment is useful for better design
efficiency in these systems. In an ideal design environment, the high level description
of the system is understandable to the managers and to the designers, and it uniquely
and unambiguously defines the hardware. This high level description can serve as the
documentation for the part as well as an entry point into the design process. As the
design process advances, additional details are added to the initial description of the
part. These details enable the simulation and testing of the system at various levels of
abstraction. By the last stage of design, the initial description has evolved into a
detailed description, which can be used by a program controlled machine for
generation of final hardware in the form of layout, printed circuit board, or gate
arrays. This ideal design process exists only if a language exists to describe hardware
at various levels so that it can be understood by the managers, users, designers,
testers, simulators and machines. The IEEE standard VHDL hardware description
language is such a language. VHDL stands for very high speed integrated circuit
hardware description language (VHSIC). In 1980 US government developed VHSIC
project to enhance the electronic design process, technology and procurement,
spawning development of many advanced integrated circuit process technologies.
VHDL was defined because a need existed for an integrated design and
documentation language to communicate design data between various levels of
abstraction. At the time, none of the existing hardware description languages fully
satisfied these requirements, and the lack of precision in English made it too
ambiguous for this purpose. Introducing VHDL and synthesis enables the design
community to explore a new design methodology. The traditional design approach, as
shown in Fig. 3.1 starts with drawing schematics and then performs functional and
timing simulation based on the same schematic. If there is any design error, the
process iterates back to update schematics. After the layout, functions and back-
aimotated timing are verified again with the same schematics.

Schematic

Function and
time checking

Layout

Fig. 3.1 Traditional schematic design approach.

The VHDL based design approach is illustrated in Fig. 3.2. The design is
functionally described with VHDL. VHDL simulation is used to verify the
functionality of the design. In general, modifying VHDL source code is much faster
than changing schematics. This allows designers to make faster functionally correct
designs, to explore more architecture trade-offs, and to have more impact on the
designs. After the function match the requirements, the VHDL code is synthesized to
generate schematics (or equivalent netlists). The netlist can be used to layout the
circuit and to verify the timing requirements (both before and after the layout). The
design changes can be made by modifying the VHDL code or changing the
constraints (timing, area and so on) in the synthesis. This new design approach and
methodology has improved the design process by shortening the design time, reducing
the number of design iterations, and increasing the design complexity that designers
can manage.

VHDL Synthesis
1^

Functional Timing
Simulation verification

Layout

Fig. 3.2 VHDL based design approach.

3.1 VHDL advantages:

VHDL offers the followings advantages for the digital design.
(I) Standard: VHDL is an IEEE standard. It reduces confusion and
makes interfaces between tools, companies, and products easier. Any
development to the standard would have better chances of lasting
longer and have less chance of becoming obsolete due to
incompatibility with others.
(II) Govt, support: VHDL is the result of VHSIC program; hence, it is
clear that the US government supports the VHDL standard for
electronics procurement. The department of defense (DOD) requires
contractors to supply VHDL, for all ASIC designs.
(III) Industry support: With the advent of more powerful and efficient
VHDL tools has come the growing support of the electronic industry.
Companies use VHDL tools not only with regard to defense contracts,
but also for their commercial designs.

18
(IV) Portability: The same VHDL code can be simulated and used in
many design tools and at different stages of the design process. This
reduces dependency on a set of design tools whose limited capability
may not be competitive in later markets. The VHDL standard also
transforms design data much easier than a design database of a
proprietary design tool.
(V) Modeling capability: VHDL was developed to model all levels of
design, from electronic boxes to transistors. VHDL can accommodate
behavioral constructs and mathematical routines that describe complex
models, such as queuing networks and analog circuits. It allows use of
multiple architecture and associates with the same design during
various stages of the design process. As shown in Fig. 3.3, VHDL can
describe low-level transistors up to very large systems.

^
Lc

Transistor Gate Block ASIC

Design complexity

Fig. 3.3 VHDL modeling capability.

(VI) Reusability: Certain common designs can be described, verified and

modified sliglitly in VHDL for future use. This eliminates reading and
marking changes to schematic pages, which is time consuming and
subject to error. For example, a parameterized multiplier VHDL code
can be reused easily by changing the width parameter so that the same
VHDL code can do either 16 by 16 or 12 by 12 multiplication.
(VII) Technology and foundry independence: The functionality and
behavior of the design can be described with VHDL and verified,
making it foundry and technology independent. This frees the designer
to proceed without having to wait for the foundry and technology to be
selected.
(VIII) Documentation: VHDL is a design description language, which
allows documentation to be located in a single place by embedding it
in the code. The combining of comments and the code that actually
dictates what the design should do reduces the ambiguity between
specification and implementation.
(IX) New design methodology: Using VHDL, and synthesis creates a new
methodology that increases the design productivity, shortens the design
cycle, and lowers cost. It amounts to a revolution comparable to that
introduced by the automatic semi-custom layout synthesis tools of the
last few years.

3.2 VHDL design flow

In the beginning, a digital system is described in VHDL at the behavioral

level. The behavioral level description is then simulated extensively. After that, the
system is described in VHDL at the more complex RTL level and then again
simulated before going down to the most complex gate level. There is no need to enter
the schematics at the gate level because the same can be generated by using a
synthesis tool from the RTL level description of the design. The High-level design
description is absolutely technology independent and hence the same design can be
realized into any technology. The technology independent feature of the High level
design is very attractive because the same design description can be ported onto any
technology. It is possible to end up in a better design through this approach because
decision regarding the architecture is taken at a less complex behavioral or RTL level
where one can analyze different design trade-off more quickly because the simulation

20
efficiency at higher levels of design abstraction is much more than the gate level.
VHDL design process flowchart is shown in Fig. 3.4.

DESIGN IDEA

'
BEHAVIORAL DESCRIPTION
IM VHDL

'
SIMULATION

LAYOUT & MASKS AND

FURTHER SPICE
SIMULATION

IT
DESIGN NOT
FEASIBLE
RTL DESCRIPTION
IN VHDL
MODIFY RTL
DESCRIPTION

Fig. 3.4 VHDL design How.

3.3 Logic synthesis

Synthesis in the domain of digital design, is the process of translation and

optimization. For example, layout synthesis is the process of taking a design netlist
and translating it into a form of data that facilitates placement and routing, resulting in
optimizing timing and/or chip size. Logic synthesis, on the otherhand, is the process
of taking a form of input (VHDL code), translating it into a form (Boolean equation
and synthesis tool specific), and then optimizing interms of propagation delay and/or
area.

21
CHAPTER-4
DESIGN OF FFT PROCESSOR
DESIGN OF FFT PROCESSOR

4.1 Architecture of FFT Processor

The architecture design of the FFT processor is aimed to achieve higli degree
of parallelism and thus high speed. In most general-purpose computers, a single
hardware multiplier is available. In the pipeline FFT there can be as many as ten
separate "butterfly boxes" (for 1024 point radix 2-FFT), which correspond to 40 real
multipliers (since each butterfly contains a complex multiplier that contains 4 real
multipliers). The pipeline FFT structure is 2 to 20 times more efficient than any
general-purpose computer structures. Because of its high efficiency and also because
of a relatively simple control mechanism, the pipeline FFT appears at present to be
the most important special FFT processor for very high-speed applications.

4.2 Radix-2 Pipeline FFT

The flow diagrams for In-place 8-point FFT with normally ordered inputs and
bit-reversed outputs shown in Fig. 4.1.

*(0) o 0^(0)

xmo OX(4)

x(2)0 O Xp)

x{3) O- 0^(6)

xwo o;*^(i)

x(5)0 OX{5)

•t(6)a 0^(3)

*(7) Q oxa)

Fig. 4.1 In-place 8-point FFT with normally ordered inputs

and bit-reversed outputs.

Let us assume that the signal samples appear at the input sequentially, x(0),
x(\), etc. The Fig. 4.2 shows a very simple arrangement for perfonning the first stage
of an FFT corresponding to the flow diagram of Fig. 4.1. The first four samples x(0)
through ;c(3) are switched into the four-stage delay element. The next four samples are

n
switched to the other input line of the system. Assuming that the butterfly
computation time is exactly equal to the sampling interval, the entire first stage of the
FFT is performed in the subsequent four-sample intervals following the switching.
The results of the first stage labeled as x\n) appear in parallel pairs at the butterfly
output.

I''BUTTERFLY

4-stage delay element

n = 0,1,2,3

n = 4,5.6,7

COEFFICIENT
MEMORY

Fig. 4.2 First FFT pipeline stage.

Since the coefficient J/f/'^ changes fi"om sample to sample, the coefficient

memory must be entering its information to the butterfly at the same rate (the
sampling rate) as the signal. It is evident fi:om Fig. 4.1 that the structural form of
stage 1 is repeated twice in stage 2. Thus, an arrangement has to be devised that will
process x (n) {n = 0,1,...,3} and x (n) (n = 4, 5,...,7} in a manner similar to the way
x(n) { n = 0,1,...,7} was processed. This contrivance is shown in Fig. 4.3. By using
appropriate delays and switches, the partly processed samples are lined up in exactly
the way specified by Fig. 4.1. Tlius, the "spacing" (difference between the samples in
time) is four time units for the first butterfly and two time units for the second. A
complete 8-point pipeline FFT is shown in Fig. 4.4. Tlie symmetry in the structure can
be exploited to construct pipeline FFT's with larger W through extrapolation.

23
2-stage
delay
4-stage delay element element

^"(n)

A 0 1 2 3 .... x(n)

B 4 5 6 7

C Q 1 2 3

D 4 5 6 7
C->E, D-^F C-vE, D->F

sw "" STRAIGHT THROUGH

CRISSCROSS
G 0 1 4 5 .... x"(n)
H 2 3 6 7

Fig. 4.3 First and second stage of 8-point pipeline FFT, radix 2.
DIF

4-srage delay 2-stage I-stage

elemeut SWl D.E SW2 D.E

2-stage 1-stage
^ D4
x(n)

A I 2 3 ..xin)

B ..V(,>)

E 6 7

D->F, E-+G D ->F,E->G

STRAIGHT THROUGH
SWl
D-+G, E-+F
CRISSCROSS

H ..X (n)

I 2 3 6 7

SW2 STRAIGHT THROUGH

CRISSCROSS
J ..A^..)

Fig. 4.4 Complete 8-point, radix 2, pipeline FFT, DIF.

24
The following points are important with reference to Fig. 4.4.
1. The delay elements in a given stage are half as that of the delay elements
in an earlier stage.
2. The arithmetic elements are busy only half the sampling time.
3. Each switch switches at double the rate of its predecessor.
4. The basic clocking interval of the whole system is naturally equal to the
sampling rate.
5. The output is bit reserved as a function of real time.
To prove statement 5, it has been noticed that the indices in Fig. 4.4 are in
exact correspondence with the indices in Fig. 4.1. Since in Fig. 4.1, the resultant
output is bit reserved, so is the output of Fig. 4.4. Fig. 4.4 is a specific implementation
of Fig. 4.1, and thus possesses all its properties in addition to timing properties. The
pipeline FFT structure has a two-port output so that two frequency samples at a time
are available. The important point is that the indices shown on the last two line of
Fig. 4.4 are in actuality the bit-reversed indices of the output frequency samples.
With regard to statement 2, this is rather tricky point and the on time of the
butterfly is really dependent on how the input is interfaced with the processor. For
example, in Fig. 4.5 it is required that contiguous data blocks be processed in real
time.

REAL-TIME INPUT

N
I" DATA BLOCK 2"" DATA BLOCK

ON ON
I OFF I I" BUTTERFLY
N/2
ON ON
_| 2'"' BUTTERFLY
I I OFF I
3N/4
ON ON
1 OFF I 1 3"* BUTTERFLY
7N/8
ON ON
I 1 OFF I •4'" BUTTERFLY
I5N/I6

Fig. 4.5 On - off times for arithmetic elements

processing contiguous blocks of data.

25
It is clear from Fig. 4.2 through 4.4 processing cannot begin until half of the
data block has entered the processor. The first stage is completed in the next (7V/2)
cycles. At this moment, the first butterfly is turned off until the initial (AV2) values of
the next data block has been gathered into 4-stage delay element. The other butterfly
follows the same pattern with a delay. Therefore, the overall system efficiency is
50% since every butterfly is on exactly half the time. The timing diagram for 8-point
FFT is shown in Fig. 4.6.

CLK

RESET

SAMPLING
PULSE tn
SAMPLES )fci)i(t>i<i:;^(i)i(t)i^^
COUNT

I Bl I

butterfly K>1
operation

butterfly
operation

SW2 criss-cross
2 switch connected
operation

Fig. 4.6 Timing diagram for 8-point FFT.

4.3 VHDL Coding and synthesis of radix-2 Pipeline FFT

The radix-2 pipelkie FFT consists of butterfly circuits which computes

butterfly operation, some shift registers to provide appropriate delays to an
appropriate data, switches to provide routing of data in a desired fashion, weight
factor generator circuit to provide weight factors to butterflies and counter which
controls the operation switches and weight factor generator circuit. In Fig. 4.7 the
radix-2 pipeline FFT (Plff) is shown with its various components namely shift
registers (Sh4_l_16 (one block), Sh2_2_16 (two block), Shl_2_16 (two block)), two

26
switches (Sw4_l, Sw4_2), three butterfly blocks (Bfl), one countei (Cn), and one
weight factor generator block (Wfg) all connected in a particular fashion.

Sh4 1 16 Sw4_l Sh2_2_16 Sw4 2 Shi 2 16

~1F

Fig. 4.7 Complete 8 - point, radix 2, pipeline FFT, DIF.

Radix-2 butterfly (Bfl) consists of divide by two and adder / subtracter module
(Adsbl6), four two's complement multiplier module (Tml6), Correction and
rounding-off module (CR16). The hierarchical order of modules is shown in Fig. 4.8.

Plff
i
\ ] ] ? T 1 }
Sh4 1 16 Sh2 2 16 Shi 2 16 Sw4 1 Sw4 2 Cn Wfg

r
Bfl

Adsbl6 Tml6 CR16

Fig.4.8 Hierarchical order of modules.

4.3.1 Architecture of butterfly

The signal flow graph of butterfly is shown in Fig. 4.9. As shown in Fig. 4.9
the butterfly circuit requires two adders, two subtracters and one complex multiplier.

27
(A„+B,)+ yW+Bl)
AB<- M

BR+ JB,
yA , . i(Ai,-n„)+ ;(A-B,))x((r„+jito

( A R - B 8 ) + y(/V-B,)

WM-fK + yW

Fig. 4.9 Signal flow graph of butterfly

The expression {(AR - BR) + X A I - Bi)} x {W^ +jWi) can be expanded and as follows.
{(AR - BR) +7(AI - Bi)} X (WK +J-W:)
= [{(AR - BR) X ^R} - {(Ai - Bi) X Wi}]
+/-[{(AR - BR) X Wi} + {(Ai - BI) X F R } ] [4.1]

The expression shows that four real multipliers are required for complex number
multiplication. The block diagram of butterfly is shown in Fig. 4.10. It has three
blocks namely ADSB16, TM16, CR16. Here overflow is prevented by having
\x(n)\ < 1 i.e. the input sequence is normalized to a fraction and by incorporating an
attenuation of 1/2 at the input of each stage (right shifting the input at every stage).
The prevention of overflow, and addition and subtraction are performed in block
ADSB16. TM16 is a two's complement multiplier, which performs multiplication
with weight factors. CR16 is correction and rounding-off circuit.

TMti raoDtJOtO]

TMK raorHMfcoi
ASL9COR1IS 0]
ISLSCORllS-OJ

HJIAO)
TM16 rR0D(3a«I BI'"I ££14
^ODVSIllSOl
DIM-O]

UiARllSQl
TMI> PSODlJtOl
P rt30-0]
C(JO-01

AR[1S.0]
-BRLZ
-BILZ
[NBR[1S.0J • BR(I3 0\ " ^ OSmiJ 01

ABDK(ll.a|

Fig. 4.10 Block diagram of butterfly

circuit.

28
(a) Add-Subtract module (ADSB16)

This is a combination of adders and subtractors along with a prevention of

overflow circuit. The block diagram ADSB16 is shown in Fig. 4.11. Tlie behavior of
module is described in VHDL code given below.

DIVIDE BY TWO ADDER REAL

MODULE PART ABSR

DIVIDE BY TWO ADDER

MODULE - IMAGINARY ABSI
PART

DIVIDE BY TWO SUBTRACTOR

BR MODULE REAL PART ABDR

DIVIDE BY TWO SUBTRACTOR

MODULE IMAGINARY AUDI
PART

Fig. 4.11 Block diagram of ADSB module.

library SYNTH;
use SYNTH.VHDLSYNTH.all;
library IEEE;
use IEEE.STD_LOGIC_1164.all;
Entity ADSB16 is
Generic (SIZE : INTEGER := 16);
Port (AR, A I , BR, BI : IN STD_tILOGIC_VECTOR(SIZE - 1 downto 0);
ARL, AIL, BRL, BIL : OUT STD_ULOGIC;
ABSR, ABSI, ABDR, ABDI
: OUT STD_ULOGIC_VECTOR(SIZE - 1 downto 0));
End ADSB16;
Architecture BEHAVE of ADSBIS is
Signal ARI, All, BRI, BII : STD_UI,OGIC_VECTOR (SIZE - 1 downto 0) ;
FUNCTION TC{QQ:STD_ULOGIC_VECTOR) return STD_ULOGIC_VECTOR is
Constant INC: STD_ULOGIC_VECTOR(SIZE - 1 downto 0)
:=INT_TO_STD_ULOGIC_VECTOR(1,SIZE);
Begin
return {NOT{QQ) + INC);
End TC;
Begin
ARI(SIZE - 2 downto 0) <= AR(SIZE - 1 downto 1 ) ;
ARKSIZE - 1) <= AR(SIZE - 1);
ARL <= AR(0);
All(SIZE - 2 downto 0) <= AI(SIZE - 1 downto 1 ) ;

29
All (SIZE - 1) <= AKSIZE - 1) ;
AIL <= AI(0) ;
BRKSIZE - 2 downto 0) <= BR (SIZE 1 downto 1) ;
BRKSIZE - 1) <= BR (SIZE - 1) ;
BRL <= BR(0);
BIKSIZE - 2 downto 0) <= BI(SIZE 1 downto 1 ) ;
BIKSIZE - 1) <= BKSIZE - 1) ;
BIL <= BI(0);
ABSR <= ARI + BRI;
ABSI <= All + BII;
ABDR <= ARI + TC(BRI);
ABDI <= All + TC(BII);
End BEHAVE;

(b) Two's complement multiplier (TM16)

This module multiplies two numbers in two's complement format. The multiplier
used is an array multiplier and thus it is a very fast multiplier. Four real multipliers are
required for complex number multiplication.

Architecture of two's complement multiplier

The two's complement multiplier is shown in Fig. 4.12.

I K y(3) IN V(3) UVy{3) IIW ( 3 ) i>ri'(Z) IN y(i) IN k'(O)

0 0 0 0 C c

1 /
BI 3= B ^
1 /
B 3=
\
B
/
B
/
^
\
B
/
5i: B 0

-^ , INX(O)
\ / \ / •' / \
/ • / \ /
BI 3l B ^ B 3= B B 5; B •*

-* , INX(l)
• / • / \ / • / '
BI :3^ B 5r B 5^ B " ^ B

y
• ^

1 INX(2)

1 /
BI 3=
''
"1
/
5-
''
BI
/
3- BI
-^— INY(1)

\
PROD((i)
\
l'R0D(5)
1
l'R0D(4)
\
PR0D(3) PRODp) PROD(l) PROD(O)

Fig. 4.12 Four bit two's complement multiplier.

It has two cells namely 'B' and 'BF as shown in Fig. 4.13. The function of
cells B and BI are defined in the truth table in Fig. 4.14.

30
CellB
It has two inputs namely 'a' and '6', two control inputs namely 'x,' and 'Xi_ i'
and two outputs namely 'z' and 'bo'.

L/ L/
BI

bo z
T
Fig. 4.13 Block diagram of cells B and BI.

Cell BI
It has two inputs namely 'a' and '6', two control inputs namely 'xC and 'Xi_ i'
and one output namely 'z'. The truth table for BI is same as B except there is no '6o'
output.

THUTH TABLE FOR B

Xi Xi.\ z bo
0 0 a b
0 1 a+b b
1 0 a + NOT(6) + 1 b
1 1 a b

TRUTH TABLE FOR BI

X\ Xi-i z
0 0 a
0 1 a+b
1 0 a + NOT(i) + 1
1 1 a

Fig. 4.14 Truth tables for cells B and BI.

The block diagram for two's complement multiplier as shown in Fig. 4.12 can
be simplified and is shown in Fig. 4.15.
OOOOOOOVa INY(3:a)

LVX(0)

INX(l)

rV INX(2)

n«(3)

rR0D(3) PR0D(2) PROD(l) PROD(0)

PROD(6:4)

Fig. 4,15 Four bit two's complement multiplier.

It has three cells namely 'BLEX', 'BLI' and 'BLII' for which symbols and
their truth table are shown in Fig. 4.16 and Fig. 4.17 respectively.

K(SZ: 0) A(S:0) B(S:0)

i A-l-l
BLEX BLI

P(SZx2:0) BO(S - 1 : 0 ) Z{S -1 : 0) zl

A(S : 0) B(S : 0)

i
BLII
XI

Z(S - 1 : 0) zl

Here SZ and S are MSB indices

Fig. 4.16 Block diagram of cells BLEX, BLI

and BLII.

32
TRUTH TABLE FOR CELL BLEX

P(SZx2 : SZ + 1) = K(SZ) & K(SZ) & & K(SZ), (SZx2 - SZ) times
P(SZ : 0) = K(SZ : 0)

TRUTH TABLE FOR CELL BLI

X\ X;., z zl BO
0 0 A(S : 1) A(0) B(S - 1 : 0)
0 1 CA + B)fS:l) fA + B)(0) B(S - 1 : 0)
1 0 (A + NOT(B) + l ) ( S : l ) (A + NOT(B) + 1)(0) BfS - 1 : 0)
1 1 A(S : 1) A(0) B(S - 1 : 0)

TRUTH TABLE FOR CELL BLH

J^i Xi.\ z zl
0 0 AfS: 1) A(0)
0 1 (A + B)(S:1) (A + B)fO)
1 0 (A + N0T(B) + 1XS:1) (A + NOT(B) + 1)(0)
1 1 ACS : 1) ACO)

Fig. 4.17 Truth tables for cells BLEX, BLI and BLH

Two's complement multiplier shown in Fig. 4.15 can be further simplified and
is shown in Fig. 4.19. It has four cells namely 'BLKO_I', 'BLKO_n', 'BLKI' and
'BLKII' for which symbols and their truth tables are shown in Fig. 4.18 and Fig. 4.20
respectively.

i
p(SZ: 0) i
k(S : 0)
BLKO_I
BLKO_n
j(SZ)c2 :0)

T
ii(S - 1 : 0) m

T^
i ^ J ZA
«(S : 0)

BLKI
b(S': 0)

blc(S : 0)

. .
A .(S : 0)

BLKII
b(S : 0)

bIc(S : 0)

. . .
xi_xi_l JII_XI_I

A b(S - I : 0)
botc(S - 1 : 0 ) ili(S - 1 : 0 ) zl zhlS - 1 : 0 ) zl

7 I Here SZ ind S ire MSB Indicei

J
Fig. 4.18 Block diagram of cells BLKOJ,
BLKOJI, BLKI and BLKII

33
INYP : 0)

INX(O)

,1NX(1:0)

,INX(2:1)

BLKII l i x i j , INX(3 :2)

zh zl

PROD(6:4) PROD(3) PROD(2) PROD(l) PROD(0)

Fig. 4.19 Four bit two's complement multiplier.

34
TRUTH TABLE FOR CELL BLKO_I

j(SZ x2 : SZ+ 1) = p(SZ) & p(SZ) & &p(SZ), (SZ x 2 - SZ) times
j(SZ : 0) = p(SZ : 0)

TRUTH TABLE FOR CELL BLKO H

1 n m
0 ooa.o (so's) 0
1 k(S : 1) k(0)

TRUTH TABLE FOR CELL BLKI

xi xi 1 zh zl bo bote
00 arS: 1) a(0) b ( S - ] 0) bfc(S - 1 0)
01 (a + b)(S:l) (a + b)(0) b ( S - l 0) btc(S - 1 0)
10 (a + blc)(S : 1) (a + btc)(0) b ( S - l 0) btc(S - 1 0)
11 a(S: 1) a(0) b ( S - l 0) btc(S - 1 0)

Fig. 4.20 Truth tables for ceUs B L K O J , BLKO_n and BLKI

Truth table for cell BLKII is not shown in Fig. 4.20 but it is same as cell
BLKI except it has only two outputs namely 'zh' and 'zl'. Now VHDL code can be
written and is shown below.

library SYNTH;
use SYNTH.VHDLSYNTH.all;
library IEEE;
use IEEE. std_logic_1164.all;
entity Tnil6 is
Generic (SIZE : INTEGER := 16);
Port (INX, INY : IN STD_ULOGIC_VECTOR(SIZE - 1 downto 0);
PROD : OUT STD_lILOGIC_VECTOR (SIZE * 2 2 downto 0)) ;
end Tml6;
architecture behave of Tml6 is
signal ai, bi, btci :
STD_ULOGIC_VECTOR((SIZE - 1) * ((SIZE * 2 - 2) - (SIZE / 2)
downto 0) ;
procedure BLKO (signal X : IN STD_UL0G1C;
signal Y : IN STD_UL0G1C_VECT0R;
signal zl : OUT STD ULOGIC;

35
signal zh : OUT STD_ULOGIC_VECTOR;
signal bo, bote : OUT STD_ULOGIC_VECTOR) is
Constant INC: STD_ULOGIC_VECTOR(Y'High * 2 - Y'Low downto Y'Low)
:=INT_TO_STD_ULOGIC_VECTOR(l, Y'Length * 2 - 1 ) ;
Constant G : STD_ULOGIC_VECTOR(Y'High * 2 - Y"Low downto Y'Low)
:=INT_TO_STD_ULOGIC_VECTOR{0, Y'Length * 2 - 1 ) ;
Variable YX, YXTC : STD_ULOGIC_VECTOR(Y'High * 2 - Y'Low downto
Y'Low) ;
Variable zi : STD_ULOGIC_VECTOR(Y'High * 2 - Y'Low downto Y'Low);
begin
YX(Y'High downto Y'Low) := Y;
LOl : For i in 1 to Y'Length - 1 loop
YX(Y'High + i) := Y(Y'High);
YXTC := ((NOT YX) + INC);
end loop LOl;
Case X is
WHEN '0' => zi := G;
WHEN others => zi := YXTC;
End Case;
zl <= zi(Y'Low);
zh <= zi(Y'High * 2 - Y'Low downto Y'Low + 1 ) ;
bo <= YX(Y'High * 2 - Y'Low - 1 downto Y'Low);
bote <= YXTC(Y'High * 2 - Y'Low - 1 downto Y'Low);
return;
End BLKO;
procedure BLKI (signal a, b, btc : IN STD_ULOGIC_VECTOR;
signal xi_xi_l : IN STD_ULOGIC_VECTOR(1 downto 0 ) ;
signal zl : OUT STD_ULOGIC;
signal zh : OUT STD_ULOGIC_VECTOR;
signal bo, bote : OUT STD_ULOGIC_VECTOR) is
Constant G : STD_ULOGIC_VECTOR(b'High downto b'Low)
:=INT_TO_STD_ULOGIC_VECTOR(0 , b'Length) ;
Variable zix, zi : STD_ULOGIC_VECTOR(b'High downto b'Low);
begin
Case xi_xi_l is
WHEN "01" => zix := b;
WHEN "10" => zix := btc;
WHEN Others => zix := G;
End Case;
zi := a + zix;
zl <= zi(b'Low);

36
zh <= zi(b'High dovmto b'Low + 1 ) ;
bo <= b(b'High - 1 downto b'Low);
bote <= btc(btc'High - 1 downto btc'Low);
return;
End BLKI;
procedure BLKII (signal a, b, btc : IN STD_ULOGIC_VECTOR;
signal xi_xi_l : IN STD_ULOGIC_VECTOR(1 downto 0 ) ;
signal zl : OUT STD_ULOGIC;
signal zh : OUT STD_ULOGIC_VECTOR) is
Constant G : STD_ULOGIC_VECTOR(b'High downto b'Low)
:=INT_TO_STD_ULOGIC_VECTOR{0, b'Length);
Variable zix, zi : STD_ULOGIC_VECTOR(b'High downto b'Low);
begin
Case xi_xi_l is
WHEN "01" => zix := b;
WHEN "10" => zix := btc;
WHEN Others => zix := G;
End Case;
zi := a + zix;
zl <= zi(b'Low);
zh <= zi(b'High downto b'Low + 1 ) ;
return;
End BLKII;
begin
GO : For i in 0 to SIZE - 1 generate
GI : if (i = 0) generate
BLKO(X => INX{i), Y => INY, zl => PROD(i),
zh => ai((i + 1) * (SIZE * 2 - 2) - (i * (i + 1) / 2) - 1
downto i * (SIZE * 2 - 2) - i * (i - 1) / 2 ) ,
bo => bi((i + 1) * (SIZE * 2 - 2) - (i * (i + 1) / 2) - 1
downto i * (SIZE * 2 - 2) - i * (i - 1) / 2) ,
bote => btci((i + 1) * (SIZE * 2 - 2) - (i * (i + 1) / 2) - 1
downto i * (SIZE * 2 - 2) - i * (i - 1) / 2));
end generate;
GII : if (i > 0) AND (i < SIZE - 1) generate
BLKI(a => ai(i * (SIZE * 2 - 2 ) - i * ( i - l ) / 2 - l
downto (i - 1) * (SIZE * 2 - 2) - (i - 1) * (i - 2) / 2 ) ,
b => bi(i * (SIZE * 2 - 2 ) - i * ( i - l ) / 2 - l
downto (i - 1) * (SIZE * 2 - 2) - (i - 1) * (i - 2) / 2 ) ,
btc => btci(i * (SIZE * 2 - 2 ) - i * ( i - l ) / 2 - l
downto (i - 1) * (SIZE * 2 - 2) - (i - 1) * (i - 2) / 2 ) ,

37
xi_xi_l(0) => INX(i - 1),xi_xi_l(l) => INX{i), zl => PROD(i),
2h => ai((i + 1) * (SIZE * 2 - 2) - i * (i + D / 2 - 1
downto i * (SIZE * 2 - 2) - i * (i - 1) / 2) ,
bo => bi((i + 1) * (SIZE * 2 - 2) - i * (i + 1) / 2 - 1
downto i * (SIZE * 2 - 2) - i * (i - D / 2) ,
bote => btci((i + 1) * (SIZE * 2 - 2) - i * (i + 1) / 2 - 1
downto i * (SIZE * 2 - 2) - i * (i - 1) / 2));
end generate;
GUI : if (i = SIZE - 1) generate
B L K l K a => a i d * (SIZE * 2 - 2 ) - i * ( i - l ) / 2 - l
downto (i - 1) * (SIZE * 2 - 2) - (i - 1) * (i - 2) / 2 ) ,
b => bi(i * (SIZE * 2 - 2 ) - i * ( i - l ) / 2 - l
downto (i - 1) * (SIZE * 2 - 2) - (i - 1) * (i - 2) / 2 ) ,
btc => btci(i * (SIZE * 2 - 2 ) - i * ( i - l ) / 2 - l
downto (i - 1) * (SIZE * 2 - 2) - (i - 1) * (i - 2) / 2) ,
xi_xi_l(0) => INX(i - 1 ) , xi_xi_l(l) => INX(i),
zl => PROD(i), zh => PROD(SIZE * 2 - 2 downto SIZE));
end generate;
end generate;
end behave;

(a) Correction and rounding-off circuit (CR16)

The weight factor 1.0 cannot be represented in two's complement (fraction)

format and one of the weight factors (Jf N^ happens to be 1.0. This problem is

overcome by choosing a number out of 2'^ possible combinations which is not taken
as a weight factor. The weight factors for 1024 points FFT are calculated and a
number other than these (1024/2) - 1 =511 weight factors is chosen to represent a
unity weight factor, say 012C\H. A number multiplied with 1.0 gives the number
itself but when a number is multiplied with 012C\H the output of multiplier is not
correct so it has to be coaected. This correction is done in correction module.
Output from multiplier is 31-bit wide, which has to be rounded to 16-bit. Suppose
a 5-bit number is available and it is to be rounded-off to 3-bit then 2-bit LSBs are
compared with 10 (binaiy). If 2-bit LSBs is greater or equal to (10) then 001 (binary)
is added to 3-bh MSBs otherwise 000 (binary) is added. After addition the number
available is rounded-off to 3-bits. The correction and rounding-off circuit is shown in
Fig. 4.21. Truth tables for MUXl and MUX2 are shown in Fig. 4.22. The VHDL code
is shown below.

38
A[IS.O|l

OOOOQOOOOOOOOOO

B|I5.0|I

00000{H)(K)000000 XHllSI

D[30:0|l

FIJO.OH

RSLCOR|I5.0|

BZI(I5|

E|30:a|a

CpOiOIB SpOKll
T(30KII OIOOOOOOOOOOOQOO

ISLCORIlS:0Ia J
Fig. 4.21 Correction and rounding-off circuit.

Truth table for MUXl

INPUT OUTPUT
T U V
012C (HEX) P Q
Others R s
Truth table for MUX2

INPUT OUTPUT
L K
0 I
Others J

Fig. 4.22 Truth table for MUXl and MUX2

library SYNTH;
use SYNTH.VHDLSYNTH.all;
library IEEE;
use IEEE.STD_LOGIC_1164.all;

39
Entity CR16 is
Generic (SIZE : INTEGER := 16);
Port (RSLCOR, ISLCOR, A, B : IN STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;
D, E, F, G : IN STD_ULOGIC_VECTOR(SIZE * 2 - 2 downto 0 ) ;
H, I : OUT STD_ULOGIC_VECTOR(SIZE - 1 downto 0));
End CR16;
Architecture BEHAVE of CR16 is
Signal COI, COII, COIII, COIV, ICOIV, IICOIII
: STD_ULOGIC_VECTOR(SIZE * 2 - 2 downto 0 ) ;
Signal AZI, BZI : STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;
Begin
PO : Process (RSLCOR, ISLCOR, A, B, D, E, F, G )
Constant SIZE : INTEGER := 16;
Variable RSL, ISL : STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;
Begin
RSL := RSLCOR; ISL := ISLCOR;
Case RSL is
WHEN "0000000100101100" => COI(SIZE * 2 - 2 downto SIZE - 1)
<= A(SIZE - 1 downto 0 ) ;
COI(SIZE - 2 downto 0) <= "000000000000000";
COIII(SIZE * 2 - 2 downto SIZE - 1)
<= B(SIZE - 1 downto 0 ) ;
COIII(SrZK - 2 downto 0) <= "000000000000000";
WHEN Others => COI <= D; COIII <= F;
End Case;
Case ISL is
WHEN "0000000100101100" => COII(SIZE * 2 - 2 downto SIZE - 1)
<= A(SIZE - 1 downto 0 ) ;
COII(SIZE - 2 downto 0) <= "000000000000000";
COIV(SIZE * 2 - 2 downto SIZE - 1)
<= B(SIZE - 1 downto 0 ) ;
COIV(SIZE - 2 downto 0) <= "000000000000000";
WHEN Others => COII <= E; COIV <= G;
End Case;
End Process PO;
ICOIV <= (COI + COIV);
IICOIII <= (COII + COIII);
P2 : Process(ICOIV, AZI)
Constant SIZE : INTEGER := 16;
Begin
AZI <= C O ' & ICOIV (SIZE * 2 - 2 - 16 downto 0))

40
+ "0100000000000000";
Case AZKSIZE * 2 - 2 - 15) is
WHEN '0' => H <= ICOIV{SIZE * 2 - 2 downto SIZE - 1 ) ;
WHEN others => H <= ICOIV(SIZE * 2 - 2 downto SIZE - 1)
+ "0000000000000001";
End Case;
End Process P2;
P3 : Process(IICOIII, BZI)
Constant SIZE : INTEGER := 16;
Begin
BZI <= ('0' & IICOIII{SIZE * 2 - 2 - 16 downto 0))
+ "0100000000000000";
Case BZI(SIZE * 2 - 2 - 15) is
WHEN '0' => I <= IICOIII(SIZE * 2 - 2 downto SIZE - 1 ) ;
WHEN others => I <= IICOIII(SIZE * 2 - 2 downto SIZE - 1)
+ "0000000000000001";
End Case;
End Process P3;
End BEHAVE;

(d) Overall code of butterfly circuit (BFL)

The overall code of butterfly is shown belovi^ which describes the

interconnection of the three modules namely ADSB16, TM16 and CR16.

library SYNTH;
use SYNTH.VHDLSYNTH.all;
library IEEE;
use IEEE.STD_LOGIC_1164.all;
Entity BFL is
Generic (SIZE : INTEGER := 16);
Port (INAR, INAI, INBR, INBI, INWR, INWI
: IN STD_UL0GIC_VECT0R(SIZE - 1 downto 0 ) ;
ARLZ, AILZ, BRLZ, BILZ : OUT STD_UL0GIC;
GSR, OSI, ODWR, ODWI
: OUT STD_UL0GIC_VECT0R(SIZE - 1 downto 0));
End BFL;
Architecture BEHAVE of BFL is
Component ADSB16 Port (AR, Al, BR, BI
: IN STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;
ARL, AIL, BRL, BIL : OUT STD ULOGIC;
ABSR, ABSI, ABDR, ABDI
: OUT STD_tILOGIC_VECTOR{SIZE - 1 d o w n t o 0) ) ;

End Component;
Component TM16 Port {INX, INY
: IN STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;
PROD '
: OUT STD_ULOGIC_VECTOR(SIZE * 2 - 2 dovmto 0));
End Component;
Component CR16 Port (RSLCOR, ISLCOR, A, B
: IN STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;
D, E, F, G
: IN STD_ULOGIC_VECTOR(SIZE * 2 - 2 downto 0 ) ;
H, I : OUT STD_U1.0GIC_VECT0R(SIZE - 1 downto 0));
End Component;
For all : ADSB16 USE ENTITY W0RK.ADSB16;
For all : TM16 USE ENTITY WORK.TMIS;
For all : CR16 USE ENTITY W0RK.CR16;
Signal ABDRX, ABDIX : STD_UL0GIC_VECT0R(SIZE - 1 downto 0 ) ;
Signal DX, EX, FX, GX : STD_ULOGIC_VECTOR(SIZE * 2 - 2 downto 0 ) ;
Begin
CPO : ADSB16 Port Map (INAR, INAI, INBR, INBI, ARLZ, AILZ,
BRLZ, BILZ, OSR, OSI, ABDRX, ABDIX);
CPl : TM16 Port Map (ABDRX, INWR, DX)
CP2 : TM16 Port Map (ABDRX, INWI, EX)
CP3 : TM16 Port Map (ABDIX, INWR, FX)
CP4 : TM16 Port Map (ABDIX, INWI, GX)
CP5 : CR16 Port Map (INWR, INWI, ABDRX, ABDIX, DX, EX, FX,
GX, ODWR, ODWI);
End BEHAVE;

4.3.2 Shift registers

As mentioned above there are tliree shift register modules namely Sh4_l_16,
Sh2_2_16, Shl_2 16. All shift registers are positive edge triggered.

(a) Shift register (Sh4_l_16)

This module is a collection of sixteen 4-bit shift registers. The inputs of each
shift register is merged to form 16-bit input bus namely 'A' and outputs of each

42
register is also merged to form 16-bit output bus namely ' C . Tlie VHDL code is
given below.

library SYNTH,-
use SYNTH.VHDLSYNTH.all;
library IEEE;
use IEEE.STD_LOGIC_1164.all;
Entity SH4_1_16 is
Generic (SIZE : INTEGER := 16);
Port (A : I N STD_UIiOGIC_VECTOR ( S I Z E - 1 d o w n t o 0) ;
CLK -. IN STD_ULOGIC;
C : OUT STD_ULOGIC_VECTOR(SIZE - 1 downto 0));
End SH4_1_16;
Architecture BEHAVE of SH4_1_16 is
Signal AI, All, AIII : STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;
Begin
Process
Begin
WAIT UNTIL CLK'EVENT AND CLK='1';
AI <= A;
All <= AI;
AIII <= All;
C <= AIII;
End Process;
End BEHAVE;

(b) Shift register (Sh2_2_16)

This module is a collection of thirty-two 2-bit shift registers. The inputs of

sixteen shift registers are merged to form two 16-bit input buses namely 'A' and 'B';
and outputs of sixteen shift registers is also merged to form two 16-bit output buses
namely ' C and 'D'. The VHDL code is given below.

library SYNTH;
use SYNTH.VHDLSYNTH.all;
library IEEE;
use IEEE.STD_LOGIC_1164.all;
Entity SH2_2_16 is
Generic (SIZE : INTEGER := IS);
Port (A, B : IN STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;

43
CLK : IN STD_ULOGIC;
C, D : OUT STD_ULOGIC_VECTOR(SIZE - 1 downto 0));
End SH2_2_16;
Architecture BEHAVE of SH2_2_16 is
Signal AI, BI : STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;
Begin
Process
Begin
WAIT UNTIL CLK'EVENT AND CLK='1';
AI <= A;
C <= AI;
BI <= B;
D <= BI;
End Process;
End BEHAVE;

(c) Shift register (Shl_2_16)

This module is a collection of thirty-two one-bit shift registers. The inputs of

library SYNTH;
use SYNTH.VHDLSYNTH.all;
library IEEE;
use IEEE.STD_L0GIC_11S4.all;
Entity SH1_2_16 is
Generic (SIZE : INTEGER := 16);
Port (A, B : IN STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;
CLK : IN STD_ULOGIC;
C, D : OUT STD_-aLOGIC_VECTOR(SIZE - 1 downto 0));
End SH1_2_16;
Architecture BEHAVE of SH1_2_16 is
Begin
Process
Begin
WAIT UNTIL CLK'EVENT AND CLK='1';
C <= A;
D <= B;
End Process;

44
End BEHAVE;

4.3.3 Switches

Tliere are two switches namely Sw4_l, Sw4_2 which routes the data in a
particular fashion as required by the signal-flow graph shown in Fig. 4.1. Routing of
data is controlled by counter (Cn), which will be described later.

(a) Switch (Sw4_l)

The switch has four input buses of sixteen bit each namely 'A', 'B', ' C , 'D';
two bit select bus namely 'SEL' which controls the routing of data and four output
buses of sixteen bit each namely 'E', 'F', 'G', 'H'. The truth table for Switch Sw4_l
is shown in Fig. 4.23. The behavior of switch is described in VHDL code given
below.

TRUTH TABLE FOR SWITCH Sw4 1

INPUT OUTPUT
SEL E F G H
00 C D A B
01 A B C D
10 A B C D
11 C D A B

A[lS:OJv^ ,-£[15:01

B115:0I\ ,\ /F[15:01

/ \
D|15:01' II[15K)|
SELU:01

For SEL11.-0| = '10','01'

T
For SELlI:01"'0a','ir

Fig. 4.23 Truth table for switch Sw4_l

library SYNTH;
use SYNTH.VHDLSYNTH.all;
library IEEE;
use IEEE.STD_L0GIC_1164.all;
Entity SW4_1 is
Generic (SIZE : INTEGER := 16);
Port (A, B, C, D : IN STD_ULOGIC VECTOR(SIZE 1 downto 0)

45
SEL : IN STD_UL0GIC_VECT0R(1 downtO 0 ) ;
E, F, G, H : OUT STD_ULOGIC_VECTOR(SIZE - 1 downto 0));
End SW4_1;
Architecture BEHAVE of SW4_1 is
Begin
PO : Process (A, B, C, D, SEL)
Begin
Case SEL is
WHEN "10" => E <= A; F <= B; G <= C; H <= D;
WHEN "01" => E <= A; F <= B; G <= C; H <= D;
WHEN Others => G <= A; H <= B; E <= C; F <= D;
End Case;
End Process PO;
End BEHAVE;

(b) Switch (Sw4_2)

The switch has four input buses of sixteen bit each namely 'A', 'B', ' C , 'D';
select line namely 'SELO' which controls the routing of data and four output buses of
sixteen bit each namely 'E', 'F', 'G', 'H'. The truth table for Switch Sw4_2 is shown
in Fig. 4.24. The behavior of switch is described in VHDL code given below.

TRUTH TABLE FOR SWITCH Sw4 2

INPUT OUTPUT
SELO E F G H
0 • c D A B
1 A B C D

AII5:0] • E|IS:01 AI15:0I\ ,-£[15:0]

SnA 2
ciis-.o] • C1I5:0I C115:0I' ,\ t;|I5:0I

DI15:0)' 1I[1S:0|
SELO SELO

1
For SELO - ' r For SELO - '0'

Fig. 4.24 Truth table for switch Sw4 2

library SYNTH;
use SYNTH.VHDLSYNTH.all;
library IEEE;
use IEEE.STD_LOGIC_1164.all;

46
Entity SW4_2 is
Generic (SIZE -. INTEGER : = 16) ;
Port (A, B, C, D : IN STD_ULOGIC_VECTOR{SIZE - 1 downto 0 ) ;
SELO : IN STD_ULOGIC;
E, F, G, H : OUT STD_UL,OGIC_VECTOR (SIZE - 1 dovmto 0));
End SW4__2;
Architecture BEHAVE of SW4_2 is
Begin
PO : Process (A, B, C, D, SELO)
Begin
Case SELO is
WHEN '0' => G <= A; H <= B; E <= C; F <= D;
WHEN Others => E <= A; F <= B; G <= C; H <= D;
End Case;
End Process PO;
End BEHAVE;

4.3.4 Counter (Cn)

Counter Cn is a two bit counter. It has one 2-bit output namely 'COUNT' and
lower bit of COUNT (i.e. COUNT(O)) is also taken out to form a output line namely
'CSWn'. The counter is negative edge triggered. The behavior of counter is described
in VHDL code given below.

library SYNTH;
use SYNTH.VHDLSYNTH.all;
library IEEE;
use IEEE.STD_L0GIC_1164.all;
Entity CN is
Port (RSTn, CLK : IN STD_ULOGIC;
COUNT : BUFFER STD_ULOGIC_VECTOR(1 downto 0 ) ;
CSWII : BUFFER STD_ULOGIC);
End CN;
Architecture BEHAVE of CN is
Begin
Process (RSTn, CLK )
Begin
If ( RSTn = '1') then
COUNT <= "00" ;
Elsif (CLK'event and CLK = '0') then
COUNT <= COUNT + "01";

47
End if;
End Process;
CSWII <= COUNT(0);
End BEHAVE;

4.3.5 Weight factor generator Wfg

As the name implies weight factor generator generates 16-bit weight factor for
three butterflies, which is controlled by counter Cn to produce appropriate weight
factors at an appropriate time. It has one 2-bit control input namely 'SELC and six
16-bit outputs namely 'WFRI', 'WFH', 'WFRH', 'WFIH', 'WFRIH', 'WFEU'. The
truth table for weight factor generator is shown in Fig. 4.25. The behavior of weight
factor generator is described in VHDL code given below.

Truth table for Weight factor generator

INPUT OUTPUT
SELC WFRI WFn WFRH WFm WFRm WFim
00 A57E A57E 0000 8000 012C 0000
01 012C 0000 012C 0000 012C 0000
10 5A82 A57E 0000 8000 012C 0000
11 0000 8000 012C 0000 012C 0000

Fig. 4.25 Truth table for Weight factor generator

library SYNTH;
use SYNTH.VHDLSYNTH.all;
library IEEE;
use IEEE.STD_LOGIC_1164.all;
Entity WFG is
Generic (SIZE : INTEGER := 16);
Port (SELC : IN STD_ULOGIC_VECTOR(1 downto 0 ) ;
WFRI, WFII, WFRII, WFIII, WFRIII, WFIIII
: OUT STD_ULOGIC_VECTOR{SIZE - 1 downto 0));
End WFG;
Architecture BEHAVE of WFG is
Begin
PO : Process(SELC)
Begin
Case SELC is
WHEN "10" => WFRI <= "0101101010000010"; --5A82
WFII <= "1010010101111110"; --A57E

48
WHEN "11" => WFRI <= "0000000000000000"; --0000
WFII <= "1000000000000000"; --8000
WHEN "00" => WFRI <= "1010010101111110"; --A57E
WFII <= "1010010101111110"; --A57E
WHEN Others => WFRI <= "0000000100101100"; --012C
WFII <= "0000000000000000"; --0000
End Case;
End Process PO;
PI : Process(SELC)
Begin
Case SELC is

WHEN "00" => WFRII <= "0000000000000000"; --0000

WFIII <= "1000000000000000"; --8000
WHEN "10" => WFRII <= "0000000000000000"; --D000
WFIII <= "1000000000000000"; --8000
WHEN Others => WFRII <= "0000000100101100"; --012C
WFIII <= "0000000000000000"; --0000
End Case;
End Process P I ;
P2 : Process(SELC)
Begin
Case SELC is

WHEN Others => WFRIII <= "OOOOOOOIOOIOIIOO" ; --012C

WFIIII <= "0000000000000000"; --0000
End Case;
End Process P 2 ;
End BEHAVE;

4.3.6 Pipeline Fast Fourier transform circuit

Tlie overall code for Pipeline FFT is shovm below. It mainly describes the
interconnection of all the blocks discussed previously

library SYNTH;
use SYNTH.VHDLSYNTH.all;
library IEEE;

use IEEE.STD_LOGIC_1164.all;
Entity PLFF is
Generic (SIZE : INTEGER := 1 6 ) ;

Port (INSMP : IN STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;

CL, REST : IN STD_ULOGIC;

49
lARL, lAIL, IBRL, IBIL, IIARL, H A I L , IIBRL, IIBIL,
IIIARL, IIIAIL, IIIBRL, IIIBIL
: OUT STD_ULOGIC,-
OPAR, OPAI, OPBR, OPBI
: OUT STD_ULOGIC_VECTOR(SIZE - 1 dovmto 0 ) ) ;
End PLFF;
Architecture BEHAVE of PLFF is
Component SH4_1_16 Port (A : IN STD_ULOGIC_VECTOR(SIZE - 1 dovmto

0) ;
CLK : IN STD_ULOGIC;
C : OUT STD_ULOGIC_VECTOR(SIZE - 1 dovmto 0 ) ) ;
End Component;
Component BFL Port (INAR, INAI, INBR, INBI, INWR, INWI
: IN STD_ULOGIC_VECTOR(SIZE - 1 dovmto 0 ) ;
ARLZ, AILZ, BRLZ, BILZ : OUT STD_ULOGIC,-
OSR, OSI, ODWR, ODWI
: OUT STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ) ;
End Component;
Component SH2_2_16 Port (A, B
: IN STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;
CLK : IN STD_ULOGIC;
C, D
: OUT STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ) ;
End Component;
Component SH1_2_16 Port (A, B
: IN STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;
CLK : IN STD_ULOGIC;
C, D
: OUT STD_ULOGIC_VECTOR{SIZE - 1 downto 0 ) ) ;
End Component;
Component SW4_1 Port (A, B, C, D
: IN STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ;
SEL : IN STD_UL0GIC_VECT0R{1 downto 0 ) ;
E, F, G, H
: OUT STD_ULOGIC_VECTOR(SIZE - 1 downto 0 ) ) ;
End Component;
Component SW4_2 Port (A, B, C, D

: IN STD_ULOGIC_VECTOR(SIZE - 1 dovmto 0 ) ;
SELO : IN STD_ULOGIC;
E, F, G, H
: OUT STD_ULOGIC_VECTOR{SIZE - 1 downto 0 ) ) ;

50
End Component;
Component CN Port (RSTn, CLK IN STD_ULOGIC;
COUNT BUFFER STD_UL0GIC_VECT0R(1 d o v m t o 0);

CSWII BUFFER STD ULOGIC);

End Component;
Component WFG Port (SELC : IN STD_ULOGIC_VECTOR(1 downto 0 ) ;
WFRI, WFII, WFRII, WFIII, WFRIII, WPIIII
: OUT STD_ULOGIC_VECTOR(SIZE - 1 downto 0));
End Component;
For COMPA: SH4_1_16 USE ENTITY WORK.SH4_1_16;
For COMPB: BFL USE ENTITY WORK.BFL;
For COMPC: SH2_2_16 USE ENTITY WORK.SH2_2_16;
For COMPD: SW4_1 USE ENTITY W0RK.SW4_1;
For COMPE: SH2_2_16 USE ENTITY WORK.SH2_2_16;
For COMPF: BFL USE ENTITY WORK.BFL;
For COMPG: SH1_2_16 USE ENTITY WORK.SH1_2_16;
For COMPH: SW4_2 USE ENTITY W0RK.SW4_2;
For COMPI: SH1_2_16 USE ENTITY WORK.SH1_2_16;
For COMPJ: BFL USE ENTITY WORK.BFL;
For COMPK: CN USE ENTITY WORK.CN;
For COMPL: WFG USE ENTITY WORK.WFG;
Signal ISHIVC, IIIWI, IWR, IWI, ISWA, ISWB, ISHIIA, ISHIIB,
ISWC, ISWD, ISWE, ISWF, ISWG, ISWH, IISHIIC, IISHIID,
IIWR, IIWI, IISWA, IISWB, ISHIA, ISHIB, IISWC, IISWD,
IISWE, IlSWF, IISWG, IISWH, IISHIC, IISHID, IIIWR
: STD_XILOGIC_VECTOR(SIZE - 1 downto 0 ) ;
Signal SELIX i STD_ULOGIC_VECTOR(1 downto 0 ) ;
Signal SELOX : STD_ULOGIC;
Begin
COMPA : SH4_1_16 Port Map (INSMP, CL, ISHIVC);
COMPB : BFL Port Map (ISHIVC, IIIWI, INSMP, IIIWI, IWR, IWI,
lARL, lAIL, IBRL, IBIL, ISWA, ISWB, ISHIIA,
ISHIIB);
COMPC : SH2_2_16 Port Map (ISHIIA, ISHIIB, CL, ISWC, ISWD);
COMPD : SW4_1 Port Map (ISWA, ISWB, ISWC, ISWD, SELIX,
ISWE, ISWF, ISWG, ISWH);
COMPE : SH2_2_16 Port Map (ISWE, ISWF, CL, IISHIIC, IISHIID);
COMPF : BFL Port Map (IISHIIC, IISHIID, ISWG, ISWH, IIWR, IIWI,
IIARL, H A I L , IIBRL, IIBIL,
IISWA, IISWB, ISHIA, ISHIB);
COMPG : SH1_2_16 Port Map (ISHIA, ISHIB, CL, IISWC, IISWD);

51
COMPH : SW4_2 Port Map (IISWA, IISWB, IISWC, IISWD, SELOX,
IISWE, IISWF, IISWG, IISWH);
COMPI : SH1_2_1S Port Map (IISWE, IISWF, CL, IISHIC, IISHID);
COMPJ : BFL Port Map (IISHIC, IISHID, IISWG, IISWH, IIIWR, IIIWI,
IIIARL, IIIAIL, IIIBRL, IIIBIL,
OPAR, OPAI, OPBR, OPBI);
COMPK : CN Port Map (REST, CL, SELIX, SELOX);
COMPL : WFG Port Map (SELIX, IWR, IWI, IIWR, IIWI, IIIWR, IIIWI);
End BEHAVE;

The code is synthesized and simulated using Viewlogic's synthesis tool. The
schematics are attached in chapter 6 , 'Simulation and implementation results'.

52
CHAPTERS
FPGA BASED IMPLEMENTATION
FPGA BASED IMPLEMENTATION

5.1 Introduction

An application specific integrated circuits (ASIC) can be defined in a broadest

sense as an IC designed for a particular application or end-use such as in a compact
disc player or a Telecommunication Systems (Cellular etc.,). ASIC stand in sharp
contrast to standard IC products such as memories or microprocessors, which are
typically designed for use in a wide range of applications. ASICs are typically
designed, at least in part, by some one other than the semiconductor vendor's
personnel. Most often, the designer is the customer. This fact, coupled with the
differences in design objectives such as performance, area, and time to market, further
differentiates ASICs from other types of designs. ASICs have driven an expansion of
the semiconductor industry, have fiindamentally altered the IC business, and have
resulted in a significant increase in the number of IC designs and designers.
To implement an ASIC design one can choose several methods as shown in
Fig. 5.1.

ASICs

i
Full Custom Linear

Semicustom FPLDs

;
Gate Standard
]
Mixed
[ ^

Arrays cell Arrays FPGAs PLDs ASPLs

.PALs
.PLAs
. EPROMs
. EEPROMs

Fig. 5.1 ASIC taxonomy

53
Tliere are basically four major technology available for ASIC implementation in
common use today. These are field-programmable logic devices (FPGAs), gate array,
standard cells, and full customs.
FPLDs are characterized by their ability to be configured by the customer.
Although knovm by many nmemonics, all FPLD devices are basically one of two
types: (1) programmable logic devices (PLDs) and (2) field-programmable gate arrays
(FPGAs).
PLDs are characterized by fixed intercormect and an AIvfD-OR plane driving
flip-flops, which are routed to output pins. FPGAs, on the other hand, posses more
flexible intercormect and are comprised of an array of logic blocks, which can be
configured to perform various logic functions. FPLDs find application primarily in
lower complexity (fewer than 2000 gates) and low volume applications. However, a
number of manufacturers are reading offerings that will reportedly contains over
20,000 gates. PLDs are available in CMOS, bipolar, and GaAs, while FPGAs are
most often fabricated in CMOS.
Some of the advantages of FPLDs are:
• Shortest fabrication time. Since the devices are not actually fabricated for
personalization but are typically programmed using a PROM programmer,
a completed design may be implemented in a matter of hours or days
instead of weeks.
• Low cost in low volume. For very low volumes, FPLDs are very cost-
effective since there are no non-recurring engineering (NRE) charges.
• Charges easier and faster. If changes are likely to be made or
personalization is necessary. FPLDs are possibly the most effective
vehicle. This also makes tliem an effective functional verification tool.

Some of the disadvantages of FPLDs are:

• Highest cost per gate. Semicustom or full custom are more cost effective
in higher volumes.
• Least flexible in terms of logic structure, FPLDs are more limited tlian
semicustom or full custom approaches.

54
Lowest integration for larger designs, FPLDs may require use of multiple
chips, where other ASIC approaches would allow the implementation in a
single chip.
Recently Field Programmable Gate Arrays (FPGAs) have become very
popular for implementing Application Specific Integrated Circuits (ASICs). As the
technology evolves the low and medium end ASICs are being implemented using
FPGAs. An effective design approach with FPGAs allows earlier market entry than
with other ASICs.
A graph depicting relative cost versus quantity for the design in the preceding
four product categories is shown in Fig. 5.2. The assumption is that the design
implemented can be realized in all four of the above products, and that each product
utilizes the same basis process technologies such as CMOS. \^^na^A2ad'TS;^\
-r- 'V,
>
S /I
)
0 5 - 3 2 2 5" if
'^A^ -'

Actual UsBge

Fig. 5.2 Relative Cost Versus Quantity

for ASIC products.

Like traditional gate arrays, FPGAs implement thousands of logic gates in

multiple structure. An FPGA manufacturer makes a single, standard devices, like a
programmable logic devices (PLD), that uses program to carry out desired functions.
Field programmability comes at a cost in logic density and performance. FPGA
capacity trails masked program gate array capacity by a factor of 10, FPGA performs
trails mask programmed gate arrays by a factor of three [14].
The most significant advantage of using FPGA products is the ability to
produce a prototype logic design on the desktop. One can create a logic design,
implement it and verify it in minutes or hours, while conventional gate array product
can take months to develop and produce working silicon. In addition, Xilinx logic
products are in-circuit programmable. So even while an FPGA device is soldered to a
board, one can reprogram the Xilinx part with a different FPGA design [15]. FPGA
needs no custom mask tooling, saving thousands of dollars over mask programmed

55
parts. The result is a no low-risk design style, where the price of logic error is small,
both in money and project delay. The reduced risk makes FPGAs useful for rapid
product development and prototyping. Moreover, FPGAs can be ftiUy tested after
manufacture, so user's design not require test program generation, and design for
testability [14].

5.2 FPGA Architecture

Many kinds of programmable logic products are called FPGAs. FPGAs are
categorized according to their combination of progranmiing technology and devices
architecture. Three programming technologies are commonly used for FPGAs. Each
has associated area and performance costs, and the devices architectures reflect these
costs [14].

(a) SRAM FPGAs

In an SRAM progranmied FPGA static memory cell holds the programming.

The architecture consist of three types of configurable elemeiits- a perimeter of
input/output blocks (lOBs), a two dimensional core array of configurable logic blocks
(CLBs), and resources for interconnection as shown in Fig. 5.3(a) [14].

• D DD DD , DD

ED-
CZh

Svrllch boi /
ltd M^L. '
Wiring cliannci

Fig. 5.3(a) SRAM FPGA Structure.

56
The lOBs provide a programmable interface between the internal logic array
and the devices packages pins, CLBs perform user specific logic functions, and the
interconnect resources carry signals among the blocks [15]. A configuration program
stored in internal static memory cells determines the logic functions and the
interconnect. Interconnect segments connect to CLB pins in the channels and to other
segments in the svi^itch boxes through pass transistors controlled by configuration
memory cells. Because SRAM cells and pass transistors are comparatively expensive
in area and delay the switch boxes are not full cross bar switches.
An SRAM FPGA program consist of a single long program word. On chip
cncuitry loads the program word, reading it serially out of an external memory every
time power is applied to the chip. The program bits set the values of all configuration
memory every time power is applied to the chip. The program bits set values of all
configuration memory cells on the chip selecting which segments connect to each
other. SRAM FPGAs are inherently re-programmable. They can be updated in the
system, providuig designer' with new design option, and capabilities, such as logic
updates that do not reqiure hardware modification and time-shared virtual logic [14].
All Xilinx FPGAs use CMOS SRAM technology.

(b) ANTIFUSE FPGAs

An Antifuse is a two terminal device that, when exposed to a high voltage,

forms a permanent short circuit between the nodes on either side. It usually consist of
rows of configurable logic elements with interconnection channel between them,
much like traditional gate arrays as shown in Fig. 5.3(b). Tlie pins on the logic blocks

Lo^ic block

' Wirinccbiiiacl ~

Fig. 5.3(b) Antifuse FPGA

Structure.

57
extend into the channel. A logic block is comparatively simple gate level network,
which one programs by connecting its input pins to fixed values or to interconnect
nets [14].

(c) CPLD FPGAs

In CPLD architecture Fig 5.3(c), the user creates logic and interconnections by
programming EPROM (EEPROM) transistors to form wide fan-in-gates. A CPLD
consist of a few function blocks contain a PLD-AND array that feeds its macrocells.
The user programs the AND-array by turning on EPROM transistors that allow
selective inputs to be included in a product term.

^y
/
18/ \ \
/ S\
FB3

V*
^
MC J
s*
A J
V*
N J
wc
21 D
uc J —^-
a ^^
r )
^
r MC >
a
>
y MC

FB4

MC J .^
V *

UC )
\*
A MC >
\*
N /
21 D ^
/
/*- a
r
y
w
r >
a ^
J
y
J
^^

M C Macrocell
FB Function block
UIM Universal interconnect mechanism

Fig. 5.3(c) CPLD FPGA Structure.

A macrocell includes an OR gate to complete the two level AND-OR logic
and may also include registers and an I/O pad. Macrocell outputs are connected as
additional function block inputs or as inputs to a global universal interconnect
mechanism (UIM) that reaches all function blocks on the chip [14].

5.3 XILINX FPGA families

Xilinx markets and supports many product lines—XC2000, XC2000L,

XC3000, XC3000A, XC3000L, XC3100, XC3100A, XC4000, XC4000A, XC4000H,
XC4006E, XC4008E, XC4010E/L, XC4013E/L, XC4020E, XC4020E, XC4025E,
XC4028EX/XL, XC4036EX/XL, XC4044EX/XL, XC4052XL, and XC4062XL. The
primary difference between these product lies in the number of gates and the
architectural features of the individual devices, as shown in table 5.1 [15]

Table 5.1 Maximum logic capacities

Product Logic Capacity Maximum No. of CLBs Maximum No. of

(Gate Equivalent) lOBs
XC2000/L 1,200-1,800 100 74
XC3000/A/L 1,300-9,000 484 176
XC3100/A 1,300-9,000 484 176
XC4000 2,000-20,000 900 240
XC4000A 2,000 - 5,000 196 112
XC4000H 2,000 - 5,000 196 192
XC4006E 4,000-12,000 256 128
XC4008E 6,000-15,000 324 144
XC4010E/L 7,000-20,000 400 160
XC4013E/L 10,000-30,000 576 192
XC4020E 13,000-40,000 784 224
XC4025E 15,000-45,000 1,024 256
XC4028EX/XL 18,000-50,000 1,024 256
XC4036EX/XL 22,000 - 65,000 1,296 288
XC4044EX/XL 27,000 - 80,000 1,600 320
XC4052XL 33,000-100,000 1,936 352
XC4062XL 40,000-130,000 2,304 384

59
5.3.1 CMOS XC4000 series

Tliis is the third generation architecture of Xilinx FPGA, introduced in 1990,

forms the basis of the XC4000 families of devices. The CLB structure and lOB
structure of XC4000EX family is shown in Fig. 5.4(a) and 5.4(b) respectively.

-7^

^v/^. SR/H,
-5i—J1>"'
S/R

u COTfTROL

D
LOGIC -T&
HJNCriON
OF C
C1-C4

r>
LOGIC
RWCTION

P.C. H*
AND

SfR
CONTROL
LDffNv
LOGIC
HJNCTION
OFR -B F ^
«r-t>0
n]
EC ,

Rg. 5.4(a) Simplified Block Diagram of XC4000EX Series CLB

Input C l u c k

Fig. 5.4(b) Simplified Block Diagram of XC4000EX Series lOB

60
Features :

1. It has logic densities upto 130,000 usable gates and supports system clock rates of
upto 66 MHz.
2. Compared to older Xilinx FPGA families, the XC4000EX/XL families are more
powerful, offering on-chip ultra-fast RAM with synchronous write option and
dual-port RAM option
3. The XC4000EX/XL families are fully PCI compliant.
4. The XC4000EX/XL families have abundant flip-flops, flexible function
generators, dedicated high-speed carry logic, wide edge decoders on each edge,
internal 3-state bus capability, 8 global low-skew clock or signal distribution
networks, flexible Array Architecture
5. The XC4000EX/XL family have generous routing resources to accommodate the
most complex interconnect patterns.
6. The XC4000EX/XL families are supported by powerful and sophisticated
software, covering every aspect of design from schematic entry, to simulation, to
• automatic block placement and routing of interconnections, and finally the
creation of the configuration bit stream.
7. The schematic library for the XC4000 FPGAEX/XL contains 400 primitives and
macros, ranging from 2- input AND gates to 16- bit accumulators, and including
arithmetic functions, comparators, counters, data registers, decoders, encoders, I/O
functions, lathes, Boolean functions, RAM and ROM memory blocks,
multiplexers, and shift registers.
8. Operational power consumption is totally dynamic.
9. Typical power consumption is between lOOmW to 2W depending upon tlie size of
the devices.
10. Buffered interconnect for maximum speed.
11. Flexible new high-speed clock network
• 8 additional early buffers for shorter clock delays.
• 4 additional fastCLK buffers for fastest clock input.
• Virtually unlimited number of clock signals.
12. Optional multiplexer or 2-input function generator on device outputs.
CHAPTER-6
SIMULATION AND
IMPLEMENTATION RESULTS
SIMULATION AND IMPLEMENTATION RESULTS

6.1 Simulation results

The simulation results are divided into three categories

(a) Simulation results for butterfly circuit.
(b) Timing diagram of FFT circuit.
(c) Gate level schematics generated by the synthesis tool.

(a)Simulation results for butterfly circuit.

Eight point FFT is calculated for the eight samples x(0) =x(l)= x(2) = x(3) =
32512 X 2'^^ =0.9921875 and x(4) = x(5) = x(6) = x(7) = 0.0. The signal flow graph for
8-point FFT (DIF) is shown in Fig. 6.1.

•oxm

o^w

O-'^O)

-Qxm

ow)

Fig. 6.1 Signal flow graph of 8-point FFT (DIF).

•><o)o O ^'C)

•v(i) a o xw
A<2) O
Ox(i)

•*(3)Q

•<-i) O

v<5)Q

•vt(i) O-

.v<7) o -

Fig. 6.2 Signal flow graph of 8-point FFI (DIF).

62
As mentioned in previous chapters, in fixed-point arithmetic, the input should
be normalized to a fraction and the input to each butterfly should be attenuated by a
factor of 2. Thus the signal flow graph for 8-point FFT is modified and is shown in
Fig. 6.2.
Both signal flow graphs give identical results (intermediate values will be
different but the final output will be identical). The results are calculated (using signal
flow graph in Fig. 6.2) by hand as follows.
Here,

w\ 1 . 1

W^=-^

Wr
,.^s ^(0) ^(4) 32512x2'^ 0 32512x2'^ !/„„„,, \
x<(Q)=-L±+^U.^ + - = =-(0.9921875)
2 2 2 2 2 2^ '
= 0.49609375

, „ , xiV) x{5) 32512x2'^ 0 32512x2"'^ 1, „ s

W=~ +~ = +- = =-(0.9921875)
2 2 2 2 2 2^ '
= 0.49609375

,.^^ jr(2) x{6) 32512x2''^ 0 32512x2"'^ 1 . „„ ,

y ( 2 ) = _ ^ + _LZ = ^ +- = ^ =-(0.9921875)
= 0.49609375

, „ , -t(3) x{l) 32512x2-'^ 0 32512x2"'^ \ , >

•' ^'^ = — " 2 = 1 ' r 2 =2(0-9921«75)
= 0.49609375

.'rAN-/'(°) ^WLTT/-° {32512x2-'= o | , 32512x2-'= 1/ ,

'^'^-[~~~rWA 2 Yr 2 = i(''-^^21875)
= 0.49609375

( 1\ . \ I\ iT^n^i-IS
V(5) = j i ^ - i f ) } x , ^ ; j 3 ^ 1 1 ^ _ ^ , . 32512x2- 1 . 1

= -(0.701582509-y0.701582509) = 0.350791254-y0.350791254

63
,^,U^_A^\j^l__ln5n^_o_ ^^(_^^_^ 32512x2-'^ . ! ( , ,0.9921875)
2 2
= -yO 49609375

J:(3^_<7)1 ^ ^ ^ 3 _ I 32512X2-'^ O' '' 32512x2 - b

^'(7) = i ^ - —>x
fr. I ri -;-
'fi ^/i 'fl}
= -(-0.701582509-;0.701582509)=-0 350791254-70 350791254

32512x2"'^ 32512x2''^
, . . ( 0 ) = ^ , ^ = 2 , 1.
^ 2 2 2 2 4 4
= 0 49609375

32512x2"'^ 32512x2"'^
,..(1) = ^ . ^ =- 2 + 1. ^^°21^i2:^=i(l.984375)
2 2 2 4 4
= 0.49609375

32512x2-'^ 32512x2-'^

-<^)=i^-^hr:- 2
2
2
2
xl = l{0.0} = 0 0
4' ^

32512x2"'^ 32512x2"'^
,.,3,4^_£mL,|.;. 2
2
2
2
x(-7)=-(0.0)=0 0

32512x2" 32512x2 •15 >

32512x2"'^ 32512x2''^
2 2

-(0.9921875-y0.9921875)=0.248046875-y 0 248046875

32512x2"'^ f I 11 32512x2-'^ I I 1
^, x'(5)^A-(7)_ 2 ^^/2"^V2 V^^''^/i[ 1 (32512x2"'^ XV2)
2 2 2 2 --'-A
1
= -(-;1.403165019)=-./0 3507912S4
4

32512x2" 32512x2 -15 ^

r(6)=j^-i:M[.^;= •xl =1(32512x2"'^+^32512x2"'^)

= -(0.9921875 + y0.992I875) = 0 248046875+ y 0 248046875

64
32512x2- ^ 32512x2"'^ 1 j _
2 \r2''r2
vm-j^-^W:- (-7)
xl-V

:-(-y325I2x2-'^xV2)=-(-yl.403165019) = -y0 350791254

65024x2- 15 65024x2''^
^(0)=j^.^U8 = 4
2
+ 4
2
• x8=-|l30048x2-'^}x8 = 3.96875

32512x2-'^ .32512x2''^ .32512x2-'^ xV2

;,(,)=|i^.i^>x« =

= -|32512x2-'^-y32512x2-'^(l + >/2)}x8 = 0.9921875-y2.395352519

X(2) = | ^ + ^ | x 8 = (0+0)x8=0.0

32512x2-'^ .32512x2-'^ 325I2x2•'^x^/2

+j -J
X(3)=|^^i^lx8= 4 - 4 , 4 x8

= 1^2512x2-'^+ y325I2x2-'^(l-V2)|x8 = 0.992I875-y0.4I09775I8

65024x2-'^ 65024x2"'^
;,(4) = f ^ - ^ l x J ^ ° x 8 = 4
2
4
2
x l x 8 = 0.0

32512x2''^ .32512x2-'^ .32512x2-'^xV2'

-J-
MS)-m-m.j^\s, xlx8

= ^{?2512x2-'^+;32512x2-'^(V2-l))x8 = 0.992I875 + ;0.41097751S

A ' ( 6 ) = | ^ - ^ | x p | ^ ° x 8 = (0-0)xlx8=0.0

32512x2-'^ .32512x2-'^ .325l2x2-'^xV2'

• + j- -J-
A'(7)={^-m,,^
•w:,«., xlx8

= ^{32512x2-''+y32512x2-'^(l + V2)jx8 = 0.992I875 + y2.395352519

65
The values shown by bold face corresponds to signal flow graph shown in Fig. 6.1.
The butterfly circuit is simulated and result obtained is tabulated in table 6.1.

SAMPLE CALCULATED VALUE SIMULATION RESULT

(HEX) (DECIMAL)
REAL IMAGINARY REAL IMAGINARY REAL IMAGINARY

x(0) 32512 X r " = 0.9921875 0.0 7F00 0000

"^
x(l) 32512x2-" = 0.9921875 0.0 7FO0 0000

x(2) 3 2 5 1 2 x 2 " = 0.9921875 0.0 7FO0 0000

*(3) 32512 X r " = 0.9921875 0.0 7FO0 0000

V I^fPUTs
A.-(4) 0.0 0.0 0000 0000
x(5) 0.0 0.0 0000 0000

^(6) 0.0 0.0 0000 0000

0.0 0000 0000
AC(7) 0.0 J
x'(Q) 0.49609375 0.0 3F80 0000 0.4960937 0.0

x'(l) 0.49609375 0.0 3F80 0000 0.4960937 0.0

x'(2) 0.49609375 0.0 3F80 0000 0.49609375 0.0
x'(3) 0.49609375 0.0 3F80 0000 0.49609375 0.0
x\4) 0.49609375 0.0 3F80 0000 0.49609375 0.0
x'(5) 0J50791254 -0.350791254 2CE6 D31A 0J507690 -0J507690
xX6) 0.0 -0.49609375 0000 C080 0.0 -0.4960937
x'(7) -0.350791254 -0J50791254 I)31A D31A -03507690 -03507690
:c"(0) 0.49609375 0.0 3F80 0000 0.4960937 0.0
x"{\) 0.49609375 0.0 3F80 0000 0.4960937 0.0
x"(2) 0.0 0.0 0000 0000 0.0 0.0

x"(3) 0.0 0.0 0000' 0000 0.0 0.0

x'\4) 0.248046875 -0.248046875 IFCO E040 0.24804687 -0.24804687
x"(S) 0.0 -0.350791254 0000 D31A 0.0 -0.3507690
x"(6) 0.248046875 0.248046875 IFCO IFCO 0.24804687 0.24804687
x"(7) 0.0 -0J50791254 0000 D31A 0.0 -0.3507690
XiO) 3.96875 0.0 3F80 0000 3.96875 0.0
X(l) 0.9921875 -2J95352519 OFEO D9AD 0.9921875 -2.395263672
X(2) 0.0 0.0 0000 0000 0.0 0.0
X(3) 0.9921875 -0.410977518 OFEO F96D 0.9921875 -0.410888671
X(,4) 0.0 0.0 0000 0000 0.0 0.0
X(S) 0.9921875 0.410977518 OFEO 0693 0.9921875 0.410888671
XC6) 0.0 0.0 0000 0000 0.0 0.0
A'(7) 0.9921875 2.395352519 OFEO 2653 0.9921875 2.395263672

Table 6.1 Outputs of butterfly

Results are correct up to four decimal places. Tlie error is due to the right
shifting of the input Xo the butterfly Xo attenuate iX by a factor of 2 and also by
rounding-off the output of the butterfly to 16-bit.

66
(b) Timing dingnim of FFT circiiiC.

The pipeline FFT circuit is also simulated and wavclbrms arc attached. The
outputs available at a particular time are shown in table 6.2

Time Output
1600ns-1700ns Xm = (3F80, 0000~) X(4) = (0000,00001
1800ns-1900ns X(2) = (0000, 0000) X(6) = (0000, 0000)
2000ns-2100ns xm = (OFEO, D9AD) X(5) = (OFEO, 0693)
2200ns-2300ns X(3) = (OFEO, F96D) X(7) = (OFEO, 2653)

Table 6.2 Outputs of FFT available at particular time.

(c) Gate level schematics generated by the synthesis tool.

The gate level schematics are generated using Viewlogic's synthesis tool. Some
important schematics are attached such as plff (pipeline FFT), bfl (butterfly circuit),
adsbl6 (Add / subtract circuit), sh4_l__16 (four stage shift register), sh2_2_16 (two
stage shift register), shl_2_16 (One stage shift register), Sw4_l (First switch), Sw4_2
(Second switch), en (Counter), wfg (Weight factor generator).

6.2 Implementation results

The butterfly circuit is implemented in Xilinx FPGA (xc4052xl) and its

implementation report is shown in table 6.3.

IMPLEMENTATION REPORT
Target device xc4052xl Remarks
Target package -1- bg432
Number of CLB's 1897 out of 1936 97%
Total CLB Fhpflops 0 out of 3872 0%
Total CLB Latchs 0 out of3872 0%
4 input LUT's 4388 out ot 3872 90%
3 input LUT's 60 out of 1936 3%
Number of bonded lOBs 164 out of 352 46%
JOB Flops 0
lOB Latchs 0
Number of TBUF's 0
Total equivalent gate count 32748
Maximum delay 350.29ns
Maximum net delay 233.706ns

Table 6.3. Implementation report.

67
mn \ IX

:^b^ \/

X ^ X 7\V
/•

\ X
\\ 2

II [J

V^
X
5 III
1 O

I b
ly

o b b- - o
in in in m , .
1" ?>
!" !" i"
J 1 _ i _ _J_ ^
ra ^ J3
^ (1) _ J K ^ 1
CO
2
< < nCO ro
u- u. n Uj
1

o O O o to >
:XlXl^lXI HMfsi
I—1
„ _

r1 J
1
1
1
i

-,1

ZJfl 3 X Si
5 X \
o o 0 1
o o 0 1
o o
0 '
0
0
l'
ll

)4 IX X
XXXX
^SD X i
X -Mr-

o o ll
X
e X

i
n
IT)
o j j j VD

X^X
cn

o
-Ka ^
Xe o

X
0 X
X
S o 0
o u
e 0

X XI XX
X X y^
o
-Soi
CO 5
00
CO
E K K
•<r y
"IBS §
^•
n ^ 0
00 ~fl'^ K
o
Si 0
gj a 0
00
CV4
^, XXX OO
eg
CO
o
—HH^
—zjSf--
o
o
o
o XXX XX
c
1
c
o
o
o
3:

0
0
^r
0
0
0
O o o .-H M 0
u ~^S
Xi XX
w
3 1
— -a
ci
XXX 5
[
o —-8!='
—^S 3i X
o
"o "5 0 0

IQ. a 0 0
0

X X - ^ 3
zaio)
X
X X
3: 1
o 0
—zHf" o 0
0
o 0
0

X XX d^
X 0
1
1 1
1

iJ — - ^ 3 0 1 i

^^1li^X
—ggS"^

C C O O 1 O 1 O —,
O O 1 ih° 1 in ' vh
1 in j in 1 "P

J2I 3
@) @) (U 0) i CU ' dj , 0) 0) t (U 0)

' (/) (/) 3 l S ' 3 5 ? 3 5 S

™ " ri 1 1 1
i?5' ^! I < K
CO
p C CO
S 0) —1 ' 1- %
00 .^ < <
o:
m
I a:
m
X
_:
a UJ j o S-
o HI ' W Q. o P 2 _ 1 1 U "J D. a CL CL UJ
[C S o
"^0. o 1 Q: ^ O u U 0 03

69
ppiiii iiL

Jl.iJ^^jlJ^J^^JJJ U l j J IJ.IJ.lij.MJ.i .'» IJJ^J IJ.IJJ.MJJJJ.LIJ.IJ.I IJ.IJJJ.II

Schematic for piff (pipeline FFT).

I i'-i:

^ t

t 11 5
e^

Schematic for bfl (butterfly circuit).

"^—l^-t^g-

-ow

-l!»r-

—tKr-
—tnr

""tw^

-*itr-

IB^ r ^ S ) l " "•• - - ^ T ^

i|iiii...""i4r-

•t "•• 1 ^ • ^ ^ \ i ' "•• • T

i - = = 3 i i i II.. ••'•• ip-

d s b 1B

SHtEI 1 or 1

Schematic for adsbl6 (Adder Isuhtractor).

s h 4 _ 1 _1 6
JI« .(.4.1 .IS
SCH .hi 1.1B

I * *ui ifl 13 0« I IHECT 1 Of 1

Schematic for sh4_l_16 (four stage shift register).

_"-iu, F ^

J-=

Schematic for sh2_2_16 (hvo stage shift register).

, ^
m
: ^
• ^

• • • ^

• ^

^
&

• ^

^
: ^

- . ^
: ^
• ^

• • • ^

• ^

: ^
^
^
&-*
• hi. 2 - 1 B

khcmatic for shl_2_16 (One stage shift register).

^ -J r IlLIHtii ' " ^
1^3 r illii'lili T^
^ 3 r iiiinrtt ^

^ =ffi FHUHllI b

^ ^m=z] L iai 14111 *g

g*ll —[ I "Ut ^^ fjB" I ^"^

Ejllj:^ I E ' I I W. ^*^

IBlnTf ^ ^ F ' ff Sfi ^"*'

B L i l —ST r HB* — I *
gaJ|.= [ bWi^&«

Schematic forSw4_l (First switch).

(
, •\*> D5-3225""
^ " " —^ ^ • ^

^ 1 Uni-'-i -^

^w=w&
-j:^=&

Schematic for Sw4_2 (Second switch).

LJ LJ u u
CD Q en X
o u t - ) O U LJ M (J

Schematic for en (Counter).

Schematic for wfg (Weight factor generator).

Design Information

Command Line map -p xc4052xl-l-hg432 -o map.ncd ../xc4000x1.nqd

bfl.pcf
Target Device x4052xl
Target Package bg432
Target Speed -1
Mapper Version xc4000xl — Ml.4.12
Mapped Date Thu Oct 07 17:43:27 1999

Design Summary

Number of errors: 0
Number of warnings: 107
Number of CLBs: 1897 out of 1936 97%
CLB Flip Flops: 0
CLB Latches: 0
4 input LUTs: 3488
3 input LUTs: 60
Number of bonded lOBs 164 out of 352 46^
lOB Flops: 0
JOB Latches: 0
Total equivalent gate count for design: 32748
Additional JTAG gate count for lOBs: 7872

80
LOGIC LEVEL TIMING REPORT

Xilinx TRACE, Version Ml.4.12

Design file: map.ncd

Physical constraint file: bfl.pcf
Device,speed: xc4052xl,-l (xl_0.17 1.16 PRELIMINARY)
Report level: summary report

WARNING:bastw:170 - No timing constraints found, doing default

enumeration.

WARNING:xiltw:1 - The current connection evaluation limit of 1000

caused the
truncation of timing analysis for paths through 26.29% of your
constrained
connections, which may limit the accuracy of this analysis. You
may specify
a larger limit with the XILINX_PATHLIMIT environment variable to
increase the
accuracy of this analysis.

Timing constraint: Default period analysis

12983036 items analyzed, 0 timing errors detected.
Maximum delay is 78.000ns.

Timing constraint: Default net enumeration

4512 items analyzed, 0 timing errors detected.
Maximum net delay is 0.984ns.

All constraints were met.

Timing summary:

Timing errors: 0 Score: 0

Constraints cover 12983036 paths, 4512 nets, and 10682 connections

(100.0%
coverage)

Design statistics:
Maximum combinational path delay: 78.000ns
Maximum net delay: 0.984ns

Analysis completed Thu Oct 07 20:25:45 1999

Thu Oct 07 20:26:18 1999

par -w -ol 2 -d 0 map.nod bfl.ncd bfl.pcf

Constraints file: bfl.pcf

Loading device database for application par from file "map.ncd".

"BFL" is an NCD, version 2.27, device xc4052xl, package bg432,
speed -1
Loading device for application par from file '4052x1.nph' in
environment
C:/Xilinx.
Device speed data version: xl_0.17 1.16 PRELIMINARY.

Overall effort level (-ol): 2 (set by user)

Placer effort level (-pi): 2 (default)
Placer cost table entry (-t): 1
Router effort level (-rl): 2 (default)

Device utilization summary:

Number of External lOBs 164 out of 352 46%

Flops: 0
Latches: 0

Number of CLBs 1897 out of 1936 97%

Total Latches: 0 out of 3872 0%
Total CLE Flops: 0 out of 3872 0%
4 input LUTs: 3488 out of 3872 90%
3 input LUTs: 60 out of 1936 3%

Starting initial Placement phase. REAL time: 30 sees

Finished initial Placement phase. REAL time: 36 sees

Starting Cons tructive Placer. REAL time: 43 sees

Placer score = 3013710
Placer score = 2503530
Placer score = 2057190
Placer score = 1709070
Placer score = 1455150
Placer score = 1322250
Placer score = 1252950
Placer score = 1210440
Placer score = 1188870
Placer score = 1175160
Placer score = 1161600
Placer score = 1150290
Placer score = 1144590
Placer score = 1138200

82
Placer score = 1134120
Placer score = 1128480
Placer score = 1124940
Placer score = 1122210
Placer score = 1119570
Placer score = 1117560
Placer score = 1115940
Placer score = 1113990
Finished Constructive Placer. REAL time: 2 mins 28 sees

Dumping design to file "bfl.ncd".

Starting Optimizing Placer. REAL time: 2 mins 36 sees

Optimizing
Swapped 124 comps.
Xilinx-Placer [1] 1109850 REAL time: 3 mins 13 sees

Finished Optimizing Placer. REAL time: 3 mins 13 sees

Dumping design to file "bfl.ncd".

Total REAL time to Placer completion: 3 mins 21 sees

Total CPU time to Placer completion: 45 sees

0 connection(s) routed; 10682 unrouted active, 98 unrouted PWR/GND.

Starting router resource preassignment
Completed router resource preassignment. REAL time: 3 mins 4 8 sees
Starting iterative routing.
End of iteration 1
10682 successful; 0 unrouted active,
98 unrouted PWR/GND; (0) REAL time: 4 mins 56 sees
End of iteration 2
10682 successful; 0 unrouted active,
98 unrouted PWR/GND; (0) REAL time: 5 mins 1 sees
Constraints are met.
Power and ground nets completely routed.
Total REAL time: 5 mins 20 sees
Total CPU time: 42 sees
Completely routed.
End of route. 10780 routed (100.00%); 0 unrouted.
No errors found.

Total REAL time to Router completion: 5 mins 32 sees

Total CPU time to Router completion: 41 sees

Generating PAR statistics.

The Delay Summary Report

The Score for this design is: 6395

The Number of signals not completely routed for this design is: 0

The Average Connection Delay for this design is: 23.327 ns

The Average Connection Delay on critical nets is: 0.000 ns

83
The Average Clock Skew for this design is: 0.000 ns
The Maximum Pin Delay is: 233.706 ns
The Average Connection Delay on the 10 Worst Nets is: 203.143 ns

Listing Pin Delays by value: (ns)

d <= 10 < d <= 20 < d <= 30 < d <= 40 < d <= 50 d >
50

5553 1849 920 521 489

1448

Timing Score: 0

Dumping design to file "bfl.ncd".

All signals are completely routed.

Total REAL time to PAR completion: 7 mins 42 sees

Total CPU time to PAR completion: 4 7 sees

PAR done.

Xilinx PAD Specification File

t****************«
********<

Input file: map.ncd

Output file: bfl.ncd
Part type: xc4052xl
Speed grade: -1
Package: bg432

Thu Oct 07 20:32:19 1999

Pinout by Component Name:

+
I Component Name Pin Number

AILZ AG31
ARLZ F4
BILZ Rl
BRLZ AC29
INAIO AF29
INAIl U28
INAIIO K28
INAIll K29
INAI12 B23
INAIl3 C24
INAIl4 L29
INAIl5 J28
INAI2 W29
INAI3 V30
INAI4 R30
INAI5 R2e
INAI6 T29
INAI7 W30
INAI8 N31
INAI9 N30
INARO El
INARl A13
INARlO Bll
INARl1 D15
INARl2 D13
1NAR13 A12
INARl4 CIS
INAR15 D14
INAR2 C14
INAR3 CI 3
INAR4 CIO
INAR5 B14
INAR6 BIO
INAR7 B15
INAR8 B12
INAR9 A15
INBIO R2
INBIl AK24
INBIIO
A24
INBIll
AJ24
INBI12
D23
INBI13
A26
INBH4
K31
INBI15
J29
INBI2
U29
INBI3
V2 9
INBI4
U30
INBI5
R29
INB16
P29
INBI7
R31
INBI8
M28
INBI9
H31
INBRO
INBRl AC30
C22

85
INBRIO A20
INBRll C21
INBR12 C20
INBR13 C23
INBR14 A22
INBR15 D12
INBR2 C18
INBR3 04
INBR4 C19
INBR5 D19
INBR6 A17
INBR7 A16
INBR8 C17
INBR9 B21
INWIO F30
INWIl AK8
INWIIO AK4
INWIll P3
INWI12 RK5
INWI13 AK15
INWI14 Kl
INWIl5 G29
INWI2 P4
INWI3 AHl
INWI4 AA3
INWI5 AH25
INWI6 AK25
INWI7 AJ26
INWI8 AJ21
INWI9 Ml
INWRO AL12
INWRl AFl
INWRIO AK3
INWRll D9
INWR12 K2
INWR13 B7
INWR14 D28
INWRl5 E29
INWR2 AL17
INWR3 Wl
INWR4 AH23
INWR5 AJ5
INWR6
ALIO
INWR7
H4
INWR8
AG3
INWR9
D2 6
ODWIO
N3
ODWIl
M4
ODWIIO
DIO
ODWIll
K3
0DWI12
J2
ODWIl 3
B9
0DWI14
J4
ODWIl5
H2
0DWI2
J3
0DWI3
M3
0DWI4
N4
0DWI5
M2
0DWI6
Ml
0DWI7
K4
0DWI8
0DWI9 N2
ODWRO L3
ODWRl AK20
ODWRIO AJ19
ODWRl1 AHl 9
ODWRl2 AH18
ODWRl3 AL13
0DWR14 AH14
ODWRl5 AK14
0DWR2 AJ14
0DWR3 AK21
0DWR4 AL19
0DWR5 AHl 7
0DWR6 AK19
0DWR7 AL20
AK18

86
i 0DWR8 I AK17
0DWR9 I p^jie
OS 10 U31
/ OSIl ( T31
I OSIIO I N29
I OSIll I M31
I 0SI12 I B24
I 0SI13 I L30
I 0SI14 I j^29
I 0SI15 I K30
I 0SI2 I V28
I 0SI3 I „3i
I 0SI4 I T30
I 0SI5 I N28
i 0SI6 I P28
I 0SI7 I f,3o
I 0SI8 I P30
I 0SI9 I J30
I OSRO I JJ20
I OSRl I C16
I OSRIO , ^^g
I OSRll I P22
1 0SR12 I B20
1 0SR13 I ^-^2
I 0SR14 I B22
I 0SR15 I gj^3
I 0SR2 I g^g
I 0SR3 ^^^
I 0SR4
I °S=^5 I AlO
I 0SR6

I 0SR8
1 0SR9 ; ^^^
ASYNCHRONOUS DELAY REPORT
Thu Oct 07 20:31:51 1999

File: bfl.dly

The 20 Worst Net Delays are:

I Max Delay (ns) I Netname I

233.706 CP1/VLX_PROCESS_OSIG_114 5
226.325 CP1/VLX_PROCESS_OSIG_114 12
208.967 CP1/VLX_PROCESS_OSIG_114^ 3
204.696 CP1/VLX_PROCESS_OSIG_114 0
203.288 CP1/VLX_PROCESS_OSIG_114 4
200.997 CP1/SGEN_NODE_390
200.639 CP1/SGEN_N0DE_387
188.841 CPl/VLX_PROCESS_0SIG_114 11
18 6.4 68 CP1/SGEN_N0DE_38 6
177.504 CPl/VLX_PROCESS_0SIG_114 9
170.385 CP1/VLX_PROCESS_OSIG_114 20
169.060 CP1/VLX_PROCESS_OSIG_114 10
160.4 59 CP4/SGEN_NODE_37 7
159.968 CP1/SGEN_N0DE_378
142.851 CP1/VLX_PROCESS_OSIG_114 2
139.017 CP4/VLX_PROCESS_0SIG_114 0
138.722 CP4/SGEN_NODE_378
138.543 CP4/VLX_PROCESS_0SIG_114 11
134.4 67 CP1/SGEN_N0DE_382
127.850 CP4/VLX PROCESS OSIG 114 12

88
POST LAYOUT TIMING REPORT

Xilinx TRACE, Version Ml.4.12

WARNING:bastw:170 - No timing constraints found, doing default

enumeration.
WARNING:xiltw:l - The current connection evaluation limit of 1000
caused the
truncation of timing analysis for paths through 26.29% of your
constrained
connections, which may limit the accuracy of this analysis. You
may specify
a larger limit with the XILINX_PATHLIMIT environment variable to
increase the
accuracy of this analysis.

Timing constraint: Default period analysis

12983036 items analyzed, 0 timing errors detected.
Maximum delay is 350.290ns.

Timing constraint: Default net enumeration

4594 items analyzed, 0 timing errors detected.
Maximum net delay is 233.706ns.

All constraints were met.

Timing summary:

Timing errors: 0 Score: 0

Constraints cover 12983036 paths, 4594 nets, and 10682 connections

(100.0%
coverage)

Design statistics:
Maximum combinational path delay: 350.290ns
Maximum net delay: 233.706ns

Analysis completed Fri Oct 08 11:23:09 1999

89
CHAPTER-7
CONCLUSION AND FUTURE
SCOPE
CONCLUSION AND FUTURE SCOPE

Fast Fourier transform processor has been successfully coded at a higher level
of design abstraction using VHDL. The design was simulated exhaustively at the
VHDL level using the Viewlogic's Speedwave simulator. It was then subsequently
synthesized with the help of Viewlogic's Aurora Synthesis tool by using the Xilinx
FPGA library. The gate level schematics, generated by the synthesis tool, were then
verified by using the Viewlogic Viewsim gate level simulator. The logically verified
gate level netlist was then implemented into the Xilnx's XC-4052XL device. The
worst case static timing information generated by the implementation tools indicates
that the butterfly post layout delay is around 350.29ns. The designed FFT processor
calculates the 8-point FFT but the code can be very easily modified for higher point
FFT's. The multiplier used is an array multiplier and the code is written in such a way
that the same code can be easily used for higher order multipliers by just changing the
generic parameter namely 'SIZE'. The delay will be drastically reduced if the same
VHDL code is implemented in the form of an ASIC.
The concept of intellectual property (IP) has become extremely popular in the
design world at the moment. Most of the complex designs are generated by the
combination of the pre-designed blocks called IP's. The objective of this dissertation
is to create an IP for FFT. This IP is available in the form of a verified VHDL code.
Any body can use this IP in his/lier complex ASIC design.

90
REFERENCES

1. J.W. Cooley and J.W. Tukey, "An Algorithms for the machine calculation of
Complex Fourier Series, ''Math Computation", Vol 19, 1965, pp. 297-301.
2. C. Runge, Z. Math. Physik, Vol. 48, 1903, p.443; also Vol. 53, 1905, p.l 17.
3. G. C. Danilson and C. Lanczos, "Some Improvements in Practical Fourier
Analysis and Their Application to X-Ray Scattering from Liquids," J.
Franklin Inst., Vol. 233, pp.365-380, 435-452.
4. J. W. Cooley, P. A. W. Lewis, and P. D. Welch, "Historical Notes on the Fast
Fourier Transform," IEEE Trans. Audio Electroacoust., Vol. AU-15, June
1967,pp.76-79.
5. W. T. Cochran et al., "What is the Fast Fourier Transform ?" IEEE Tram.
Audio Electroacoust., Vol. AU-15, June 1967, pp.45-55.
6. R. C. Singleton, "A Method for Computing the Fast Fourier Transform with
Auxiliary Memory and Limited High-Speed Storage," IEEE Trans. Audio
Electroacoust., Vol. AU-15, June 1967, pp.91-97.
7. B. Gold and C. M. Rader, Digital Processing of Signals, Mc Graw-Hill Book
Company, New York, 1969.
8. A. V. Oppenhiem and R. W. Schafer, Digital Signal Processing, Prentice Hall,
Englewood Cliffs, N. J., 1975
9. L. R. Rabiner and G. Gold, Theory and Application of Digital Signal
Processing, Prentice Hall, Englewood Cliffs, N. J., 1975
10. John G. Proakis and Dimitris G. Manolakis, Digital Signal Processing,
Principles, Algorithms and Applications, Prentice Hall, Englewood Cliffs,
N. J., 1996
11. K. C. Chang, Digital Design and Modeling with VHDL and Synthesis, IEEE
Computer Society Press, Los Alamitos, California, 1997
12. Zainalabedin Navabi, VHDL Analysis and Modeling of Digital Systems,
McGraw-Hill Inc, New York, 1993
13. Jeffory, I. Hilbert, "ASIC Technology" pp. 217-219, Academic Press Inc.
1991.
14. Stephen Trimberger, "Manager, Advanced Development, Xilinx hic", "Field
Programmable Gate An-ays", Guest Editor's Introduction, pp. 3-5, IEEE, Sept.
1992.
15. Xilinx, "XACTUser Guide", pp. 1-1 to 1-3, April 1993.
16. Xilinx, "Programmable Logic Data Book", 1994.

92
APPENDIX

The program shown below is in C language for calculating weight factors for
1024-pointFFT.
I**********************************************************************************************/

/* EVALUATION OF WEIGHT FACTOR */

# mclude<stido h>
# mclude<math h>
# mclude<stdhb h>
# define PI 3 141592654
# define TA 32768 0
mainO
{
FILE • fp,
static char name[l ]= "|",
static char mame[3]= " |",
static char Iame[2]= " 1",
float x,nv,iw,n,r,rmax,xd,rww,iww
fp=fopen("co", "w"),
n=1024 0,
r=0 0,
fprmtf(fp," RESULTS \n"),
^nntf ( fp. " -\n"),
fpnntf(fp,"l 1 1 1 1 j\n"),
f p r m t f (fp, "I WTFAC I RWF I IWF | RWW | IWW |\n"),
fpnntf(fp,"| 1 1 1 1 |\n"),
do
{
X = (PI*2 0)*(r/n),
xd = (180/PI)*x.
rw = cos(x),
Avw = rw*TA,
iw = -sin(x),
iw\v = iw*TA,
miax = n/2 1,
fprintf (fp, "I I I I I IV,"),
fpnntf (fp, " %s %-8 If %s %i2 7f %12 7f %s %I0 2f %s %I0 2f %s\n" ,name, r.name, rw, niame, iw, mame rww, lame, iww,
lame),
fpnntf (fp, "I I I I I |\n"),
r = r+l
)
while (r <= miax),
fpnntf (fp, "\n"),
)

This program was compiled and run on UNIX system. The weight factor
calculated is multiplied by 2'^ = 32768.0 to facilitate easy searching. The values of

93
real and imaginary parts of weight factor vary between 32768.00 to -32767.38 and
0.0 to -32768.00 respectively. A portion of output is illustrated below.
RESULTS

1 1 1 1 1 1

1 WT.FAC 1 RWF 1 IWF | RWW | IWW |

1 1 1 1 1 1
1 1 —1— 1 1 1
1 254.0 1 0.0122715 | -0.9999247 | 402.11 | -32765.53 |
300
1 1 1 1 sn 1 1
1 255.0 1 0.0061359 | -0.9999812 | 201.06 | -32767.38 |

1 256.0 1 0.0000000 | -1.0000000 | 0.00 | -32768.00 |

1 257.0 1 -0.0061358 | -0.9999812 | -201.06 | -32767.38 |

I I 1 I I I

The value 300 is not a weight factor it is converted in binary

(0000000100101100)2 = 012C (HEX) and represented as 1.0 (decimal).

Lonnie C. Ludeman - Fundamentals of Digital Signal Processing (, Harper & Row) - Libgen - Li PDF
100% (2)
Lonnie C. Ludeman - Fundamentals of Digital Signal Processing (, Harper & Row) - Libgen - Li PDF
360 pages
DSP by Avatar Singh PDF
80% (15)
DSP by Avatar Singh PDF
355 pages
DC Gupta Electricity and Magnetism
100% (1)
DC Gupta Electricity and Magnetism
802 pages
SANS780 2019 Ed5
No ratings yet
SANS780 2019 Ed5
55 pages
FFT128 Project
No ratings yet
FFT128 Project
70 pages
(Roberto Cristi) Modern DSP PDF
100% (1)
(Roberto Cristi) Modern DSP PDF
391 pages
CCB 2 - e PDF
No ratings yet
CCB 2 - e PDF
5 pages
Digital Signal Processing : Lecture Notes
No ratings yet
Digital Signal Processing : Lecture Notes
81 pages
Introduction To Digital Signal Processing
75% (4)
Introduction To Digital Signal Processing
284 pages
Ijatcse 144942020
No ratings yet
Ijatcse 144942020
5 pages
Digital Signal Processing S5 Syllabus
No ratings yet
Digital Signal Processing S5 Syllabus
3 pages
FFT Processor
No ratings yet
FFT Processor
29 pages
Digital Signal Processing - 13EC302
No ratings yet
Digital Signal Processing - 13EC302
3 pages
Eetop - CN - FAST FOURIER TRANSFORM PROCESSOR DESIGN
No ratings yet
Eetop - CN - FAST FOURIER TRANSFORM PROCESSOR DESIGN
186 pages
Rajalakshmi Engineering College: Thandalam, Chennai - 602 105 Lesson Plan
No ratings yet
Rajalakshmi Engineering College: Thandalam, Chennai - 602 105 Lesson Plan
6 pages
DSP Ai
No ratings yet
DSP Ai
113 pages
EC6502 Principles of Digital Signal Processing
No ratings yet
EC6502 Principles of Digital Signal Processing
320 pages
Digiatl Singal Process by Ishani Mishra Final
No ratings yet
Digiatl Singal Process by Ishani Mishra Final
328 pages
DSP 15
No ratings yet
DSP 15
8 pages
DSP
No ratings yet
DSP
5 pages
ADSP
No ratings yet
ADSP
1 page
DSP Notes
No ratings yet
DSP Notes
230 pages
Design of Radix-2 Butterfly Processor
100% (1)
Design of Radix-2 Butterfly Processor
39 pages
Alan v. Oppenheim, Ronald W. Schafer - Digital Signal Processing (1975, Prentice-Hall) - Libgen - Li
50% (2)
Alan v. Oppenheim, Ronald W. Schafer - Digital Signal Processing (1975, Prentice-Hall) - Libgen - Li
600 pages
4 Ijcsi
No ratings yet
4 Ijcsi
10 pages
Col10598 - Digital Signal Processing and Digital Filter Design (Draft) (C. Sidney Burrus)
100% (1)
Col10598 - Digital Signal Processing and Digital Filter Design (Draft) (C. Sidney Burrus)
320 pages
Aim and Objective of The Course:: Course Structure Analysis Subject Wise
No ratings yet
Aim and Objective of The Course:: Course Structure Analysis Subject Wise
10 pages
Pattern Recognition
No ratings yet
Pattern Recognition
28 pages
FPGA Implimentation of Adaptive Noise Cancellation Project Report With VHDL Program
No ratings yet
FPGA Implimentation of Adaptive Noise Cancellation Project Report With VHDL Program
26 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
168 pages
EEE 420 Digital Signal Processing: Instructor: Erhan A. Ince E-Mail
No ratings yet
EEE 420 Digital Signal Processing: Instructor: Erhan A. Ince E-Mail
19 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
163 pages
Digital
No ratings yet
Digital
3 pages
Libro Ifeachor DSP
100% (3)
Libro Ifeachor DSP
862 pages
Lecture Notes On Discrete-Time Signal Processing
100% (3)
Lecture Notes On Discrete-Time Signal Processing
155 pages
Handbook of Real Time Fast Fourier Transform
100% (4)
Handbook of Real Time Fast Fourier Transform
484 pages
Signal Processing: Understanding Digital
No ratings yet
Signal Processing: Understanding Digital
7 pages
DSP Course Out Line
No ratings yet
DSP Course Out Line
6 pages
DSP Ec1361
No ratings yet
DSP Ec1361
3 pages
Ec8553 Discrete-Time Signal Processing
No ratings yet
Ec8553 Discrete-Time Signal Processing
223 pages
FFT Implementation in FPGA
No ratings yet
FFT Implementation in FPGA
52 pages
Review of Oppenheim and Schafer's Digital Signal Processing - Longo (1978)
No ratings yet
Review of Oppenheim and Schafer's Digital Signal Processing - Longo (1978)
2 pages
On FRFT
No ratings yet
On FRFT
11 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
2 pages
VLSI Implementation of Pipelined Fast Fourier Transform
No ratings yet
VLSI Implementation of Pipelined Fast Fourier Transform
6 pages
Applied Digital Signal Processing PDF
100% (5)
Applied Digital Signal Processing PDF
1,009 pages
19EC409 - Discrete Time Signal Processing
No ratings yet
19EC409 - Discrete Time Signal Processing
4 pages
Digital Signal Processing Quantum
0% (1)
Digital Signal Processing Quantum
332 pages
1 PDF
No ratings yet
1 PDF
10 pages
1 PDF
No ratings yet
1 PDF
10 pages
1 PDF
No ratings yet
1 PDF
10 pages
EC3492 - DIGITAL SIGNAL PROCESSING Notes - by WWW - Notesfree.in
No ratings yet
EC3492 - DIGITAL SIGNAL PROCESSING Notes - by WWW - Notesfree.in
163 pages
D.S.P Book N.G Palan PDF
79% (14)
D.S.P Book N.G Palan PDF
692 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
3 pages
D.S.P Book N.G Palan PDF
No ratings yet
D.S.P Book N.G Palan PDF
692 pages
Digital Signal Processing 6
No ratings yet
Digital Signal Processing 6
163 pages
Digi Sign Proc
No ratings yet
Digi Sign Proc
40 pages
EEE3218 New
No ratings yet
EEE3218 New
116 pages
Digital Signal Processing by Ramesh Babu
No ratings yet
Digital Signal Processing by Ramesh Babu
305 pages
Data Communication and Networking: For Under-graduate Students
From Everand
Data Communication and Networking: For Under-graduate Students
DR LILADHAR REWATKAR
No ratings yet
CCNA Certification Study Guide Volume 1: Exam 200-301 v1.1
From Everand
CCNA Certification Study Guide Volume 1: Exam 200-301 v1.1
Todd Lammle
5/5 (1)
Mastering AUTOSAR: A Comprehensive Guide for Automotive Engineers
From Everand
Mastering AUTOSAR: A Comprehensive Guide for Automotive Engineers
Mohamad Charara
No ratings yet
CCST Cisco Certified Support Technician Study Guide: Networking Exam
From Everand
CCST Cisco Certified Support Technician Study Guide: Networking Exam
Todd Lammle
5/5 (1)
Traffic Lights Detection and Classification Using ResNet50
No ratings yet
Traffic Lights Detection and Classification Using ResNet50
53 pages
Iot-Based Occupancy Monitoring Techniques For Energy Efficient Smart Buildings
No ratings yet
Iot-Based Occupancy Monitoring Techniques For Energy Efficient Smart Buildings
8 pages
Batch 7
No ratings yet
Batch 7
74 pages
UART
No ratings yet
UART
12 pages
Sign
No ratings yet
Sign
1 page
GSM Based Soldier Tracking System and Monitoring U
No ratings yet
GSM Based Soldier Tracking System and Monitoring U
12 pages
Iot Based Coal Mine Safety System
No ratings yet
Iot Based Coal Mine Safety System
79 pages
Batch 8
No ratings yet
Batch 8
9 pages
Batch - 10
No ratings yet
Batch - 10
80 pages
Density Based Traffic Control System Using Image Processing: November 2018
No ratings yet
Density Based Traffic Control System Using Image Processing: November 2018
5 pages
Final Document-31
No ratings yet
Final Document-31
45 pages
ASP .NET With AWS SYLABUS
No ratings yet
ASP .NET With AWS SYLABUS
12 pages
Multipurpose Robot: by Sathishkumar G
No ratings yet
Multipurpose Robot: by Sathishkumar G
19 pages
Railway Track Crack Detection
No ratings yet
Railway Track Crack Detection
53 pages
Intelligent Robot For Real Time Operations Using Iot
No ratings yet
Intelligent Robot For Real Time Operations Using Iot
16 pages
Energy: Dawid Taler, Jan Taler, Marcin Trojan
No ratings yet
Energy: Dawid Taler, Jan Taler, Marcin Trojan
14 pages
2019-A Bi-Objective Hyper-Heuristic Support Vector Machines For Big Data Cyber - Security
No ratings yet
2019-A Bi-Objective Hyper-Heuristic Support Vector Machines For Big Data Cyber - Security
11 pages
Iot Based Weather Monitoring System With Push Notifications: Technical Specifications: Hardware
No ratings yet
Iot Based Weather Monitoring System With Push Notifications: Technical Specifications: Hardware
2 pages
2019-Exploratory Visual Sequence Mining Based On Pattern-Growth
No ratings yet
2019-Exploratory Visual Sequence Mining Based On Pattern-Growth
14 pages
NEXUS 1262 Modbus Map
No ratings yet
NEXUS 1262 Modbus Map
406 pages
ELK415 - Worked Example1
No ratings yet
ELK415 - Worked Example1
10 pages
C 2012 Naren Velez
No ratings yet
C 2012 Naren Velez
19 pages
1 Letter-217 FOTE Training
No ratings yet
1 Letter-217 FOTE Training
2 pages
BBAi - Fonroche Lighting - Specifications - 2022 - EN
No ratings yet
BBAi - Fonroche Lighting - Specifications - 2022 - EN
2 pages
Switched-Capacitor Boost-Buck Ladder Converters With Extended Voltage Range in Standard CMOS
No ratings yet
Switched-Capacitor Boost-Buck Ladder Converters With Extended Voltage Range in Standard CMOS
14 pages
Irgb 4062 DPBF
No ratings yet
Irgb 4062 DPBF
13 pages
EK68 Manual
No ratings yet
EK68 Manual
18 pages
Uttarakhand Power Corporation Limited: ACCOUNT NO: 40108161061
No ratings yet
Uttarakhand Power Corporation Limited: ACCOUNT NO: 40108161061
2 pages
Chapter 4 - Transmission Line Parameters
No ratings yet
Chapter 4 - Transmission Line Parameters
56 pages
Manual Asus p5vd2-x
No ratings yet
Manual Asus p5vd2-x
108 pages
Thermax Offer - REL Foods-Bhartiyam Bev R4
No ratings yet
Thermax Offer - REL Foods-Bhartiyam Bev R4
32 pages
Ial Physics Notes
No ratings yet
Ial Physics Notes
72 pages
Cables GIZA
No ratings yet
Cables GIZA
50 pages
Wiring For CAN Bus
100% (2)
Wiring For CAN Bus
27 pages
Chapter 5 - v2
No ratings yet
Chapter 5 - v2
84 pages
Enva Alarm 130223 - 1300
No ratings yet
Enva Alarm 130223 - 1300
668 pages
Tmai Pub Exp 250131 Tmai Corporate Exposure
No ratings yet
Tmai Pub Exp 250131 Tmai Corporate Exposure
9 pages
Study For Performance Comparison of SFIG and DFIG Based Wind Turbines
No ratings yet
Study For Performance Comparison of SFIG and DFIG Based Wind Turbines
10 pages
BW2786 Version of BW2780 Released 7 - 31 - 09 PDF
No ratings yet
BW2786 Version of BW2780 Released 7 - 31 - 09 PDF
2 pages
Notes 03 - Second Order Systems
No ratings yet
Notes 03 - Second Order Systems
2 pages
ABB Brochure Hydrogen A 2021 LowRes
No ratings yet
ABB Brochure Hydrogen A 2021 LowRes
10 pages
Alpha AX20
No ratings yet
Alpha AX20
7 pages
IM31 SM3 2 1 0 0 System Regulator
No ratings yet
IM31 SM3 2 1 0 0 System Regulator
32 pages
PDU
No ratings yet
PDU
2 pages
Encoder
No ratings yet
Encoder
6 pages
49 00 00 Fi
No ratings yet
49 00 00 Fi
8 pages