0% found this document useful (0 votes)
67 views75 pages

8 Karatsuba Document

About multiplier in detail

Uploaded by

21wj1a04n1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views75 pages

8 Karatsuba Document

About multiplier in detail

Uploaded by

21wj1a04n1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 75

ABSTRACT

Computation intensive applications such as DSP, image processing, floating point processors
and communication technologies today require efficient binary multiplication which usually is
the most power and time consuming block. This paper proposes an efficient design for
unsigned binary multiplication to reduce delay. A 32×32 -bit multiplier has been designed
which is based on Vedic Karatsuba algorithm using reversible logic. The designs have been
coded in Verilog, synthesized in Xilinx vivado ise.
Chapter 1
INTRODUCTION

Multipliers play an important role in today’s digital signal processing and various other
applications. With advances in technology, many researchers have tried and are trying to design
multipliers which offer either of the following design targets – high speed, low power
consumption, regularity of layout and hence less area or even combination of them in one
multiplier thus making them suitable for various high speed, low power and compact VLSI
implementation.
The common multiplication method is “add and shift” algorithm. In parallel multipliers
number of partial products to be added is the main parameter that determines the performance of
the multiplier. To reduce the number of partial products to be added, vedic multiplier is one of
the most popular method. In this lecture we introduce the multiplication algorithms and
architecture and compare them in terms of speed, area, power and combination of these metrics.
The basic method of multiplier is explains below

The binary multiplication also happens in same way of digit multiplication as shown in
below example here by getting partial products and gates are used and we are using adder (half
adder ,full adder)adding the columns .
An example of 4-bit multiplication method is shown below:
Although the method is simple as it can be seen from this example, the addition is done serially
as well as in parallel. To improve on the delay and area the CRAs are replaced with Carry Save
Adders, in which every carry and sum signal is passed to the adders of the next stage. Final
product is obtained in a final adder by any fast adder (usually carry ripple adder). In array
multiplication we need to add, as many partial products as there are multiplier bits. This
arrangements is shown in the figure below
Fig : array multiplier
In applications like multimedia signal processing and data mining which can tolerate error, exact
computing units are not always necessary. They can be replaced with their approximate
counterparts. Research on approximate computing for error tolerant applications is on the rise.
Adders and multipliers form the key components in these applications. In, approximate full
adders are proposed at transistor level and they are utilized in digital signal processing
applications.
IMPLEMENTATION OF WALLACE MULTIPLIER
The Wallace tree has three steps:
1. Multiply (that is - AND) each bit of one of the arguments, by each bit of the other,
yielding results. Depending on position of the multiplied bits, the wires carry different weights,
for example wire of bit carrying result is 32.
2. Reduce the number of partial products to two by layers of full and half adders.
3. Group the wires in two numbers, and add them with a conventional adder.
4. The second phase works as long as there are three or more wires with the same weight add a
following layer:
 Take any three wires with the same weights and input them into a full adder. The result will be
an output wire of the same weight and an output wire with a higher weight for each three input
wires.
 If there are two wires of the same weight left, input them into a half adder.
 If there is just one wire left, connect it to the next layer.
a)Steps involved in WALLACE TREE multipliers Algorithm:

 Multiply (that is - AND) each bit of one of the arguments, by each bit of the other, yielding N
results. Depending on position of the multiplied bits, the wires carry different weights.
 Reduce the number of partial products to two layers of full adders.
 Group the wires in two numbers, and add them with a conventional adder.
Fig-2 Product terms generated by a collection of AND gates

b)WALLACE TREE Multiplier Using Adder

Ripple Carry Adder is the method used to add more number of additions to be performed
with the carry in sand carry outs that is to be chained. Thus multiple adders are used in ripple
carry adder. It is possible to create a logical circuit using several full adders to add multiple-bit
numbers. Each full adder inputs a Cin, which is the Cout of the previous adder. This kind of
adder is a ripple carry adder, since each carry bit "ripples" to the next full adder.The proposed
architecture of WALLACE multiplier algorithm using RCA is shown in Fig
Take any 3 values with the same weights and give them as input into a full adder. The result will
be an output wire of the same weight.
 Partial product obtained after multiplication is taken at the first stage. The data’s are taken with 3
wires and added using adders and the carry of each stage is added with next two data’s in the
same stage.
 Partial products reduced to two layers of full adders with same procedure.
At the final stage, same method of ripple carry adder method is performed and thus product
terms p1 to p8 isobtained .
Fig :4x4 WALLACE Multiplier

3.Wallace tree multiplier operation


• A fast way to multiply two binary integers.
• Any multiplier has three stages.
Stage1: partial products.
Stage2:partial product addition.
Stage3:final addition.
Stage1 partial products
• Works exactly like “long hand” multiplication.Numbers are binary integers.Products(each blue
square) are the result of a simple and gate.All products are done simultaneously.Fast (however
long and gates takes).Trick is in adding up the columns
Fig 5(a): Stage1 partial products
Stage2:partial product addition ,step1
To add up columns, add up three rows at a time.The result for each set of three rows is a set of
two rows.Each resulting set of two rows has a row for the sum and a row for the carry-out.Odd
rows are left alone.
red: full adder output.
yellow: half adder output.
green :left alone.

Fig 5(b): partial product addition ,step1


Stage2 :partial products addition,step2
• Repeat the same process.This time, three are two sets of three rows .Result in two set of two
rows . Gray boxes indicate that summation bits have been moved down to the carry_out row.

Fig 5( c)partial products addition,step2


Stage2:partial products addition,step4
• Repeat the process one last time.Remaining three rows become two rows.In this example, stage2
has 4 steps ,and 4 full adder delays.The five LSB have already been calculated.

Fig 5(d): partial products addition,step4


Stage3:final addition
• Final result is calculated by adding the final two rows. In this example, the 5 LSBs do no
need to be added. The saving from already having 5bits offsets the delay from doing
stage2Result is that Wallace tree multiplication takes about the same amount of time as a 2N
_bit ripple carry adder.
Fig 5(e): final addition
Booth multiplier

Booth's multiplication algorithm is a multiplication algorithm that multiplies two signed


binary numbers in two's complement notation. The algorithm was invented by Andrew Donald
Booth. Booth used desk calculators that were faster at shifting than adding and created the
algorithm to increase their speed. Booth's algorithm is of interest in the study of computer
architecture.
Booth's algorithm examines adjacent pairs of bits of the N-bit multiplier Y in signed
two's complement representation, including an implicit bit below the least significant bit, y-1 = 0.
For each bit yi, for i running from 0 to N-1, the bits yi and yi-1 are considered. Where these two
bits are equal, the product accumulator P remains unchanged. Where yi = 0 and yi-1 = 1, the
multiplicand times 2i is added to P; and where yi = 1 and yi-1 = 0, the multiplicand times 2 i is
subtracted from P. The final value of P is the signed product.
The representation of the multiplicand and product are not specified; typically, these are
both also in two's complement representation, like the multiplier, but any number system that
supports addition and subtraction will work as well. As stated here, the order of the steps is not
determined. Typically, it proceeds from LSB to MSB, starting at i = 0; the multiplication by 2 i is
then typically replaced by incremental shifting of the P accumulator to the right between steps;
low bits can be shifted out, and subsequent additions and subtractions can then be done just on
the highest N bits of P. There are many variations and optimizations on these details.
The algorithm is often described as converting strings of 1's in the multiplier to a high-
order +1 and a low-order –1 at the ends of the string. When a string runs through the MSB, there
is no high-order +1, and the net effect is interpretation as a negative of the appropriate value.
Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned
binary addition) one of two predetermined values A and S to a product P, then performing a
rightward arithmetic shift on P. Let m and r be the multiplicand and multiplier, respectively; and
let x and y represent the number of bits in m and r. Determine the values of A and S, and the
initial value of P. All of these numbers should have a length equal to (x + y + 1).
 A: Fill the most significant (leftmost) bits with the value of m. Fill the remaining (y + 1) bits
with zeros.
 S: Fill the most significant bits with the value of (−m) in two's complement notation. Fill the
remaining (y + 1) bits with zeros.
 P: Fill the most significant x bits with zeros. To the right of this, append the value of r. Fill the
least significant (rightmost) bit with a zero.
Determine the two least significant (rightmost) bits of P. If they are 01, find the value of
P + A. Ignore any overflow. If they are 10, find the value of P + S. Ignore any overflow. If they
are 00, do nothing. Use P directly in the next step. If they are 11, do nothing. Use P directly in
the next step. Arithmetically shift the value obtained in the 2nd step by a single place to the right.
Let P now equal this new value. Repeat steps 2 and 3 until they have been done y times. Drop the
least significant (rightmost) bit from P. This is the product of m and r.
 Example
Find 3 × (−4), with m = 3 and r = −4, and x = 4 and y = 4:
m = 0011, -m = 1101, r = 1100
A = 0011 0000 0
S = 1101 0000 0
P = 0000 1100 0
Perform the loop four times :
P = 0000 1100 0. The last two bits are 00.
P = 0000 0110 0. Arithmetic right shift.
P = 0000 0110 0. The last two bits are 00.
P = 0000 0011 0. Arithmetic right shift.
P = 0000 0011 0. The last two bits are 10.
P = 1101 0011 0. P = P + S.
P = 1110 1001 1. Arithmetic right shift.
P = 1110 1001 1. The last two bits are 11.
P = 1111 0100 1. Arithmetic right shift.
The product is 1111 0100, which is −12.
multipliers form the basic building block of many ASIP as well as general purpose processors.
Hence performanceof such systems depends on processing speed, area and power dissipation of
the multipliers. Although conventional algorithms like Wallace Tree and Dadda [1]-[4] are
proven to be time-tested, yet with advancement in technology and frequent change in VLSI-
architectures, need for faster multiplication-algorithms/ architectures emerged. The Vedic
Multiplier Algorithm (VMA) [5] proposed for decimal numbers, uses a novel method to reduce
the number of partial product terms. This paper proposes a multiplier based on Vedic Karatsuba
algorithm, which is modified with the “adaptive” concept as also by using proposed high speed
adders and adopted for binary system. The performance is validated and benchmarked against
conventional 16×16-bit Wallace and Dadda multipliers. The analyses are done using 3-different
adder-designs with the Square-Root-Carry-Select- Adder (SRCSA) giving the best results in
terms of delay. All designs are implemented on ASIC platform and based on the analyses, the
best design is proposed.

Chapter 2
LITERATURE VIEW
Vijay kumar reddy Modified High Speed Vedic Multiplier Design and Implementation The
proposed research work specifies the modified version of binary vedic multiplier using vedic
sutras of ancient vedic mathematics.It provides modification in preliminarilry implemented vedic
multiplier.The modified binary vedic multiplier is preferable has shown improvement in the
terms of the time delay and also device utilization.The proposed technique was designed and
implemented in Verilog HDL.For HDL simulation, modelsim tool is used and for circuit
synthesis,Xilinx is used.The simulation has been done for 4 bit, 8 bit,16 bit, multiplication
operation. Only for 16 bit binary vedic multiplier technique the simulation results are shown.This
modified multiplication technique is extended for larger sizes.The outcomes of this
multiplication technique is compared with existing vedic multiplier techniques.

A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and Analysis of Approximate


Compressors for Multiplication”, Inexact (or approximate) computing is an attractive paradigm
for digital processing at nanometric scales. Inexact computing is particularly interesting for
computer arithmetic designs. This paper deals with the analysis and design of two new
approximate 4-2 compressors for utilization in a multiplier. These designs rely on different
features of compression, such that imprecision in computation (as measured by the error rate and
the so-called normalized error distance) can meet with respect to circuit-based figures of merit of
a design (number of transistors, delay and power consumption). Four different schemes for
utilizing the proposed approximate compressors are proposed and analyzed for a Dadda
multiplier. Extensive simulation results are provided and an application of the multipliers to
image processing is presented. The results show that the proposed designs accomplish significant
reductions in power dissipation, delay and transistor count compared to an exact design;
moreover, two of the proposed multiplier designs provide excellent capabilities for image
multiplication with respect to average normalized error distance and peak signal-tonoise ratio
(more than 50 dB for the considered image examples).
C. Liu, J. Han, and F. Lombardi, “A Low-Power, High-Performance Multiplier with
Configurable Partial Error Recovery”, Proc. of IEEE Design, Automation & Test in Europe
Conference & Exhibition (DATE), [Approximate circuits have been considered for error-tolerant
applications that can tolerate some loss of accuracy with improved performance and energy
efficiency. Multipliers are key arithmetic circuits in many such applications such as digital signal
processing (DSP). In this paper, a novel multiplier with a lower power consumption and a shorter
critical path than traditional multipliers is proposed for high-performance DSP applications. This
multiplier leverages a newly-designed approximate adder that limits its carry propagation to the
nearest neighbors for fast partial product accumulation. Different levels of accuracy can be
achieved through a configurable error recovery by using different numbers of most significant
bits (MSBs) for error reduction. The multiplier has a low mean error distance, i.e., most of the
errors are not significant in magnitude. Compared to the Wallace multiplier, a 16-bit multiplier
implemented in a 28nm CMOS process shows a reduction in delay and power of 20% and up to
69%, respectively. It is shown that by utilizing an appropriate error recovery, the proposed
multiplier achieves similar processing accuracy as traditional exact multipliers but with
significant improvements in power and performance.
G. Zervakis, et al., “Design-Efficient Approximate Multiplication Circuits Through Partial
Product Perforation” Approximate computing has received significant attention as a promising
strategy to decrease power consumption of inherently error tolerant applications. In this paper,
we focus on hardware-level approximation by introducing the partial product perforation
technique for designing approximate multiplication circuits. We prove in a mathematically
rigorous manner that in partial product perforation, the imposed errors are bounded and
predictable, depending only on the input distribution. Through extensive experimental
evaluation, we apply the partial product perforation method on different multiplier architectures
and expose the optimal architecture-perforation configuration pairs for different error constraints.
We show that, compared with the respective exact design, the partial product perforation delivers
reductions of up to 50% in power consumption, 45% in area, and 35% in critical delay. In
addition, the product perforation method is compared with the state-of-the-art approximation
techniques, i.e., truncation, voltage overscaling, and logic approximation, showing that it
outperforms them in terms of power dissipation and error.
T. Yang, T. Ukezono, and T. Sato “A Low-Power High-Speed Accuracy-Controllable Multiplier
Design”, Multiplication is a key fundamental function for many error-tolerant applications.
Approximate multiplication is considered to be an efficient technique for trading off energy
against performance and accuracy. This paper proposes an accuracy-controllable multiplier
whose final product is generated by a carry-maskable adder. The proposed scheme can
dynamically select the length of the carry propagation to satisfy the accuracy requirements
flexibly. The partial product tree of the multiplier is approximated by the proposed tree
compressor. An 8 × 8multiplier design is implemented by employing the carry maskable adder
and the compressor. Compared with a conventional Wallace tree multiplier, the proposed
multiplier reduced power consumption by between 47.3% and 56.2% and critical path delay by
between 29.9% and 60.5%, depending on the required accuracy. Its silicon area was also 44.6%
smaller. In addition, results from an image processing application demonstrate that the quality of
the processed images can be controlled by the proposed multiplier design.
A. Cilardo, et al., “High-Speed Speculative Multipliers Based on Speculative Carry-Save
Tree”,Sacrificing exact calculations to improve digital circuit performance is at the foundation of
approximate computing. In this paper, an approximate multiply-and-accumulate (MAC) unit is
introduced. The MAC partial product terms are compressed by using simple OR gates as
approximate counters; moreover, to further save energy, selected columns of the partial product
terms are not formed. A compensation term is introduced in the proposed MAC, to reduce the
overall approximation error. A MAC unit, specialized to
perform 2D convolution, is designed following the proposed approach and implemented in
TSMC 40nm technology in four different configurations. The proposed circuits achieve power
savings more than 60%, compared to standard, exact MAC, with tolerable image quality
degradation.
J. Liang, et al., “New Metrics for The Reliability of Approximate and Probabilistic Adders”,
Approximate/inexact computing has become an attractive approach for designing high
performance and low power arithmetic circuits. Floating-point (FLP) arithmetic is required in
many applications, such as digital signal processing, image processing and machine learning.
Approximate FLP multipliers with variable accuracy are proposed in this paper; the accuracy and
the circuit requirements of these designs are analyzed and assessed according to different
metrics. It is shown that the proposed approximate FLP multiplier designs further reduce delay,
area, power consumption and power-delay product (PDP) while incurring about half of the
normalized mean error distance (NMED) compared with the previous designs. The proposed
IFLPM24–15 is the most efficient design when considering both PDP and NMED. Case studies
with three error-tolerant applications show the validity of the proposed approximate designs.
Chapter 3
INTRODUCTION OF VLSI
Very-large-scale integration (VLSI) is the process of creating integrated circuits by
combining thousands of transistor-based circuits into a single chip. VLSI began in the 1970s
when complex semiconductor and communication technologies were being developed. The
microprocessor is a VLSI device. The term is no longer as common as it once was, as chips have
increased in complexity into the hundreds of millions of transistors.
3.1Overview
The first semiconductor chips held one transistor each. Subsequent advances added more
and more transistors, and, as a consequence, more individual functions or systems were
integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten
diodes, transistors, resistors and capacitors, making it possible to fabricate one or more logic
gates on a single device. Now known retrospectively as "small-scale integration" (SSI),
improvements in technique led to devices with hundreds of logic gates, known as large-scale
integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has moved
far past this mark and today's microprocessors have many millions of gates and hundreds of
millions of individual transistors.
At one time, there was an effort to name and calibrate various levels of large-scale
integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used. But the huge
number of gates and transistors available on common devices has rendered such fine distinctions
moot. Terms suggesting greater than VLSI levels of integration are no longer in widespread use.
Even VLSI is now somewhat quaint, given the common assumption that all microprocessors are
VLSI or better.
As of early 2008, billion-transistor processors are commercially available, an
example of which is Intel's Montecito Itanium chip. This is expected to become more
commonplace as semiconductor fabrication moves from the current generation of 65 nm
processes to the next 45 nm generations (while experiencing new challenges such as increased
variation across process corners). Another notable example is NVIDIA’s 280 series GPU.
This microprocessor is unique in the fact that its 1.4 Billion transistor count,
capable of a teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor
count is largely due to the 24MB L3 cache). Current designs, as opposed to the earliest devices,
use extensive design automation and automated logic synthesis to lay out the transistors,
enabling higher levels of complexity in the resulting logic functionality. Certain high-
performance logic blocks like the SRAM cell, however, are still designed by hand to ensure the
highest efficiency (sometimes by bending or breaking established design rules to obtain the last
bit of performance by trading stability).
3.2What is VLSI?
VLSI stands for "Very Large Scale Integration". This is the field which involves packing
more and more logic devices into smaller and smaller areas.
VLSI
1. Simply we say Integrated circuit is many transistors on one chip.
2. Design/manufacturing of extremely small, complex circuitry using modified semiconductor
material
3. Integrated circuit (IC) may contain millions of transistors, each a few mm in size
4. Applications wide ranging: most electronic logic devices
3.3History of Scale Integration
 late 40s Transistor invented at Bell Labs
 late 50s First IC (JK-FF by Jack Kilby at TI)
 early 60s Small Scale Integration (SSI)
 10s of transistors on a chip
 late 60s Medium Scale Integration (MSI)
 100s of transistors on a chip
 early 70s Large Scale Integration (LSI)
 1000s of transistor on a chip
 early 80s VLSI 10,000s of transistors on a
 chip (later 100,000s & now 1,000,000s)
 Ultra LSI is sometimes used for 1,000,000s
 SSI - Small-Scale Integration (0-102)
 MSI - Medium-Scale Integration (102-103)
 LSI - Large-Scale Integration (103-105)
 VLSI - Very Large-Scale Integration (105-107)
 ULSI - Ultra Large-Scale Integration (>=107)
3.4Advantages of ICs over discrete components
While we will concentrate on integrated circuits, the properties of
integrated circuits-what we can and cannot efficiently put in an integrated circuit-largely
determine the architecture of the entire system. Integrated circuits improve system characteristics
in several critical ways. ICs have three key advantages over digital circuits built from discrete
components:
 Size. Integrated circuits are much smaller-both transistors and wires are shrunk to micrometer
sizes, compared to the millimetre or centimetre scales of discrete components. Small size leads to
advantages in speed and power consumption, since smaller components have smaller parasitic
resistances, capacitances, and inductances.
 Speed. Signals can be switched between logic 0 and logic 1 much quicker within a chip than they
can between chips. Communication within a chip can occur hundreds of times faster than
communication between chips on a printed circuit board. The high speed of circuits on-chip is
due to their small size-smaller components and wires have smaller parasitic capacitances to slow
down the signal.
 Power consumption. Logic operations within a chip also take much less power. Once again,
lower power consumption is largely due to the small size of circuits on the chip-smaller parasitic
capacitances and resistances require less power to drive them.
3.5VLSI and systems
These advantages of integrated circuits translate into advantages at the system level:
 Smaller physical size. Smallness is often an advantage in itself-consider portable televisions or
handheld cellular telephones.
 Lower power consumption. Replacing a handful of standard parts with a single chip reduces total
power consumption. Reducing power consumption has a ripple effect on the rest of the system: a
smaller, cheaper power supply can be used; since less power consumption means less heat, a fan
may no longer be necessary; a simpler cabinet with less shielding for electromagnetic shielding
may be feasible, too.
 Reduced cost. Reducing the number of components, the power supply requirements, cabinet
costs, and so on, will inevitably reduce system cost. The ripple effect of integration is such that
the cost of a system built from custom ICs can be less, even though the individual ICs cost more
than the standard parts they replace.
Understanding why integrated circuit technology has such profound influence on the design of
digital systems requires understanding both the technology of IC manufacturing and the
economics of ICs and digital systems.
3.6Applications
 Electronic system in cars.
 Digital electronics control VCRs
 Transaction processing system, ATM
 Personal computers and Workstations
 Medical electronic systems.
3.7Applications of VLSI
Electronic systems now perform a wide variety of tasks in daily life. Electronic
systems in some cases have replaced mechanisms that operated mechanically, hydraulically, or
by other means; electronics are usually smaller, more flexible, and easier to service. In other
cases electronic systems have created totally new applications. Electronic systems perform a
variety of tasks, some of them visible, some more hidden:
 Personal entertainment systems such as portable MP3 players and DVD players perform
sophisticated algorithms with remarkably little energy.
 Electronic systems in cars operate stereo systems and displays; they also control fuel injection
systems, adjust suspensions to varying terrain, and perform the control functions required for
anti-lock braking (ABS) systems.
 Digital electronics compress and decompress video, even at high-definition data rates, on-the-fly
in consumer electronics.
 Low-cost terminals for Web browsing still require sophisticated electronics, despite their
dedicated function.
 Personal computers and workstations provide word-processing, financial analysis, and games.
Computers include both central processing units (CPUs) and special-purpose hardware for disk
access, faster screen display, etc.
 Medical electronic systems measure bodily functions and perform complex processing
algorithms to warn about unusual conditions. The availability of these complex systems, far from
overwhelming consumers, only creates demand for even more complex systems.
The growing sophistication of applications continually pushes the design and manufacturing of
integrated circuits and electronic systems to new levels of complexity.
And perhaps the most amazing characteristic of this collection of systems is its variety-as
systems become more complex, we build not a few general-purpose computers but an ever wider
range of special-purpose systems. Our ability to do so is a testament to our growing mastery of
both integrated circuit manufacturing and design, but the increasing demands of customers
continue to test the limits of design and manufacturing
Chapter 4
Vedic Multiplier using karastubha algorithm
Vedic mathematics:
Vedic Mathematics is one of the most ancient methodologies used by the Aryans in
order to perform mathematical calculations. This consists of algorithms that can boil down large
arithmetic operations to simple mind calculations. The above said advantage stems from the fact
that Vedic mathematics approach is totally different and considered very close to the way a
human mind works. The efforts put by Jagadguru Swami Sri Bharati Krishna Tirtha Maharaja to
introduce Vedic Mathematics to the commoners as well as streamline Vedic Algorithms into 16
categories or Sutras needs to be acknowledged and appreciated. The vedic algorithm is one such
multiplication algorithm which is well known for its efficiency in reducing the calculations
involved.
With the advancement in the VLSI technology, there is an ever increasing quench
for portable and embedded Digital Signal Processing (DSP) systems. DSP is omnipresent in
almost every engineering discipline. Faster additions and multiplications are the order of the day.
Multiplication is the most basic and frequently used operations in a CPU. Multiplication is an
operation of scaling one number by another. Multiplication operations also form the basis for
other complex operations such as convolution, Discrete Fourier Transform, Fast Fourier
Transforms, etc. With ever increasing need for faster clock frequency it becomes imperative to
have faster arithmetic unit. Therefore, DSP engineers are constantly looking for new algorithms
and hardware to implement them. Vedic mathematics can be aptly employed here to perform
multiplication.
RECURSIVE KARATSUBA MULTIPLICATION
Recursive Karatsuba is based on incorporating Karatsuba algorithm repeatedly at every
stage to improve speed when bit size is high. The algorithm works on separating the
bits (N) into groups of half-the-number-of-bits (N/2) and then following the same Karatsuba
procedure with the segmented bits recursively. For a 16-bit multiplication, e.g., it breaks the
number down to 8-bit multiplication, which is further divided into 4-bit and finally reduced to 2-
bit which is the last stage for normal multiplication to be performed. At every stage we have
implemented adaptive Karatsuba for the 3rd product term .
Standard Karatsuba Multiplier
Let X and Y are inputs of ‘n’ bits. Assuming decomposition of X and Y into 2 equal parts; X H,
YH represent the higher order bits and X L, YL the lower order bits. Their product can be
computed as:

In Karatsuba algorithm the computation is rewritten as:


XHYL + XLYH = (XH + XL)(YH + YL) — XHYH — XLYL (2)
So, 4×n/2-bit multiplications can be reduced to 3×n/2-bit multiplications: (XH + YH)(XL +
YL), XHYH and XLYL. Fig. 1 shows the standard Karatsuba multiplier at a stage when inputs are
n-bits.
Time complexity of conventional multiplication algorithm requires:
O(n) = n2 (3)
whereas Karatsuba multiplication algorithm requires :
O(n) = n1.58 (4)
where n is the number of bits and O is order of complexity, considering O(1) = 1. This shows
analytically that Karatsuba algorithm with a complexity of n 1.58 (due to logarithmic dependency
of n) is faster than the standard multiplication due to the logarithmic power of n.
Fig. 1. Standard Karatsuba Multiplier for n-Bits
We modify the Karatsuba algorithm such that the computation of the third product is performed
efficiently. Assuming inputs X and Y be of ‘n’ bits, we have the argument
of the third product of (n/2 + 1) bits. Let Z and U be these
arguments left padded with (n/2 -1 )bits. ZH, UH represent the higher order bits and ZL,UL
represent the lower order bits where ZH and UH can be either 0 or 1, because they are the
carry-outs of the third product arguments (XH + YH) and (XL + YL). Thus

Depending on ZH, UH the above expression can be evaluated as shown in Table I.

Table 1
It can be observed from Table I that third product- computation of (n/2 + 1)bits requires one
multiplication of(n/2 -1) bits and additional shifting, adding and multiplexing
operations, instead of a (n/2 + 1) bit multiplier at every stage.This makes Karatsuba
implementation recursive, without additional hardware. Fig. 2 shows the adaptive concept
for(n/2 + 1) bit computation at a stage when inputs are n bits and where the ‘Shift and Add’
block applies in appropriate cases as mentioned in Table I.
Fig. 2. Adaptive Concept for 3rd Product Computation
Conventional Wallace and Dadda multiplier uses 3:2 compressors for carrying out addition of
the generated partial products. The analyses depict that the proposed multiplier based on
adaptive, recursive Karatsuba approach has lesser delay compared to the conventional Wallace
and Dadda multiplier.
CARRY SELECT ADDERS
The performance of the Karatsuba multiplier has been further improved by the use of high
speed parallel adders at different stages. The optimizations that have been carried out
are as follows:
 Additions involving 16 bit and above are carried out using the proposed fast adders.
 Final addition in the third-product-computation is carried out using carry save adders with the
final stage involving the fast adders.
Carry Select Adder
In electronics, a carry-select adder is a particular way to implement an adder, which is a
reduced logic delay element that computes The carry-select adder generally consists of ripple-
carry adders and a multiplexer. Adding two n-bit numbers with a carry-select adder is done with
two adders (therefore two ripple-carry adders), in order to perform the calculation twice, one
time with the assumption of the carry-in being zero and the other assuming it will be one. After
the two results are calculated, the correct sum, as well as the correct carry-out, is then selected
with the multiplexer once the correct carry-in is known.
Ripple carry adder generates carry out bit by rippling effect of the incoming carry generating
from the previous stage after receiving the carry-in (C in) bit. Hence, the speed of RCA is less, as
Cout of a stage is dependent on Cout of the previous stage or the Cin of the current stage. This
linear dependency problem of Cout on Cin is overcome by assuming the possible values of C in
to be a ‘0’ and a ‘1’. By using these possible values of C in, the partial Sum (S) and Cout are
generated in parallel. Then, a multiplexer is employed to choose the final sum and carry based
on the actual carry input received. Though this feature helps in reducing the computation
delay, the area efficiency suffers due the use of the redundant adder circuit. Thus, Carry Select
Adder consists of two rca blocks.in this project to reduce the optimal delay we proposed square
root carry save adder.
Square Root Carry Select Adder
In Square Root Carry Select Adder (SRCSA), [9]-[10] the block size can be variable. The
complete analysis is omitted here for brevity, but here, e.g., a 16-bit adder can be created using
block sizes of 2-2-3-4-5 instead of using uniform block size of four (as done before) [8]. This
break-up is ideal when the Full-Adder delay is equal to the MUX delay. Fig. 3 shows the block
diagram of proposed SRCSA adder for 16 bits where the inputs are A and B, Carry-in is denoted
as Cin and outputs are denoted by sum (S) and Carry-out (Cout).
CARRY SELECT ADDERS

Fig. 3. 16-bit SRCSA (proposed adder)


Fig. 4.delay comparison of adder design
Fig.5. Layout of Proposed Multiplier (Courtesy Genus)
Fig.6.Physical Verification of proposed layout

Fig.7. Post syntheses Timing (upper-waveform, Courtesy Genus);


Post verification Timing (lower –waveforms, courtesy Innovus)of proposed multiplier.
Vedic algorithms have been used to construct function circuits with high performance and a
simple architecture. However, because many of these methods are built on decimal number
systems, banalization frequently resulted in a trade-off between speed and simplicity in
architecture due to conversion-hardware overhead. However, Vedic algorithms have recently
reawakened interest, owing to innovative circuit realisation — mainly multipliers. The current
project is significant since these authors have updated a well-known algorithm (Karatsuba) to
incorporate an adaptive aspect that allows recursive operations to reduce the order of complexity
from square to logarithmic value of power of bit-length.
To demonstrate the concept, a 1616-bit multiplier has been conceived and designed with the
primary goal of decreasing the delay so that it may be used in DSP, Image Processing, and
computation-intensive ASIPs. It is based on the Vedic Karatsuba algorithm, which produces
fewer partial product terms. To improve speed, the technique is further refined by employing an
adaptive notion for the computation of the third product term. Furthermore, integrating the carry
save adders with the proposed Square Root Carry Select Adder (SRCSA) adder, as presented in
this paper, improves the compression speed of partial product terms.
The proposed multiplier with high-speed compression provides a delay decrease of 18.19 percent
compared to Dadda and 24.08 percent compared to Wallace Tree, according to this
benchmarking exercise with other candidate designs. In addition, the latency is less than when
the suggested design employs CSA (with other parameters being comparable). As a result, we
believe it is an excellent choice for fast multiplier blocks in ASIP and general-purpose
processors.

REVERSIBLE GATES
Reversible logic has received great attention in the recent years due to their ability to
reduce the power dissipation which is the main requirement in low power VLSI design. It has
wide applications in low power CMOS and Optical information processing, DNA computing,
quantum computation and nanotechnology. Irreversible hardware computation results in energy
dissipation due to information loss. According to Landauer’s research, the amount of energy
dissipated for every irreversible bit operation is at least KTln2 joules, where K=1.3806505*10-
23
m2kg-2K-1 (joule/Kelvin-1) is the Boltzmann’s constant and T is the temperature at which
operation is performed . The heat generated due to the loss of one bit of information is very
small at room temperature but when the number of bits is more as in the case of high speed
computational works the heat dissipated by them will be so large that it affects the
performance and results in the reduction of lifetime of the components In 1973, Bennett
showed that KTln2 energy would not dissipate from a system as long as the system allows the
reproduction of the inputs from observed outputs . Reversible logic supports the process of
running the system both forward and backward. This means that reversible computations can
generate inputs from outputs and can stop and go back to any point in the computation history.
A circuit is said to be reversible if the input vector can be uniquely recovered from the output
vector and there is a one-to-one correspondence between its input and output assignments, i.e.
not only the outputs can be uniquely determined from the inputs, but also the inputs can be
recovered from the outputs Energy dissipation can be reduced or even eliminated if
computation becomes Information-lossless

THE CONCEPT

Reversibility in computing implies that no information about the computational states can ever
be lost, so we can recover any earlier stage by computing backwards or un-computing the
results. This is termed as logical reversibility. The benefits of logical reversibility can be
gained only after employing physical reversibility. Physical reversibility is a process that
dissipates no energy to heat. Absolutely perfect physical reversibility is practically
unachievable. Computing systems give off heat when voltage levels change from positive to
negative: bits from zero to one. Most of the energy needed to make that change is given off in
the form of heat. Rather than changing voltages to new levels, reversible circuit elements will
gradually move charge from one node to the next. This way, one can only expect to lose a
minute amount of energy on each transition. Reversible computing strongly affects digital
logic designs. Reversible logic elements are needed to recover the state of inputs from the
outputs. It will impact instruction sets and high-level programming languages as well.
Eventually, these will also have to be reversible to provide optimal efficiency.

MOTIVATION BEHIND REVERSIBLE LOGIC


High-performance chips releasing large amounts of heat impose practical limitation on
how far can we improve the performance of the system. Reversible circuits that conserve
information, by un computing bits instead of throwing them away, will soon offer the only
physically possible way to keep improving performance. Reversible computing will also lead
to improvement in energy efficiency. Energy efficiency will fundamentally affect the speed of
circuits such as Nano circuits and therefore the speed of most computing applications. To
increase the portability of devices again reversible computing is required. It will let circuit
element sizes to reduce to atomic size limits and hence devices will become more portable.
Although the hardware design costs incurred in near future may be high but the power cost and
performance being more dominant than logic hardware cost in today’s computing era, the need
of reversible computing cannot be ignored

REVERSIBLE LOGIC GATES


A reversible logic gate is an n-input n-output logic device with one-to-one mapping.
This helps to determine the outputs from the inputs and also the inputs can be uniquely
recovered from the outputs. Also in the synthesis of reversible circuits direct fan-Out is not
allowed as one–to-many concept is not reversible. However fan-out in reversible circuits is
achieved using additional gates. A reversible circuit should be designed using minimum
number of reversible logic gates. From the point of view of reversible circuit design, there are
many parameters for determining the complexity and performance of circuits.

The number of Reversible gates (N): The number of reversible gates used in circuit.
The number of constant inputs (CI): This refers to the number of inputs that are to be
maintained constant at either 0 or 1 in order to synthesize the given logical function.
The number of garbage outputs (GO): This refers to the number of unused outputs present
in a reversible logic

 circuit. One cannot avoid the garbage outputs as these are very essential to achieve re-
versibility.
Quantum cost (QC): This refers to the cost of the circuit in terms of the cost of a primitive
gate. It is calculated knowing the number of primitive reversible logic gates (1*1 or 2*2)
required to realize the circuit.
BASIC REVERSIBLE LOGIC GATES
Feynman Gate
Feynman gate is a 2*2 one through reversible gate as shown in figure 1. The input vector is
I(A, B) and the output vector is O(P, Q). The outputs are defined by P=A, Q=A xor B.
Quantum cost of a Feynman gate is 1. Feynman Gate (FG) can be used as a copying gate.
Since a fan-out is not allowed in reversible logic, this gate is useful for duplication of the
required outputs.

Figure 1: Feynman Gate

Table 1: Truth table of Feynman gates

Double Feynman Gate (F2G)


Fig.2 shows a 3*3 Double Feynman gate. The input vector is I (A, B, C) and the output
vector is O (P, Q, R). The outputs are defined by P = A, Q=A xor B, R=A xor C. Quantum cost of
double Feynman gate is 2.

Fig 2: Double Feynman gate

Toffoli Gate:

Fig 3 shows a 3*3 Toffoli gate. The input vector is I (A, B, C) and the output vector is
O(P,Q,R). The outputs are defined by P=A, Q=B, R=AB xor C. Quantum cost of a Toffoli gate
is 5.

Fig 3: Toffoli gate


Table 3: Truth table of Toffoli gate

Fredkin Gate

Fig 4 shows a 3*3 Fredkin gate. The input vector is I (A, B, C) and the output vector is
O (P, Q, R). The output is defined by P=A, Q=A′B xor AC and R=A′Cxor AB. Quantum cost of a
Fredkin gate is 5.

Fig 4(a): Fredkin gate

Fig 4(b): fault tolerant Fredkin gate


Table 4: Truth table of fredkin gate

Peres Gate
Fig 5 shows a 3*3 Peres gate. The input vector is I (A, B, C) and the output vector is O
(P, Q, R). The output is defined by P = A, Q = Axor B and R=AB xor C. Quantum cost of a Peres
gate is 4. In the proposed design Peres gate is used because of its lowest quantum cost.

Fig 5: Peres gate

Table 5: Truth table of peres gate


TSG gate
Fig 7 shows a 4*4 TSG gate. The input vector is I (A, B, C, D) and the output vector is O
(P, Q, R, S). The output is defined by P = A, Q = A’C’ xor B’, R = (A’C’ xor B’) xor D and S =
(A’C’ xor B’).D xor (AB xor C) Quantum cost of a Peres gate is 4. In

the proposed design Peres gate is used because of its lowest quantum cost. It can be verified
that the input pattern corresponding to a particular output pattern can be uniquely determined.
The proposed TSG gate is capable of implementing all Boolean functions and can also work
singly as a reversible Full adder

Fig 7: TSG gate

Fig 8: TSG Gate Working As Reversible Full Adder

Sayem gate
SG is a 1 trough 4x4 reversible gate. The input and output vector of this gate are, Iv =
(A, B, C, D) and Ov = (A, A’B xor AC, A’B xor AC xor D, AB xor A’C xor D). The block
diagram of this gate is shown in Fig 9

Fig 9: Sayem gate

APPLICATIONS
Reversible computing may have applications in computer security and transaction
processing, but the main long-term benefit will be felt very well in those areas which require
high energy efficiency, speed and performance .it include the area like

1. Low power CMOS.


2. Quantum computer.
3. Nanotechnology
4. Optical computing
5. Design of low power arithmetic and data path for digital signal processing (DSP).
6. Field Programmable Gate Arrays (FPGAs) in CMOS technology for extremely low
power, high testability and self-repair

We have presented an approach to the realize the multipurpose binary reversible gates. Such
gates can be used in regular circuits realizing Boolean functions. In the same way it is possible
to construct multiple-valued reversible gates having similar properties. The proposed vedic
multiplier with Karatsuba algorithm designs have the applications in digital circuits like DSP
apllications, DIP applications, building reversible ALU, reversible processor etc. This work
forms an important move in building large and complex reversible VLSI Designs.
In carry look ahead adder is used for increment the address .in this adder sub blocks are
and gates ,or gates and xor gates .

Fig: fredkin gate used as and,or,not and buffer also

Fig: basic gates using reversible logic gates


Chapter 5
VERILOG
5.1 Introduction:
 Verilog synthesis tools can create logic-circuit structures directly from verilog behavioral
description and target them to a selected technology for realization (I.e,translate verilog to actual
hardware).
 Using verilog , we can design ,simulate and synthesis anything from a simple combinational
circuit to a complete microprocessor on chip.
 Verilog HDL has evolved as a standard hardware description language. Verilog HDL offers
many useful features for hardware design.
 Verilog HDL is a general-purpose hardware description language that is easy to learn and easy to
use. It is similar in syntax to the C programming language. Designers with C programming
experience will find it easy to learn Verilog HDL.
 Verilog HDL allows different levels of abstraction to be mixed in the same model. Thus, a
designer can define a hardware model in terms of switches, gates, RTL, or behavioral code. Also,
a designer needs to learn only one language for stimulus and hierarchical design.
 Most popular logic synthesis tools support Verilog HDL. This makes it the language of choice
for designers.
 All fabrication vendors provide Verilog HDL libraries for post logic synthesis simulation. Thus,
designing a chip in Verilog HDL allows the widest choice of vendors.
 The Programming Language Interface (PLI) is a powerful feature that allows the user to write
custom C code to interact with the internal data structures of Verilog. Designers can customize a
Verilog HDL simulator to their needs with the PLI.
History Of Verilog HDL
 Verilog was started initially as a proprietary hardware modeling language by Gateway Design
Automation Inc. around 1984. It is rumored that the original language was designed by taking
features from the most popular HDL language of the time, called HiLo, as well as from
traditional computer languages such as C. At that time, Verilog was not standardized and the
language modified itself in almost all the revisions that came out within 1984 to 1990.
 Verilog simulator was first used beginning in 1985 and was extended substantially through 1987.
The implementation was the Verilog simulator sold by Gateway. The first major extension was
Verilog-XL, which added a few features and implemented the infamous "XL algorithm" which
was a very efficient method for doing gate-level simulation.
 The time was late 1990. Cadence Design System, whose primary product at that time included
Thin film process simulator, decided to acquire Gateway Automation System. Along with other
Gateway products, Cadence now became the owner of the Verilog language, and continued to
market Verilog as both a language and a simulator. At the same time, Synopsys was marketing
the top-down design methodology, using Verilog. This was a powerful combination.
 In 1990, Cadence recognized that if Verilog remained a closed language, the pressures of
standardization would eventually cause the industry to shift to VHDL. Consequently, Cadence
organized the Open Verilog International (OVI), and in 1991 gave it the documentation for the
Verilog Hardware Description Language. This was the event which "opened" the language.
 OVI did a considerable amount of work to improve the Language Reference Manual (LRM),
clarifying things and making the language specification as vendor-independent as possible.
 Soon it was realized that if there were too many companies in the market for Verilog, potentially
everybody would like to do what Gateway had done so far - changing the language for their own
benefit. This would defeat the main purpose of releasing the language to public domain. As a
result in 1994, the IEEE 1364 working group was formed to turn the OVI LRM into an IEEE
standard. This effort was concluded with a successful ballot in 1995, and Verilog became an
IEEE standard in December 1995.
 When Cadence gave OVI the LRM, several companies began working on Verilog simulators. In
1992, the first of these were announced, and by 1993 there were several Verilog simulators
available from companies other than Cadence. The most successful of these was VCS, the
Verilog Compiled Simulator, from Chronologic Simulation. This was a true compiler as opposed
to an interpreter, which is what Verilog-XL was. As a result, compile time was substantial, but
simulation execution speed was much faster.
 In the meantime, the popularity of Verilog and PLI was rising exponentially. Verilog as a HDL
found more admirers than well-formed and federally funded VHDL. It was only a matter of time
before people in OVI realized the need of a more universally accepted standard. Accordingly, the
board of directors of OVI requested IEEE to form a working committee for establishing Verilog
as an IEEE standard. The working committee 1364 was formed in mid 1993 and on October 14,
1993, it had its first meeting.
 The standard, which combined both the Verilog language syntax and the PLI in a single volume,
was passed in May 1995 and now known as IEEE Std. 1364-1995.
 After many years, new features have been added to Verilog, and the new version is called
Verilog 2001. This version seems to have fixed a lot of problems that Verilog 1995 had. This
version is called 1364-2001.
5.2 PROGRAM STRUCTURE:
 The basic unit and programming in verilog is ”MODULE”(a text file containing statements and
declarations ).
 A verilog module has declarations that describes the names and types of the module inputs and
outputs as well as local signals, variables, constants and functions that are used internally to the
module ,are not visible outside.
 The rest of the module contains statements that specify the operation of the module output and
internal signals.
 Verilog is a case-sensitive language like C. Thus sense, Sense, SENSE, sENse,…etc., are all
treated as different entities / quantities in Verilog.

SYNTAX:
Module Module_Name( port list);
Port declaration
Function declaration
Endmodule
module ß signifies the beginning of a module definition.
endmodule ßsignifies the end of a module definition.
IDENTIFIERS:
Any program requires blocks of statements, signals, etc., to be identified with an
attached nametag. Such nametags are identifiers.
There are some restrictions in assigning identifier names. All characters of the alphabet or an
underscore can be used as the first character. Subsequent characters can be of alphanumeric type,
or the underscore (_), or the dollar ($) sign .
For example
 name, _name. Name, name1, name_$, . . . all these are allowed asidentifiers
 name aa ßnot allowed as an identifier because of the blank ( “name” and “aa”are interpreted as
two different identifiers)
 $name ß not allowed as an identifier because of the presence of “$” as the firstcharacter.1_name
not allowed as an identifier, since the numeral “1” is the first character @name not allowed as an
identifier because of the presence of the character “@”.
 A+b not allowed as an identifier because of the presence of the character “+”.
An alternative format makes it is possible to use any of the printable ASCII characters
in an identifier. Such identifiers are called “escaped identifiers”; they have to start with the
backslash (\) character. The character set between the first backslash character and the first white
space encountered is treated as an identifier. The backslash itself is not treated as a character of
the identifier concerned.
Examples
\b=c
\control-signal
\&logic
\abc // Here the combination “abc” forms the identifier.
WHITE SPACE CHARACTERS
Blanks (\b), tabs (\t), newlines (\n), and form feed form the white space characters in
Verilog. In any design description the white space characters are included to improve readability.
COMMENTS
It is a healthy practice to comment a design description liberally –A single line
comment begins with “//” and ends with a new line, and for multiple comments starts with “\*”
and ends with”*\”.
PORT DECLERATION:
Verilog module declaration begins with a keyword ”module” and ends
with”endmodule”. The input and output ports are signals by which the module communicates
with each others.
Syntax:
Input identifier………………..identifier;
Output identifier………………..identifier;
Inout identifier………………..identifier;
Input [msb:lsb] identifier………………..identifier;
Output[msb:lsb] identifier………………..identifier;
Inout [msb:lsb] identifier………………..identifier;
LOGIC SYSTEM:
Verilog uses 4 –logic system .a 1 –bit signal can take one of only four possible values.
0 LOGIC 0,OR FALSE
1 LOGICAL 1,OR FALSE
X A UNKNOWN LOGICAL VALUE
Z HIGH IMPEDENCE
5.3 OPERATORS
Arithmetic Operators:
These perform arithmetic operations. The + and - can be used as either unary (-z) or binary (x-y)
operators.
+ (addition)
- (subtraction)
* (multiplication)
/ (division)
% (modulus)
Relational Operators:
Relational operators compare two operands and return a single bit 1or 0. These operators
synthesize into comparators.
< (less than)
<= (less than or equal to)
> (greater than)
>= (greater than or equal to)
== (equal to)
!= (not equal to)
Bit-wise Operators:
Bit-wise operators do a bit-by-bit comparison between two operands. However see “Reduction
Operators” .
~ (bitwise NOT)
& (bitwise AND)
| (bitwiseOR)
^ (bitwise XOR)
~^ or ^~ (bitwise XNOR)
Logical Operators:
Logical operators return a single bit 1 or 0. They are the same as bit-wise operators
only for single bit operands. They can work on expressions, integers or groups of bits, and treat
all values that are nonzero as “1”. Logical operators are typically used in conditional (if ... else)
statements since they work with expressions.

! (logical NOT)
&& (logical AND)
|| (logical OR)
Reduction Operators
Reduction operators operate on all the bits of an operand vector and return a single-bit
value. These are the unary (one argument) form of the bit-wise operators above.
& (reduction AND)
| (reduction OR)
~& (reduction NAND)
~| (reduction NOR)
^ (reduction XOR)
~^ or ^~ (reduction XNOR)
Shift Operators
Shift operators shift the first operand by the number of bits specified by the second operand.
Vacated positions are filled with zeros for both left and right shifts (There is no sign extension).
<< (shift left)
>> (shift right)
Concatenation Operator:
The concatenation operator combines two or more operands to form a larger vector
{} (concatenation)
Replication Operator:
The replication operator makes multiple copies of an item.
{n{item}} (n fold replication of an item)
Literals:
Literals are constant-valued operands that can be used in Verilog expressions. The two
common Verilog literals are:
(a) String: A string literal is a one-dimensional array of characters enclosed in double quotes(““).
(b) Numeric: constant numbers specified in binary, octal, decimal or hexadecimal.
Number Syntax
n’Fddd..., where
n - integer representing number of bits
F - one of four possible base formats:
b (binary), o (octal), d (decimal),h (hexadecimal). Default is d.
Literals written without a size indicator default to 32-bits or the word width used by the
simulator program, this may cause errors, so we should careful with unsized literals.
NET:
Verilog actually has two classes of signals
1. nets.
2. variables.
 Nets represent connections between hardware elements. Just as in real circuits, nets have values
continuously driven on them by the outputs of devices that they are connected to.
 The default net type is wire, any signal name that appears in a module input /output list, but not
in a net declaration is assumed to be type wire.
 Nets are one-bit values by default unless they are declared explicitly as vectors. The terms wire
and net are often used interchangeably.
 Note that net is not a keyword but represents a class of data types such as wire, wand, wor, tri,
triand, trior, trireg, etc. The wire declaration is used most frequently.
 The syntax of verilog net declaration is similar to an input/output declaration.
Syntax:
Wire identifier ,………….. identifier;
Wire [msb:lsb] identifier ,………….. identifier;
tri identifier ,………….. identifier;
tri [msb:lsb] identifier ,………….. identifier;
 The keyword tri has a function identical to that of wire. When a net is driven by more than one
tri-state gate, it is declared as tri rather than as wire. The distinction is for better clarity.
VARIABLE:
 Verilog variables stores the values during the program execution, and they need not have
Physical significance in the circuit.
 They are used in only procedural code (i.e,behavioral design).A variable value can be used ina
expression and can be combined and assign to other variables, as in conventional software
programming language.
 The most commonly used variables are REG and INTEGERS.
Syntax:
Reg identifier ,………….identifier;
Reg [msb:lsb] identifier,………….identifier;
Integer identifier ,………….identifier;
 A register variable is a single bit or vector of bits , the value of 1-bit reg variable is always
0,1,X,Z. the main use of reg variables is to store values in Verilog procedural code.
 An integer variable value is a32-bit or larger integer ,depending on the word length on the word
length used by simulator .An integer variable is typically used to control a repetitive
statements ,such as loop, in Verilog procedural code.
PARAMETER:
Verilog provides a facility for defining named constants within a module ,to improve
readability and maintainability of code. The parameter declaration is
Syntax:
Parameter identifier =value;
Parameter identifier =value,
: :
identifier =value;
 An identifier is assigned to a constant value that will be used in place of the identifier
throughout the current module.
 Multiple constants can be defined in a single parameter declaration using a comma –separated
list of arguments.
 The value in the parameter declaration can be simple constant ,or it can be a constant expression.
 An expression involving multiple operators and constants including other parameters ,that yields
a constant result at compile time. The parameter scope is limited to that module in which it is
defined.
ARRAYS:
Arrays are allowed in Verilog for reg, integer, time, and vector register data types.
Arrays are not allowed for real variables. Arrays are accessed by <array_name> [<subscript>].
Multidimensional arrays are not permitted in Verilog.
Syntax:
Reg identifier [start:end];
Reg [msb:lsb] identifier [start:end];
Integer identifier [start:end] ;
Example: integer count [0: 7] ; I I An array of 8 count variables
5.4 DATAFLOW DESIGN ELEMENTS:
Continuous assignment statement allows to describe a combinational circuit in terms of
the flow of data and operations on the circuit. This style is called “dataflow design or
description”. The basic syntax of a continuous –assignment statement in Verilog is
Syntax:
Assign net-name=expression;
Assign net-name[bit-index]=expression;
Assign net-name[msb:lsb]=expression;
Assign net-concatenation =expression;
 “Assign” is the keyword carrying out the assignment operation. This type of assignment is called
a continuous assignment.
 The keyword “assign ”is followed by the name of a net, then an”=”sign and finally an expression
giving the value to be assigned
 If a module contains two statements “assign X=Y” and “assign Y=~X”, then the simulation will
loop “forever”(until the simulation times out).
For example:
assign c = a && b;
 a and b are operands – typically single-bit logic variables.
 “&&” is a logic operator. It does the bit-wise AND operation on the two
 operands a and b.
 “=” is an assignment activity carried out.
 c is a net representing the signal which is the result of the assignment.
5.5 STURCTURAL DESIGN (OR) GATE LEVEL MODELING
 Structural Design Is the Series of Concurrent Statement .The Most Important Concurrent
Statement In the module covered like instance statements, continuous –assignment statement and
always block. These gives rise to three distinct styles of circuit design and description.
 Statement of these types, and corresponding design styles, can be freely intermixed within a
Verilog module declaration.
 Each concurrent statement in a Verilog module “executes” simultaneously with other statements
in the same module declaration.
 In Verilog module, if the last statement updates a signal that is used by the first statement, then
the simulator goes back to that first statement and updates its result.
 In fact, the simulator will propagate changes and updating results until the simulated circuit
stabilizes.
 In structural design style, the circuit description or design individual gates and other components
are instantiated and connected to each other using nets.
 Verilog has several built in gate types, the names of these gates are reserved words, some of
these are
Syntax of Verilog instance ststements:
Component_name
instance-identifier(expression……………expresssion);
Component_name instance-identifier (.port-name(expression),
: :
.port-name(expression));
Basic gate primitives in Verilog with details:
Table 5.1(a)
5.6 BEHAVIORAL MODELING
Behavioral level modeling constitutes design description at an abstract level. One can
visualize the circuit in terms of its key modular functions and their behavior. The constructs
available in behavioral modeling aim at the system level description. Here direct description of
the design is not a primary consideration in the Verilog standard. Rather, flexibility and
versatility in describing the design are in focus [IEEE].
Verilog provides designers the ability to describe design functionality in an algorithmic
manner. In other words, the designer describes the behavior of the circuit. Thus, behavioral
modeling represents the circuit at a very high level of abstraction. Design at this level resembles
C programming more than it resembles digital circuit design. Behavioral Verilog constructs are
similar to C language constructs in many ways.
Structured Procedures:
There are two structured procedure statements in Verilog: always and initial .These
statements are the two most basic statements in behavioral modeling. All other behavioral
statements can appear only inside these structured procedure statements.
Verilog is a concurrent programming language unlike the C programming language, which is
sequential in nature. Activity flows in Verilog run in parallel rather than in sequence. Each
always and initial statement represents a, separate activity flow in Verilog. Each activity flow
starts at simulation time 0.The statements always and initial cannot be nested. The fundamental
difference between the two statements is explained in the following sections.
Always:
 The key element of Verilog behavioral design is the always block the always block contains one
or more “procedural statements”.
 Another type of procedural statement is a “begin-end” block. But the ALWAYS block is used in
all because of its simplicity, that is why we call it an always block.
 Procedural statement in an always block executes sequentially .The always block executes
concurrently with other concurrent statement in the same module.
Syntax:
1).Always @(signal-name …………signal-name)
Procedural statement
2). Always procedural statements
 In the first form of always block, the @ sign and parenthesized list of signal names called
“sensitivity list “.
 A verilog concurrent statement such as always block is either executing or suspend
 A concurrent statement initially is in suspending state, when any signal value changes its value,
it resumes execution starting its first procedural statement and continuing until the end.
 A properly written concurrent statement will suspend after one or more executions. However it is
possible to write a statement that never suspends (e.g.: assign X=~X), since X changes for every
pass, the statement will execute forever in zero simulation time(which is not useful).
 As shown in the second part of syntax, the sensitivity list in always block is optional .an always
block without a sensitivity list starts running at zero simulation time and keeps looping forever.
 There are different types of procedural statement that can appear with in an always block. They
are blocking-assignment statement, non blocking-assignment statement, begin-end blocks, if,
case, while and repeat.
IF AND IF-ELSE BLOCK:
The IF construct checks a specific condition and decides execution based on the result.
Figure shows the structure of a segment of a module with an IF statement. After execution of
assignment1, the condition specified is checked. If it is satisfied, assignment2 is executed; if not,
it is skipped. In either case the execution continues through assignment3, assignment4, etc.
Execution of assignment2 alone is dependent on the condition. The rest of the sequence remains.
The flowchart equivalent of the execution is shown in Figure.
Syntax:
If (condition)
...
assignment1;
if (condition) assignment2;
assignment3;
assignment4;
...
After the execution of assignment1, if the condition is satisfied, alternative1 is followed and
assignment2 and assignment3 are executed. Assignment4 and assignment 5 are skipped and
execution proceeds with assignment6.
 If the condition is not satisfied, assignment2 and assignment3 are skipped and assignment4 and
assignment5 are executed. Then execution continues with assignment6.
For Loops
Similar to for loops in C/C++, they are used to repeatedly execute a statement or block of
statements. If the loop contains only one statement, the begin ... end statements may be omitted.
Syntax:
for (count = value1;
count </<=/>/>= value2;
count = count +/- step)
begin
... statements ...
End
While Loops:
The while loop repeatedly executes a statement or block of statements until the expression
in the while statement evaluates to false. To avoid combinational feedback during synthesis, a
while loop must be broken with an @(posedge/negedge clock) statement . For simulation a delay
inside the loop will suffice. If the loop contains only one statement, the begin ... end statements
may be omitted.
Syntax:
while (expression)
begin
... statements ...
End
CASE:
The case statement allows a multipath branch based on comparing the expression with a
list of case choices. Statements in the default block executes when none of the case choice
comparisons are true. With no default, if no comparisons are true, synthesizers will generate
unwanted latches. Good practice says to make a habit of puting in a default whether you need it
or not.
If the defaults are dont cares, define them as ‘x’ and the logic minimizer will treat them
as don’t cares and dsave area. Case choices may be a simple constant, expression, or a comma-
separated list of same.
Syntax
case (expression)
case_choice1:
begin
... statements ...
end
case_choice2:
begin
... statements ...
end
... more case choices blocks ...
default:
begin
... statements ...
end
endcase
casex:
In casex(a) the case choices constant “a” may contain z, x or ? which are used as don’t
cares for comparison. With case the corresponding simulation variable would have to match a
tri-state, unknown, or either signal. In short, case uses x to compare with an unknown signal.
Casex uses x as a don’t care which can be used to minimize logic.
Casez:
Casez is the same as casex except only ? and z (not x) are used in the case choice constants
as don’t cares. Casez is favored over casex since in simulation, an inadvertent x signal, will not
be matched by a 0 or 1 in the case choice.
FOREVER LOOPS
The forever statement executes an infinite loop of a statement or block of statements. To
avoid combinational feedback during synthesis, a forever loop must be broken with
an@(posedge/negedge clock) statement. For simulation a delay inside the loop will suffice. If the
loop contains only one statement, the begin ... end statements may be omitted.
Syntax
forever
begin
... statements ...
End
sExample
forever begin
@(posedge clk); // or use a= #9 a+1;
a = a + 1;
end
REPEAT:
The repeat statement executes a statement or blocks of statements a fixed number of times.
repeat CONSTRUCT The repeat construct is used to repeat a specified block a specified number
of times. The quantity a can be a number or an expression evaluated to a number. As soon as the
repeat statement is encountered, a is evaluated. The following block is executed “a” times. If “a”
evaluates to 0 or x or z, the block is not executed.
Syntax:
repeat (number_of_times)
begin
... statements ...
End
Chapter 6
SOFTWARE USED
Xilinx
Xilinx software is used by the VHDL/VERILOG designers for performing Synthesis operation.
Any simulated code can be synthesized and configured on FPGA. Synthesis is the transformation
of HDL code into gate level net list. It is an integral part of current design flows.
6.1Algorithm
Start the ISE Software by clicking the XILINX ISE icon.Create a New Project and find the
following properties displayed.If the design needs large number of LUTs there is a possibility to
change the family ,device and package changes.
Fig 6.1.Create new folder for design

Fig 6.2. Set family and device before design a project

Fig 6.3. finishing new folder ,Set family and device before design a project
Fig6.4. Ready to design a project
Create a HDL Source formatting all inputs, outputs and buffers if required. which provides a
window to write the HDL code, to be synthesized.

Fig6.5. create module name(.v files names) for design


Fig 6.6.Declaration of input and output ports with their bit lengths

Fig 6.7. The schematic was created by its ports

Fig 6.8: Ready to write the code for design


Fig 6.9: Check syntax for the Design

Fig 6.10: check for RTL schematic view


For RTL (register transfer logic )view ,which is also known as designer view because it is look
like what the designer thinks in his mind.
Fig 6.11.RTL schematic view of design (and gate)

Fig 6.12:Internal structure of RTL schematic view of design(andgate)

Fig 6.13. Check for view technology schematic view of the project
Fig6.14: Internal structure of view technology schematic view
View technology schematic of design (and gate) ,here LUTs are displayed ,luts are considered as
area of the design.

Fig 6.15 : The truth table ,schematic of design ,Boolean equation and k-map of design.
In Xilinx tool there is a availability to get truth table ,schematic of design ,Boolean equation and
k-map
Fig 6.16: Simulation of design to verifying the logics of design.

Fig 6.17: Apply inputs through force constant or force clock for input signals

Fig6.18 : Apply force to value


Fig 6.19: Run the design after applying inputs

Fig 6.20: Show all values(zoom to full view) for the design
Chapter7
RESULTS

RTL SCHEMATIC:- The RTL schematic is abbreviated as the register transfer level it denotes
the blue print of the architecture and is used to verify the designed architecture to the ideal
architecture that we are in need of development .The hdl language is used to convert the
description or summery of the architecture to the working summery by use of the coding
language i.e verilog ,vhdl. The RTL schematic even specifies the internal connection blocks for
better analyzing .The figure represented below shows the RTL schematic diagram of the
designed architecture.

Fig1: RTL Schematic of existed vedic multiplier


Fig2: RTL Schematic of Proposed vedic multiplier

7.2TECHNOLOGY SCHEMATIC:- The technology schematic makes the representation of the


architecture in the LUT format ,where the LUT is consider as the parameter of area that is used
in VLSI to estimate the architecture design .the LUT is consider as an square unit the memory
allocation of the code is represented in there LUT s in FPGA.

Fig :View Technology Schematic of existed vedic multiplier


Fig :View Technology Schematic of proposed vedic multiplier

7.3SIMULATION:-
The simulation is the process which is termed as the final verification in respect to its working
where as the schematic is the verification of the connections and blocks. The simulation window
is launched as shifting from implantation to the simulation on the home screen of the tool ,and
the simulation window confines the output in the form of the wave forms. Here it has the
flexibility of providing the different radix number systems.
Fig :Simulated Waveforms of existed vedic multiplier

Fig :Simulated Waveforms of proposed vedic multiplier

Fig :Simulated Waveforms of proposed vedic multiplier


The simulation is the process which is termed as the final verification in respect to its working
where as the schematic is the verification of the connections and blocks. The simulation window
is launched as shifting from implantation to the simulation on the home screen of the tool ,and
the simulation window confines the output in the form of the wave forms. Here it has the
flexibility of providing the different radix number systems.
7.4PARAMETERS:-
Consider in VLSI the parameters treated are area ,delay and power ,based on these
parameters one can judge the one architecture to other. here the consideration of delay is
considered the parameter is obtained by using the tool XILINX 14.7 and the HDL language is
verilog language.

Fig :device used for synthesis


Table :device utilization summery of existed design

Table :device utilization summery of proposed design

Table : power analysis


PARAMETERS:-Consider in VLSI the parameters treated are area, delay and power, based on
these parameters one can judge the one architecture to other. Here the area and power are
considered the parameter is obtained by using the tool XILINX 14.7 and the HDL language is
Verilog language.

Parameter Karatsuba Karatsuba Multiplier using Reversible


Multiplier logic
Power (mWatt) 2.819 1.661
Number of 4 input 397 234
LUT’s
Table : parameter comparison observed in xilinx
Fig : Power comparison bargraph

Fig : LUT comparison bargraph

7.5.ADVANTAGES
The karastubha vedic multiplier is the fastest and novel algorithm implanted by ancestors easy
to calculate than traditional and area, power are also less compared to conventional multiplier.
7.6.APPLICATIONS
1.Signal processing.
2.Image processing.
3.Amplifiers etc
4. cryptography
Chapter 8
Conclusion And Future Scope

Vedic algorithms have been useful in designing function circuits for achieving high speed and
simplified architecture. However, the challenge had been that all these algorithms are based on
decimal number systems and as such binarization often led to trade-off of the speed-advantage
and simplified- architecture due to conversion-hardware-overhead. Yet renewed interest has been
observed recently in Vedic algorithms particularly due to clever circuit realization – primarily
multipliers. The present endeavor assumes importance in that context where a known algorithm
(Karatsuba) has been modified by these authors to include adaptive aspect allowing recursive
operation to reduce the order of complexity from square to logarithmic value of power of bit-
length.
A 16×16-bit multiplier using reversible logic has been proposed and designed to showcase the
technique with the primary objective of minimizing the delay so that it can find application in
DSP, Image Processing and computation intensive ASIPs. It is based on the Vedic Karatsuba
algorithm that generates lesser number of partial product terms. The algorithm is further
optimized using adaptive concept for computation of the third product term to yield faster speed.
Moreover, the compression speed of the partial product terms is also enhanced by combining the
carry save adders with the proposed Square Root Carry Select Adder (SRCSA) adder with
reversible logic as discussed in this
The implementation, synthesis and simulation are performed in XILINX-ISE tool in verilog
HDL language.In future the implementation of this multiplier employed which eliminates gate
delays and adding the approximation to the architecture can enhance the performance in dsp
applications, image processing ,filters and cryptographic applications. Area and speed based
applications ,It will be used in future.

REFERENCES
[1] Swami Bharati Krishna Tirthaji Maharaja, “Vedic Mathematics”, MotilalBanarsidass
Publishers, 1965.
[2] Rakshith T R and RakshithSaligram, “Design of High Speed Low Power Multiplier using
Reversible logic: a Vedic Mathematical Approach”,International Conference on Circuits, Power
and Computing Technologies (ICCPCT-2013), ISBN: 978-1-4673-4922-2/13, pp.775-781.
[3] M.E. Paramasivam and Dr. R.S. Sabeenian, “An Efficient Bit Reduction Binary
Multiplication Algorithm using Vedic Methods”, IEEE 2nd International Advance Computing
Conference, 2010, ISBN: 978-1-4244-4791-6/10, pp. 25-28.
[4] Sushma R. Huddar, Sudhir Rao Rupanagudi, Kalpana M and Surabhi Mohan, “Novel High
Speed Vedic Mathematics Multiplier using Compressors”, International Multi conference on
Automation, Computing, Communication, Control and Compressed Sensing(iMac4s), 22-23
March 2013, Kottayam, ISBN: 978-1-4673-5090-7/13, pp.465-469.
[5] L. Sriraman and T. N. Prabakar, “Design and Implementation of Two Variables Multiplier
Using KCM and Vedic Mathematics”, 1st International Conference on Recent Advances in
Information Technology (RAIT -2012), ISBN: 978-1-4577-0697-4/12.
[6] Prabir Saha, Arindam Banerjee, Partha Bhattacharyya and Anup Dandapat, “High Speed
ASIC Design of Complex Multiplier Using Vedic Mathematics”, Proceeding of the 2011 IEEE
Students' Technology Symposium 14-16 January,2011, IIT Kharagpur, ISBN: 978-1-4244-8943-
5/11, pp.237-241.
[7] Soma BhanuTej, “Vedic Algorithms to develop green chips for future”, International Journal
of Systems, Algorithms & Applications, Volume 2, Issue ICAEM12, February 2012, ISSN
Online: 2277-2677.
[8] Gaurav Sharma, Arjun Singh Chauhan, Himanshu Joshi and Satish Kumar Alaria, “Delay
Comparison of 4 by 4 Vedic Multiplier based on Different Adder Architectures using VHDL”,
International Journal of IT, Engineering and Applied Sciences Research (IJIEASR), ISSN: 2319-
4413, Volume 2, No. 6, June 2013, pp. 28-32.
[9] Aniruddha Kanhe, Shishir Kumar Dasand Ankit Kumar Singh, “Design and Implementation
of Low Power Multiplier using Vedic Multiplication Technique”, International Journal of
Computer Science and Communication, Vol. 3, No. 1, June 2012, pp. 131-132.
[10] Anju and V.K. Agrawal, “FPGA Implementation of Low Power and High Speed Vedic
Multiplier using Vedic Mathematics”, IOSR Journal of VLSI and Signal Processing (IOSR-
JVSP) Volume 2, Issue 5 Jun. 2013, ISSN: 2319 – 4200, pp. 51-57.
[11] Animul islam, M.W. Akram, S.D. pable ,Mohd. Hasan, “Design and Analysis of Robust
Dual Threshold CMOS Full Adder Circuit in 32 nm Technology”, International Conference on
Advances in Recent Technologies in Communication and Computing,2010.
[12] Deepa Sinha, Tripti Sharma, k.G.Sharma, Prof.B.P.Singh, “Design and Analysis of low
Power 1-bit Full Adder Cell”,IEEE, 2011.
[13] Nabihah Ahmad, Rezaul Hasan, “A new Design of XOR-XNOR gates for Low Power
application”, International Conference on Electronic Devices,Systems and
Applications(ICEDSA) ,2011.
[14] R.Uma, “4-Bit Fast Adder Design: Topology and Layout with Self-Resetting Logic for Low
Power1VLSI Circuits”, International Journal of Advanced Engineering Sciences and
Technology, Vol No. 7,1Issue No. 2, 197 – 205.
[15] David J. Willingham and izzet Kale, “A Ternary Adiabatic Logic (TAL) Implementation of
a Four-Trit Full-Adder,IEEE, 2011.
[16] Padma Devi, Ashima Girdher and Balwinder Singh, “Improved Carry Select Adder with
Reduced1Area and Low Power Consumption”, International Journal of Computer
Application,Vol 3.No.4, June1 2010 .
[17] B.Ramkumar, Harish M Kittur, P.Mahesh Kannan, “ASIC Implementation of Modified
Faster Carry Save Adder”, European Journal of Scientific Research ISSN 1450-216X Vol.42
No.1, pp.53-58,2010.
[18] Y. Sunil Gavaskar Reddy and V.V.G.S.Rajendra Prasad, “Power Comparison of CMOS and
Adiabatic Full Adder Circuits”, International Journal of VLSI design & Communication Systems
(VLSICS) Vol.2, No.3, September 2011
[19] Mariano Aguirre-Hernandez and Monico Linares-Aranda, “CMOS Full-Adders for Energy-
Efficient Arithmetic Applications”, IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, Vol.19, No. 4, April 2011.
[20] Ning Zhu, Wang Ling Goh, Weija Zhang, Kiat Seng Yeo, and Zhi Hui Kong, “Design of
Low-Power High-Speed Truncation-Error-Tolerant Adder and Its Application in Digital Signal
Processing”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 18, No.
8, August 2010.
[21] Sreehari Veeramachaneni, M.B. Srinivas, “New Improved 1-Bit Full Adder Cells”, IEEE,
2008.International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.1,
February 2012164
[22] Tripti Sharma, k.G.Sharma, Prof.B.P.Singh, Neha Arora, “High Speed, Low Power 8T Full
Adder Cell with 45% Improvement in Threshold Loss Problem”, Recent Advances in
Networking, VLSI and Signal Processing.
[23] G.Shyam Kishore, “A Novel Full Adder with High Speed Low Area”, 2nd National
Conference on Information and Communication Technology (NCICT) 2011 Proceedings
published in International Journal of Computer Applications® (IJCA).
[24] Shubhajit Roy Chowdhury, Aritra Banerjee, Aniruddha Roy, Hiranmay Saha, “A high
Speed 8 Transistor Full Adder Design using Novel 3 Transistor XOR Gates”, International
Journal of Electrical and Computer Engineering 3:12 2008.
[25] Romana Yousuf and Najeeb-ud-din, “Synthesis of Carry Select Adder in 65 nm FPGA”,
IEEE.
[26] Shubin.V.V, “Analysis and Comparison of Ripple Carry Full Adders by Speed”,
Micro/Nano Technologies and Electron Devices(EDM),2010, International Conference and
Seminar on, pp.132-135,2010.
[27] Pudi. V, Sridhara., K, “Low Complexity Design of Ripple Carry and Brent Kung Addersin
QCA”,Nanotechnology,IEEE transactions on,Vol.11,Issue.1,pp.105-119,2012.
[28] Jian-Fei Jiang; Zhi-Gang Mao; Wei-Feng He; Qin Wang, “A New Full Adder Design for
Tree Structured Aritmetic Circuits”, Computer Engineering and Technology(ICCET),2010,2 nd
International Conference on,Vol.4,pp.V4-246-V4- 249,2010.

You might also like