VLSI System Design
VLSI System Design
Overview
Microelectronic history
the complexity of microelectronics
design steps
Goal: You are familiar with the microelectronics history,
have an idea about the microelectronics complexity and
you have an overview of the VLSI design steps.
MicroLab, VLSI-1 (1/28)
JMM v1.4
What’s expected of you
Class/Homework Readings from a Starter Guide to
VHDL and some articles. Some
50% in class
50% homework problems to be worked at home. Self-
Self-
study of the VHDL language with help
of the CBT CD from Doulouse.
Doulouse.
Project
Some design exercises to be done in
40% of final grade the lab. Specify, design and simulate
a small VHDL design project using a
data-
data-path / finit state machine.
Place & route it on a FPGA target
technology (due date: July 19th at
13h00, 2002)
Test
One 70 minute in-
in-class test. Meant
60% of final grade
to be duck soup if you’ve been
coming to lectures and doing the lab
and homework (date: Friday July 12th,
2002).
JMM v1.4
Timetable 4th Semester:
Introduction to VLSI System Design
Date Topic Self-
Self-Study
11-
11-15.3. vlsi1: history & complexity A VLSI tutorial
18-
18-22.3. vlsi8: micro technologies How a silicon int.
25-
25-29.3. --
11-
11-19.4. vlsi8: micro technologies article Hoff
22-
22-26.4. vlsi21: top-
top-down design, VHDL VHDL/CBT
29.4-
29.4-3.5. Ex400, 401 VHDL/CBT
6-10.5. -- VHDL/CBT
13-
13-17.5. vlsi21 & Ex402 VHDL
20-
20-24.5. vlsi21 & Ex404,405 VHDL
27-
27-31.5. vlsi21 & Ex406-
Ex406-408 VHDL
3-7.6. vlsi21 & Ex409 chapter 5
10-
10-14.6. vlsi21: & Ex410 VHDL finish
17-
17-21.6. Ex450 project
24-
24-28.6 Ex451 project
1-5.7. Ex452 project
8-12.6. Test project
15-
15-19. 6 test discussion and outlook project
19.6. at 13h00 project due
MicroLab, VLSI-1 (3/28)
JMM v1.4
So, what’s VLSI Systems Design
all about?
You’ll get a bottom-
bottom-up tour of how integrated
circuits are engineered. We’ll talk about
field-
field-effect transistors: how they work, how they’re
built, effects of new technologies
various design and layout techniques, from the
ordinary to the bizarre, for creating combinational
and sequential circuits, datapaths,
datapaths, memories,
buffers, regular logic structures, …
how you tackle the problem of designing circuits
with 1,000,000 gates -- you’re not in Digital
Technique anymore!
JMM v1.4
Key Technology Microelectronics
JMM v1.4
What is a VLSI Circuit?
JMM v1.4
Course Outline/Brief history
JMM v1.4
Early integration
Jack Kilby,
Kilby, working at Texas Instruments, first dreamed up the idea
of a monolithic “integrated circuit” in July 1959. By the end ooff the
year, he had constructed several examples, including the flip-
flip-flop
shown in the patent drawing above. Components are connected by
hand-
hand-soldered wires and isolated by “shaping” and pn diodes used as
resistors.
JMM v1.4
“ “
JMM v1.4
Practice makes perfect...
1.5 mm
1961: TI and Fairchild introduced
the first logic IC’s (cost ~$50 in
quantity!). This is a dual flip-
flip-flop with 4
transistors.
0.97 mm
3.81 mm
JMM v1.4
The Big Bang
2.87 mm
JMM v1.4
Exponential Growth
JMM v1.4
Today
AVP-
AVP-III Video Codec from Lucent Technologies
JMM v1.4
“Computer-
“Computer-
Aided CAD Tools #1
Design”
JMM v1.4
CAD Tools #2
Problem:
designing highly complex VLSI circuits
(100K to xM fets)
fets)
classical, iterative procedures are unsuitable
precise transistor models are necessary for
reliable predictions Æ data inflation
Solution:
new design methodologies
powerful design tools
high level design languages
silicon compiler would be useful
JMM v1.4
VLSI Design Challenge
Goal:
designing circuits with increasing complexity in
always shorter times
JMM v1.4
Chip Complexity #1
JMM v1.4
Chip Complexity #2
JMM v1.4
Architecture
(Multiple choice)
This is a picture of
ANSWER: _________
MicroLab, VLSI-1 (19/28)
JMM v1.4
Circuit Design & Layout
Standard cell Full custom
RAM Generator
JMM v1.4
VLSI: The Ideal Implementation
Medium?
VLSI
gives the designer control over almost everything:
architecture, logic design, speed, area, power, …
densities are increasing, costs decreasing with each
passing year
is used by almost everyone: “No one gets fired for
building an ASIC”
was the enabling technology for much of the
economic growth of the 80’s and 90’s. It will no
doubt continue in its starring role for some time
come.
Is life really a bowl of cherries?
JMM v1.4
VLSI Fact
Fact--of-
of-Life #1:
“So much to do, so little time”
You need a design methodology :
low-
low-level building blocks,
high-
high-level architecture
layout, verification
JMM v1.4
VLSI Fact-
Fact-of-
of-Life #2:
“You can’t reach in and fix it”
verification”” kept appearing in
Notice that the word “verification
the previous slide.
Mistakes can be costly:
find bug(s) ? ?
reverify 1 week Ecu 10k
new masks 3 days Ecu 25k
fab run 12 weeks Ecu 1k/wafer
slip ship date Ecu Ecu Ecu
JMM v1.4
VLSI Fact-
Fact-of-
of-Life #3:
“Verification is a tedious task”
JMM v1.4
VLSI Fact
Fact--of-
of-Life #4:
“You can’t find all the bugs”
The key word here is “find”:
one can’t explore the behaviour of the circuit under all
possible conditions
some of the bugs arise from unanticipated interactions
which, by definition, one never thinks of testing
it’s not clear when one is “done” looking for bugs!
Time pressures mean that most searches stop too soon.
JMM v1.4
VLSI Fact-
Fact-of-
of-Life #5:
“Nobody’s perfect”
JMM v1.4
Microelectronics in 4th Semester
EXPERIENCE VHDL
exercises with data path / fsm
CAD tools project
synthesis
design flow
Course material
Textbook from Weste & Eshraghian for
4th and 5th semester (voluntary)
Copy of transparencies (placeholder for private notes)
Next topic…
Microelectronic technologies like standard cell,
gate array, sea-
sea-of-
of-gates, macro cell, FPGA, tiny
micro-
micro-controllers.
JMM v1.4
VLSI Design I
The MOSFET model Wow !
Are device models as
nice as Cindy ?
Overview
The large signal MOSFET model and second order
effects. MOSFET capacitances.
Introduction in fet process technology
JMM v1.4
Let’s build a MOSFET
There are lots of different recipes to choose from.
Like most things in life, you get what you pay for:
the ability to have good bipolar devices, radiation
hardness, reduced latch-
latch-up and substrate noise, …
are all extra cost options. We’ll consider a general
process: bulk CMOS with a p- p-type substrate:
p-type
Back is metal
metalliz
lized to provide
a good ground connection.
Good for n-
n-channel fets,
fets, but p-
p-channel
fets will need a n-
n-type “well” (or tub) to
live in!
MicroLab, VLSI-2 (2/24)
JMM v1.4
Next, a “thick” (0.4um) layer of silicon dioxide, called
field oxide, is formed on the surface by oxidation in wet
oxygen. This is then etched to expose surface where we
want to make a mosfet:
mosfet:
JMM v1.4
On top of the thin oxide a 0.7um thick layer of
polycrystalline silicon, called polysilicon or poly for
short, is deposited by CVD. The poly layer is patterned
and plasma etched (thin ox not covered by poly is etched
away too!) exposing the surface where the source and
drain junctions will be formed:
JMM v1.4
The entire surface is doped, either by diffusion or ion
implantation, with phosphorus (an electron donor) which
creates two n-
n-type regions in the substrate. The
phosphorus also penetrates the poly reducing its resistance
and affecting the nfet’s threshold.
n+ n+
n+ wires: 20-
20-30 ohms/sq. p
???
JMM v1.4
NFET Operation
Picture shows configuration when Vgs < Vto
S G D
Ids = 0
n+ n+
S D
JMM v1.4
FET = field effect transistor
The four terminals of a fet (gate, source, drain and bulk)
connect to conducting surfaces that generate a complicated
set of electric fields in the channel region which depend on
the relative voltages of each terminal.
Picture shows configuration
when Vgb > Vto gate
inversion
happens here
Eh Ev
source drain
bulk
INVERSION: CONDUCTION:
A sufficiently str
strong
ong vertical If a channel exists, a
field will attract enough horizontal field will cause
electrons to the surface to a drift current from the
create a conducting n-
n-type channel drain to the source.
between the source and drain. Expect Ids proportional
to Vds*(W/L)?
Vds*(W/L)?
JMM v1.4
Threshold voltage
The gate voltage required to form the channel is called the threshold
voltage. Many factors affect the gate-
gate-source voltage at which the
channel becomes conductive. Threshold voltage for source-
source-bulk voltage
zero:
VTO = Vt − ms + Vfb
Q Q ε ox
VTO = 2φ F + b + φ ms − fc
, C ox C ox t ox
kT N DN A
n-channel 2 kT ln N A
0.61V for n- ln 2
p-channel q n i
-0.61V for p- q ni
2 ε si q N A 2φ F
JMM v1.4
Body effect (second order)
As Vsb increases, the depth of the depletion region
increases, exposing more of the fixed acceptor (i.e.
negative) ions in the substrate.
Thus the second term in the threshold voltage equation on
the previous slide increases from
2ε si qN A 2 ΦF
to
2ε si qN A (Vsb + 2 ΦF )
JMM v1.4
Basic DC equations
JMM v1.4
“Linear” operating region
Vs Vgs > Vt 0 < Vds < Vdsat
Ids
W µ ε ox Vds2
I ds =
L t ox
(
Vgs − Vt Vds −
2
)
max value at Vds = Vdsat,
but then channel is only linear when Vds is small,
pinched off (see next slide) otherwise parabolic
MicroLab, VLSI-2 (11/24)
JMM v1.4
Saturated operating region
Vs Vgs > Vt Vdsat < Vds
Ids
W µ ε ox
( )
2
I ds (sat ) = Vgs − Vt
2 L t ox
JMM v1.4
Channel--length modulation
Channel
(second order)
Vs Vgs > Vt Vdsat < Vds
Ids
L’ = L - dL
dL
JMM v1.4
NFET Ids curves
“Put it together and what have you got?”
JMM v1.4
SPICE Models
There are different models used in circuit simulators:
level 1 is our simple model including the most
important second order effects described
level 2 model is based on device physics
level 3 is a semi-
semi-empirical model allowing to match
equations to the real circuit
circuit:: example BSIM model
from Berkeley models subthreshold characteristics
.
M1 4 3 5 0 nfet W=1u L=0.5u AS=1p AD=1p PS=3u PD=3u
.
.
.MODEL nfet NMOS
+TOX=1E-
+TOX=1E-8
+CGB0=345p CGS0=138p CGD0=138p
+CJ=775u CJSW=344p MJ=0.35 MJSW=0.26 PB=0.75
+. . . .
.
.
MicroLab, VLSI-2 (15/24)
JMM v1.4
MOSFET Capacitance Estimation
the dynamic response of MOS systems strongly
depends on the parasitic capacitances associated with
the MOS device. The total load capacitance on the
output of a CMOS gate is the sum of:
gate capacitance (of other inputs connected to out)
diffusion capacitance (of drain/source regions)
routing capacitances (output to other inputs)
Cgd drain
Cdb
gate substrate
Cgs Csb
source
gate
Cgb
JMM v1.4
MOSFET gate capacitances
Cg = Cgd + Cgs + Cgb
Oxide-
Oxide-related capacitances come in two forms:
channel-
channel-charge related capacitances (intrinsic):
cut-
cut-off: Cgb = Cox W L
Cgs = Cgd = 0
shielded by channel
linear: Cgb = 0
Cgs = Cgd = 0.5 Cox W L
equally shared between S and D
note capacitive coupling of gate and drain/source
saturation: Cgb = 0 channel pinched off
Cgd = 0 channel shortened
Cgs = 0.67 Cox W L
JMM v1.4
MOSFET diffusion capacitances
Junction capacitances Cdb and Csb are a function of the
applied terminal voltages and diffusion dimensions:
source/drain diffusion
xj
channel
JMM v1.4
P-channel MOSFETs
S G D
p+ p+
n
p
threshold voltage is PFET is built inside its
negative since we need B own “substrate”: a n-
n-type
attract holes to form well or tub diffused into
inversion layer p-type bulk substrate.
Don’t forget well contacts!
Other symbols:
G Terminal with lower
voltage is labelled D,
the other is labelled S
S D
JMM v1.4
Depletion--mode MOSFETs
Depletion
S G D
n+ n+
JMM v1.4
Coming Up...
Next topic…
Static characteristics of MOS inverters: input
and output voltages, noise margins, power
dissipation.
CBT:
Study the chip fabrication text of the university of
Manchester at the MicroLab VLSI course web link.
JMM v1.4
Useful Constants
JMM v1.4
Alcatel 0,5um Process Parameters
sym param nmos pmos units description
Vt0 VTO 0.61 -0.61 V threshold voltage
tox TOX 1E-1E-8 1E-
1E-8 m thin oxide thickness
NA NSUB 4E16 4E16 cm-3 substrate doping density
µ U0 290 72 cm2/Vs charge mobility
k KP A/V2 fet gain factor
γ GAMMA V0.5 bulk threshold param.
param.
Cox COX F/m2 oxide capacitance
capacitance
λ α/L V- 1 channel length
α modulat.
modulat.1e-
1e-8 2e-
2e-8 V-1m-1 channel length mod fact.
φ0 PB 0.7556 0.78469 V built in junction potent.
2φF PHI 0.77 0.77 V surface inversion pot.
Cgb0 CGB0 3.45E-
3.45E-10 dito F/m overlapping cap per 2L
Cgs0 CGS0 1.38E-
1.38E-10 dito F/m overlapping cap per W
Cgd0 CGD0 1.38E-
1.38E-10 dito F/m overlapping cap per W
Cj CJ 7.75E-
7.75E-4 8.15E-
8.15E-4 F/m2 zero-
zero-bias cap / unit A
Cjsw CJSW 3.44E-
3.44E-10 3.54E
3.54E--10 F/m zero-
zero-bias cap per unit P
Mj MJ 0.35 0.36 grading coeff for bottom
Mjsw MJSW 0.26 0.27 grading coeff sidewall
MicroLab, VLSI-2 (23/24)
JMM v1.4
VLSI--2
Exercises: VLSI
Ex vlsi2.1 (difficulty: easy): Calculate the missing
parameters on the previous transparency like intrinsic
transconductance k, bulk threshold parameter γ and
0.5µm process)
oxide capacitance Cox of an nfet (Alatel 0.5µ process)
=100µA/V2, kp=24.9µ
Result: kn=100µ =24.9µA/V2, γ=0.334V0.5,
Cox=3.45E-
=3.45E-7 F/cm2 (see Weste pp48ff)
Ex vlsi2.2 (difficulty: easy): Calculate the threshold
voltage shift due to the body effect of an nfet at Vsb =
2.2V ((Alcatel 0.5µm process)
Alcatel 0.5µ
Result: dVtn = 0.282V (see Weste pp55)
Ex vlsi2.3 (difficulty: easy): Calculate the
transconductance βn of an nfet (Alatel 0.5µ
0.5µm process),
W=1 µm, L= 0.5 µm
Result: βn=200 µΑ/ µΑ/V2 (see Weste pp53)
Ex vlsi2.4 (difficulty: easy): Calculate the capacitances of
an nfet with Vsb=
Vsb=Vdb=3V, W=1µm, L=0.5µ
Vdb=3V, W=1µ L=0.5µm,
A=1µm2, P=3µ
A=1µ P=3µm (Alatel 0.5µm process)
(Alatel 0.5µ
Result: Cgate=2.35fF, Cdrain=Csource=1.2fF (see Weste
pp183-
pp183-191)
JMM v1.4
VLSI Design I
Static characteristics of MOS inverter
Static characteristics?
Does that mean it’s not
going to move?
Overview
Static transfer characteristic of CMOS gates
JMM v1.4
NFET Review
D D +
G G Vds >= 0
+
S - S -
Vgs
cut-
cut-off: Vgs < Vt S D
linear:V
linear: Vgs >= Vt
Vds < Vdsat S D
Vgs - Vt
saturation: Vgs >= Vt
Vds >= Vdsat S D
Ids
Vgs
Vds
MicroLab, VLSI-3 (2/14)
JMM v1.4
PFET Review
D D -
G G Vds <= 0
+
S - S +
Vgs
cut-
cut-off: Vgs > Vt S D
linear:V
linear: Vgs <= Vt
Vds > Vdsat S D
Vgs - Vt
saturation: Vgs <= Vt
Vds <= Vdsat S D
-Vds
-Vgs
-Ids
MicroLab, VLSI-3 (3/14)
JMM v1.4
“Bipolar” Logic
Isn’t this a
CMOS course?
Vin Vout
JMM v1.4
Characterizing Inverters
What goals do we want to achieve with our inverter
implementation (and, more generally, other functions)?
fast propagation delay (next lecture!)
low power dissipation
compact layout
noise immunity
Vout
Draw voltage-
voltage-transfer
VOH curve (VTC) for inverter.
Shade-
Shade-in areas that
VTC can’t enter.
What can we say about
gain?
VOL What is “ideal” inv. VTC?
Vin
VIL VIH
JMM v1.4
Noise Margin Are there other ways
of signalling?
output input
characteristics characteristics
Vdd
Logical High
Output Range VOHmin Logical High
Input Range
VIHmin
VILmax
Logical Low
Logical Low VOLmax Input Range
Output Range
Vss
MicroLab, VLSI-3 (6/14)
JMM v1.4
Choosing signal voltages
This is a subject on which reasonable people
can disagree! One possible line of attack:
merged VTC for all
process corners &
Vout devices
Step 1: pick VIL and VIH
don’t want to amplify noise
so find values of Vin where
VTC gain = 1 or -1. Choose
smallest VIL and largest VIH
VIL VIH
Vout
Step 2: pick VOL and VOH
choose values so that VOH
(1) VTC is in legal territory
(2) leave desired noise
margins VOL
VIL VIH
NML NMH
JMM v1.4
Inverter pulldown devices
The NFET makes an ideal pulldown device:
Ipd
Vin
VIL
Vt0
always > Vt0
MicroLab, VLSI-3 (8/14)
JMM v1.4
Inverter pullup devices
Resistor. No load on input, VOH=Vdd
Will dissipate static power; increasing R will reduce
power and increase noise margin, but low-
low-to-
to-high
transition gets slower. Only practical if process
supports undo
undop
doped poly which has sheet resistance of 10M
Ohm/square.
Depletion-
Depletion-mode NFET. No load on input, VOH=Vdd.
Connecting gate to source sets Vgs = 0 so Ipu is
determined only by Vout. Layout can be compact since
pullup is in same well as pulldown;
pulldown; buried contact can be
used to connect gate to source. Only found in NMOS
processes.
Enhancement-
Enhancement-mode NFET. VOH= Vdd- Vt unless gate of
pullup is driven above Vdd. If gate is not switched off,
pullup needs to be weak to avoid excessive power
dissipation, but this may entail larger layouts. Useful
where PFETs not wanted (e.g., some I/O structures).
Pseudo-
Pseudo-NMOS using saturated PFET as load
device. VOH= Vdd. Useful for building large fan-
fan-in NOR
gates found in static ROMs and PLAs where static power
dissipation is okay.
JMM v1.4
Inverter with PFET pullup
Vgs,
gs,pu = Vin-Vdd Vds,
ds,pu = Vout-Vdd
S negligible steady-
steady-state
G
power dissipation
Vin D Vout VOL = 0V, VOH = Vdd
D VTC transition very sharp
switching point can be
G S adjusted by fet sizing
Vgs,
gs,pd = Vin Vds,
ds,pd = Vout
non-
non-vertical only because
of channel-
channel-length modulation
Vout Vin = Vout
Vdd
n=off lin
p=
sat
sat
p=
n= p=off
lin
n=
Wn/
Wn/Wp>1
Wp>1 Wn/
Wn/Wp<1
Wp<1
Vin
JMM v1.4
Build your own VTC
In the steady state:
Ids,pd(Vin,Vout) = -Ids,pu
ds,pu(Vin-Vdd,Vout-Vdd)
Ids,
ds,pd Ids,
ds,pd
-Ids,
ds,pu
Vin = 0.5V -Ids,
ds,pu
Vin = 1.5V
Vout Vout
Vout
Ids,
ds,pd
-Ids,
ds,pu
Vin = 2.5V
Vout
When both fets are
saturated, small changes
in Vin produce large
changes in Vout
Vin
Ids,
ds,pd Ids,
ds,pd
-Ids,
ds,pu
Vin = 3.5V -Ids,
ds,pu
Vin = 4.5V
Vout Vout
JMM v1.4
Ben Bitdiddle’s Buffer!
Vin Vout
JMM v1.4
Coming Up...
Next topic…
Dynamic characteristics of MOS inverters:
propagation delay, effects of rise and fall times.
Transistor sizing, interconnect issues, estimating
performance.
Readings for next time…
Weste:
Sections 2.3 thrugh 2.3.2
JMM v1.4
VLSI--3
Exercises: VLSI
Ex vlsi3.1 (difficulty: easy): Calculate the CMOS
inverter threshold values for the following confi- confi-
0,5µm process,VDD=3,3V)
gurations (Alcatel 0,5µ
a) Wn = Ln, Wp = Lp
b) Wn = 10 Ln, Wp = Lp
c) Wn = Ln, Wp = 10 Lp
Result: a) Vinv = 1.30V, b) Vinv = 0.893, c) Vinv =
1.88V (see Weste pp66)
Ex vlsi3.2 (difficulty: medium, time consuming):
Calculate the noise margin and VIL, VIH, VOL, VOH,
for a CMOS inverter operating at 3.3V with βn=
βp, Utn= -Utp=0.61V.
Result: VIL = 1.39V, VIH = 1.91V, VOL = 0.26V,
VOH = 3.04, NML= NML=1.13V
Weste pp99: 2.10 ex 5 (difficulty: medium, time
consuming): Design an input buffer that may be
used to interface with a TTL driver (V (Vdd=3.3V,
VOL=0.8V, VOH=2.0V). Show full derivations of
DC conditions. Assume Wn =1µ =1µm and Ln = Lp =
0.5µm
0.5µ
1.51µm
Result: Wp = 1.51µ MicroLab, VLSI-3 (14/14)
JMM v1.4
VLSI Design I
Dynamic characteristics of MOS inverters
Overview
gate delay modeling
power dissipation
JMM v1.3
Static properties reviewed
sharp transition:
inverter good
receiver for voltage-
voltage-
based signalling
Vout Ids,n
ds,n
increasing Wn increasing Wp
decreasing Wp decreasing Wn
Vin
VOH=Vdd, VOL=0, sharp transition => good noise margins
VOH=Vdd => pfet off when Vin=VOH => no static power
VOL=0 => nfet off when Vin=VOL => no static power
JMM v1.3
Choosing what to measure
V tf
Vin
90%
Vin Vout ???
10% Vout
t
td
tr
Rise time, tr = time for a waveform to rise from 10% to
90% of its steady-
steady-state value
Fall time, tf = time for a waveform to fall from 90% to
10% of its steady-
steady-state value
Delay time, td = time between input transition (when Vin
= ???) and output transition (when Vout = ???).
If ??? = Vinv, can delay be negative?
does Vinv differ for each gate?
so does td(seq. of gates) = sum(td)?
should we choose 50% instead of Vinv?
JMM v1.3
Signal delay time
Signal delay time is composed as follows
gate delay time
UCC
switch level mode of fet switch level mode
of inverter Rp
C Cin
R Rn
JMM v1.3
Fall time analysis #1
dynamic transition
Vout
static transition Vin = Vout
Vdd
n=off lin
speed
p=
sat
sat
p=
n= lin p=off
n=
Vin
JMM v1.3
Fall time analysis #2
Saturated: Vout >= Vdd - Vt,n
dVout βn
= − (Vdd − Vt,n )
2
CL
Vout dt 2
So, time to fall from 0.9Vdd to
Idsat,n
dsat,n CL Vdd - Vt,n is given by
2C L 0.9V dd
β n (Vdd − Vt,n )
2 ∫Vdd − Vt, n
dVout
CL 2 (n − 0.1 )
tf = + 0.5 ln (19 − 20n )
β n Vdd (1 − n ) (1 - n )
tr is
similar equals 3 to 4 for Vdd=3V-
=3V-5V and Vt,n=.5V-
=.5V-1V
equals 3.6 for C05M
MicroLab, VLSI-4 (6/29)
JMM v1.3
Estimating delays
In most CMOS circuits, the delay of a single gate is
dominated by the output raise and fall time. Thus:
tr tf
t dr = t df =
2 2
JMM v1.3
Input rise/fall & delay
How do input rise and fall times affect delay?
fast inputs will quickly turn off one mosfet and provide
maximum Vgs to the driving mosfet for most of the output
transition
slow inputs will leave both mosfets on longer, reducing
effective current to/from load capacitance and Vgs will be
lower than above.
So we might expect slower input transitions to lead to
longer output delay times.
One rule of thumb (Weste
(Weste,
Weste, p. 216ff)
~0.2 for Vtn = 0.61V, Vdd = 3.3V
1 + 2n
t dr = t dr −step + t f,in
6
1 − 2p
t df = t df −step + t r,in
6
JMM v1.3
Bootstrapping & delay
CGD
JMM v1.3
Multiple inputs & delay
Cout
A
Cab
B Intermediate
Cbc node
C capacitances
Ccd
D
JMM v1.3
Driving large loads #1
If large loads have to be driven, the delay may increase
drastically. Large loads are output capacitances, clock trees,
etc.
C
t d = t inv L = 1000 ⋅ t inv
1 CG
CG CL=1000 CG
1 40 200
CG CL=1000 CG
40 200 1000
td = ⋅ t inv + ⋅ t inv + ⋅ t inv = 50 ⋅ t inv
1 40 200
JMM v1.3
Driving large loads #2
To drive a large load capacitance one might
employ a sequence of n inverters, each a factor “a” larger
than the previous one:
1 a a2 a3
CG CL
n=4 inverters
4
in practice
3
a=3...5
2
0
0 1 2 3 4 5 6 7 8
JMM v1.3
Power dissipation #1
JMM v1.3
Power dissipation #2
dc power dissipat
dissipation: short circuit current (power to
ground) due to switching
ac power dissipation: capacitor current (charging, re-
re-
charging) due to switching
I0 = I S (e qV / kT − 1 )
PS = ∑ I0 ⋅ VDD
JMM v1.3
Dynamic power dissipation #1
Comparison of dynamic short circuit current vs.
capacitive current.
As expected, the short circuit current have a less
important contribution when the load gets large.
Slower input transition would increase short circuit
current.
Uin Uout
W/L=4 Idsn
Uin Uout-
out-A
W/L=2
Idsp
W/L=4 Idsn
Uin Uout-
out-B
W/L=2 Idsp
50fF
W/L=4 Idsn
Uin Uout-
out-C
Idsp
W/L=2 200fF
JMM v1.3
Dynamic power dissipation #2
Average dynamic power for switching a square-
square-wave input
with a repetition frequency of fp = 1/t 1/tp is (capacitor
current)
t p /2 tp
1 1
Pd = ∫ in (t )Vout dt + ∫ i p (t )(VDD − Vout )dt
tp 0 t p t p /2
C L VDD2
Pd = = C L VDD2 fp
tp
proportional to switching
frequency but independent
of device parameters
JMM v1.3
Dynamic power dissipation #3
VDD+Vtp
Vtn
tp
Imax
Imean
t1 t2 t3
β t rf
⋅ (VDD − 2 Vt )
3
Psc =
12 t p
JMM v1.3
Total power dissipation
Ptotal = Ps + Pd + Psc
switching activity:
nswitching = percentage of switching gates
there exist simulators estimating power dissipation
using the switching activity
JMM v1.3
Build your own power meter
linear current-
current-controlled
current source
+
Vs = 0 Is g*I
g*Is RY CY Vy
-
Vy(0) = 0V
Device
or
Periodic input Circuit CL
Vin(t) = Vin(t+T)
JMM v1.3
Power and ground bounce
Metal power-
power-carrying conductors have to be sized
for three reasons:
metal migration
power supply noise
RC delay
general rule:
limit current density J AL ≈ 0.4... 1mA / µm
contact replication
I I I
JMM v1.3
“It’s the wires, stupid”
As process dimensions shrink, wiring capacitances
start to dominate the mosfet capacitances.
To estimate wiring capacitances, consider the
following figure:
h
Cpp
fringing-
fringing-field
parallel-
parallel-plate
capacitance
capacitance
Parallel-
Parallel-plate capacitance given in process
files. Fringing capacitance is significant
when t is comparable to h.
JMM v1.3
Fringing Capacitance
Figure 6.11 from CMOS Digital Integrated Circuits:
Analysis and Design, by Kang and Leblebici:
Leblebici:
JMM v1.3
Wire model?
Today, the longest wire on a VLSI chip might be 2cm
which has “time of flight” of ~130ps assuming εSiO2
= 3.9 ε0
If the signal rise/fall time is longer than the time of
flight we can model wires as a distributed RC network.
Longer wires or shorter rise/fall times require the wire
to be modelled as a transmission line.
For short wires, a lumped RC model is sufficient. For
longer wires, we use the distributed RC model where
signal propagation can be shown to obey the diffusion
equation:
R/unit length dV d 2 V
rc = 2
dt dx
C/unit length distance from driver
JMM v1.3
Eq.. in “real life”
Diffusion Eq
rcl2 Weste,
Weste, Eq.
Eq. 4.28,
t = 2 .2 but 10% to 90% rise/fall time
2
Ex vlsi4.3: clock with 50pf load distributed
1µ-wide metal wire running from clock
by 1µ
buffer in corner of 10mm x 10mm chip.
buffer
r = 0.05 ohm/square
c = 50pf/20mm
l = 20mm
a) t = ? b) t = ?
r = 0.0025 ohm/square
c = 50pf/10mm
l = 10mm
c) t = ?
whew!
MicroLab, VLSI-4 (25/29)
JMM v1.3
Inductance
Bond-
Bond-wire inductance can cause deleterious effects
in large, high speed I/O buffers
package inductance: 3 .. 15 nH
with process shrinking on-
on-chip inductance has to be
taken into account
Vdd
on-
on-chip inductance: 10 .. 50pH/mm
L
dI
dV = L i(t)
dt
design techniques:
9 separate power pins for I/O pads and chip core
9 multiple power and ground pins
9 careful selection of the position of the power and
ground pins on the package
9 adding decoupling capacitances on the board
9 increase the rise and fall times
9 use advanced package technologies (SMD, etc)
MicroLab, VLSI-4 (26/29)
JMM v1.3
Coming Up...
Next topic…
Combinational logic: series/parallel switch
networks, transmission gates. Performance
optimis
optimisation.
ation.
JMM v1.3
VLSI--4
Exercises: VLSI
Ex vlsi4.1 (difficulty: easy): Calculate the inductive spike
at the power supply provoked by 8 output buffers, each
driving 50pF in 4ns, Vdd=3.3V, total bonding
inductance 15nH
Result: dVtot = 1.24V (see Weste pp 205)
JMM v1.3
VLSI--4
Exercises: VLSI
JMM v1.3
VLSI Design I
CMOS Combinational Logic
Overview
Euler rules for complex CMOS gates
Layout and stick diagram
JMM v1.4
How ‘bout more than 1 input?
Vdd
Logic recipe:
pullup: make this connection
when we want F(A1,…,An) = 1
...
A1
F(A1,…,An)
...
An
pulldown: make this connection
when we want F(A1,…,An) = 0
...
Finally! I was
getting tired
of inverters...
JMM v1.4
Complementary logic
Now you know what the “C”
in CMOS stands for!
We want complementary pullup and pulldown
logic, i.e., the pulldown should be “on” when
the pullup is “off” and vice versa.
pullup pulldown F(A1,…,An)
on off driven “1”
off on driven “0”
on on driven “X”
off off no connection
JMM v1.4
CMOS complements
What a nice Thanks. It runs
VOH you have... in the family...
pulldown pullup
nfet block pfet block
A
A B
B
A
A B
B
JMM v1.4
Development of CMOS gates /1
A
B 0 1
Step 1: development of nfet 0 1 1
block. Logic mini-
mini-
mization of “0” in 1 1 0
Karnaugh diagram
F=A*B
JMM v1.4
Development of CMOS gates /2
A
B 0 1
Step 2: development of pfet 0 1 1
block. Logic mini-
mini-
mization of “1” in 1 1 0
Karnaugh diagram
F=A+B
A B
JMM v1.4
Development of CMOS gates /2
A
B 0 1
Step 3: put nfet and pfet 0 1 1
block together
1 1 0
F=A*B
JMM v1.4
NAND & NOR
2-input NAND. When output is low,
two nfets are in series. So to keep
output fall time equivalent to that
of an inverter, the nfets have to be
twice as wide. Pfet widths can be
A same as those in the inverter (but
remember there were already 2x nfet
widths!). Can be extended to large
B fan-
fan-in but practical limit is 4 inputs.
Why?
Pseudo-
Pseudo-NMOS NOR gates are
used to build high fan-
fan-in NOR
gates for PLA’s to save area
A1 … An
(at some cost in static power).
JMM v1.4
Layout of simple gates
VDD p-type substrate
n-type well
metal/pdiff
metal/pdiff
contact
with detail
removed Wp
Lp
IN OUT
Wn
Ln contact
from metal
to ndiff
GND
JMM v1.4
Layout Rules #1
JMM v1.4
Layout Rules #2
JMM v1.4
Stick Diagram
JMM v1.4
NAND & NOR ((again
again))
again
JMM v1.4
Fan--In CMOS Gates
Large Fan
&
&
& ≥1
&
JMM v1.4
CMOS Gate Recipe
A
Step 1. Figure out pulldown
network that does what you
want, e.g., F = A*(B+C) B C
JMM v1.4
Complex CMOS Gates /1
Example: F = A * B + C * D
C D
Step 2: generation of pfet
block (logic “1”)
A B
F = (A + B) * (C + D)
JMM v1.4
Complex CMOS Gates /2
A C
where is this signal
B D
in the transistor schema ?
A & ≥1
B
C &
JMM v1.4
Complex CMOS Gates Layout /1
JMM v1.4
Complex CMOS Gates Layout /2
VDD
C D
N1
A B
F
A C
N2 N3
B D
VSS start
F
C
A
start
VDD N3 N1 N2 F
D
B
VSS
A -> B -> D -> C
MicroLab, VLSI-5 (19/34)
JMM v1.4
Complex CMOS Gates /3
C D
A B
F
A B D C
MicroLab, VLSI-5 (20/34)
JMM v1.4
Complex CMOS Gates /4
C
A
B
B C
JMM v1.4
A Quiz! /1
JMM v1.4
A Quiz! /2
CD
00 01 11 10
AB
00 1 1 1 1
01 0 0 0 0
11 0 0 0 0
10 1 0 0 0
JMM v1.4
Quiz : Solution
F=A*B+B*C*D
C VDD
start
VSS P1 N1 F
A
D
P2
start
JMM v1.4
Transmission Gates
S
CMOS nMOS
A B A B
S
S
JMM v1.4
CMOS TG Electrical Model
S=VDD S=0
A B A B
S=0 S= VDD
switch is off switch is “on”
VB
0V |VT,p| VDD-VT,n VDD
R
Req,p
eq,p Req,n
eq,n
Req,TG
eq,TG
Req,n
eq,n || Req,p
eq,p
VB
0V VDD-VT,n VDD
MicroLab, VLSI-5 (26/34)
JMM v1.4
TG Circuits: MUX
A
Y=A*S+B*S
B
Is this node
always the “output”
S of this gate?
inverter
not drawn
JMM v1.4
TG Circuits: 4 to 1 MUX
B
F
C
S1
S2
MicroLab, VLSI-5 (28/34)
JMM v1.4
Best XOR in Town
A ≥1&
A =1 F B F
B ≥1
12 transistors
A
A*B+A*B
B
Is this node
always the “output”
8 transistors of this gate?
A A*B+A*B
B Is this node
always the “output”
of this gate?
6 transistors
MicroLab, VLSI-5 (29/34)
JMM v1.4
TG Quiz
JMM v1.4
TG Circuits: Problems
Uin Uout
R R R R R
Uin Uout
C C C C C
τ = 2.2 ⋅ (RC )2
JMM v1.4
Coming Up...
Next topic…
Dynamic ((precharge
precharge/evaluate)
precharge/evaluate) logic circuits:
CMOS domino logic, NP domino logic, CVSL logic.
Charge sharing.
JMM v1.4
VLSI--5
Exercises: VLSI #1
A
B
Z
GND
JMM v1.4
VLSI--5
Exercises: VLSI #2
JMM v1.4
VLSI Design I
Dynamic Logic Gates
Overview
Dynamic logic gates, Domino, NORA, CVSL structure,
Goal: You are familiar with dynamic logic gates and its
different families. You can handle the dynamic
logic problems like charge sharing and timing.
MicroLab, VLSI-6 (1/28)
JMM v1.3
Tinkering with Logic Gates
Things to like about CMOS gates:
easy to translate logic to fets
rail-
rail-to-
to-rail switching
good noise margins
no static power since fets are in cutoff
sizing not critical to correct operation
JMM v1.3
Dynamic CMOS Gates
“pre
“precharge”
B switch
A B
A “evaluate”
CLK
switch
inputs must be stable before CLK goes high because once output has
been discharged it won’t go high again until next cycle
for same reason, noise/glitches on inputs cannot exceed nfet
threshold, a much more stringent requirement than for static CMOS
CMOS
gates.
Prec
Precharge
echarge phase Evaluate phase
clock
output
JMM v1.3
There’s good news & bad news
The good news:
Dynamic gates are faster than static gates despite the extra
“evaluate” fet in the pulldown path because of the reduction in self-
self-
loading and the elimination of the pullup short-
short-circuit current during
the first part of the output transition.
The bad news:
Dynamic gates cannot be cascaded.
CLK
JMM v1.3
CMOS Domino Logic
pree
preecharge: high
evaluate: falls (maybe)
nfets nfets
buffer might
be needed
in any case
CLK for high fan-
fan-out
circuits.
pree
preecharge:low
evaluate: rises (maybe)
When CLK goes high, dynamic node is conditionally discharged andand the
buffer output will conditionally go high. Since discharge can only
only
happen once, buffer output can only make one low-
low-to-
to-high transition.
JMM v1.3
Domino--style Circuits
More Domino
weak pfet “keeper” keeps dynamic node pulled high during
evaluate phase if it’s not being pulled down through nfets Ö
gate is static in both clock phases.
CLK
nfets
“latching” pfet acts like keeper above unless dynamic node
gets pulled down during evaluate phase. When buffer output
goes high it switches keeper off saving static power. Good
for leakage current problems...
Be careful of cap.
! coupling to dynamic
node (see later slide).
JMM v1.3
Optimising Domino Logic (I)
nfets nfets
JMM v1.3
Optimising Domino Logic (II)
In domino logic circuits we want evaluate
to happen as quickly as possible. We can
size fets to optimise evaluate speed.
small large
large small
nfets
JMM v1.3
“it is not everything gold which is
glittering“
JMM v1.3
Charge Sharing (I)
F=0-
F=0->1 C 3C
Suppose the dynamic node has been
E=1 1.5C
discharged during the previous
evaluate cycle. Then during
precharge, all the intermediate nodes
D=1 1.5C in the pulldown chain will remain
discharged while the dynamic node is
C=1 C precharged.
precharged. Calculate the voltage on
the dynamic node when CLK goes
B=1 high. When CLK goes high, the
C
voltage on the dynamic node goes to
A=1 ->0 C
3C V = 1.1V for VDD=3.3V
3C + 6C DD
CLK
which is low enough to switch the output
inverter.
JMM v1.3
Charge Sharing (II)
n-logic
n-logic n-logic
n-logic
CLK
JMM v1.3
Capacitive Coupling
OUT
CLK
OUT
t
Coupling can also occur between other signal wires and long dynamic
dynamic
nodes (e.g., ones that span multiple bits in a datapath).
datapath). Solutions:
on long routes add “twists” to avoid continuous routes or route
dynamic signals between mutually exclusive or complementary
signals.
JMM v1.3
Domino Logic Design
To convert to Domino-
Domino-style design we need to
create schematic that uses non-
non-inverting gates:
(1) look for CMOS gates followed by inverter
(2) use Demorgan’s Law to create non-
non-inv gates
C
D
E Y
F
G
H
Convert to Domino OR gate
Domino AND
A
B X
D
E Y
F
G
H Domino AND-
AND-OR
Domino OR
JMM v1.3
Domino Logic Design (II)
X Y
A E G H
B D F
C nfet W/L = 4
pfet W/L = 8
CLK
s = static
d = domino (W/L = 4)
dd = domino (W/L = 8)
JMM v1.3
Dual--rail Domino Logic
Dual
Domino circuits that generate both polarities of output
CLK CLK
A A
A B A B
B B
CLK CLK
CLK
A A
A
B B
CLK
JMM v1.3
Multiple--output Domino
Multiple
Why stop at complementary outputs? There are interesting
multiple-
multiple-output functions where there is a lot of sharing of nfets in
the evalua
evaluate logic.
logic. For example, in a carry-
carry-lookahead adder
C1 = G1 + P1C0 Gi = A i Bi
C2 = G2 + P2G1 + P2P1C0 Pi = Ai+Bi
C3 = G3 + P3G2 + P3P2G1 + P3P2P1C0
C4 = G4 + P4G3 + P4P3G2 + P4P3P2G1 + P4P3P2P1C0
CLK
C4
P4 G4
C3
P3 G3
C2
P2 G2
C1
P1 G1
C0
JMM v1.3
Dual--rail “Keeper” Circuit
Dual
CLK
A
A B
B
CLK
The cross-
cross-coupled pfets serve as “keepers”
for the output which is high making the gate
static rather than dynamic! During precharge
both keepers are off; during the evaluate
phase, the output that goes low switches
on the keeper for the output that is staying
high. Really solves capacitive coupling
problems with dynamic logic in datapaths.
datapaths.
JMM v1.3
Cascade voltage switch logic (CVSL)
Q Q clock Q
Q
nmos nmos
combinatorial combinatorial
network network
clock
The static version might be dynamic CVSL
quite slow due to the nfet
pfet “fight” during switching
Q
Q
d e
d a
b a
e b c
c
JMM v1.3
CMOS NORA Logic (NP Domino)
p blocks n blocks n blocks p blocks
If we turn a dynamic gate “upside down” and use pfets to build the
logic block, we get a logic gate that “pree
“preecharges” low and
“discharges” high. By using these gates in an alternating seque
sequence
nce
with regular nfet dynamic gates we can eliminate the race problem
we had with nfet-
nfet-only dynamic gate sequences and hence we don’t
need the buffer inverter present in domino gates.
JMM v1.3
Domino Life Cycle
Actively pr
precharging
Actively evaluating
JMM v1.3
Self--timed Pipelines
Self
0 = precharged
1 = evaluation done
P/E done?
done? P/E done?
done? P/E done?
done?
F1 F2 F3
JMM v1.3
Muller CC--Element
Add weak feedback
inverter if we’re worried
about dynamic storage
for precharge/eval
precharge/eval signal
P/E
Pdone
Sdone
The Muller C-
C-Element is the “AND” gate for self-
self-timed
logic because it changes its output only after both inputs
have changed. As shown above, it’s an elegant
implementation for both sets of rules on the previous
slide.
JMM v1.3
Completion Detectors
Self-
Self-timed logic
dual-rail signalling (i.e., two wires) to encode
use dual-
reset (not yet evaluated) 00
ready with value 0 01
ready with value 1 10
and then build handshake logic that starts
next stage when current stage is done and next
stage has completed its previous computation
and delivered its values...
JMM v1.3
Self--timed Pipeline Latency
Self
1 = precharged
0 = evaluation done
C C C
P/E done?
done? P/E done?
done? P/E done?
done?
F1 F2 F3
JMM v1.3
Further Improvements
We don’t have to delay evaluation until successor has finished
its precharge (signalling that it’s finished with our values). We
can just check that successor has started precharging…
precharging… Even
with this improvement, the correct sequencing will still happen
for any combination of precharge and evaluate times for all the
gates.
We can modify the control element like so:
S P/E
Sdone
JMM v1.3
Dynamic Logic Summary
Advantages of dynamic logic:
smaller area than fully static gates
smaller parasitic capacitances hence higher
speed
reliable operation if correctly designed. Concerns:
capacitive coupling to dynamic nodes
charge sharing with dynamic nodes
subthreshold leakage currents in eval logic
minority carrier injection and latchup
alpha particle immunity
vdd/
vdd/gnd noise and resistance
JMM v1.3
Coming Up...
Next topic…
CMOS sequential logic.
logic.
JMM v1.3
VLSI--6
Exercises: VLSI
JMM v1.3
VLSI Design I
Clocking Strategies
Generator
Clock
Today’s handouts:
(1) Lecture Slides
JMM/ESA v1.0
VLSI Systems Design
Microelectronic Technologies
Overview
microelectronic technologies, ASIC, FPGA, µC
JMM v1.4
Microelectronic Technologies
What is microelectronic ?
Has a microelectronic design engineer only to have
good knowledge about silicon, layout, etc. ?
application specific
integrated circuit
macro cell full custom
standard cell
gate array microprocessors
PIC, COP
FPGA RISC
uController
PAL CPLD signal processor
JMM v1.4
Gate Array Technology #1
prefabricated wafers
I/O stages predefined
regular array of fets and interconnection channels
interconnection defines functionality
features
size: 100 - 1M gates
short turn around time
cheap at medium quantities
unsuitable for regular structures like RAM, PLA, ALU
JMM v1.4
Gate Array Technology #2
JMM v1.4
Sea--of
Sea of--Gate Technology
prefabricated wafers
I/O stages predefined
regular array of fets,
fets, no reserved interconnection
channels
interconnection defines functionality
features
size: 100 - 1M gates
short turn around time
cheap at medium quantities
regular structures like RAM, PLA, ALU can be used
JMM v1.4
SOG Example
nwell contacts
INV NOR2
GND
3 nfets
2 small, 1 large horizontal
mosfets with wiring tracks
common gate in metal-
metal-1
3 pfets
GND
substrate
contacts
vertical wiring tracks
in metal-
metal-1 or metal-
metal-2
MicroLab, VLSI-8 (6/20)
JMM v1.4
Standard Cell Technology
complete fabrication process
predefined library of base functions
modular similar to TTL families
features
chip size limits complexity
long turn around time
cheap at high quantities
standardized cell height
unsuitable for regular structures
more flexible and compact (1:4) than gate array
JMM v1.4
Standard Cell Example
Create a library of pre-
pre-layed-
layed-out cells, e.g,, boolean gates,
registers, muxes,
muxes, adders, I/O pads, … A data sheet for
each cell describes the cell’s function, area, power,
propagation delay, output rise/fall time as function of
load, etc.
JMM v1.4
Full Custom Technology
JMM v1.4
Macrocell Technology #1
JMM v1.4
Macrocell Technology #2
2-dim array of standard cell block
full custom block
JMM v1.4
FPGA Technology #1
field programmable device
no fabrication needed for customizing
predefined logic blocks
unsuitable for regular structures
features
size: up to 2‘000’000 logic gates (see Virtex from Xilinx)
Xilinx)
large silicon area necessary (72 million fets,
fets, 10x Pentium2)
short design and customize time
cheap for small quantities
compared to ASICs,
ASICs, FPGAs have a reduced clock speed
circuit configuration downloadable (RAM or PROM)
JMM v1.4
FPGA Technology #2
configurable
logic block (CLB)
I/O buffers
switching
matrix
I/O buffers
I/O buffers
routing
channels
I/O buffers
configuration
- mask programmable
- one time programmable
- downloading of configuration from host into internal RAM
- downloading of configuration from on board serial ROM
JMM v1.4
JMM v1.4
CLB from Xilinx serie XC5200
C1...C4
H1 Din/H2 SR/H0 EC
G4
Din Bypass
G3 Logic F’ S/R
Function Control SD YQ
G’
of
G2 H’ D Q
G1...G4
G1
Logic
Function
G’ EC
of
F’,G’ H’ H’ RD
1
and H1 Y
F4
Din
Bypass
F3 Logic F’ S/R
Function G’ Control SD XQ
F2 of H’ D Q
G1...G4
F1
K (Clock)
EC
FPGA Technology #3
RD
1
H’ X
PSM PSM
PSM PSM
JMM v1.4
uC Technology
PIC
36 mm
MicroLab, VLSI-8 (16/20)
JMM v1.4
How to select a technology
Selection arguments
- cost
- speed
- size
- time to market
cost
units ASIC
FPGA
NRE
units
design
design
JMM v1.4
Coming Up...
Next topic…
Hardware description language VHDL, top-
top-down
design.
JMM v1.4
VLSI--8
Exercises: VLSI #1
JMM v1.4
VLSI--8
Exercises: VLSI #2
time
delayed market d
introduction L
product life
JMM v1.4
VLSI Design I
Regular Logic Structures
Today’s handouts:
(1) Lecture Slides
JMM v1.2
Goals for Regular Logic Structures
JMM v1.2
Useful Logic Forms
Truth tables
w direct implementation as muxes, ROMs
w good when you have many outputs and few inputs since cost of
“decoding” inputs is fixed
w ECO-tolerant but often not efficient use of logic
Minimum Sum-of-Products (SOP, AND-OR)
w minimize no. of literals (small fan-in ANDs) or no. of products
(small fan-in ORs)
w maximum sharing of product terms for multiple-output functions
w if fan-ins are small: direct implementation as complex gates or
as 2-levels of ANDs then ORs
w if fan-ins aren’t small: multiple levels of gates (e.g., parity,
“Achilles heel” = 2n-1 minterms)
w efficient use of logic, but not very ECO-tolerant
But how do we minimize the number of literals or
minterms? Yeah, we know about Karnaugh maps, but
they aren’t so good for more than 4 inputs or for
maximizing minterm sharing.
JMM v1.2
Logic Manipulation
Start with two-level minimization
w by inspection searching for terms that are logically
adjacent:
p⋅ x + p⋅ x = p ⋅( x + x ) = p ⋅1= p
w Karnaugh maps for simple situations
w Quine-McCluskey otherwise
Then try to generate multiple levels:
w factoring. Choose literal that appears in most product
terms (>1) and factor it out.
F = a ⋅c + a ⋅d +b⋅c +b⋅d +a ⋅e
= a ⋅(c +d ) +b⋅ (c + d ) + ae
w factor again with or-terms that appear in multiple places
F = (a + b) ⋅(c + d ) + ae
w find common subexpressions (multiple output
decomposition)
JMM v1.2
Muxes as “lookup tables”
A B C F
0 0 0 0
0 0 1 0
0 1 0 0 0
0 1 1 1 C
1 0 0 0 C
1 0 1 1 1
1 1 0 1
1 1 1 1
A,B
A,B,C
OP0
OP1 F Vcc
OP2
OP3
B
A,B out
OP<3:0> F
0 0 0 0 ZERO A
1 0 0 0 AND gnd
1 1 1 0 OR
0 1 1 0 XOR
MicroLab, VLSI-9 (5/12)
JMM v1.2
Read-only Memories
if connection or if connection or
mosfet is present, mosfet is present,
blank otherwise blank otherwise
7
6
5
4
Address decoder 3 For each Fi,
implemented as OR together
AND (= NOR). 2 all rows for
Note: all but one which output
row pulled down 1 is 1 (actually
for given input. use NOR then
0 invert).
A B C F1 F0
JMM v1.2
PLAs
In fact, the optimizations from the previous
slide are so worthwhile that we have a
name for the resulting “optimized” ROM:
Programmed Logic Array, or PLA for short.
“AND” plane “OR” plane
Hint: for greater
ECO-tolerance, add
4,5,6,7 a few extra empty
2,3 rows!
1
What are the logic
equations for F1
and F0?
A B C F1 F0
JMM v1.2
PLA Folding
PLAs can be sparse, i.e., only a few of the
possible connections in either plane may
be made. (AND plane can only have 50%!)
A A B B C C D D F1 F2
1
2
3
4
5
6
JMM v1.2
Multiple-input encoding
On the previous slide, it was noted that
the AND plane can have at most 50% of
its connections programmed. Why?
AB AB AB AB
A A B B
AIN BIN
AIN BIN
You get extra computing oomph: for example, it’s now
possible to compute (A xor B) using
a single row rather than the two rows it took
with the old encoding.
MicroLab, VLSI-9 (9/12)
JMM v1.2
Datapath Operators
Most digital functions can be divided into the
following categories:
u datapath operators
u memory elements
u control structures
u I/O cells
Datapath operators form an important subclass of
VLSI design that benefit from the structured design
principles of hierarchy, regularity, modularity and
locality.
u N-bit Data is generally processed by the use of n
identical subcircuits.
u Data operations may be sequenced in time or space.
JMM v1.2
Datapath Operator Example
Magnitude operator example:
u data may be arranged to flow in one direction
u control signals are introduced in an orthogonal direction
to the dataflow
less than or equal
B m
m Z
- =0
m
A
ctrl
Am
Bm - =0 if Zm
Am-1
B m-1 - =0 if Zm-1
m bits
A1
B1 - =0 if Z1
A0
B0 - =0 if Z0
subtractor equal-zero mux
metal1 control flow
JMM v1.2
Coming Up...
Next topic…
Sequential logic: state elements, latches and
registers. Static vs. dynamic storage. Single and
multiphase clocking strategies. Setup and hold
times; propagation delays.
Readings for next time…
Weste:
u Sections 8.1 thru 8.2 (data operators)
u 8.3.2, 8.4.2 (just read, don‘t study)
JMM v1.2
VLSI Design I
CMOS Sequential Logic
Clocking Strategies
Overview
single and double phase clock systems
Latch and FF timing
JMM v1.4
Sequential Logic
Use #1: Get better utilization from
idle combinational logic blocks.
Pipeline the system so that new
computations start before the old ones
complete. Add registers to keep
computations separate.
8
A
8 Use #2: Convert parallel operations
x C
B to a sequence of (faster, smaller)
8 serial operations.
operations.
1
A
1
+ C
B
8 8
JMM v1.4
Flip--Flops
Latches and Flip
Q follows D
D Q D
G G
Q
level sensitive latch
Q stable
D Q D
clk clk
Q
edge sensitive flip-
flip-flop
Q stable
JMM v1.4
Latch Timing Constraints #1
latch a latch b
D Q CLa D Q CLb D Q
G G G
CLK
t1a
t2b
H S
CLK H S
Do I have to
check ALL these t1a = tnqa+ tnla > thb
constraints?
t1b = tnqb + tndb > tha
t2a = txqa + txla < tc0 - tsb
t2b = txqb + txlb < tc1 - tsa
th = hold time
ts = setup time
tn = min delay from invalid input to invalid output
tx = max delay from valid input to valid output
tl = delay for combinatorial logic from input to output
tq = delay for memory element from G to Q
JMM v1.4
Latch Timing Constraints #2
t1a
t2b
H S
CLK H S
JMM v1.4
Static Latches
Basic idea: Want storage node to
be isolated from whatever
Need gain around user does to Q.
this loop to make 0
latch static.
Q
D 1
Would like fast CLK-
CLK-to-
to-Q,
small setup and zero hold
times.
CLK
Oops… feedback not
Obvious implementation: isolated from Q. Could
add additional
output inverters...
D D
CLKN
CLK CLK
Should we buffer CLK
0, 1 or 2 times?
JMM v1.4
Latch Timing
1 2
CLK
JMM v1.4
Dynamic Latches
Suppose in the interest of speed we were
willing to give up the “static guarantee”
and take our chances with dynamic latches,
i.e., remove feedback path...
Eliminate when
Q fanout is small (1)
D Q
Can combine
other logic
with inverter
CLK local or global
clock inverter?
CLKN D Q
D Q
CLK
CLK
JMM v1.4
Flip--flops (registers)
Flip
Using alternating positive and negative dynamic latches with
a single clock gives great speed and small area, but…
lots of worries about clock skew
must balance logic delays to minimize wastage
need latch size checks (check optimisations!)
D D Q D Q Q D D Q Q
master slave
G G CLK
CLK
D
CLK
Q
!
MicroLab, VLSI-10 (9/23)
JMM v1.4
Flip--flop Implementations
Flip
Obvious implementation:
Q
D
CLK
D Q
CLK
JMM v1.4
Flip--Flop Timing
Flip
D Q CL D Q
clk clk
CLK
t1
t2
CLK
JMM v1.4
Flip--Flops
Dynamic Flip
I’ll have the Christer Svensson
special please!
2
CLK QN
CLK is low:
node 1 follows not(D)
node 2 pulled up
QN is “floating” with it’s old value
CLK is high:
node 2 = “0” if node 1 = “1”,
otherwise it stays “1”
Ö node 2 = not(node 1) shortly after CLKÏ
QN = not(node 2) Ö stable soon after CLKÏ
node 1 can be pulled down if D goes to “0” (capacitive
coupling), but node 2 won’t change!
MicroLab, VLSI-10 (12/23)
JMM v1.4
Single--Phase Clocked Systems
Single
RTL #1:
D Q D Q D Q
CLK
latch #2:
D Q D Q D Q
G G G
CLK
clk
MicroLab, VLSI-10 (13/23)
JMM v1.4
Clock Skew
D Q D Q D Q
CLK delay
D Q D Q
clk clk
delay CLK
MicroLab, VLSI-10 (14/23)
JMM v1.4
Two--Phase Clocked Systems (latch)
Two
D Q D Q D Q
G G G
PHI1
PHI2
phi1
“non-
“non-overlapping
two phase clocks” phi2
≥1 phi2
JMM v1.4
Two--Phase Clocked Systems (FF)
Two
D Q D Q D Q
CLK
CLK
“non-
“non-overlapping
two edge clocks”
JMM v1.4
Clock Distribution
Two main techniques for clock distribution exist:
a single large buffer (see Alpha processor)
n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
delays have
n-bit datapath to match
clk between
n-bit datapath
n-bit datapath stages
n-bit datapath
n-bit datapath
n-bit datapath
clk
JMM v1.4
Phase Locked Loop Clock Technique
Phase locked loops (PLL) are used to generate
internal clocks on chips for two main reasons:
to synchronize the internal clock of a chip with an
external clock
to operate the internal clock at a higher rate than
the external clock input
clock clock
PLL
clock clock
route route
dclk dclk
dclk+dpad dpad
clock clock
dclk dclk
JMM v1.4
PLL Divider #2
by n
up VCO
Phase Charge voltage
Filter
fosc Detector down Pump controlled n x fosc
oscillator
PLL
fosc
ffeed
up
down
Ufilter
The phase detector produces a sequence of up/down
pulses, which are used to switch a charge pump.
The charge pump charges/discharges a capacitor
with voltage or current pulses
A filter is used to limit the rate of change of the
capacitor voltage. The result is a slowly changing
voltage that depends on the frequency difference
between the PLL and VCO.
The VCO increases/decreases its frequency of
operation depending on its input voltgae
MicroLab, VLSI-10 (19/23)
JMM v1.4
Static Timing Analysis
Do I have to Yup, for every pair of connected
check ALL the register/latches AND for all
constraints? possible data values!
JMM v1.4
Stage Delay Computation
Look at each gate and use knowledge of input timing and rise/fall
rise/fall
timing to compute earliest and latest time output could change ffor
or
both rising and falling output transitions.
IN VDD
INÏ Ö OUTÐ
C1 COUT
2
CLKN min Ö 1=OV, fast
IN OUT max Ö 1=VDD, slow
CLK
1 IN GND
INÐ Ö OUTÏ
C2 COUT
Other transitions:
CLKÏ, CLKÐ, CLKNÏ, CLKNÐ min Ö 2= VDD , fast
max Ö 2=0V, slow
Use Penfield-
Penfield-Rubenstein model to compute
td,in-
d,in-out = sum(R
sum(Ri,Ci) over all nodes “i” in the stage, where Ri is
total “effective resistance” to power rail and Ci is non-
non-zero if node
capacitor needs to be charged/discharged. Multiply by degrading
factor to account for rise/fall time of input.
JMM v1.4
Coming Up...
Next topic…
Data operators
Selfstudy…
Selfstudy…
Weste:
PLL section 9.3.5.3
JMM v1.4
VLSI--10
Exercises: VLSI
JMM v1.4
Intro to VLSI Systems
Finite State Machines
Today’s handouts:
(1) Lecture Slides
JMM/ESA v1.0
Excuse me… Is there such a thing
as unclocked
sequential logic?
Wave pipelining
just assert new inputs to logic after waiting “long enough” to
ensure that previous values won’t be corrupted. Requires very
careful design of each level of logic to ensure consistent
propagation delay along all paths with all possible data values.
Hard to do in the face of manufacturing variataions (“fast N, slow
P” and vice versa)
Self-timed logic
use dual-rail signaling (i.e., two wires) to encode
reset (not yet evaluated) 00
ready with value 0 01
ready with value 1 10
and then build handshake logic that starts
next stage when current stage is done and next
stage has completed its previous computation
and delivered its values. Dual-rail logic works well
with precharge-evaluate gates… more on this
in a later lecture.
JMM/ESA v1.0
Finite State Machines
JMM/ESA v1.0
Correct State Diagrams
in/out 1/0
S1 S2
1/0 1/0 1/0
0/0 0/0
S3 S8 S9 S4
1/1
-/0 1/0 -/0
0/0 0/0
S5 S6
1/1 -/1 1/0 0/0
S7
JMM/ESA v1.0
Merge Equivalent States
Two states are equivalent if for each
possible combination of inputs
(1) they have identical outputs
(2) they transition to equivalent states
0/0 0/1 1/0 1/1 0/0
S1 S2 S3 S4 S5
0/1 1/1 1/1 0/1
1/1
Compatibility table:
S2 start by putting “X”
in square (Si,Sj) if Si
produces different output
S3 from Sj for some input
all but
first
state
S4
X
S5 X
S1 S2 S3 S4
JMM/ESA v1.0
0/0 0/1 1/0 1/1 0/0
S1 S2 S3 S4 S5
0/1 1/1 1/1 0/1
1/1
S2
S3
S4
X
S5 X S1,S5
S1 S2 S3 S4
Next: for non-X square (Si,Sj) write in pairs of states that have to be
equivalent in order for Si and Sj to be equivalent.
JMM/ESA v1.0
Perform State Encoding
Given a minimized symbolic state diagram,
assign binary codes to the states. We need to predict the
effects of logic minimization and find state encoding the
produces smallest logic implementation.
0 01 01 1 1 00 10 1
1 01 00 0 “Q-M” 0 0- 01 1
S1=01 S3=10
0 00 01 1 - 10 11 0
S2=00 S4=11
1 00 10 1 - 11 01 1
- 10 11 0
- 11 01 1
0 00 00 1 0 0- 00 1
1 00 01 0 “Q-M” 1 -0 01 0
S1=00 S3=10
0 01 00 1 1 01 10 1
S2=01 S4=11
1 01 10 1 - 10 11 0
- 10 11 0 - -1 00 1
- 11 00 1
JMM/ESA v1.0
FSM Logic Implementation
Multi-level
Logic
ROM
“One hot”
Registers
PLA
JMM/ESA v1.0
Coming Up...
Next topic…
Arithmetic circuits: adders and multipliers.
JMM/ESA v1.0
VLSI Design I
Datapath Operators: Addition and Multiplication
Overview
Carry propagate, carry lookahead,
lookahead, carry save, carry skip
and carry select adder
JMM v1.4
Addition/Subtraction
Adder architectures:
carry-
carry-propagate adder (CPA)
ripple carry adder
carry-
carry-lookahead adder (CLA)
manchester carry adder
Why can‘t we just add
hierarchical carry-
carry-lookahead adder
carry-
carry-save adder (CSA)
carry-
carry-skip adder
carry-
carry-select adder
parallel adder
serial adder ...
JMM v1.4
Binary Addition
Here’s an example of binary addition as
one might do it by “hand”:
Carries from previous
1 1 0 1 column
01101
+00101
10010
If we use a two’s-
two’s-complement representation
for signed integers, the same procedure will
work for adding both signed and unsigned
numbers.
Besides the sum, one often wants two other
bits of information from an adder:
carry-
carry-out: indicates that add in the most significant position
produced a carry; used when implementing multi-
multi-word arithmetic,
e.g, “1 + (-
(-1)”
C =a ⋅b +s ⋅(a +b )
n−1 n−1 n−1 n−1 n−1
overflow: indicates that the answer has too many bits to be
represented correctly by the result width (2‘s complement),
e.g., “(2N-1 - 1)+ (2N-1- 1)”
V =a ⋅b ⋅ s +a ⋅b ⋅ s
n−1 n−1 n−1 n−1 n−1 n−1
MicroLab, VLSI-12 (3/29)
JMM v1.4
Adder with “ripple” carry chain
To convert the simple addition procedure to hardware, we’ll
need “full adder” module:
A B CIN COUT S
A COUT
0 0 0 0 0
B 0 0 1 0 1
CIN S 0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
One-
One-bit adders are sometimes 1 0 1 1 0
called “counters” since they 1 1 0 1 0
count the number of 1’s on their 1 1 1 1 1
inputs and encode the answer
on their outputs. Thus a full
adder is a 3:2 counter. S = A⋅ B⋅Cin+ A⋅ B⋅Cin+ A⋅ B⋅Cin+ A⋅ B⋅Cin
Cout= A⋅ B+ A⋅Cin+B⋅Cin
COUT
AN-1
Carry “ripples” from BN-1
one stage to the next SN-1
...
A2
B2
S2
A1
B1
A0
C0 S1 propagation delay
B0
S0
_______________
CIN
MicroLab, VLSI-12 (4/29)
JMM v1.4
Faster carry logic (CLA)
Let’s see if we can improve the speed by
rewriting the equations for COUT:
COUT = AB + ACIN + BCIN
= AB + (A + B)CIN
= G + P CIN where G = AB and P = A + B
generate propagate
So if we had (N+1)-
(N+1)-input gates and didn’t mind a lot of
loading on the P signals,
signals, the propagation delay of adder
built using this equation for the carries would be (count
per fan-
fan-in 1 delay unit: ripple carry: 5N delays):
____________________________________
Of course, this is impractical but it does lead to some
interesting ideas:
faster ripple-
ripple-carry implementations
hierarchical carry-
carry-lookahead adders
JMM v1.4
Manchester carry chain (CLA)
The plan: first generate carry-
carry-in for each adder bit as fast
as we can then compute the sum. Delay still proportional
to size of adder, but “constant” is pretty small.
static Manchester stages
P=A+B PN PN
GN CN-1
PN
PN CN
CN-1 CN
GN
GN
PN
PN
CN-1 CN
PN GN
When CLK is high, if GN
is high, CN is asserted,
CLK i.e., driven low.
JMM v1.4
Manchester Adder Block (CLA)
PNPN+1PN+2PN+3 link in Manchester
carry chain
JMM v1.4
carry--lookahead adders
Hierarchical carry
The linear growth of adder carry-
carry-delay with size of the input word may
be improved by calculating the carries to each stage in parallel:
parallel:
7 6 5 4 3 2 1 0
0,7
AK SK BK GJ+1,K CJ PJ+1,K
GIJ
K I,K C I- 1
PIJ
JMM v1.4
Carry--skip adders
Carry
Since computing PIK is simpler than computing GIK, let’s try just
computing PIK and apply the “skip” optimisation from Manchester
adders.
C12 C0
P8,11 C8 P4,7 C4
Supp
Suppose
ppose it takes 1 time unit for a signal to pass thru
two logic levels, then
time to ripple thru block of k bits = k time units
time to skip a block = 1 time unit
Consider a 24-
24-bit carry-
carry-skip adder organized as
6 blocks of four bits each. So the worst case propagation time is
4 + 1 + 1 + 1 + 1 + 4 = 12 time units
ripple skip ripple
But now reorganize the adder with the least significant 3 bits inin the
first block, the next 4 bits in the second block, followed by bl
blocks
ocks of
5, 5, 4, and 3. Now the worst case propagation time is
3 + 1 + 1 + 1 + 1 + 3 = 10 time units
JMM v1.4
Late--arriving inputs
Late
Is there a general way to reorganize a
logic equation to accommodate a late-
late-
arriving input?
JMM v1.4
Carry--select adders
Carry
Building on the idea from the previous slide: perform two
additions in parallel, one assuming the carry-
carry-in is zero and
the other assuming the carry-
carry-in is one. When the carry-
carry-in
is finally known, the correct result is selected from the two
precomputed results.
0 0
... ...
1 1
CIN
>=1 1 0 ... 1 0
>=1 1 0 ... 1 0 ...
& &
Is this a “mux
“mux”?
mux”?
JMM v1.4
JMM v1.4
32-
32-bit carry-
carry-select adder
Adder layouts
32-
32-bit carry-
carry-lookahead adder
0 0 0 0 0 0
0
0 0 0
JMM v1.4
Even--Odd Arrays
Even
Abstract carry-
carry-save picture from previous page:
M-2
...
CSA
CSA
CSA
CSA
CSA
CPA
Rewire so that first two adders work in parallel.
Feed results into third and fourth adders which
also work in parallel, etc.
M-4 2
...
CSA
CSA
CSA
CSA
CSA
CSA
CPA
JMM v1.4
Wallace Trees
O(log1.5M)
CSA
CSA
CSA
CSA
CPA
CSA
CSA
JMM v1.4
Bit--Serial Adder
Bit
• bit-
bit-serial adders are very slow, have a high data
latency, but are extremely compact
• applications are signal processing
cout
FF
clk clr
A
result
n-bit register
B n-bit register
JMM v1.4
CSA Adder (pipelining)
• pipelining adders are extremely fast, but lack of
high data latency (CSA structure of slide #13)
nc
FF
S=A+B+C+D Carry
FF
FF
D(3)
FF
FF
A(3) S(3)
B(3)
C(3) FF
FF
D(2)
FF
FF
A(2) S(2)
B(2)
C(2) FF
FF
D(1)
FF
FF
A(1) S(1)
B(1)
C(1) FF
FF
0
D(0)
FF S(0)
FF
A(0)
clk B(0) CPA adder
C(1) 0 clk
MicroLab, VLSI-12 (17/29)
CSA adders
JMM v1.4
CPA Adder (pipelining)
• the CPA structure on slide #13 can also be used in
a pipeline structure. Useful in signal processing
applications.
FF Carry
B(3) FF FF FF FF FF S(3)
A(3) FF FF FF FF
FF
B(2) FF FF FF FF FF S(2)
A(2) FF FF FF
FF
B(1) FF FF FF FF FF S(1)
A(1) FF FF
FF
B(0) FF FF FF FF FF S(0)
A(0) FF
Cin FF
Thus multiplication of an N-
N-bit number by an M-
M-bit
number boils down to the addition of M N-
N-bit partial
products each of which is formed by a simple Boolean
operation. Any of the techniques from the previous slides
can be used to accomplish the required additions.
JMM v1.4
Array multipliers
Example 3x3 array multiplier nc P5
using CSAs to sum partial A2B2
products: P4
0
A2B1 A1B2 0
P3
0
0 A1B1 A0B2
P2
A2B0
0
0 A0B1
P1
A1B0
0
0
P0
A0B0
JMM v1.4
Higher Radix Multiplication
Array multipliers are nice, but we get one column of adders (which
(which are
big/slow) for each partial product, i.e., one column for each bit
bit of the
multiplier. If we could use, say, 2 bits of the multiplier in generating
generating
each partial product we would halve the number of columns and double
the speed of the multiplier!
multiplier!
This looks the same as before except we have half as many partial
partial
products to sum. Generating each partial product is now more
complicated since BK+1,K can now be 0, 1, 2 or 3. The only
troublesome value here is 3 since that would seem to require more
more
adder inputs than we have (3*A = A + 2*A).
But…
we can also write 3*A = 4*A - A. We’ll do the -A in this partial
product stage and signal the next stage that it needs to add 4*A.
4*A. To
keep the signalling simple we’ll also rewrite 2*A = 4*A - 2*A
Profs go crazy nowadays, why
can‘t he just multiply as
everybody does it
JMM v1.4
(Radix--4)
Booth Recoding (Radix
A*B = B1,0*A*20 + B3,2*A*22 + … + BM-1,M-
1,M-2*A*2
M-2
AN-1 AN-2 … A4 A3 A2 A1 A0
x BM-1 BM-2 … B3 B2 B1 B0
M/2 2
...
BK+1,K*A = 0*A Ö 0
= 1*A Ö A
= 2*A Ö 4*A - 2*A
BK+1 BK BK-1 action N x1 x2 = 3*A Ö 4*A - A
0 0 0 add 0 -- 0 0
0 0 1 add A 0 1 0
Ai
0 1 0 add A 0 1 0 x1
&
>=1
0 1 1 add 2*A 0 0 1 Ai<<1 &
x2 PPi
1 0 0 sub 2*A 1 0 1 =1
1 0 1 sub A 1 1 0 N
carry-
carry-in
1 1 0 sub A 1 1 0
1 1 1 add 0 -- 0 0
Not cheaper than an ADD but all recodes
can be done in parallel so we only pay
time penalty once (for first column)!
MicroLab, VLSI-12 (22/29)
JMM v1.4
16x32 Booth Multiplier
JMM v1.4
Serial Multiplication
• bit-
bit-serial multipliers are very compact, but lack of
high data latency and are very slow
• simplest form of serial multiplier: successive addition
cout
& FF
reset
clk result
clr
A
&
B N-1 bit register
clk
JMM v1.4
Serial/parallel and Pipelined
Multiplication
• serial/parallel multiplier: very modular structure
Y0 Y1 Y2 Y3
X1 X0 0 0
&
&
&
&
P
M+N bit product -> td=M+N time intervals, but time intervals are larger
Yn
Xj Xj+1
&
PPin PPout
JMM v1.4
Shifters
• Shifters are very important for microprocessor
architectures:
– arithmetic shifting
– logical shifting
– rotation functions
• barrel shifter constructed by transmission gates
shift3 shift2 shift1 shift0
result3
result2
input6 result1
input5
result0
input4
input3
input2
input1 Operation: input
input0 logical right shift 0,0,0,A(3:0)
logical left shift A(3:0),0,0,0
right rotate A(2:0),A(3:0)
left rotate A(3:0),A(2:0)
arithmetic right shift A3,A3,A3,A(3:0)
arithmetic left shift A(3:0),A0,A0,A0
MicroLab, VLSI-12 (26/29)
JMM v1.4
Coming Up...
Next topic…
VLSI fabrication: processing steps, basic
structures, self-
self-aligned processes, P and N devices.
JMM v1.4
VLSI--12
Exercises: VLSI
A B A A B C B
B C
C Carry Sum
B C
A B A A B C B
JMM v1.4
VLSI--12 con‘t
Exercises: VLSI
Ex vlsi12.2 (difficulty: easy): A 32-
32-bit adder is built as a
carry-
carry-select adder. Each adder as well as the muxes have
one delay unit. Find the optimal structure in respect to
speed.
Result: The maximum speed is 9 time units for a structure
with stages 4-4-4-5-6-7-6 (see Weste pp532)
Ex vlsi12.3 (difficulty: easy): A hierarchical carry-
carry-
lookahead adder (see slide 8) is given. Show
algebraically that C3=G03+ P03 Cin corresponds to the
equation C3=G3+P3 G2 +P3 P2 G1 +P3 P2 P1 G0 +P3
P2 P1 P0 Cin (note that Gii= Gi and Pii= Pi)
Ex vlsi12.4 (difficulty: easy, time consuming): Design a
VHDL code for a 32-
32-bit hierarchical carry-
carry-lookahead
adder (see slide 8). If one block has a delay of 1 time
unit, what is the overall delay.
Result: The total delay is 9 time units
Ex vlsi12.5 (difficulty: medium): Consider X1 as a late
arriving input which needs to be speed up. Develop the
circuit for the function: f = X1⋅ X2 + X1⋅ X3+ X4⋅ X5
JMM v1.4
VLSI Systems Design
Design Project: Practical Aspects
I am a VHDL expert.
But how applying
in real live – for my MP3 player!
Overview
applying the “description-
“description-synthesis” design
method in practice
JMM v1.4
Project Goal
Goal:
design of an
an electronic system from specification
down to ASIC/FPGA
Problem:
one of the most difficult tasks in a VLSI project
design is to find the starting design point
Basic Steps:
in order to proceed in a structured manner, you
should perform the following steps
block diagram
HW/SW co-co-design (hardware/software co-
co-design)
IP cores (intellectual property cores)
hardware software co--design
co
JMM v1.4
chip test
Initial System Design Steps
System design steps
1. identify your chip in the overall system
block diagram
7. identify IP cores
8. organize as much as possible IP cores (tools, core
generators, old designs, internet)
IP cores
JMM v1.4
Project MP3 Player: step 1
(block diagram)
Step 1: identify your chip in the overall system
USB
USB LCD
LCD
MP3
MP3Player
Player
ASIC/FPGA
ASIC/FPGA
Keyboard
Keyboard MP3
MP3Decoder
Decoder
Power
Power Flash
FlashMemory
Memory DAC
DAC
JMM v1.4
Project MP3 Player: step 22--4
(block diagram)
Step 2: define the chip IO and group them to
blocks
Step 3: identify functional units of your chip
Step 4: find the interconnections between your
units
MP3 Player ASIC/FPGA
I2C interface
keyboard Decoder
interface interface
I2S interface
JMM v1.4
Project MP3 Player: step 5
(HW/SW Co Co--Design)
Step 5: identify speed and control sensitive tasks
Step 6: define the “intelligence” of each
functional unit
add “intelligence” ?
control sensitive
MP3 Player ASIC/FPGA
I2C interface
add “intelligence”
keyboard Decoder
interface interface
I2S interface
JMM v1.4
Project MP3 Player: step 77--8
(Hardware Design)
Step 7: identify IP cores
Step 8: organize as much as possible IP cores
(tools, core generator, old designs, internet)
I2C interface
Decoder
interface
keyboard
interface
USB core
I2S interface
JMM v1.4
Project MP3 Player: step 99--11
(Hardware Design)
Step 9: update design if necessary according to
available IP cores
Step 10: define inter-
inter-process communication
Step 11: define the interconnection between units
Decoder
interface
“intelligent”
keyboard Port A
DAC
interface interface
Port B
USB core Port C
Port D
JMM v1.4
Hardware/Software Design Steps
Hardware design project steps:
I. imagine your chip working in the target system, identify
and describe its basic functional units in a data-
data-flow view
FSMD architecture model
engineering courses
structured
JMM v1.4
Project MP3 Player: step I
(Hardware design project steps)
Step I: imagine your chip working in the target
system, identify and describe its basic functional
units in a data-
data-flow view
download MP3 song from host to flash
memory (flow 1):
9 generate flash command, generate flash address
9 load byte from USB into register
9 use byte to execute ECC (Hamming code)
9 update flash address
9 store byte into flash
9 write ECC code after 512 bytes
9 generate write-
write-to-
to-flash after 512 bytes
9 use pipeline structure to speed up data transfer
MP3 Player ASIC/FPGA
mainPIC core
LCD
power interface
management control
Decoder
interface
“intelligent”
keyboard Port A
DAC
interface interface
Port B
USB core Port C Port D
“intelligent”
USB
lash “intel.”
intel.” “intel.”
intel.”
interface
interface I2S inter. I2C inter.
JMM v1.4
Project MP3 Player: step II
(hardware design project steps)
Step II: find the RTL structure of each of the
previous data-
data-flow functions and update your
block diagram by allocating your RTL
structure to one or more functional units
download MP3 song from host to flash
memory (flow 1):
count
enable
in out
clk
clk clk
sel mux
USB Flash
interface interface
pads to
flash mem
MicroLab, VLSI-13 (11/24)
JMM v1.4
Project MP3 Player: step III
(hardware design project steps)
Step III: fix in detail the function of your
functional units (local intelligence or data-
data-path
only) and add FSMs if required, fix the detailed
interconnections between your units
“intelligent”
“intelligent” “intelligent”
“intelligent”
Hardware
(IP core) Flash &lash
I2S interface I2C
LCD interface
interface
(FSMD architecture) interface
(FSMD architecture)
JMM v1.4
Project MP3 Player: step IVa
Step IVa:
IVa: design all FSMs,
FSMs, define clock strategy, use
colored data-
data-flow, be careful with the inter-
inter-process
communications
Clock strategy: Rising edge for data-
data-paths, falling edge for IP
cores and FSMs.
FSMs. All handshake signals between FSMDs and IP
cores on falling edge.
Colors:
Colors: make a lot of copies of your RTL data path
Colors:
Colors: for each data-
data-flow step, color the old active data paths
leaving a register blue, the new active data-
data-paths leaving a
register green, and data-
data-paths treated with a combinatorial
function in the corresponding dark color. Active control signals
and its blocks are orange. All other data-
data-signals are red. Red
signals are dominant. Be sure that no red signals enter a FSM,
and no darkend or red signals attack asynchronous set/reset of
FFs.
FFs.
count
enable
in out
clk
request
process 1
aknowledge
process 2
data data valid
JMM v1.4
Project MP3 Player: step V
Step V: VHDL coding of your RTL design
use a processes for data-
data-path manipulation and its
succeeding register
use 2 processes for a FSM:
one process for transition table (VHDL case)
one process for next state (state register)
continuous assignment for output function
count
enable
in out
clk
Process 2
pads to
flash mem
MicroLab, VLSI-13 (15/24)
JMM v1.4
Project MP3 Player: step VI
Step VI: test bench design
the design of a test bench is one of the most time
consuming and important tasks. A test bench will be
re-
re-used several times during the different design
steps as well as for chip test (have a look at vlsi21)
Test Bench
control response
and generation
stimulus and
generation verification
JMM v1.4
Final System Design Steps
Hardware design project steps:
12. system test bench design
simulation
system
JMM v1.4
diagam
Block diagamm of a general System
JMM v1.4
GECKO Design Environment
Design entry:
C-code software
manual RTL hardware
algorithms
All three design entry elements will be converted
to VHDL and thus can be implemented into a SoC
JMM v1.4
SoC Design Methodology
The specify-
specify-explore-
explore-refine design flow is extended
to a specify-
specify-explore-
explore-refine-
refine-prototype-
prototype-analyse
design flow for SoC designs with real-
real-time
constraints
JMM v1.4
SoC with GECKO Environment
Real Time
Software
Signal Processing
Hardware
Microprocessor
IP Core Hardware
IP blocks SoC
Power Analog
Sensor
blocks blocks
JMM v1.4
The GECKO system
JMM v1.4
Hardware--in
Hardware in--the
the--Loop
hardware-
hardware-in-
in-the-
the-loop
hardware-
hardware-in-
in-the-
the-
software-
software-loop
JMM v1.4
Homework: MyProject
MyProject 2002:
2002: speed controlled dc motor
Matlab//Simulink with speed controller
Matlab
GECKO main board with dc-
dc-motor electronics
use hardware-
hardware-in
in--the-
the-simulation-
simulation-loop
Implementation constraints:
microprocessor with C code for „administrative“ tasks
pulse wide modulation for driving dc motor (hardware)
A/B signal encoder for speed sensing (hardware)
driving circuitry (expansion board) as simple as possible
Technical data:
dc motor has 6000 turns/minute at 5V
speed sensor has 12 pulses per turn
JMM v1.4
VLSI Design II
CMOS Processing
Overview
Processing steps
JMM v1.4
Introduction
Complementary MOS (CMOS) technology is
becoming the dominant candidate for VLSI
applications
CMOS provides both n- n-channel and p-
p-channel MOS
transistors on one chip
on extremely expensive fabs cheap chips are
produced
each chip passes hundreds of different processing
steps
random process disturbances cause electrical
parameter variations of the chips
elements are never identical
JMM v1.4
VLSI Circuit Fabrication
oxidize silicon to form deposit thin layers of material
thin and thick layers of and etch into desired pattern
SiO2 to serve as
insulators.
n+ n+
JMM v1.4
Overview
Overview of Processing Step Sequence
n-well
active
poly
Overview of Processing Steps n-diffusion
making the wafers
photolithography p-diffusion
oxidation
contacts
layer deposition
etching metal1
diffusion
via1
implantation
metal2
passivation
JMM v1.4
Processing Steps:
Making the wafers
JMM v1.4
Processing Steps:
Photolithography #1
JMM v1.4
Processing Step:
Photolithography #2
JMM v1.4
Processing Steps:
Oxidation #1
Thermal oxidation is a process in which silicon (Si
(Si)
Si)
reacts with oxygen to form a continuous layer of
high-
high-quality silicon dioxide (SiO2)
oxidation of the silicon surface
oxidation through a window in the oxide
selective oxide growth
oxidation of the silicon surface
JMM v1.4
Processing Steps:
Oxidation #2
oxidation through
a window
selective
oxide growth
birds bike
MicroLab, VLSI-14 (9/32)
JMM v1.4
Processing Steps:
Layer Deposition - General
Thin layers of both conduction substances and
insulation materials constitute an important part of
any semiconductor device.
epitaxy (single crystal deposition)
PVD and CVD process (polycrystalline deposition)
JMM v1.4
Processing Steps:
Vapour Deposition
PVD
CVD
JMM v1.4
Processing Steps: Etching
wet etching
dry etching
JMM v1.4
Processing Steps:
Diffusion
Solid state diffusion is a process which allows
atoms to move within a solid at elevated
temperatures.
JMM v1.4
Processing Steps:
Implantation
The alternative to the diffusion technique of dopant
introduction used in IC manufacturing is ion
implantation.
JMM v1.4
Drive--in
N-Well Implant & Drive
In p substrate only n-
n-channel fets can be processed.
Therefore an n-
n-well has to be implanted in order to hold
the p-
p-channel fets.
fets.
JMM v1.4
Channel--stop Implant
Channel
JMM v1.4
Grow Field Oxide
JMM v1.4
Grow Thin Oxide
Now grow a “thin” (0.01um = 100 Angstroms) layer of
silicon dioxide, called gate oxide, on the surface by
exposing the wafer to dry oxygen.
JMM v1.4
Deposit & Etch Polysilicon
JMM v1.4
Implant Nfet Drain & Source
JMM v1.4
Effective Nfet Dimensions
JMM v1.4
Parasitic Fets
JMM v1.4
Implant Pfet Drain & Source
JMM v1.4
Deposit SiO2 insulator
JMM v1.4
Etch contact cuts
JMM v1.4
Deposit & Etch Metal1
JMM v1.4
Voila: a CMOS Inverter!
JMM v1.4
Planarize
JMM v1.4
Deposit & Etch Metal2
JMM v1.4
Double--level Metal CMOS
N-well, Double
Process Steps
1. Grow barrier oxide 23. Deposit SiO2 using CVD
2. Mask/Etch
Mask/Etch n-n-well window 24. Mask/Etch
Mask/Etch contacts
3. P n-well implant through SiO2
4. Thermal drive-
drive-in to deepen n-
n-well 25. Deposit first Al using PVD
5. Remove barrier oxide 26. Mask/Etch
Mask/Etch leaving metal1
6. Grow “pad” oxide wires
7. Deposit Si3N4 27. Grow thick layer of SiO2
8. Mask/Etch
Mask/Etch leaving active region 28. Spin on thick, flat layer of
9. B channel-
channel-stop implant photoresist
10. Grow field oxide (more drive-
drive-in!) 29. Etch SiO2 and photoresist
11. Remove Si3N4 at same rate until only flat
12. Remove pad oxide SiO2 remains
13. B or P implant to adjust VTH Mask/Etch vias through SiO2
30. Mask/Etch
14. Grow thin (gate) oxide 31. Deposit second using PVD
15. Deposit P-doped polysilicon 32. Mask/Etch
Mask/Etch leaving metal2
16. Mask/Etch
Mask/Etch leaving poly wires wires
17. Etch exposed thin oxide 33. Deposit overglass to
18. Mask off p-p-diffusion regions passivate circuit
19. Sb or As nfet source/drain 34. Mask/Etch
Mask/Etch pad windows
implant, n-
n-well contact too
20. Mask all but p-
p-diffusion regions
21. B pfet source/drain implant
22. Thermal source/drain annealing
JMM v1.4
Coming Up...
Next time:
Mask layout: design rules, layout examples,
structured and symbolic layout techniques,
retargetable layouts. CAD tools for layout: design
capture, design rule checking, extraction, network
comparison.
JMM v1.4
VLSI--14
Exercises: VLSI
JMM v1.4
VLSI Design II
CMOS Layout
Overview
CMOS Layout and Design Rules
JMM v1.4
Sources of Error
Line registration errors
resist exposure and development
over/under etching, lateral diffusion
uneven topography
Ö systematic errors corrected by bloating/
shrinking mask
Ö random errors increase minimum widths
and spacing
Mask misalignment
Ö random errors increase extensions and
surrounds
Other fab difficulties
Ö contacts and vias only on “flat” surfaces
Ö no devices near boundaries of well
Ö no poly contacts over diffusion
Ö “gate” metal must connect to diffusion
Ö minimum metal coverage requirements
Electrical properties
Ö current density limitations
Ö latch-
latch-up prevention
Process instabilities
mobility variations (why?)
thin-
thin-oxide thickness variations
sheet resistances
Ö use of “process corners” in analysis
JMM v1.4
Design vs. Actual IC
JMM v1.4
Line Registration Errors
JMM v1.4
Mask Alignment Errors (I)
JMM v1.4
Mask Alignment Errors (II)
Maly,
Maly, Figure 2-9
JMM v1.4
Design Rules
Exclusion rule extension rules
enclosure rules (overlapping)
width rules
spacing rules
We can specify the design rules using some convenient
units, e.g., microns but what happens if we want to
manufacture the chip using different manufacturers?
One suggestion: use an abstract unit, the lambda, and scale
the design to the appropriate actual dimensions when the
chip is to be manufactured.
Usually all edges must be “on grid”, e.g., in the MOSIS
scalable rules, all edges must be on a half lambda grid, on
0.5µm Alcatel all edges must be on 0.05µ
the 0.5µ 0.05µm grid.
MicroLab, VLSI-15 (7/36)
JMM v1.4
Lambda--based Rules
Lambda
One lambda (λ(λ)= one half of the “minimum” mask
dimension, typically the length of a transistor channel.
Under the assumption that the worst case alignment is
0.75λ, the maximum relative misalignment
better than 0.75λ
1.5λ. This can be
between any two masks is better than 1.5λ
used to derive design rules and to estimate minimum
dimensions of a junction area and perimeter before a
transistor has to be laid out.
3λx3λ
x3λ
4λ
3λ
2λ
4λ 2λ
1λ
3λ 2λ
diffusion (active)
3λ
poly
1λ 2λ
metal1
contact 6λ
1λ
0.5µ
For 0.5µm Alcatel process:
0.25µ
λ= 0.25 µm 5λ
MicroLab, VLSI-15 (8/36)
JMM v1.4
Lambda vs. Micron Rules
Lambda-
Lambda-based design rules are based on the assumption
that one can scale a design to the appropriate size before
manufacture. The assumption is that all manufacturing
dimensions scale equally,
equally, an assumption that “works” only
over some modest span of time. For example: if a design
2λ and a metal width of
is completed with a poly width of 2λ
3λ then minimum width metal wires will always be 50%
wider than minimum width of poly wires.
0.5µm process
Consider the following data from Alcatel 0.5µ
(compare with Weste, Table 3.2 pp145):
lambda lambda micron
contacted metal pitch rule = 0.25u rule
1/2 * contact size 1.5λ 0.375µ 0.3µ
contact surround 1λ 0.25µ 0.25µ
metal-
metal-to-
to-metal spacing 4λ 1.0µ 0.8µ
contact surround 1λ 0.25µ 0.25µ
1/2 * contact size 1.5λ 0.375µ 0.3µ
9λ 2.25µ 1.9µ
+40% in area
Scaled design is legal
but much larger than
it needs to be!
JMM v1.4
Retargetable Layouts?
JMM v1.4
0.5µ
0.5µm CMOS Alcatel Mietec Process
C05M--D
Layers and mask definition: C05M
layer name drawn mask name
active yes active
nwell yes n-well
pwell no (p-
(p-well)
poly yes poly
nplus no (n+ implant)
pplus yes p+ implant
contact yes contact
metal_1 yes metal 1
via_1 yes via 1
metal_2 yes metal 2
via_2 yes via 2
metal_3 yes metal 3
nitride yes nitride
dractext yes -
nldd no (no low doped drain, Zener) Zener)
nlddprot yes -
JMM v1.4
nplusprot yes - MicroLab, VLSI-15 (11/36)
C05M--D: some logical descriptions
C05M
logical name used masks
nwell = nwell
pwell = nwell
n+diffusion = active and pplus and poly
p+diffusion = active and pplus and poly
n+source/drain = active and pplus and poly and nwell
p+source/drain = active and pplus and poly and nwell
gate = active and poly
locical masks
pfet
nwell nwell
n+diffusion active
poly
JMM v1.4
(C05M--D)
Layout Rules (C05M #1
n-well, active
1.7µm
1.7µ
n strap
0.8µm
0.8µ
0.8µm
0.8µ
n-well
0.5µm
0.5µ 0.5µm
0.5µ on same
0.6µm 2µm (3µ
0.6µ (3µm) (different)
1µm potential
n strap
1.1µm
1.1µ
0.7µm
0.7µ 1µm 2.4µm
2.4µ
1.1µm
1.1µ
p strap
0.6µm
0.6µ
1µm
JMM v1.4
(C05M--D)
Layout Rules (C05M #2
poly, fets
0.5µm
0.5µ
0.6µm
0.6µ 0.6µm
0.6µ
0.6µm
0.6µ
1.1µm
1.1µ
0.7µm
0.7µ
1.1µm
1.1µ
0.35µm
0.35µ
0.6µm
0.6µ
JMM v1.4
(C05M--D)
Layout Rules (C05M #3
abutting straps
abutting
strap
1.6µm
1.6µ
abutting
strap
0.8µm
0.8µ
0.8µm
0.8µ 1.15µm
1.15µ 0.6µm
0.6µ
1.1µm
1.1µ
1.1µm
1.1µ
0.6µm
0.6µ
1µm
0.8µm
0.8µ 1.15µm
1.15µ
abutting 1.15µm
1.15µ
0.8µm
0.8µ
strap
JMM v1.4
(C05M--D)
Layout Rules (C05M #4
contact
via1
metal, contacts, via1, via2 via2
0.7µm
0.7µ 0.9µm
0.9µ 1.1µm
1.1µ
0.8µm
0.8µ 0.9µm
0.9µ 1.1µm
1.1µ
0.25µm
0.25µ
0.2µm
0.2µ 0.7µm
0.7µ
0.25µm
0.25µ 0.2µm
0.2µ
0.8µm
0.8µ 0.9µm
0.9µ 1µm
0.6µm
0.6µ via2
0.8µm
0.8µ
via1
via1 need to be 0.5µm 0.25µ
0.5µ 0.25µm
covered by metal2
contact
0.35µm
0.35µ
contacts need to be
0.6µm
0.6µ covered by metal1
0.25µm
0.25µ
0.8µm
0.8µ
MicroLab, VLSI-15 (16/36)
JMM v1.4
Sticks and Compaction
JMM v1.4
Digital Layout: Choosing a “style”
What about routing signals between gates? Note that both layouts
layouts block
metal/poly routing inside the cell. Choices: metal2 routing over
over the cell or
routing above/below the cell.
avoid long (> 50 squares) poly runs
don’t “capture” white space in a cell
don’t obsess over the layout, instead make a
second pass, optimizing where it counts
MicroLab, VLSI-15 (18/36)
JMM v1.4
Digital Layout:
Optimising Connections
considering “composibility
“composibility”
composibility” with
neighbouring gates?
JMM v1.4
Digital Layout: Big vs. Parallel
can’t make gates too
long because of poly
resistance! Eventually
really large transistors
have to broken into
smaller transistors in
wired in parallel.
94µm2
area = 94µ 73µm2
area = 73µ
JMM v1.4
Digital Layout: Eliminating Gaps
A B C D E A
B C
D E
B D
A
C E
C B A E D
B C
D E
B D
A
C E
JMM v1.4
Analog Layout: Large Transistors
node 1
J1 Q1 J2 Q2 J3 Q3 J4 Q4 J5
node 2 gates
node 1
Q1 Q2 Q3 Q4
node 2
MicroLab, VLSI-15 (22/36)
JMM v1.4
Analog Layout: Matching
Using lithography techniques a variety of two-
two-
dimensional effects can cause effective sizes of
components to differ from the sizes of the glass
layout masks.
lateral diffusion
overetching
SiO2 protection
SiO2 protection
JMM v1.4
Matching Transistor Layouts:
Common--Centroid Layout
Common
use interdigitated G
M2
finger structures M2
for keeping the
effect of temp
SM1,M2 M1
and oxide
thickness M1
gradients low
use one outside M2
finger for M1, one
for M2 M2
symmetry in x & y
M1
fets in analog
circuitry are M1
typically much
wider than in M2
digital circuits
SM1,M2 M2
GM1 GM1
M1
DM1 DM2
DM1 GM1 DM2
MicroLab, VLSI-15 (24/36)
JMM v1.4
Capacitor Matching #1
material
preferable poly1 - poly2 structures (only C05M-
C05M-A)
if not available: poly1 - diffusion (C05M-
(C05M-D), but
nonlinear due to voltage dependency
sandwich structures with poly - metal1
JMM v1.4
Capacitor Matching #2
x
xa = x − 2∆e
y x − 2∆e ya = y − 2∆e
y − 2 ∆e
∆e
ε ox
C= A = Cox xy
∆e Ca tox
∆C t − 2∆e ( x + y )
ε = = C1 C2
C xy
C 2 a C 2 (1 + ε 2 ) C2 C1
=
C1a C1 (1 + ε 1 )
ideally ε1 = ε 2
JMM v1.4
Capacitor Matching #3
P2 P1
=
A2 A1
P2 A2
= =K
P1 A1
4 units
x2 + y 2
K=
2x1
(
y2 = x1 K ± K 2 − K ) K=1 ... 2
JMM v1.4
Analog Layout: Resistor #1
L ρ
resistor value: R = Rsq Rsq =
W t
JMM v1.4
Analog Layout: Resistor #2
Examples of possible resistor layout
0.14 Rsq
2.11 Rsq
matched resistors
JMM v1.4
Analog Layout:
Noise Considerations #1
Where does noise coupling occur
every time a digital gate changes its state a glitch
is injected on the digital power supply and in the
surrounding substrate
direct ohmic connections (power supply line)
analog part digital part analog part digital part analog part digital part
pad pad
pad pad pad
JMM v1.4
Analog Layout:
Noise Considerations #2
Use of shields
analog interconnect digital interconnect
ground shield n+ n+ n+
n-well
p- substrate
p+ n+ p+
n-well
analog region digital region
depletion region
p- substrate as bypass capacitor
JMM v1.4
Summary of Analog Layout Rules
JMM v1.4
Checking Layouts
Design Rule Checker (DRC). This is a program that checks each
piece of the layout against the process design rules. This is a
slow process:
canonicalize layout into a set of leading and
trailing non-
non-overlapping mask edges. Some Boolean mask
operations may be needed. determine electrical connectivity
and label each edge with the node it belongs to.
test each edge end point against neighboring
edges to check for spacing (leading edges) and width
(trailing edges) violations.
Layout vs. Schematic (LVS). First a netlist is extracted from the
layout. Use the electrical info generated by the DRC and then
recognize transistors are juxtapositions of channel with
diffusion. Then see if extracted netlist is isomorphic to the
schematic netlist. This is done by a coloring algorithm:
initialize all nodes to the same color
JMM v1.4
Coming Up...
Next topic:
Small signal fet model
Optional
have a look at Alcatel CMOS C05M-
C05M-D design rules
manual
JMM v1.4
VLSI--15
Exercises: VLSI #1
0.5µm
Ex vlsi15.1 (difficulty: easy): Assume the 0.5µ
Alcatel Mietec process. Use the λ rules to
calculate the minimal area and perimeter of the
following layout structure.
=4.5µm2, AJ2=3.188µ
Result: a) AJ1=4.5µ =3.188µm2,
=2.25µm2, PJ1=6µ
AJ3=2.25µ =6µm, PJ2=6µ=6µm,
=1.5µm (see Johns&Martin pp99)
PJ3=1.5µ
J2
J1 J3
Q1 Q2
JMM v1.4
VLSI--15
Exercises: VLSI #2
2.314 units
4 units
JMM v1.4
Intro to VLSI Systems
CMOS Layout (replicating)
Today’s handouts:
(1) Lecture Slides
(2) Problem Set #5
(3) Inverter Layout Tutorial
JMM/ESA v1.0
Design for Re-use
JMM/ESA v1.0
Replicating Cells
JMM/ESA v1.0
Vertical Replication
JMM/ESA v1.0
Vertical Intercell Routing
carry-out to
cell above
S’pose we have a signal
that will run vertically from
one cell to the next, e.g., the
carry-out from one cell becomes
the carry-in for the cell above.
JMM/ESA v1.0
Building a Datapath
It’s often the case that we want to operate on many bits in parallel. A
sensible way to arrange the layout of this sort of logic is as a datapath
where data signals run horizontally between functional units and
control signals run vertically to all the bits of a particular functional
unit:
control
bit #3
bit #2
bit #1
bit #0 data
Logic that generates the control signals can be placed at the bottom of
the datapath. If control logic is complicated or irregular, it might be
placed in a separate standard cell block and only the control signal
buffers placed placed just below the datapath. Although it’s tempting
to run control signals in poly (so they can control fets) this is unwise
for tall datapaths because of poly resistance (e.g., 32 bits x 20u/bit
= 640u = ~1000 squares = ~20k ohms!)
JMM/ESA v1.0
Datapath Bit Pitch
How tall should we make each bit of the datapath?
That depends on
w the width of the nfets and pfets
w how much in-cell routing there is
w how much over-the-cell global routing there is
Global routes can be determined from datapath schematic:
RESULT
OP1
OP2
SHIFTER
ADDER
BOOLE
MULT
OP EN OP EN EN CIN EN
gnd (m2)
JMM/ESA v1.0
Adder Datapath
JMM/ESA v1.0
Shifter Datapath
JMM/ESA v1.0
Design for Re-use
JMM/ESA v1.0
Breaking the Rules
BIT BIT
word line
JMM/ESA v1.0
Coming Up...
Next time:
Scaling effects, fundamental limits. Submicron
design issues. Power dissipation and packaging.
Readings for next time…
Weste: 6.3.7 through 6.3.9
JMM/ESA v1.0
Intro to VLSI Systems
Predicting the Future
Today’s handouts:
(1) Lecture Slides
(2) Mead and Conway, Chapter 9
(1981)
JMM/ESA v1.0
Scaling
w/α tox/α
l/α
t/α
xj/α
α NA
What happens?
JMM/ESA v1.0
Often, different dimensions will scale at different
rates. But for an overall picture of what the future
portends, there are two major scaling models:
JMM/ESA v1.0
First, let’s consider constant field scaling, and use
basic MOSFET models to predict the effect of
scaling by α
Parameters Effect
W/L
Cg = Cox W L
Id Cox (W/L) (Vgs-Vt) 2
device power = V I
Area = W L
device power / Area
Rdiff
Rmetal
Rpoly
JMM/ESA v1.0
Speedup!
L
e-
τ = L/(µE)
Transit time τ scales as ___________
delay=Cg V/I
JMM/ESA v1.0
Interconnect
R = ___________
Scaled R = R ________
C = ____________
Scaled C = C ________
JMM/ESA v1.0
Scaling Table
Resulting Influence
DC Power dissipation 1/a^2 a a
Dynamic Power Dissipation 1/a^2 a a
Power-delay product 1/a^3 1/a 1/a
Gate Area 1/a^2 1/a^2 1/a
Power-density (VI/A) 1 a^3 a^2
Current Density a a^3 a^2
JMM/ESA v1.0
Die Size
With basic scaling of the same system, we’d just end
up with smaller and smaller chips.
JMM/ESA v1.0
Global Interconnect Scaling
W/α t/α
d
scaled R = R * α 2
scaled C = C
scaled delay = delay * α 2
JMM/ESA v1.0
Power Scaling
θja=2 C/W
Junction temp = ___________
heat sink
JMM/ESA v1.0
In the submicron domain, it’s difficult to scale VDD,
so power faces the “constant voltage” scaling of α2
3. Adiabatic logic.
JMM/ESA v1.0
Problems with scaling theory
JMM/ESA v1.0
Some limits
Xd
JMM/ESA v1.0
Subthreshold leakage
JMM/ESA v1.0
Threshold Variations
VDD
Vout
JMM/ESA v1.0
Threshold Variations, cont.
JMM/ESA v1.0
Lithographic Scaling Limits
Ultraviolet = λ = 0.3 µ
X-Ray Lithography, λ = __________
Synchrotron lithography?
Wavelength of an electron?
Cost of FABs.
Optical tricks.
JMM/ESA v1.0
Fundamental Physical Limits
Thermodynamic
How much entropy change to set a bit?
Reversability
Quantum Limits
Tunnelling For Eb of 1eV, the gate oxides and
depletion layers must be thicker than 1 nm. In the
IBM 0.4um process, the gate thickness is 7
nm.
Thermal Limits
JMM/ESA v1.0
Is this the beginning of the end?
Not really.
JMM/ESA v1.0
Coming Up...
Next topic…
MOS memories. Static and dynamic RAM cells.
Single and double-ended bit line sensing.
Multiport register files.
Readings for next time…
Weste: 4.13
JMM/ESA v1.0
Intro to VLSI Systems
CMOS Memories
Today’s handouts:
(1) Lecture Slides
JMM/ESA v1.0
Semiconductor Memories
Usually the majority of transistors found in a modern system are devoted
to data storage in the form of random-access memories. The need for
increased densities and lower prices has driven the development of
improved VLSI technology.
Uses:
“main” memory ⇒ high capacity, low cost
cache memories, TLB’s ⇒ fast access
programming info (eg, FPGA) ⇒ non-volatile
JMM/ESA v1.0
Design Tradeoffs
density: bits/unit area. Usually higher density
also means lower cost per bit. Improvements due
to finer lithography, better capacitor structures,
new materials with higher dielectric constants.
JMM/ESA v1.0
Memory Architecture
bit lines word lines
Col. Col. Col. Col.
1 2 3 2M
Row 1
N Row 2
Row Address Decoder
Row 2N
memory
cell
M (one bit)
N+M Column Decoder
DATA
JMM/ESA v1.0
ROM Circuits
NOR-based
ROM array
shared
ground
R1
R2
R3
R4
shared
bit line C1 C2 C3 C4
contact
R1 R2 R3 R4 C1 C2 C3 C4
1 0 0 0 0 1 0 1
0 1 0 0 0 0 1 1
0 0 1 0 1 0 0 1
0 0 0 1 0 1 1 0
JMM/ESA v1.0
ROM Layout
VDD
GND
shared shared
contact ground
no
pulldown
pulldown
ground and
word line
refresh
JMM/ESA v1.0
ROM Performance
tACCESS = tROW DECODE + tCOLUMN + tCOL DECODE
tROW DECODE :
If ROM is large, row decode logic is just a small percentage of
total area. So we can make the driver for the word line large and
thus fast. Note that we need to strap the poly word line to
eliminate slow down due to poly resistance.
tCOL DECODE:
As with the row decode logic, we can increase speed by
increasing size of transistors in this section.
t COLUMN:
We want small program transistors to keep the total area of
ROM as small as possible. Also increasing size of pulldowns
increases load on both word and bit lines. This means we’re
limited in the speed we can achieve in pulling down the column.
If CPD,DRAIN = 10fF and we have 128 rows:
tCOLUMN = C ∆V / I AV
= (10fF)(128)(2.5V)/(30uA)
= 110ns
JMM/ESA v1.0
Sense Amplifiers
Let’s speed things up by sensing small changes in the bit line voltage
using a sense amplifier:
R1
R2
C1
column C1
(tree)
decoder
C0
C0
tenths of a volt
amplified to full SENSE AMP
rail-to-rail swing
JMM/ESA v1.0
Single-ended Sense Amp
Choose fet sizes so that
M2, MD >> MC >> M1
M3 >> M4
voltage M1
reference 1
(fets sized M3
to produce
VREF = 3V) M2
M4
2
series fets in
column decoder MD bit line
(pullup built into
sense amp)
MC
When bit line is not pulled down, V1 = VDD and V2 = VREF - Vth = 2V, so M3
is off and M4 is on and the output is pulled low.
JMM/ESA v1.0
SRAM Circuits
precharge or VDD
static
bistable 6-T SRAM Cell
storage access fet
element
word line
Differential Sense Amp
rdata
bit bit
tie bulk to
source if
possible long-channel
precharge fet used as
or VDD
current source
clocked
cross-coupled Use CLK if
sense amp possible to
clk reduce power
and improve
write speed
wdata
JMM/ESA v1.0
6-T SRAM Cell Layout
VDD
inverter
pullup
inverter
pulldown
GND
JMM/ESA v1.0
SRAM Read Cycle
VDD
VDD
6-T SRAM Cell bit
word
1
word data
JMM/ESA v1.0
Differential Sense Amp
rdata
4.8/0.6 4.8/0.6
bit 2 V2 1 V1 bit
4.8/0.6 4.8/0.6
3 long-channel
VDD 0.9/7.2
fet used as
current “source”
VCS
JMM/ESA v1.0
Fast Address Decoding
A2 A1 A0
JMM/ESA v1.0
Multiport SRAM (Reg File)
One can increase the number of SRAM ports by
adding access transistors. Writes are usually
double-ended; single-ended reads can be used
to save space.
write
read0
read1
rd0 wd wd rd1
An alternative design that can be easily expanded
without worrying about unintentionally flipping the
cell on reads is shown below.
wd rd0 rd1
PU = 2/1 2/1
PD = 4/1
4/1 2/1
5/1
PU = 2/2
PD = 2/3
write
read0
read1
MicroLab, VLSI-18 (15/21)
JMM/ESA v1.0
Content-addressable RAM
By adding two transistors to the 6-T SRAM cell one can form an
XOR gate to compare the cell contents to data on the bit lines.
The output of this logic can drive a pulldown in a distributed NOR
gate to form a word “match” signal for a content-addressable
memory (CAM).
word
xor gate
match
JMM/ESA v1.0
CAM Architecture
JMM/ESA v1.0
3-T Dynamic RAM
precharge
Precharge happens
before each r/w cycle.
3-T DRAM Cell READ/WRITE and
PRECHARGE dont’
read overlap.
CW CR
CC
Data is stored on
write CC. It’s not destroyed
on read, but will leak
away through write
wdata transistor. CW >> CC
rdata
WRITE: READ:
After precharge, CW is charged high. After precharge, CR is charged high.
When WRITE is asserted CW shares When READ is asserted CR is pulled
charge low if there’s a stored “1” or remains
with CC and dominates since unchanged if there’s a stored “0”. A
CW >> CC. If WDATA is sense amp
asserted, both CW and CR is usually used to speed up
will be discharged, writing a the availability of read data.
“0” into the cell; otherwise
a “1” will be written.
JMM/ESA v1.0
1-T Dynamic Ram
1-T DRAM Cell
Explicit storage
capacitor (fet gate, word
trench, stack) = 30fF
to 100fF. If we
want higher C:
access fet
better dielectric VREF
more area
εA
C= d bit
thinner film
poly W bottom
word electrode
line
access fet “Stack” DRAM Cell
JMM/ESA v1.0
1-T DRAM Read Cycle
DSL PC DSR
lbit rbit
R2 R1 R 129 R 130
C C C/2 C/2 C C
VDD CS VDD
lbit, rbit
precharge (PC)
JMM/ESA v1.0
Coming Up...
Next time:
Driving large loads:
I/O circuits (edge rates, ESD protection, latch up)
Clock generation and distribution (skew)
Readings for next time…
Weste: 5.4.2, 5.5, 5.6
JMM/ESA v1.0
VLSI Design I
Defect Mechanisms and Fault Models
Overview
Defects
Fault models
Goal: You know the difference between design and
fabrication defects. You know sources of defects
and you can estimate yield. You can handle fault
models at different abstraction levels.
MicroLab, VLSI-19 (1/32)
JMM v1.4
Design Defects
?
Design Specification
JMM v1.4
Manufacturing Defects
Goal: verify every gate is operating as expected
Defects from misalignment, dust and other particles, “stacking”
faults, pinholes in dielectrics, mask scratches & dirt, thickness
thickness
variations ⇒ layer-
layer-to-
to-layer shorts, discontinuous wires (“opens”),
circuit sensitivities (VTH, LCHANNEL).
Find during wafer probe.
JMM v1.4
Production defects in CMOS circuits
JMM v1.4
VLSI fabrication process
layout wafer
controlling tolerances
for futher
processing
process monitor
steps steps
wafer
not futher
disturbances processed
environment
changing
JMM v1.4
VLSI fabrication process (con‘t)
parameter
measuring
layout
of test-chips
parameter and
wafer bonding function test
fabrication packaging of packaged
chips
measuring parameter
of process and function
parameters test of chips
on wafer
JMM v1.4
VLSI fabrication process (con‘t)
parameter test
test of electrical parameters: current consumption,
quiescent currents, voltage levels, delay times, etc.
function test
test for logical faults: binary test sequences are applied
to the device under test (DUT)
JMM v1.4
Defect classification
JMM v1.4
Defects at wafer fabrication
50% of all defects
reason:
changes in fabrication environment
substrate inhomogenities,
inhomogenities, mask misalignment
dust particles, photolithography defects
local or global effects
electrical effects depend on layout topology
changes in delay, current consumption
shorts, opens
JMM v1.4
Defect at chip packaging
reasons:
bonding problems
mechanical stress
effect:
normally occur at primary inputs or outputs
easy to detect
JMM v1.4
Defects during lifetime
defect rate
early defects
wear defects
middle life phase
time
JMM v1.4
Yield modeling
k
with n to infinity and p to zero (np = λ ) we find
with
λk −λ
Pr{K = k} = e
k!
MicroLab, VLSI-19 (12/32)
JMM v1.4
Yield modeling (con‘t)
∞
expectation value E {K } = ∑ ke −λ = λ
k =0
− AD0 2
1 − e
Y2 = f2
AD0 f3
1/(2 D0)
Y = e − AD0
0 2D0
(for low yield)
D0
(for
MicroLab, VLSI-19 (13/32)
JMM v1.4
Yield modeling (con‘t)
JMM v1.4
VLSI fabrication process: conclusion
JMM v1.4
Fault models for integrated circuits
JMM v1.4
Fault models: Testing approaches
Plan: supply a set of test vectors that specify an input or output
value for every pin on every cycle. Tester will load the program
program
into the pin cards, run it and report any discrepancies between an
observed output value and the expected value.
2n inputs required to
exhaustively test circuit
JMM v1.4
Fault models: abstraction level
JMM v1.4
Fault models (con‘t)
fault dependencies
faults are layout dependent
fault are technology dependent
JMM v1.4
Hard to detect faults
JMM v1.4
Logic level fault models
historical perspective
Eldred proposed 1959 methods how to test
computers with relays, diodes, tubes, which
behaved like switches
stimulation of development of fault models on logic
Östimulation
level
stuck-
stuck-at fault model
signal can be stuck at "0" or "1"
independent of process technology
does not model technology dependant
characteristics
mathematical calculus exists
very useful for TTL technology (or other old
"current" technologies, but not for "charge"
technologies like CMOS)
JMM v1.4
Logic level fault models (con‘t)
stuck at “0” = S-
S-A-0 = node@0
stuck at “1” = S-
S-A-1 = node@1
A Z = ABCD
B X ZB@1 = ACD
C
D ZB@0 = 0
R1 R2 R4
T4
I1
T1 T2
O
I2
R3 T3
JMM v1.4
Fault reduction
fault collapsing
fault equivalence
fault dominance
single faults, multiple faults
fault detection
fault free function: f(x))
with fault α: fα(x))
test vectors x detect fault, if condition is fulfilled:
f ( x ) ⊕ fα ( x ) = 1
fault equivalence
A
f β ( x ) = fα ( x ) C
fault dominance B
Tβ ⊂ Tγ
fault β dominates γ A
0
B
0
fault classes
α/1 <=> β/1 <=> γ/1
0 1 β/0 => γ/0
1 0 α/0 => γ/0 α/1 A stuck-at-1
1 1 γ/0 <=> equivalence
=> dominance
JMM v1.4
Logic level fault models
fault dominance
Tα represents test vector set to detect fault α
fault α dominates fault γ under condition
Tα ⊂ Tγ
for test generation only tests for fault α are
necessary
multiple faults: fault masking problems
multiple
JMM v1.4
Transistor level fault models
JMM v1.4
Transistor level fault models (con‘t)
A
fault free stuck-
stuck-at stuck-
stuck-open
B
vddsop A B Y α/0 β/0 γ/0 a b vdd
bsop
asop Y 0 0 1
0 1 0
1 0 0
1 1 0
JMM v1.4
Functional level fault models
JMM v1.4
Functional level fault models:
example
S0 S1 S2
A0
A1
A2 Y
A3 88toto1 1MUX
MUX
A4
A5
A6
A7
JMM v1.4
Fault models summary
JMM v1.4
Coming Up...
Next topic…
Test pattern generation and fault simulation
Test
JMM v1.4
VLSI--19 #1
Exercises: VLSI
A B
F = (A+C)(B+D)
A C FX=OPEN = __________
X
B D
JMM v1.4
VLSI--19
Exercises: VLSI #2
JMM v1.4
VLSI Design I
Test Pattern Generation and Fault Simulation
Overview
Test pattern generation
Fault simulation
Goal: Design for testability terms like
controllability and observability are known. You are
familiar with test pattern algorithms as well as
with testability measure metrics.
MicroLab, VLSI-20 (1/26)
JMM v1.4
Testers
The device under test
(DUT) can be a site on
a wafer or a packaged
part.
100’s
pin
circuitry
tCYCLE
non-
non-return-
return-to-
to-zero (NRZ) data
return-
return-to-
to-zero (RTZ) data
return-
return-to-
to-one (RTO) data
surround-
surround-by-
by-complement (SBC) ~data data ~data
JMM v1.4
Test pattern generation
JMM v1.4
Algorithms for test pattern
generation
JMM v1.4
Boolean difference
algebraic method: boolean difference
circuits function with input vector x
f ( x ) = f ( x1 ... x n )
for ith component of vector x with fix value we define
f i (1) = f ( x1 ,..., x i −1 ,1, x i +1 ..., xn )
f i (0 ) = f ( x1 ,..., xi −1 ,0, x i +1 ..., x n )
definition of boolean difference
∂f ( x )
= f ( x1 ,..., x i ,..., x n ) ⊕ f ( x1 ,..., xi ,..., xn )
∂x i
∂f ( x )
= f i (0 ) ⊕ f i (1)
∂xi
circuit with fault α: stuck-
stuck-at-
at-1 at input xi
fα ( x ) = f ( x1 ,..., x i −1 ,1, x i +1 ..., xn ) = fα (1)
s-a-1 faults the two functions f(x)) and fα(1)
to detect s-
must produce different results, so the test vector set is
defined by T=1 ∂f
T = f ( x ) ⊕ fα ( x ) = x i
∂x i
∂f
s-a-0 faults: T = f ( x ) ⊕ fα ( x ) = xi
for s-
∂xi
MicroLab, VLSI-20 (5/26)
JMM v1.4
Boolean difference: Rules
∂f (x ) ∂f (x )
∂ f (x ) ∂f (x ) =
= ∂ xi ∂xi
∂xi ∂xi
∂ ∂f (x ) ∂ ∂f (x )
⋅ = ⋅
xi ∂x j x j ∂xi
∂[ f (x )g (x )] ∂g (x ) ∂f (x ) ∂f (x ) ∂g (x )
= f (x ) ⊕ g (x ) ⊕
∂xi ∂xi ∂xi ∂xi ∂xi
∂[ f (x ) + g (x )] ∂g (x ) ∂f (x ) ∂f (x ) ∂g (x )
= f (x ) ⊕ g (x ) ⊕
∂xi ∂xi ∂xi ∂xi ∂xi
∂[ f ( x )g ( x )] f (x )
= g(x )
∂x i ∂x i g ( x ) independent of xi
∂[ f ( x ) + g ( x )] f (x )
= g(x )
∂x i ∂x i
∂[ f (x ) + g (x )] ∂[ f (x )⋅ g (x )]
=
∂xi ∂xi
∂[ f (x ) ⊕ g (x )] ∂f (x ) g (x )
= ⊕
∂xi ∂xi ∂xi
MicroLab, VLSI-20 (6/26)
JMM v1.4
Boolean difference: example
G1
x1
s1 G3
x2 ≥1
s2
& G5
x3
y
G4
≥1
G2
x4 &
s3
JMM v1.4
Test generation: DD--algorithm
Basics:
Basics:
fault sensitisation (provoke error)
fault propagation
line justification
D-notation
a signal with value D is fault free if D = 1
a signal with value D is faulty if D=0
a signal with value D is fault free if D = 0
a signal with value D is faulty if D=1
JMM v1.4
Test generation: Path sensitisation
Example Ex20.2 (easy): circuit with stuck-
stuck-at-
at-1 fault
at x3. Find test vectors with means of D-
D-algorithm
sensitization:
fault propagation:
line justification
Step 1: Sensitize circuit.
Step circuit. Find input values that
produce a value on the faulty node that’s different
from the value forced by the fault. For our S-
S-A-1
fault above, want output of AND gate to be 0.
Is this always possible? What would it mean if no such
input values exist?
Is the set of sensitizing input values unique? If not,
which should one choose?
What’s left to do?
G1
x1
s1 G3
x2 ≥1
s2
& G5
x3
y
G4
G2 ≥1
x4
&
Xs3
S-A-1
JMM v1.4
Test generation: Fault propagation
sensitization:
fault propagation:
line justification
Step 2: Fault propagation.
Step propagation. Select a path that
propagates the faulty value to an observed output (y
in our example).
G1
x1
s1 G3
x2 ≥1
s2
& G5
x3
y
G4
G2 ≥1
x4 & Xs3
S-A-1
JMM v1.4
Test generation: Line justification
sensitization:
fault propagation:
line justification
Step 3: Line justification.
Step justification. Find a set of input
values that enables the selected path
(backtracking).
Is this always possible? What would it mean if no such
input values exist?
Is the set of enabling input values unique?
If not, which should one choose?
G1
x1
s1 G3
x2 ≥1
s2
& G5
x3
y
G4
G2 ≥1
x4 & Xs3
S-A-1
JMM v1.4
Test generation: PODEM, FAN
JMM v1.4
PODEM
JMM v1.4
PODEM: Example
branch-
branch-and-
and-bound tree
nodes represent decisions
branches represent PI's
9 represent 1st decision faulty
start
example x1 stuck-
stuck-at-
at-1
x1=0
x2=1
G1
x1
s1 G3
x2 ≥1
s2
& G5
x3
y
G4
≥1
G2
&
x4
s3
JMM v1.4
Test pattern generation: Heuristics
JMM v1.4
Controllability and observability
measure
often used to solve np-
np-complete problems
heuristics do not guarantee to find a solution in a
given time
testability measure methods:
temas,
temas, testscreen,
testscreen, victor, camelot,
camelot, scoap
sandia controllability/observability
controllability/observability analysis
program ((scoap
scoap)
scoap)
each node in a circuit gets values for its
controllability, observability and testability
high values indicate nodes which are hard to control
or to observe
distinguish between "1" and "0" controllability
distinguish between combinational and sequential
values
JMM v1.4
Observability & Controllability
When propagating faulty values to observed outputs we are often faced
with several choices for which should be the next gate in our path.
path.
?
X
?
JMM v1.4
Testability measurement:
Scoap algorithm
combinational "1" and "0" controllability of a logic gate
output y dependent on inputs x1..x3
OR gate:
OR
CC 0 ( y ) = CC 0 ( x1 ) + CC 0 ( x2 ) + CC 0 ( x3 ) + 1
CC 1 ( y ) = min{CC 1 ( x1 ), CC 1 ( x 2 ), CC 1 ( x3 )}+ 1
AND gate:
AND
CC 0 ( y ) = min{CC 0 ( x1 ), CC 0 ( x2 ), CC 0 ( x3 )}+ 1
CC 1 ( y ) = CC 1 ( x1 ) + CC 1 ( x 2 ) + CC 1 ( x3 ) + 1
combinational "1" and "0" observability of a logic gate
dependent on output y and inputs x2,x3
OR gate:
OR
CO ( x1 ) = CO ( y ) + CC 0 ( x 2 ) + CC 0 ( x3 ) + 1
AND gate:
AND
CO ( x1 ) = CO ( y ) + CC 1 ( x2 ) + CC 1 ( x3 ) + 1
initialization (N are internal nodes, X,Y are PI, PO's)
initialization
CC 0 ( X ) = 1 CC 0 (N ) = ∞ CO (Y ) = 0
CC 1 ( X ) = 1 CC 1 (N ) = ∞ CO (N ) = ∞
MicroLab, VLSI-20 (18/26)
JMM v1.4
Testability measurement:
Scoap algorithm (con‘t)
hmmm. I guess
smaller numbers
are better...
1,1,-
1,1,-
1,1,-
1,1,- -.-.0
1,1,-
1,1,-
1,1,-
1,1,- -.-.0
1,1,-
1,1,-
1,1,-
1,1,-
1,1,-
1,1,- CC0,CC1,CO
JMM v1.4
Fault simulation
JMM v1.4
Fault simulation (con‘t)
A=[00000] A'=[01000]
MA
C=[01100] C'=[01110]
MC
B=[00000] B'=[00100] ≥1
MB
JMM v1.4
Fault Grading
So, you’ve constructed a set of test vectors
using the techniques described here. Will
they detect all the faulty parts?
JMM v1.4
Conclusion
JMM v1.4
Coming Up...
Next topic…
Design for Testability
Design
JMM v1.4
VLSI--20
Exercises: VLSI #1
G2
&
x3
G3
& G6
x1
G1
& y
x2 &
G4
α &
x4
G5
&
JMM v1.4
VLSI--20 #2
Exercises: VLSI
y
G4
≥1
x4 &
s3
JMM v1.4
VLSI Systems Design
Top-down Design and HDLs
It seems I
have to
hurry up!
Overview
? Top down design-flow, VHDL hardware description
language, test-bench methodology
JMM v1.4
chapter 1
JMM v1.4
The Need for HDLs cont.
JMM v1.4
Hardware Description Languages
?textual HDLs
VHDL, Verilog-HDL, HardwareC, etc.
?graphic HDLs
SpecdChart, etc. (control & dataflow graphs)
?tabular HDLs
BIF, etc. (FSMD models in tabular forms)
?time-diagram HDLs
Waves, etc.
?Standardization
? VHDL: IEEE Std 1067-1987 & 1993
? std_logic package IEEE Std 1164-1993
? Verilog-HDL: IEEE Std 1997
JMM v1.4
A Tale of Two HDLs
VHDL Verilog-HDL
VHSIC HDL, Very High Speed C-like concise syntax
Integrated Circuits. ADA-like
verbose syntax, lots of redundancy
Extensible types and Built-in types and logic
simulation engine. Logic representations. Oddly,
representations are not this has led to slightly
built in and have evolved incompatible simulators
with time (IEEE-1164). from different vendors.
JMM v1.4
Introduction to VHDL & Verilog
VHDL Verilog-HDL
?rich & powerful ?simple & efficient
language language
?data type driven ?hardware driven
language language
?goal: documentation of ?goal: automatic
large complex systems synthesis
JMM v1.4
Introduction to VHDL & Verilog cont.
language features
?signal data types (in, out, bidir, signal-strength ...)
?hardware structures (memory, register-files, ...)
?logic operators (shift, rotation, masking, ...)
?asynchronous structures (set, reset of memories)
?parallel or synchronous structures
?constraints (pin, technology, area, delays, ...)
?inter-process communications (shared medium,
message passing, ...)
VHDL
JMM v1.4
chapter 2
sum
carry
time (ns)
5 10 15 20 25 30 35 40
JMM v1.4
Signal Values
value interpretation
U un-initialized
X forcing unknown
0 forcing 0
1 forcing 1
Z high impedance
W weak unknown
L weak 0
H weak 1
- don’t care
JMM v1.4
Resolved Signals
unresolved signal
JMM v1.4
chapter 3
Entity
a sum
+
b carry
entity HalfAdder is
port (a,b : in bit;
sum,carry : out bit);
end HalfAdder;
library IEEE;
use IEEE.std_logic_1164.all
entity HalfAdder is
port (a,b : in std_ulogic;
sum,carry : out std_ulogic);
end HalfAdder;
JMM v1.4
Exercises Ex401: Entity
sel rNot
8 bit data
n z
a
32 bit data Alu32
6 bit op-code c
use first letter of
b component name in capital,
op and first letter of signal
name in small cap
MicroLab, VLSI-21 (12/94)
JMM v1.4
Architecture
end behavior;
JMM v1.4
Entity-Architecture: Hierarchy
(VHDL vs. Verilog)
library IEEE;
use IEEE.std_logic_1164.all;
entity FullAdder is
port (a,b,ci: in std_logic; co,s:out std_logic);
end FullAdder; VHDL
JMM v1.4
Concurrency
?The operation of digital systems is inherently
concurrent
?Within VHDL signals are assigned values using
signal assignment statements <=
?Multiple signal assignment statements are executed
concurrently
concurrent architecture concurrent_behavior of HalfAdder is
signal assignment begin
sum <= (a xor b) after 5 ns;
carry <= (a and b) after 5 ns;
end concurrent_behavior;
sum
carry
time (ns)
5 10 15 20 25 30 35 40
MicroLab, VLSI-21 (15/94)
JMM v1.4
Dataflow Model #1
library IEEE;
use IEEE.std_logic_1164.all;
entity HalfAdder is
port (a,b: in std_logic;
carry,sum:out std_logic);
end HalfAdder;
JMM v1.4
Dataflow Model #2
s1
L1
L4
+
L2 s2
L5
L3 s3
library IEEE;
use IEEE.std_logic_1164.all;
entity FullAdder is
port (a,b,cIn: in std_logic;
cOut,sum: out std_logic);
end FullAdder;
JMM v1.4
Signal Assignments #1
sig <= ‘0’, ‘1’ after 10 ns, ‘0’ after 20 ns, ‘1’ after 40 ns;
time (ns)
5 10 15 20 25 30 35 40
time (ns)
5 10 15 20 25 30 35 40
JMM v1.4
Conditional Signal Assignment #2
library IEEE;
use IEEE.std_logic_1164.all;
entity Mux4to1 is
port (i0,i1,i2,i3: in std_logic_vector(7 downto 0);
sel : in std_logic_vector(1 downto 0);
z : out std_logic_vector(7 downto 0));
end Mux4to1;
JMM v1.4
Exercises Ex402: Conditional Signal
Assignment
?Ex402 (difficulty: easy): Define the VHDL code of
a 1bit ALU with the operations: AND, OR,
FullAdder. Use the resolved 9 value logic of the
IEEE 1164 package. The Simple1bitALU.vhd file has
to be analyzed and simulated by the Synopsys
commands: gvan and vhdldbx
carry
a
Alu32
carryIn result
b
opcode
JMM v1.4
Delays: Delta Delay Model
?The VHDL language distinguished between tree
delay models:
?Delta delay model
?Inertial delay model (default)
?Transport delay model
? Delta delay model
?If no delay is specified, a delta delay is assumed. A delta
delay is as small as zero delay. It is used by the
simulator which sums delta delays to zero.
in1 architecture delta_delay of Comb is
signal s1,s2,s3,s4: std_logic:=0;
in2 begin
z s1 <=not(in1);
s1 s2 <=not(in2);
s3 <=not(s1 and in2);
s2 s4 <=not(s2 and in1);
s3 z <=not(s3 and s4);
s4 end delta_delay;
0 10 20 30 40 50 60 70
in2
s2
s3
z
10 ? 2? 3?
MicroLab, VLSI-21 (21/94)
JMM v1.4
Delays: Inertial Delay Model
?Digital circuits have a certain amount of inertia.
For example it takes a finite amount of time and a
certain amount of energy for the output of a gate to
respond to a change on the input
? Inertial delay model (default)
?a pulse shorter than the propagation delay will not
propagate to the output
5 10 15 20 25 30 35 40
VHDL’93!
sum <=reject 2 ns inertial (a xor b) after 5 ns;
MicroLab, VLSI-21 (22/94)
JMM v1.4
Delays: Transport Delay Model
input
out
input
8 ns
out1 output for delay: 8 ns
time (ns)
5 10 15 20 25 30 35 40
JMM v1.4
Delay Model in Practice
?Accurate delay library IEEE;
modeling of wire use IEEE.std_logic_1164.all;
delays is possible,
although in practice it entity HalfAdder is
is difficult to obtain port (a,b: in std_logic;
accurate estimates of carry,sum:out std_logic);
the wire delay without end HalfAdder;
proceeding through
physical design and architecture transport_delay of HalfAdder is
layout of the circuit. signal s1,s2: std_logic:=‘0’;
begin
s1 <= (a xor b) after 2 ns;
a s1 s2 <= (a and b) after 2 ns;
sum
b sum <= transport s1 after 4 ns;
carry <= transport s2 after 4 ns;
s2 end transport_delay;
carry
a
b inertial
sum
carry transport
s1
s2
time (ns)
0 2 4 6 8 10 12 14
MicroLab, VLSI-21 (24/94)
JMM v1.4
Exercises vlsi21: Conditional
Assignments
?Ex403 (difficulty: easy): Write and simulate a
VHDL model of a 2-bit comparator (compare on
equality, filename: Comp2.vhd).
a
Comp2 c
b
?Ex405 (difficulty: easy): Construct and test a
VHDL module for generating the following
waveforms.
a
b
c time (ns)
0 10 20 30 40 50 60
JMM v1.4
chapter 4
JMM v1.4
Example: Process Statement
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all;
entity Memory is
port (addr,wrData: in std_logic_vector(31 downto 0);
wr,rd: in std_logic;
rdData :out std_logic_vector(31 downto 0));
end Memory;
JMM v1.4
The Process Construct #2
identical behavior
process
MicroLab, VLSI-21 (28/94)
JMM v1.4
VHDL vs. Verilog: Events
VHDL Verilog-HDL
whow!
everything is
event driven like
in real life
JMM v1.4
Conditional Programming Constructs
?If-then-else statement
if condition then sequential statement
[ elsif condition then sequential statement ]
[ else sequential statement ] end;
?case statement
case expression is
{when choices => sequential statements }
[ when others => sequential statements ]
end case;
JMM v1.4
Example: Condition Statements
library IEEE;
use IEEE.std_logic_1164.all;
entity HalfAdder is
port (a,b: in std_logic;
sum,carry: out std_logic);
end HalfAdder;
Case_Process: process(a,b)
begin
case a is
when ‘0‘ => carry <= a after 5 ns;
when ‘1‘ => carry <= b after 5 ns;
when others => carry <= ‘x‘ after 5 ns;
end case;
end process;
JMM v1.4
VHDL vs. Verilog:
Combinational Logic Example
entity Multiplexer4to1 is
port (sel: in std_logic_vector (1 downto 0);
a,b,c,d: in std_logic_vector (15 downto 0);
z:out std_logic_vector (15 downto0));
end Multiplexer4to1;
VHDL
architecture DemoExample of Multiplexer4to1 is
begin
process (a,b,c,d,sel)
4 to 1 multiplexer
begin (no interfered memory)
case sel is
when (“00“) => z <= a;
when (“01“) => z <= b;
when (“10“) => z <= c;
when (“11“) => z <= d; module Multiplexer4to1(sel,a,b,c,d,z);
when others => z<=“-------“; input [1:0] sel;
end case; input [15:0] a,b,c,d;
end process; output [15:0] z;
end DemoExample;
assign z =(sel == 2’d0) ? a:
(sel == 2’d1) ? b:
(sel == 2’d2) ? c:
(sel == 2’d3) ? d:
Verilog-HDL 16’bx;
endmodule
MicroLab, VLSI-21 (32/94)
JMM v1.4
Loop Programming Constructs
JMM v1.4
Example: Loop Statements
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all; a
Multiplier
(32 bit) m
entity Multiplier is b
port (a,b: in std_logic_vector(31 downto 0);
m: out std_logic_vector(63 downto 0));
end Multiplier;
JMM v1.4
Exercises vlsi21: Loops
shiftLeft shiftRight
shiftNum
JMM v1.4
More on Processes
process A conflict
y<=‘0‘; - two drivers!
- not synthesisable!
process B
y<=‘1‘;
JMM v1.4
The Wait Statement
wait on signal;
example: wait on clk,reset,status;
wait;
JMM v1.4
Example: Wait Statements
library IEEE; library IEEE;
use IEEE.std_logic_1164.all; use IEEE.std_logic_1164.all;
JMM v1.4
Latch vs. Flip-Flop
process(clk,reset)
begin
if (reset = ‘0’) then d D Q q
q <= ‘0’; Latch
elsif (clk=‘1’) then clk
q <= d;
end if;
end process;
reset
process(clk,reset)
begin
if (reset = ‘0’) then d D Q q
q <= ‘0’; Flip-Flop
elsif (clk’event and clk=‘1’) then clk
q <= d;
end if;
end process; reset
process(clk,reset)
begin
if (reset = ‘0‘) then
q <= “00000000“; Mux D Q q
elsif rising_edge(clk) then d
register
if (enable = ‘1’) then enable
q <= d; clk
end if;
end if;
end process; reset
MicroLab, VLSI-21 (39/94)
JMM v1.4
Exercises vlsi21: Synchronous
d q
enable
clk
reset
reset
MicroLab, VLSI-21 (40/94)
JMM v1.4
More on Wait: Inter-Process Comm.
transmitData
request
acknowledge
receiveData time
entity Handshake is
port(inputData: in std_logic_vector(31 downto 0));
end Handshake;
end behavioral;
JMM v1.4
Exercises vlsi21: Handshake
inputData outputData
input process output process
JMM v1.4
Attributes
attribute function
signal’event function returning a Boolean value
signifying a change in value on this signal
signal’active function returning a Boolean value
signifying an assignment made to this
signal (may not be a new value)
signal’last_event function returning the time since the
last event
signal’last_active function returning the time since the
signal was last active
signal’last_value function returning the previous value
of this signal
signal’left returns the leftmost value of signal in
its defined range
signal’right returns the rightmost value of signal
in its defined range
signal’hight returns the highest value of signal
in its defined range
signal’low returns the lowest value of signal
in its defined range
signal’ascending returns true if signal has an ascending
range of values
signal’length returns the number of elements in the
array signal
MicroLab, VLSI-21 (43/94)
JMM v1.4
Generating Periodic Waveforms
library IEEE;
use IEEE.std_logic_1164.all; Z
time (ns)
entity Periodic is
port(Z: out std_logic); 0 10 20 30 40 50
end Periodic;
JMM v1.4
Exercises vlsi21: FSM
OrangeState
orange
carPresent
green
red
main 0 1 0
second 1 0 0
carPresent
GreenState RedState1
orange
orange
green
green
red
red
main 0 0 1 main 1 0 0
second 1 0 0 second 0 0 1
carPresent
RedState2
orange
green
carPresent
red
reset main 1 0 0
second 0 1 0
JMM v1.4
chapter 5
Modeling Structure
component label
component
H1 H2 interconnection
HalfAdder3 HalfAdder3
s1 sum
in1 a sum a sum OR2
in2 b carry b carry s2 a
z cOut
s3
b
cIn
O3
JMM v1.4
Example: Structural Model
library IEEE;
use IEEE.std_logic_1164.all;
entity FullAdder3 is
port (in1,in2,cIn: in std_logic;
sum,cOut: out std_logic);
end FullAdder3; component behavior
architecture structural of FullAdder3 is described elsewhere
component HalfAdder3
port(a,b: in std_logic;
sum,carry: out std_logic);
end component; component
declaration
component OR2
port(a,b: in std_logic;
z: out std_logic);
end component;
JMM v1.4
Exercises vlsi21: Structural Model
JMM v1.4
VHDL vs. Verilog:
library IEEE; Structural
use IEEE.std_logic_1164.all;
entity FullAdder4 is Description
port (a,b,cIn:in std_logic;
cOut,sum:out std_logic);
end FullAdder4;
JMM v1.4
VHDL vs. Verilog:
Data Flow Description
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all;
entity FullAdder5 is
port (a,b,cIn:in std_logi;
VHDL
sum,cOut:out std_logic);
end FullAdder5;
JMM v1.4
Hierarchy, Abstraction, and Accuracy
OR2 HalfAdder3
JMM v1.4
Generics
?The VHDL language provides the ability to construct
parameterized models using the concept of generics
entity AND2 is
generic(andDelay: Time);
library IEEE; port(a,b : in std_logic; z: out std_logic;
use IEEE.std_logic_1164.all; end AND2;
JMM v1.4
More on Generics
?Within a structural model there are two ways in
which the values of generic constants of lower level
components can be specified:
?in the component declaration
?in the component instantiation
?If both are specified, then the value provided by the
generic map() takes precedence.
?If neither is specified, then the default value
defined in the model is used.
library IEEE;
use IEEE.std_logic_1164.all;
entity GenericOR is
generic(n: positive:=2);
port(in1: in std_logic_vector((n-1) downto 0); z: out std_logic);
end GenericOR;
JMM v1.4
Exercises vlsi21: Hierarchy, Generic
JMM v1.4
Configuration
Configuration associates
an architecture
description to each
FullAdder3 component:
- behavioral or
- structural for
OR2 HalfAdder3
FullAdder3
AND2 XOR2
JMM v1.4
Configuration: Component Binding
C1
architecture lowPower of Comb is
a Comb sum - - -
b (combinational
logic) carry
C2
Dff
q d architecture highSpeed of Comb is
carryIn - - -
clk
rst clock
reset
architecture behavioral of Comb is
- - -
JMM v1.4
Configuration: Default Binding Rules
JMM v1.4
Example: Configuration
C1 highSpeed
in1 Comb sum
in2 (combinational
logic)
carryIn carry
C2 MyDff
q d
clk
rst clock
configuration name behavioral
reset entity name
(used for simulation)
library name
configuration CFG_HighSpeed of SerialAdder is entity name
architecture name
for structural
for C1: Comb use entity WORK.Comb(highSpeed);
end for;
JMM v1.4
Exercises vlsi21: Configuration
? Ex 413 (difficulty: easy): Write a VHDL code of
the bit-serial adder shown in the previous
transparency SerialAdder.vhd
a) Construct a model for the two components Comb
and MyDff and place them both in your WORK
library (don‘t use the library MyLibrary yet).
b) Adapt the configuration, compile and simulate it.
i1
i2 &
i3
?1 o1
&
JMM v1.4
chapter 6
JMM v1.4
Functions
?Functions are used to compute a value based on the
values of the input parameters. Functions are
placed in declarative parts. Example of function
definition:
rising_edge(clk)
?Functions execute in zero simulation time, thus
wait statements cannot exist in functions.
Parameters are restricted to be of mode in.
mode not necessary
function rising_edge (signal clock: std_logic) return boolean is
variable edge: boolean:=false;
begin
edge:=(clock= ‘ 1‘ and clock‘event);
return(edge);
end rising_edge;
MicroLab, VLSI-21 (62/94)
JMM v1.4
Example: Type Conversion Function
with Functions
?As VHDL is a type sensitive language, type
conversions are quite often necessary.
JMM v1.4
Procedures
?Procedures are subprograms that can modify one or
more of the input parameters. Example of procedure
declaration reading from a file f:
JMM v1.4
Example: Procedure
library IEEE;
use IEEE.std_logic_1164.all;
entity CPU is
port(di: out std_logic_vector(31 downto );
addr: out std_logic_vector(2 downto 0);
r,w: out std_logic;
do: in std_logic_vector(31 downto 0);
s: in std_logic);
end CPU;
JMM v1.4
Overloading
Dff(clk,d,q,qbar);
Dff(clk,d,q,qbar,reset,clear);
?From the type and number of arguments we can tell
which procedure we meant to use.
?Note that in std_logic_1164.vhd the boolean
functions and, or, etc have been defined for
std_logic types, the functions +,*, etc have been
defined for certain predefined types of the language
such as integer. See also std_logic_arith package.
function “*“(arg1,ar2: std_logic_vector) return std_logic_vector;
function “+“(arg1,ar2: singed) return signed;
MicroLab, VLSI-21 (66/94)
JMM v1.4
Packages
?Locally related functions and procedures can be
grouped into packages, and thus easily be shared
among designs and people.
package MyLibraryPackage is
--
-- type declarations
package declaration
-- function declarations
-- procedure declarations similar to VHDL entity
-- defines interfaces
end MyLibraryPackage;
JMM v1.4
Example: Package Declaration
package std_logic_1164 is
end std_logic_1164;
JMM v1.4
Libraries
JMM v1.4
Synopsys tools on unix workstations
DEFAULT: ./WORK
MyLibrary : ./lib
use = . ./src
timebase = ns
MyVHDLdesign.vhd components
library MyLibrary; can also be
use MyLibrary.MyPackage.all; placed into
library
-- use MyLibrary.all; libraries
library MyLibrary
WORK
entity MyVHDLdesign is /home/MyHome/VHDLdesign/lib
... /home/MyHome/VHDLdesign/WORK
MicroLab, VLSI-21 (70/94)
JMM v1.4
Exercises vlsi21: Libraries & Packages
? Ex415 (difficulty: medium): The small circuit
ConfigExample from exercise Ex414 shall be
rewritten by using the components OR2 and ANDn
from the library MyLibrary.
a) Write the VHDL file MyComponents.vhd holding
the two components OR2 and ANDn and compile it
into the library MyLibrary.
b) Rewrite the ConfigExample circuit using only
library elements and call it LibraryExample.vhd,
compile and simulate it.
i1 ANDn
i2 & OR2
i3
ANDn ?1 o1
&
JMM v1.4
Exercises vlsi21: Packages
? Ex417 (difficulty: medium): The bit-serial adder of
exercise Ex413 shall we rewritten using a
procedure call for the Dff instead of a component
(SerialAdder2.vhd). Place the procedure into a
package MyPackage and analyze it into the library
MyLibrary. Verify the functionality.
C1 highSpeed
in1 Comb sum
in2 (combinational
logic)
carryIn carry
Dff
q d
clk library
rst clock MyLibrary
behavioral
reset
JMM v1.4
VHDL vs. Verilog: Data Types
VHDL Verilog-HDL
JMM v1.4
VHDL vs. Verilog: Operators
Operator type function VHDL Verilog
arithmetic a+b + +
a-b - -
a*b * *
a/b / /
a-b*n a div b mod %
a-(a/b)*b rem
logical a and b and &
a or b or ¦
not(a and b) nand ~&
a exor b xor ^
shift logic srl,sll >>
shift arith. sra,sla
rotate ror, rol
reduction, & {a,b}
concatenation
replication {4{a}}
relational > > >
>= >= >=
/=
JMM v1.4
VHDL vs. Verilog:
Sequential Structures
-- inside an architecture
...
variable inp: std_logic_vector (7 downto 0);
variable outp,cout:std_logic_vector (7 downto 0);
VHDL
process (clk)
begin
if (clk’event and clk = ‘1’) then
outp := outp + inp;
cout := outp + 1;
end if;
end process;
...
/* inside a module */
...
wire [7:0] inp;
sequentially executed statements reg [7:0] outp, cou;
...
always @(posedge clk)
begin
outp = oupt + inp;
cout = outp + 1;
end
Verilog-HDL ...
JMM v1.4
VHDL vs. Verilog:
Parallel Structures
-- in an architecture
...
variable inp: std_logic_vector (7 downto 0);
signal outp,cout:std_logic_vector (7 downto 0);
VHDL
p1: process (clk)
begin parallel executed
if (clk’event and clk = ‘1’) then statements
outp <= outp + inp;
cout <= outp + 1;
end if;
end process; /* in a module */
...
p2: process (reset) wire [7:0] inp;
begin reg [7:0] outp, cou;
if (reset = ‘0’) then ...
outp <= “00000000“; always @(posedge clk)
end if; fork
end process; outp = outp + inp;
... cout = outp + 1;
join
parallel
executed blocks always @(reset)
two drivers if (!reset)
Verilog-HDL outp = 8’b0;
...
JMM v1.4
VHDL vs. Verilog: Assignments
architecture ex1 of AssignExample is
signal x1, y1, y2, z1, z2:
std_logic_vector (7 downto 0); VHDL
variable x2: std_logic_vector (7 downto 0);
...
begin
p1: process (clk)
begin signal assignment
if (clk’event and clk = ‘0’) then variable assignment
x1 <= y1;
y1 <= x1;
z1 <= y1 after 12ns; Verilog-HDL
end if;
end process; module AssignExample
wire [7:0] v,y2,z2;
p2: process (y2) reg [7:0] x1,y1,z1,x2;
begin ...
x2 := y2; always @(posedge clk)
y2 <= x2; fork
z2 <= y2 after 12ns; x1 = y1;
end process; y1 = x1;
end ex1; z1 #(12) = y1;
join
before the falling edge of clk:
assign x2 = y2;
x=1, y=2, z=3 assign y2 = x2;
12ns after falling edge of clk: assign #(12) z2 = y2;
x= y= z= ? endmodule
MicroLab, VLSI-21 (77/94)
JMM v1.4
VHDL vs. Verilog:
Sequential Logic
library IEEE; register with
use IEEE.std_logic_1164.all;
package MyDefinition is asynchronous reset
type vector16 is array (15 downto 0) of
std_logic;
end MyDefinition; VHDL
library IEEE;
use IEEE.std_logic_1164.all;
use work.MyDefinition.all;
entity AsynRegister is
port (clk,rst: in std_logic;
a: in vector16; z: out vector16); Verilog-HDL
end AsynRegister;
module AsynRegister(clk,rst,a,z);
architecture DemoExample of AsynRegister is input clk,rst;
begin input [15:0] a;
process (clk, rst); output [15:0] z;
begin
if (rst = ‘0’) then always @(posedge clk)
z <= vector16’(others => ‘0‘); if (rst == 1’b0)
elsif (clk’event and clk = ‘1’) then z = 16’b0;
z <= a; else
end if; z = a;
end process; endmodule
end DemoExample; MicroLab, VLSI-21 (78/94)
JMM v1.4
“Dataflow” Modeling
library IEEE;
use IEEE.std_logic_1164.all;
JMM v1.4
VHDL Example: Behavioral Modeling
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
JMM v1.4
Synthesis
JMM v1.4
Logic Synthesis
Z <= (A and B) or C; if (SEL = ‘1’) then Z <= B;
else Z <= A;
A end if;
B
C Z B 1
0 Z
A
A
B SEL
C Z
B
SEL Z
A
process(word)
variable result: std_logic;
begin WORD(3)
result := ‘0’; PARITY
for j in 0 to 3 loop
result := result xor word(j); WORD(2)
end loop;
parity <= result;
WORD(1)
end process; WORD(0)
JMM v1.4
FSM Example
architecture behavioral of Moore is
type StateType is (S0,S1,S2,S3);
signal current,next: StateType;
begin
JMM v1.4
Further Reading
Also:
JMM v1.4
Test Bench /1
I can relax
my test bench does
everything automatically
JMM v1.4
Test Bench /2
?compare a test bench with MicroLab-I3S:
?there are chips and PCBs needed to be tested
?there is a nice measurement equipment
?there are skilled and hard working people
?there are no signals coming or going to the outside of the
lab
Test Bench
control response
and generation
stimulus and
generation verification
JMM v1.4
Test Bench in Design Flow
design of
VHDL model
FPGA test
(debugger)
test bench synthesis of
test machine logic model
inp FPGA out test bench
chip simulation of
logic model inp out
ASIC fabrication
test bench
test machine
prototype test
(ASIC) inp ASIC out
chip
MicroLab, VLSI-21 (87/94)
JMM v1.4
VHDL Test Bench
use IEEE.std_logic_1164.all
entity TestBench is test bench has
end TestBench; no inputs, no outputs
architecture sample of TestBench is
signal clk, a: bit;
signal b: bit;
component MyCircuit
port(clk,a:in bit; b: out bit);
end component;
begin
call of device under test
DUT: MyCircuit port map (clk,a,b); (DUT)
process
begin clk generation
clk <= ‘0’, ‘1‘ after 20 ns, ‘0‘ after 70 ns;
wait for 100 ns;
end process;
TestPatternGenerator: block
begin test pattern generation
process on a cycle by cycle basis
begin
a <= ‘0’; -- test cycle 1
wait for 100 ns;
a <= ‘1’; -- test cycle 2
wait for 100 ns; response pattern
... verification not
end process; yet implemented!
end block;
end sample;
MicroLab, VLSI-21 (88/94)
JMM v1.4
Test Bench - Test Cycle
test cycle
clock
input valid
JMM v1.4
ProTest Test Machine
JMM v1.4
Conclusions
JMM v1.4
Exercises: VLSI-21: Test-Bench
test bench
inp VHDL out
model
JMM v1.4
Coming Up...
?Next topic…
CAD exercises and mini FPGA projects PWM, blackjack
dealer, simple microprocessor, etc
JMM v1.4
Exercises: VLSI-21 #1
JMM v1.4
Exercises: VLSI-21 #2
JMM v1.4
Exercises: VLSI-21 #3
JMM v1.4
Exercises: VLSI-21 #4
JMM v1.4
Exercises: VLSI-21 #5
JMM v1.4
VLSI Design I
Automatic Synthesis of Digital Circuits
Overview
design abstraction domains
architectural models
Goal: You are familiar with the design abstraction
domains, the description-
description-synthesis design method,
the design strategies as well as the three synthesis
steps. You know the FSMD architectural model as
well as the interprocess communication models.
MicroLab, VLSI-22 (1/40)
JMM v1.4
Introduction
JMM v1.4
Design Methodology
budget ($, speed, area,
power, schedule, risk)
low-
low-level building blocks, spice
high-
high-level architecture paper & pencil
JMM v1.4
Capture--Simulation Method
Capture
bottom--up approach
bottom
structure of a system is described
knowledge of an experienced designer is difficult to
automate
& CLK
D Q
data 3A
clk
ena
JMM v1.4
Description--Synthesis Method
Description
top-
top-down approach
behaviour of a system is described
technology independent
CAD algorithms can search the solution space very
quickly
&
D Q
if data-ready then
bus := data;
else
bus := high-Z; clk
end if;
JMM v1.4
Design methods for VLSI circuits
what it is now
top-
top-down or bottom-
bottom-up ?
JMM v1.4
Abstraction Domains
JMM v1.4
Abstraction Domains: YY--Chart
synthesis
logic cells
abstraction level
chips, mo d ules
circuit chips, MC M s, boar ds
abstraction level
Physical Domain
JMM v1.4
Behavioural Domain
Behavioural Domain
progr a ms
subro utines, b. equ ations
instructions
JMM v1.4
Structural Domain
Structural Domain
processors
AL Us, registers
logic gates
tra nsistors
JMM v1.4
Physical Domain
layout, transistors
cells
chips, mo d ules
chips, MC M s, boar ds
Physical Domain
JMM v1.4
Abstraction Levels
JMM v1.4
Abstraction: System Level
JMM v1.4
Abstraction: Microarchitecture Level
output
JMM v1.4
Abstraction: Logic Level
cin sel
a
b
mux s
ALU
cout
JMM v1.4
Abstraction: Circuit Level
c y
b
JMM v1.4
Design Strategies
a strategy?
why not ad-
ad-hoc
JMM v1.4
Design Strategies: Hierarchy
cin
a sum
b
cin sum
a
b adder cout
cout
JMM v1.4
Design Strategy: Regularity
JMM v1.4
Design Strategies: Modularity
JMM v1.4
Design Strategies: Locality
JMM v1.4
Automatic Synthesis /1
synthesis
Behavioural Domain Structural Domain
silicon compilation
Physical Domain
JMM v1.4
Automatic Synthesis /2
JMM v1.4
Automatic Synthesis: Allocation
delay s1
s4
s6
s8
s10
s14 s18
s22
area
JMM v1.4
Allocation: Example
RTL example
xx = a + b;
yy = a * c;
zz = x + d;
xx = y - d;
xx = x + c;
allocation: 1 adder, 1 multiplier, 1 substractor
a b c d
+ * +
y z
-
x2
+
x3
MicroLab, VLSI-22 (25/40)
JMM v1.4
Automatic Synthesis: Scheduling
JMM v1.4
Scheduling: Example
a b c
+ *
cycle 1 y
d
- +
cycle 2
x2 z
+
cycle 3
x3
JMM v1.4
Automatic Synthesis: Binding
JMM v1.4
Binding: Example
b a c
d
cycle 1 + *
x1 y
cycle 2 + -
z x2
cycle 3 +
x3
JMM v1.4
cont..
Binding: Example cont
reg
x1 x2 a
mux mult
b d
y
reg reg reg reg
mux mux
add sub
z, x1, x3 x2
JMM v1.4
Architecture Models
JMM v1.4
Architecture Models:
Microarchitecture
microarchitecture components
microarchitecture
functional units
adder, multiplier, comparator, ALU, etc.
memory elements
latch, flip-
flip-flop, register, register-
register-file, RAM, ROM ...
interconnection units
bus, multiplexer
JMM v1.4
Architectural Models:
Combinational Logic
combinational logic:
non subdividable units
encoder, decoder, carry-
carry-lookahead adder ...
subdividable units
ripple-
ripple-carry adder, selector, ALUs,
ALUs, ...
implementation forms
ROM (table lookup)
PLA structures (2 stage logic)
multistage logic
bit-
bit-slice, systolic array, etc
JMM v1.4
Architectural Models:
Finit State Machines
JMM v1.4
Architectural Models:
Control Unit / Data Path
FSM datapath
transfer transfer
logic logic status
functional
unit
state datapath
control
register
JMM v1.4
Architectural Models:
System Architecture
D
Q process 1
control inputs databus
FSM datapath
transfer transfer
logic logic status
functional
unit
state datapath
control
clock1 register
control outputs
D
Q
control inputs
FSM datapath
transfer
transfer transfer
logic
logic logic status
functional
unit
state datapath
control
clock2 register
control outputs
process 2
MicroLab, VLSI-22 (36/40)
JMM v1.4
Architectural Models:
Interprocess Communication
process 1
request
aknowledge
process 2
JMM v1.4
Architectural Models:
Implementation Constraints
JMM v1.4
Conclusions
description--synthesis method
description
system design with HDLs (parallel constructions,
RTL level)
top-
top-down and bottom-
bottom-up design
abstract models are not precise
races, hazards, delays, signal strength, ...
silicon compiler does not exist
JMM v1.4
Coming Up...
Next time...
Hardware description languages
Reading
Weste:
Sections 6 thru 6.2.7 (design strategy)
6.4 thru 6.4.5 (design methods)
JMM v1.4
VLSI Design I
Design for Test
Overview
design for test architectures
ad-
ad-hoc, scan based, built-
built-in
Goal: You are familiar with testability metrics and
you know ad-
ad-hoc test structures as well as scan-
scan-
based test structures. Built in test structures as
BILBO and boundary scan can be applied.
MicroLab, VLSI-23 (1/24)
JMM v1.3
Design For Test
What can we do to increase testability?
increase observability
Ö add more pins (?!)
Ö add small “probe” bus, selectively
enable different values onto bus
Ö use a hash function to “compress” a
sequence of values (e.g., the values of a
bus over many clock cycles) into a
small number of bits for later read-
read-out
Ö cheap read
read--out of all state information
increase controllability
Ö use muxes to isolate sub-
sub-modules and
select sources of test data as inputs
Ö provide easy setup of internal state
JMM v1.3
Ad--hoc testing #1
Ad
Ad-
Ad-hoc test techniques are a collection of ideas
aimed at reducing the test time. Common
techniques are:
partitioning large sequential circuits
adding test points
adding multiplexers
&
co2 &
co2 &
co2
load load
test 1 test 1
=1 0 =1 0 =1
Q2 Q2 Q2
&
co1 &
co1 &
co1
load load
test 1 test 1
=1 0 =1 0 =1
Q1 Q1 Q1
vdd &
co0 vdd load &
co0 vdd load &
co0
test 1 test 1
=1 0 =1 0 =1
Q0 Q0 Q0
half-
half-adder
MicroLab, VLSI-23 (3/24)
JMM v1.3
Ad--hoc testing
Ad #2
Module Module
1 A B
A control 0
Module A Module B
1
0 B control
Module Module
0 1 1 0 A B
A out test1
test1 test2
test2 B out
JMM v1.3
Scan--based test techniques #1
Scan
Idea:
Idea: have a mode in which all registers are chained
into one giant shift register which can be loaded/
read-
read-out bit serially. Test remaining (combinational)
logic by
(1) in “test” mode, shift in new values for all
register bits thus setting up the inputs to the
combinational logic
(2) clock the circuit once in “normal” mode, latching
the outputs of the combinational logic back into
the registers
(3) in “test” mode, shift out the values of all
register bits and compare against expected
results. One can shift in new test values at the
same time (i.e., combine steps 1 and 3).
.
.
.
scan-
scan-out
CL 1
D Q
0
shift out
clk
QQ DD
QQ DD normal/test
clk
clk
clk
clk 1
D Q
0
shift in
normal/test clk
scan-
scan-in
normal/test
JMM v1.3
Scan--based test techniques #2
Scan
serial scan
scan-
scan-out
DD QQ CL1 DD QQ CL2 DD QQ
DD QQ DD QQ DD QQ
clk
clk clk
clk clk
clk
clk
clk clk
clk clk
clk
scan-
scan-in serial scan chain
Scan registers
R1 CL R4
CL CL
R2 CL R5
R6
R3
JMM v1.3
Level sensitive scan design
A popular approach is the level sensitive scan
design technique from T.W. Williams (LSSD)
the circuit is level sensitive (steady state response is
independent of circuit and wire delays within a circuit):
hazard free
each register may be converted to a serial shift register
D D
T T2
C 1 B
I
A reg A reg B
L1 L2 D D
C B C B
I I
A A
Comb
D D
C B logic C B
I I
A A
D D
C B C B
I I
A A
serial data out
serial data in
c1
shift-clk
c2
shift-
normal operation
shift data into reg A shift reg B out
c1
shift-
shift-clk
c2
MicroLab, VLSI-23 (7/24)
JMM v1.3
Scan Elements
D D
T T2
C 1 B
LSSD I
A
L1 L2
D
&
& T1 D
& & T2
&
C
I
& & &
B
&
&
A L2
L1
D
scan FF 1
D Q Q
0
TI
clk
TE
TE clkb clka
D
clka clkb Q
TE clkb
clka
TI
TE clkb clka
MicroLab, VLSI-23 (8/24)
JMM v1.3
Self--Test Techniques: BILBO
Self
Problem: Scan-
Scan-based approach is great for testing combinational logic
but can be impractical when trying to test memory blocks, etc. because
because
of the number of separate test values required to get adequate fault
fault
coverage.
1 circuit
0
under
test
normal/test
Generate pseudo-
pseudo-random For pseudo-
pseudo-random input
data for most circuits by data simply compute some
using, e.g., a linear feedback hash of output values and
shift register (LFSR). compare against expected
Memory tests use more value (“signature”) at end of
systematic FSMs to create test. Memory data can be
ADDR and DATA patterns. checked cycle-
cycle-by-
by-cycle.
JMM v1.3
Linear Feedback Shift Register (LFSR)
If Ci’s are not programmable, can eliminate
AND gates and some XOR gates...
=1 =1 =1 =1
&
&
&
&
&
....
c1 c2 c3 cn-1 cn
D Q D Q D Q D Q D Q
1 + c1 x + c 2 x 2 + c3 x 3 cn−1 x n−1 + cn x n
Î pseudo-
pseudo-random sequence generator (PRSG)
JMM v1.3
Signature Analysis
signature analysis is used to compact a data stream
into a so called signature
different responses for different ci, many well-
well-
known CRC (cyclic redundancy check) polynomials
correspond to a specific choice of ci’s.
’s.
serial in
=1 =1 =1 =1
&
&
&
&
&
....
c1 c2 c3 cn-1 cn
=1 D Q D Q D Q D Q D Q
parallel in
=1 =1 =1
&
&
&
c1 c2 Cn-1
=1 D Q =1 D Q . . . . =1 D Q =1 D Q
z1 q1 z2 q2 zn-1 qn-1 zn qn
JMM v1.3
LFSR Polynomials
polynomials for maximal long sequences for n equal
1 up to 32
n f(x)
1,2,3,4,6,7,15,22 1+x+x
1+x+xn
5,11,21,29 1+x2+xn
10,17,20,25,28,31 1+x3+xn
9 1+x4+xn
23 1+x5+xn
18 1+x7+xn
8 1+x2+x3+x4+xn
12 1+x+x4+x6+xn
13 1+x+x3+x4+xn
14,16 1+x3+x4+x5+xn
19,27 1+x+x2+x5+xn
24 1+x+x2+x7+xn
26 1+x+x2+x6+xn
30 1+x+x2+x23+xn
32 1+x+x2+x22+xn
examples of CRC’s
n CRC
8 1+x+x4+x5+x7+x8
16 1+xMicroLab,
2+x15+x16
VLSI-23 (12/24)
JMM v1.3
BILBO #1
Very popular built-
built-in test structure is the built-
built-in
logic block observation (BILBO) from Koenemann
BILBO operate in 4 different modes
PRSG or
signature analysis BILBO BILBO
normal
mode operation signature
PRSG of circuit analysis
mode mode
JMM v1.3
BILBO #2
D0 D1 D2 D3
c1
c0 scan
out
&
&
&
&
scan =1 D =1 D =1 D =1 D
in 0 & Q & & &
Q Q Q
1 clk clk clk clk
=1
Q1 Q2 Q3 Q4
mode c1 c0 function
A 0 0 scan mode
B 1 0 reset
C 0 1 PRSG or signature analyzer
D 1 1 parallel registers
JMM v1.3
IDDQ Testing
A-met
meter (measures IDD)
VDD
GND
JMM v1.3
System--Level Test: Boundary Scan
System
The IEEE 1149.1 boundary scan architecture
provides a standardized serial scan path through the
I/O pins of a chip (also called JTAG)
at the board level, chips obeying the standard may
be connected in a variety of series and parallel
combinations for board testing (replacing bead of
nails)
standardized tests:
connectivity tests between components
sampling and setting chip I/Os
IO pad and
boundary cell
JMM v1.3
Boundary Scan: Test Access Port
clocks/control
TCK TAP
TMS controller
(TRST)
MicroLab, VLSI-23 (17/24)
JMM v1.3
Boundary Scan: TAP controller
1 test-
test-logic reset
0 1
1 1
0 run-
run-test/idle select-
select-DR-
DR-scan select-
select-IR-
IR-scan
0 0
1 capture-
capture-DR 1 capture-
capture-IR
0 0
shift-
shift-DR 0 shift-
shift-IR 0
1 1
exit1-
exit1-DR 1 exit1-
exit1-IR 1
0 0
pause-
pause-DR 0 pause-
pause-IR 0
1 1
0 exit2-
exit2-DR 0 exit2-
exit2-IR
1 1
update-
update-DR update-
update-IR
1 0 1 0
JMM v1.3
Boundary--scan: IR
Boundary
to next IR bit
data 0
D Q D Q
from last cell 1
IR bit
clk clk
shiftIR
clockIR updateIR
TRST &
reset
updateIR
JMM v1.3
Boundary--scan: DR
Boundary
TAP data register (DR)
JMM v1.3
Boundary--scan: DR
Boundary
boundary scan input and output cells
mode
next cell
out 0
PAD 0 1 to chip
1 D Q D Q
shiftDR clk clk
last cell clockDR updateDR
mode
next cell
0 out
from chip 0 1 PAD
1 D Q D Q
shiftDR clk clk
last cell clockDR updateDR
JMM v1.3
Boundary scan: instructions
Minimum 3 instructions
Bypass (all 0): it is used to bypass any serial data
registers in a chip with a 1 bit register. This allows
specific chips to be tested in a serial-
serial-scan chain without
having to shift through the accumulated SR stages in all
the chips
Extest (all 1): testing of off chip circuitry
JMM v1.3
Coming Up...
Next time:
Top down design. Hardware description languages,
logic synthesis.
Readings …
Weste:
Weste:
7.3 through 7.3.3.3 (ad-
(ad-hoc & scan-
scan-based testing)
7.3.4 through 7.3.4.1 (BILBO)
7.3.5 (Iddq
(Iddq testing)
7.5 (boundary scan)
JMM v1.3
VLSI--22
Exercises: VLSI
JMM v1.3
VLSI Design II
Small Signal FET Model
and Diode Models
Overview
small signal equivalent circuit for fet and diodes
JMM v1.3
Summary: Large Signal Model
MOS fets have 3 regions of operation
cutoff region (
(subthreshold
subthreshold):
subthreshold): VGS <= Vth
linear region (triode region): VGS> Vth ; 0< VDS< VDSsat
cutoff
(subthreshold)
subthreshold I DS = 0
W 2
VDS
linear region I DS = µCox
L (VGS − Vth )VDS − 2
active region channel length modulation
µCox W
I DS (sat ) = (VGS − Vth )2 [1 + λ (VDS − Veff )]
2 L
k rds 2ε Si
λ= k rds =
2 L VDS − Veff + Φ 0 qN A
JMM v1.3
Advanced Large Signal Modeling:
Short Channel Effects
As device dimensions are scaled down, short-
short-channel
effects degrade the operation of mos fets
mobility degradation: short channels and large
electric fields provoke more electron collisions.
Carrier velocity saturates as it is not anymore
proportional to the electric filed: µn E
νd ≅
1 + E
µ nCox W 2 where is the Ec
ID = Veff
2(1 + θVeff ) L square law
Id
θ = 1 LE
c
1 UGS U’GS
Rsx ≅ Rsx
µ nCoxWEc
JMM v1.3
Advanced Large Signal Modeling:
Leakage Currents
An important second-
second-order device limitation is the
leakage current of the junctions (ex sample-
sample-and
hold time)
the intrinsic concentration is a strong function of
temperature, the leakage current is also strongly
dependent of temperature (approx. doubles for 11C)
leakage current of a reverse-
reverse-biased junction:
1
qA j ni τ 0 ≅ (τ n + τ p )
I IK ≅ xd 2
2τ 0
2ε si
xd = (Φ 0 + VR )
qN A
MicroLab, VLSI-24(5/22)
JMM v1.3
Small Signal Equivalent Circuits
Why do we love them?
Find Id of a transistor in active region when the gate
sin(ωt)
is driven with a voltage source Vgs=V0sin(ω
It is handy to use simple linear equations !
ÖIt
f (n ) (x0 )
∞
f (x ) = ∑ (x − x0 )n
Taylor: n =0 n!
approximation: f (x ) ≈ f (x ) +
df (x0 )
0 (x − x0 )
dx
operating point small signal
– small signal parameters are denoted with small letters
– small signal parameters are very handy for building
simple equivalent circuits MicroLab, VLSI-24(6/22)
JMM v1.3
Transconductance #1
MicroLab, VLSI-24(7/22)
JMM v1.3
Transconductance #2
µCox W
I DS (sat ) = (VGS − Vth )2 [1 + λ (VDS − Veff )]
2 L
∂I D
gm =
∂VGS
W 2I D W
g m = µ n Cox Veff = = 2 µ n Cox I D
L Veff L
∂I D ∂I D ∂Vtn
gs = = ⋅
∂VSB ∂Vtn ∂VSB
∂I D
g ds =
∂VDS
1
g ds = = λI Dsat ≈ λI D
rds
MicroLab, VLSI-24(8/22)
JMM v1.3
Small--Signal Modeling in the Active
Small
Region (Low Frequency)
the low-
low-frequency model
id
vg vd
+ gmvgs gsvs rds
vgs
-
is
vs
Depending on the terminal voltages, and the relative size of
the parameters, some of the components may be ignored.
This helps to reduce the complexity of hand calculations.
vg rds
is rs=1/gm
vs
MicroLab, VLSI-24(9/22)
JMM v1.3
MOSFET Capacitance Estimation
in Active Region
The dynamic response of MOS systems strongly depends on
the parasitic capacitance associated with the MOS transistor.
2 2
C gs = WCox L + Lov = WLCox + WCGS 0
3 3 C j0
C gd = WLov Cox = WCGD 0 C jx = Mj
V
1 + XB
Φ0
Csb = ( As + Ach )C js
'
C j −sw0
'
Cdb = Ad C jd C j −sw, x = M jsw
V
1 + XB Φ
0
Csb = Csb' + Cs−sw Cs −sw = Ps C j −sw,s
Cdb = Cdb '
+ Cd −sw Cd −sw = Pd C j −sw,d
VGS>Vth
VDG>-Vth
VSB=0
poly Cgd
Al Cgs SiO2
n+ n+
p+ field Cs-sw C’sb Lov
impland Cd-sw
p- substrate C’db
VB=0MicroLab, VLSI-24(10/22)
JMM v1.3
Small--Signal Modeling
Small
in the Active Region
the small signal model
Cgd id
vg vd
+ gmvgs gsvs rds
Cgs vgs
- Cdb
is
Csb vs
MicroLab, VLSI-24(11/22)
JMM v1.3
Small--Signal Modeling
Small
in the Triode region
a simplified triode-
triode-region model for small VDS
1 1
C gs = C gd = AchCox + WLov Cox = WLCox + WCGX 0
2 2
1
C xb = Ax + Ach C jx + Px C j −sw, x
2
MicroLab, VLSI-24(12/22)
JMM v1.3
Small--Signal Modeling
Small
cut--off region
in cut
a simplified cut-
cut-off region model
vg
Csb Cdb
MicroLab, VLSI-24(13/22)
JMM v1.3
Diodes
anode cathode
p+/nwell diode anode
Note that the metal Al SiO2
contacts to the p+ n+
diode are connected n well cathode
to heavily doped p- substrate
region pn junction
cathode anode
anode
n+/pwell diode
Al SiO2
n+ p+
p well cathode
n- substrate
pn junction
MicroLab, VLSI-24(14/22)
JMM v1.3
Diode Modeling
If a diode is reverse-
reverse-biased, current flow is
extremely small and primarily due to thermal or
optically generated carriers. electric field
C j0
Cj = Mj p+ n
VR
1 + N AND
Φ0 Φ 0 = VT ln 2
ni depletion region
qε si N D N A depletion
Cj0 = capacitance Cj
2 Φ 0 (N A + N D )
Large-
Large-signal model for forward biased junction
VD 1 1
ID = ISe VT I S ∝ AD +
N A ND
CT = Cd + AC j diffusion capacitance Cd
ID
(Cd=0 for forward biased Schottky diodes) Cd = τ T
VT
Small-
Small-signal model for a forward-
forward-biased diode
dominant for
1 dI D I D large currents
= =
rd dVD VT rd
Cj Cd
MicroLab, VLSI-24(15/22)
JMM v1.3
Coming Up...
Next time:
Basic current mirrors and single stage amplifiers.
MicroLab, VLSI-24(16/22)
JMM v1.3
VLSI--24
Exercises: VLSI #1
Johns&Martin 1.1 pp7: Ex1.4 (difficulty: easy):
Assuming process C05M-
C05M-D. a) Calculate the total
zero-
zero-bias depletion capacitance CT-j0 of a p+nwell
5µm times 5µ
diode with an area of 5µ 5µm. Do not use
the Spice parameter CJ. b) At 3V reverse-
reverse-bias the
capacitance Cj has to be calculated again.
Result:
Result: a) CT-j0=16.3fF, b) CT-j=8.98fF
Johns&Martin 1.1 pp10: Ex1.6 (difficulty: medium):
C05M-D and Mj=0.5 (use Spice
Assuming process C05M-
parameter CJ). A reversed biased p+nwell diode is
charged from 0V to 3.3V through a 10kΩ 10kΩ resistor.
Calculate the time to charge the diode to 2/3 2/3 of its
end value.
value.
Result: t66%=130ps (Johns: see eq.
Result: eq. 1.36 pp10)
Johns&Martin 1.2 pp31: 1.9 (difficulty: easy):
Assuming process C05M-
C05M-D. a) Derive the low- low-
frequency parameters for an nfet with W=10µ W=10µm and
L=0.5µm at Vgs=1.1V, Vds= Veff , Vsb= 0.55
L=0.5µ 0.55V.
b) What is the new value of rds if the drain-drain-source
voltage is increased by 0.55
0.55V.
Result:
Result: a) gm=0.98mA/V, gs=0.143mA/V,
=208kΩ, b) rds=12.8kΩ
rds=208kΩ =12.8kΩ ???? MicroLab, VLSI-24(17/22)
JMM v1.3
VLSI--24
Exercises: VLSI #2
MicroLab, VLSI-24(18/22)
JMM v1.3
VLSI--24
Exercises: VLSI #3
MicroLab, VLSI-24(19/22)
JMM v1.3
VLSI--24
Exercises: VLSI #4
0.5µm
0.5µ
0.6µm
0.6µ
3µm
0.6µm
0.6µ
MicroLab, VLSI-24(20/22)
JMM v1.3
VLSI--24
Exercises: VLSI #5
J1 Q1 J2 Q2 J3 Q3 J4 Q4 J5
27λ
27λ
node 2 gates
MicroLab, VLSI-24(21/22)
JMM v1.3
VLSI--24
Exercises: VLSI #6
MicroLab, VLSI-24(22/22)
JMM v1.3
VLSI Design II
Basic Current Mirrors and
Single Stage Amplifiers He !
That‘s me !
JMM v1.2
Outline
Current mirrors
Single stage amplifiers with active loads
Johns&Martin
nodal analysis method
simple CMOS current mirror (chap 3.1)
common-
common-source amplifier (chap 3.2)
source-
source-follower or common drain amplifier (chap 3.3)
common gate amplifier (chap 3.4)
source degenerated current mirror (chap 3.5)
high-
high-output-
output-impedance current mirrors (chap 3.6)
cascode gain stage (chap 3.7)
Exercises
hand calculations
spice simulations
JMM v1.2
Simple CMOS Current Mirror
Used as bias current source
Used to multiply currents
Used as high output impedance
2L
Iin Iout
Id active
V1 rout
Q1 Q2
linear
Vds
JMM v1.2
Simple CMOS Current Mirror
(Q1 model)
small signal model (low frequency)
id
vg vd
+ gmvgs gsvs rds
vgs
-
is
vs Iin Iout
small signal model for Q1 V1 rout
Q1 Q2
V1
iy
vg1
+ gm1vgs1 gs1vs1 rds1 +
vgs1 ~ vy
- -
v1
Q1 1/gm1=rs1
small signal model of
diode connected transistor
MicroLab, vlsi-25 (4/26)
JMM v1.2
Simple CMOS Current Mirror
(small signal analysis)
Small signal model of overall CMOS current mirror
Q1 Q2
ix
+ gm2vgs2 rds2 +
1/gm1 vgs2 ~ vx
- -
ix
rds2 +
~ vx
-
rout = rds 2
JMM v1.2
Common Source Amplifier
the common source topology is the most popular
gain stage, especially when high-
high-input impedance is
required
a common use of simple current mirrors in a single-
single-
stage amplifier with an active load
active loads represent high-
high-impedance output loads
without using high impedance resistors or large
power supply voltages.
for a given supply voltage a larger gain can be
achieved using active loads.
1MΩ load were required with a
for example, if a 1MΩ
100µA bias current, a 100µ
100µ 100µA x 1MΩ
1MΩ=100V
power supply would be necessary
active load
Q3 Q2
Vout
rout
Ibias Q1
Vin
common source
amplifier stage
JMM v1.2
Common Source Amplifier
(small signal analysis)
it is assumed, that the bias current is such that
both transistors Q2 and Q3 are in active region.
Q3 Q2
Vout
rout
Ibias Q1
Vin
Q1 R2
Rin vout active load
vin ~
+ + gm1vgs1
vgs1
- - rds1 rds2
v gs1 = v in
v out
Av = = − gm1 R2 = − gm1 (rds1 rds 2 )
v in
MicroLab, vlsi-25 (7/26)
JMM v1.2
Source--Follower or
Source
Common-
Common-Drain Amplifier
common-
common-drain amplifier is commonly used as
voltage buffers and thus is called source-
source-follower
ideally the small signal voltage gain is close to
unity
as the circuit has no voltage gain it does have a
current gain
dc level of the output voltage is not the same as
the dc level of the input voltage
note that the body effect is the major limitation on
the small-
small-signal gain
common-
common-drain
amplifier stage
Vin
Ibias Q1
Vout
Q3 Q2
active load
JMM v1.2
Source--Follower
Source
(small signal analysis)
Note that the voltage controlled current source that
models the body effect of the nfet has been
included
Vin
Ibias Q1
Vout
Q3 Q2
Q1
vd1
vin =vg1
+ gm1vgs1 gs1vs1 rds1
vgs1
-
vs1 vout=vs1
rds2
active load
JMM v1.2
Nodal Equation Methodology
In order to minimize circuit equation errors, a
consistent methodology should be maintained when
writing nodal equations:
the first term is always the node at which the currents
are being summed v out
rds2
JMM v1.2
Source--Follower
Source
(small signal analysis, con‘t)
Q1
vd1
vin =vg1
+ gm1vgs1 gs1vs1 rds1
vgs1
-
vs1 vout=vs1
rds2
v s1
vout g m1
Av = =
vin g m1 + g s1 + g ds1 + g ds 2
JMM v1.2
Common--Gate Amplifier
Common
Common-
Common-gate stage with active load is used when
relatively small input impedance is desired
Application examples: input impedance of 50Ω50Ω to
terminate a transmission line, or first stage of
amplifier to amplify current instead of voltage
active load
Q3 Q2
Vout
Ibias Q1 common-
common-gate
Vbias
rin amplifier stage
Vin
Q1
vd1 vout
vin
MicroLab, vlsi-25 (12/26)
JMM v1.2
Common--Gate Amplifier
Common
(small signal analysis)
Q1
vd1 vout
vin
v s1 = −v gs1 thus
Q1
vd1 vout
+ (gm1+gs1)vs1 rds1
vgs1 RL=rds2
-
vs1
rin only active
RS charge present
nodal analysis for is
nodes vout and vs1: vin
é ù
v out ê Gs g m 1 + g s 1 + g ds 1
Av = =ê
v in ê G + g m 1 + g s 1 + g ds 1 G L + g ds 1
s
êë 1 + g ds 1 / G L
MicroLab, vlsi-25 (13/26)
JMM v1.2
Summary: Gain Stages
common source amplifier: gain stage with high
input impedance.
vout
Av = = − g m1 (rds1 rds 2 )
vin
common drain amplifier (source follower): used as
voltage buffers with small signal voltage gain close
to 1, but can produce current gain.
vout g m1
Av = =
vin g m1 + g s1 + g ds1 + g ds 2
JMM v1.2
Source--Degenerated Current Mirror
Source
General consequence of finit output resistance:
deviation in large signal behavior
difficulties as active load
V1 rout
Q1 Q2
Rs Rs
Q1 Q2
0V ix
gm2vgs
+ gsvs rds2 +
1/gm1 vgs ~ vx
- -
vs
Rs ix Rs
impedance increase
vx
rout = = rds 2 [1 + Rs ( gm 2 + g s 2 + gds 2 )]
ix
MicroLab, vlsi-25 (15/26)
JMM v1.2
High--Output Impedance Current Mirrors
High
Cascode Current Mirror
the output impedance of a cascode current mirror is
increased by a factor 10 to 100 compared to a basic
current mirror
a disadvantage is the reduced output voltage swing
because transistors may enter triode region
Q3 Q4
Q1 Q2
JMM v1.2
Cascode Current Mirror
Q1 Q2
JMM v1.2
Cascode Current Mirror (con‘t)
Q3 Q4
Q1 Q2
impedance
impedance vgs4=-vs4
vg4 iout
JMM v1.2
High--Output
High Output--Impedance Current Mirrors
Wilson Current Mirror
very similar performance than cascode current
mirror but 1/2 of its output impedance
shunt-
shunt-series feedback to increase output impedance
Iin Iout
rin rout
Q3 Q4
Q1 Q2
JMM v1.2
Cascode Gain Stage
cascode configuration for single stage amplifiers is
commonly used in modern IC design
quite large gain for single stage due to large impedance
at the output
to enable the large gain, high quality cascode current
mirrors at the output are necessary
large gain normally without any speed degradation
voltage across input drive fet is limited
minimizing short channel effects in modern technologies
configuration: common-
common-source-
source-connected transistor
feeding into a common-
common-gate-
gate-connected transistor
JMM v1.2
Cascode Gain Stage
telescopic cascode amplifier
Ibias
Vout
Vbias
Q2 CL
Vin
Q1
output impedance of cascode stage:
rx ≅ g m 2 rds 1rds 2
ix vx
vx
gm2vgs2 gs2vs2 ix
+ rds2
vgs2 vs2(gs2+gm2)
- rds2
vs2
vs2
gm1vgs1 gs1vs1
+ rds1 rds1
vgs1
-
2 rd 2 ≅ g m 2 rds1rds 2
1 æ gm ö
A v ≅ − çç
2 è g ds for high impedance Ibias with
RL ≅ g 2
r g mrds2
m− p ds − p rout ≅
for gdsn=gdsp and gmn=gmp 2
MicroLab, vlsi-25 (21/26)
JMM v1.2
Summary: Cascode and Source Deg.
2
vout 1 æ gm ö
Av = ≅ − çç
vin 2 è g ds
JMM v1.2
Coming Up...
Next topic…
Frequency response of single stage amplifiers
Exercises:
Havea look at the exercises in Johns&Martin.
CAD exercise Ex601
JMM v1.2
VLSI--25
Exercises VLSI #1
Johns&Martin chap 3.1 pp127: 3.1 (difficulty: easy):
Consider the current mirror shown on transparency
=100µA and each transistor has
vlsi25/3 where Iin=100µ
W=10µm and L=2µ
W=10µ L=2µm. Given rds=88000 [L
(µm)]/[ID (mA
(mA)],
mA)], find rout for the current mirror
and the value of gm1. Also estimate the change in Iout
for a 0.5V change in the output voltage.
=1.76MΩ, gm1=0.45mA/V,
Result: rout =1.76MΩ
=0.28µΑ
dIout=0.28µΑ
JMM v1.2
VLSI--25
Exercises VLSI #2
Johns&Martin chap 3.3 pp131: 3.3 (difficulty: easy):
Consider the source follower shown on transparency
=100µA and all transistors
vlsi25/8 where Ibias=100µ
0.5µm process have
designed with Alcatel 0.5µ
W=10µm and L=2µ
W=10µ L=2µm. Given γn=0.45V1/2,
Vsb=2V, and rds-
ds-n=88000 [L (µ(µm)]/[ID (mA
(mA)].
mA)].
What is the gain of the stage.
Result: Av =0.88
JMM v1.2
VLSI--25
Exercises VLSI #3
Johns&Martin chap 3.6 pp138: 3.5 (difficulty: easy):
Consider the cascode current mirror shown on
=100µA and all
transparency vlsi25/15 where Iin=100µ
W=10µm and L=2µ
transistors have W=10µ L=2µm. Given
VSB4=1V and rds-ds-n=50000 [L (µ(µm)]/[ID (mA
(mA)].
mA)].
What is the output impedance and the minimal
output voltage.
=527kΩ, Vout(min)=1.5V
Result: rout =527kΩ
JMM v1.2
VLSI Design II
Frequency Response of
Single Stage Amplifiers
[dB]
40
20
0
103 104 105 106 107 108 109 [Hz]
Circuit Analysis
the precise way: solving complex equations
the approximate way: find the dominant pole
the handy way: let Spice do it precisely
JMM v1.2
Outline
Frequency response
common-
common-source amplifier
source-
source-follower amplifier
source-
source-follower amplifier with compensation technique
cascode gain stage
Johns&Martin
frequency response (chap 3.11)
Gray&Meyer
estimation of dominant poles
zero-
zero-Value Time Constant Analysis (pp500 ff)
ff)
(Analysis and Design of Analog Integrated Circuits, 3rd
edition, Wiley and Sons, ISBN-
ISBN-0471-
0471-59984-
59984-0)
Exercises
hand calculations
spice simulations
JMM v1.2
Frequency Response
Dominant Pole Approximation
precise calculation of frequency response is a
complex task and thus different approximation
methods exist
one method is the zero-
zero-value time constant analysis
first some ideas about dominant-
dominant-pole approximation
are developed
JMM v1.2
Dominant Pole Approximation
(con’t 2)
n
1
b 1 = ∑ −
i= 1 pi
an important practical case occurs when one pole is dominant
1 n
1
p 1 << p 2 , p 3 ,
p1
>> ∑ −
i= 2 pi
1
thus b 1 ≅
p1
K
A ( jω) =
ω 2 ω 2 ω 2
1 + 1 + 1 +
p 1 p 2 p n
with a dominant pole we simply get
K
A ( jω) ≅
ω 2
1 +
p 1
MicroLab, vlsi26 (4/29)
JMM v1.2
Dominant Pole Approximation
(con’t 3)
this approximation will be quite accurate as long as ω ≅ p 1
jω
s plane
σ
p3 p2 p1
JMM v1.2
Zero--Value Time Constant
Zero
τ x = Rx C x
JMM v1.2
Frequency Response
Zero-
Zero-Value Time Constant
RL
Rin
+ vout
vin ~
-
i3
Cx
+
v3
- i2
Rin rb Cµ
+ -
+ + v2
vin ~ i1 C v rπ vout
- 1
- π gmv1 RL
We can show that with this choice od variables the circuit equations are of the form:
i 1 = (g 11 + sC π )v 1 + g 12 v 2 + g 13 v 3
i 2 = g 21 v 1 + (g 22 + sC µ )v 2 + g 23 v 3
i 3 = g 31 v 1 + g 32 v 2 + (g 33 + sC x )v 3
JMM v1.2
Zero--Value Time Constant
Zero
(con’t 1)
determinant ∆ of the
The poles of the transfer function are the zeros of the determinant
circuit equations, which can be written in the form:
∆(s ) = K 0 + K 1 s + K 2 s2 + K 3 s3
∆ ( s ) = K 0 (1 + b 1 s + b 2 s 2 + b 3 s 3 )
If all capacitors are zero:
K 0 = ∆ C π =C µ = C x = 0 ≡ ∆ 0
Consider now the term K1s, this is the sum of the terms involving s that are
obtained when the system determinant is evaluated. However it is apparent,
that s only occurs when associated with a capacitance:
K 1 s = h 1 sC π + h 2 sC µ + h 3 sC x
The terms are constants. h1 can be evaluated by expanding the determinant
about the first row:
∆ ( s ) = (g 11 + sC π )∆ 11 + g 12 ∆ 12 + g 13 ∆ 13
With cofactors ∆xx of the determinant. The term sCπ is found by evaluating
∆11 with Cµ and Cx equal zero
h 1 = ∆ 11 Cµ =C x =0
JMM v1.2
Zero--Value Time Constant
Zero
(con’t 2)
∆ ( s ) = g 21 ∆ 21 + (g 22 + sC µ )∆ 22 + g 23 ∆ 23
With cofactors ∆xx of the determinant. The term sCµ is found by evaluating
∆22 with Cπ and Cx equal zero
h 2 = ∆ 22 C π =C x =0
similarly
h 3 = ∆ 33 C µ = Cπ = 0
K 1 = ∆ 11 Cµ =C x =0 C π + ∆ 22 C π =C x =0 C µ + ∆ 33 C µ =C π = 0 Cx
and:
K 1 ∆ 11 Cµ =C x =0 ∆ 22 C π =C x =0 ∆ 33 Cµ =C π =0
b1 = = Cπ + Cµ + Cx
K0 ∆0 ∆0 ∆0
JMM v1.2
Zero--Value Time Constant
Zero
(con’t 3)
Now consider putting i2=i3=0 and solving for v1
v 1 ∆ 11
=
i1 ∆(s )
The driving-
driving-point resistance at the Cπ node pair with all capacitors
equal to zero:
∆ 11 Cµ = C x =0 ∆ 11
= Cµ = C π = C x =0
∆0 ∆
We now define
∆ 11
R π0 = Cµ =C x =0
∆0
b 1 = R π0C π + R µ0Cµ + R x 0C x
Thus:
1 1
ω−3 dB ≅ ω−3 dB ≅
b1 ∑T 0
Thus the sum of the zero-
zero-value time constants leads to the -3dB frequency
JMM v1.2
Summary: Frequency Analysis Methods
The precise way:
Add the parasitic capacitors to the equivalent circuit. Use
nodal analysis for evaluating the transfer function.
The approximate way:
if there exists a pole p1 <<p2, p3 ,..., and the transfer
function is already given be the transfer function
A(s)=N(s)/D(s)
with D ( s ) = 1 + b1 s + b2 s 2
+ l + bn s n
JMM v1.2
Frequency Response
Common-
Common-Source Amplifier
precise calculation of frequency response is most
often left to computer simulations
much insight can be obtained by finding the
dominant frequency effects (dominant poles, zeros)
+ gm1vgs1
vin ~ vgs1
- Cgs1 C2
R2
Cdb of Q1 and Q2
and load CL
rds of Q1 and Q2
JMM v1.2
Frequency Analysis ((con’t
con’t))
con’t
C gd 1
− g m 1R 2 1 − s at frequencies
v out gm1 where gain has just
= started to decrease
v in 1 + sa + s 2 b
a = R in [C gs 1 + C gd 1 (1 + g m 1R 2 )] + R 2 (C gd 1 + C 2 )
b = R inR 2 (C gd 1 C gs 1 + C gs 1 C 2 + C gd 1 C 2 )
1
ω−3 db =
a
1
ω−3 db =
R in [C gs 1 + C gd 1 (1 + g m 1R 2 )] + R 2 (C gd 1 + C 2 )
Miller capacitance for Rin >> R2
g m 1 C gd 1
ωp 2 ≅
C gs 1 C gd 1 + C gs 1 C 2 + C gd 1 C 2
JMM v1.2
Frequency Response
Source-
Source-Follower Amplifier
source followers can have complex poles and thus
exhibit overshoot
a compensation technique resulting in only real axis
poles is shown, resulting in no overshooting
Q1
vout
Iin Rin Cin
Ibias CL
Cgd1 vd1
JMM v1.2
Source--Follower Amplifier
Source
(con’t 1)
vg1 Yg
+ gm1vgs1
iin Rin C’in Cgs1 -vgs1
vs1 vout
C’in=Cin+Cgd1 Rs1 Cs
R s 1 = rds 1 rds 2 (1 / g s 1 )
JMM v1.2
Source--Follower Amplifier
Source
(con’t 2)
1. gain from vg1 to vout is found
2. admittance Yg looking into gate of Q1 without considering Cgd1 is found
3. Gain from iin to vg1 is found
4. overall gain from vin to vout is found and results interpreted
ig1t sC gs 1 (sC s + G sq )
Yg = =
v g1 s(C gs 1 + C s ) + g m 1 + G s 1
3. Gain from iin to vg1 is found
v g1 s(C gs 1 + C s ) + g m 1 + G s 1
=
iin a + sb + s 2 c
4. overall gain from vin to vout is found and results interpreted
v out sC gs 1 + g m 1
A(s ) = =
iin a + sb + s 2 c
JMM v1.2
Source--Follower Amplifier
Source
(con’t 3)
N( s )
ω0 is the pole frequency
Q is the Q factor
A ( s ) = A (0 )
s s2
1+ + 2
ω 0 Q ω0
There is no peaking and the transfer functions maximum is at dc if:
Q < 1 / 2 ≅ 0.707
ω0 is the -3dB frequency if: Q = 1/ 2
Step input function:
no peaking for Q ≤ 0 .5
peaking for Q > 0 .5
(complex conjugate poles) 4 Q 2 −1
% overshoot = 100 e −π /
For the source follower:
− gm1 G in (g m 1 + G s 1 )
ω0 =
ωZ =
C gs 1 C gs 1 C s + C 'in (C gs 1 + C s )
G in (g m 1 + G s 1 )[C gs 1 C s + C 'in (C gs 1 + C s )]
Q=
G in C s + C 'in (g m 1 + G s 1 ) + C gs 1 G s 1
Source follower circuits can exhibit large amounts of overshoot under certain
conditions. In practical uE circuits the parasitic capacitances and the output
capacitance results in only moderate overshoot for worst-
worst-case conditions.
MicroLab, vlsi26 (17/29)
JMM v1.2
Source--Follower Amplifier
Source
Compensation Technique
source followers can have complex poles and thus
exhibit overshoot
overshooting may be reduced by:
increasing Cin
or Cs or both
adding a compensation network
Q1
C1 vout
Iin Rin Cin
Ibias CL
R1
C gs 1 (C s g m 1 − C gs 1 G s 1 ) g m 1 C gs 1 C s
C1 = ≅
(g m 1 + G s1 )(C gs 1 + C s ) (g m1 + G s1 )(C gs 1 + C s )
(C + G )
gs 1 s (C 2
gs 1 + G s )2
R1 = ≅
C gs 1 (C g − C G ) C
s m1 gs 1 s1 Cg
gs 1 s m 1
C gs 1 C s
C2 =
C gs 1 + C s (see Johns/Martin pp160-
pp160-162)
MicroLab, vlsi26 (18/29)
JMM v1.2
Frequency Response
Common-
Common-Gate Amplifier
The frequency response of the common-
common-gate stage
is usually superior to that of the common-
common-source
stage due to the low impedance, rin, at the source
node, assuming GL=(sC
=(sCL+gds2)is not considerably
smaller than gds1.
Ibias
Q1
vout
Vbias CL
=
vout
JMM v1.2
Frequency Response
High-
High-Ouput Impedance Mirrors
Both the Wilson and the cascode current mirrors
introduce high-
high-frequency poles into the signal
transfer function.
The approximate time constant of these poles is
Cgs/gm, the roof of this statement can be found by
doing high-
high-frequency, small-
small-signal analysis.
Q3 Q4 Q3 Q4
Q1 Q2 Q1 Q2
JMM v1.2
Frequency Response
Cascode Gain Stage
The exact high-
high-frequency analysis of a cascode gain
stage is usually left to simulation on a computer.
at high-
high-frequencies, the time constant due to the
output node almost always dominates since the
impedance is so large at that node:
Cout=(Cgd2+Cdb2)+CL+Cbias
CL is normally the major contributor
Ibias
Vout
Vbias
Q2 CL
Vin
Q1
1 2 g 2ds
ω−3 dB ≅ ≅
R out C L g mCL
JMM v1.2
Cascode Gain Stage
(con’t 1)
Zero-
Zero-value time constant analysis method used
Ibias C d 2 = C gd 2 + C db 2 + C L + C bias
Vout C s 2 = C db 1 + C sb 2 + C gs 2
Vbias
Q2 CL
Vin
Q1 gm2vs2 vout
JMM v1.2
Cascode Gain Stage
(con’t 2)
gm2vs2 vout
vx ix
vg1
-~ + G d 1 = g ds 1 + Ys 2
admittance looking into the source
Rin gm1vg1 Rd1
of a cascode transistor is Ys2
τ Cgd 1 = C gd 1R d 1 (1 + R in [G d 1 + g m 1 ])
JMM v1.2
Cascode Gain Stage
(con’t 3)
gm2vs2 vout
G d 1 = g ds 1 + Ys 2
admittance looking into the source
of a cascode transistor is Ys2 gm2vs2 vout
JMM v1.2
Cascode Gain Stage
(con’t 4)
gm2vs2 vout
node vs2 the resistance seen by the capacitor Cs2 is rds1 in paralell
with the impedance seen looking in the source of Q2 which
is approximately rds, thus:
rds
τ Cs 2 ≅ C s2
2
The resistance seen by C is the output impedance of the
node vout cascode amplifier, thus: d2
g mrds2
τ Cd 2 ≅ C d 2
2
τ total ≅ C gs 1R in + C gd 1 + C s 2 ds + C d 2 ds
2 2 2
MicroLab, vlsi26 (25/29)
JMM v1.2
Cascode Gain Stage
Comments
High frequencies considerations
Av
Vbias
Vout
A(s ) =
Q2 1 + s / ω−3 dB
CL
Vin at frequencies substantial larger than ω-3dB:
Q1 Av gm1
A(s ) ≅ ≅−
s / ω−3 dB sC L
JMM v1.2
Coming Up...
Next topic…
Basic OpAmp design and compensation
Readings
for next time…
Johns&Martin: Sections 3.11
Exercises:
Have a look at the exercises in Johns&Martin.
JMM v1.2
VLSI--26
Exercises VLSI #1
Johns&Martin chap 3.11 pp156: 3.8 (difficulty: easy):
Consider the common-
common-source amplifier shown on
=100µA and all
vlsi-26/6 where Iin=100µ
transparency vlsi-
W=100µm and L=1.6µ
transistors have W=100µ L=1.6µm. Given
=180kΩ, CL=0.3pF,
Rin=180kΩ, =0.3pF, Cgs1=0.2pF,
=0.2pF, Cgd1=15fF,
=15fF,
=20fF, Cdb2=36fF,
Cdb1=20fF, =36fF, µnCox=90µ
=90µA/V2,
=30µA/V2, and rds-
µpCox=30µ ds-n=8000 [L (µ(µm)]/[ID
mA)], rdsp=12000 [L (µ
(mA)], (µm)]/[ID (mA
(mA)].
mA)].
Estimate the 3db frequency response.
Result: f-3db =554kHz
JMM v1.2
VLSI--26
Exercises VLSI #2
Johns&Martin chap 3.11 pp166:3.11 (difficulty: easy):
Assume that for the input transistors and the
=100kΩ,
cascode transistors, gm=1mA/V, rds=100kΩ
=180kΩ, CL=5pF,
Rin=180kΩ, =5pF, Cgs=0.2pF,
=0.2pF, Cgd=15fF,
=15fF,
=40fF, Cdb=20fF,
Csb=40fF, =20fF, Cbias=20fF,
=20fF, Estimate the -
dB frequency of the cascode amplifier (transparency
19).
=2π 6.3MHz
Result: ω-3dB =2π
JMM v1.2
Analog Microelectronics
Basic OpAmp Design
and Compensation
Today’s handouts:
(1) Lecture Slides
MicroLab, vlsi27 (1/34)
JMM v1.0
Outline
u Johns&Martin
u MOS differential pair and gain stage (chap 3.8)
u two-stage CMOS OpAmp (chap 5.1)
u gain
u frequency response
u systematic offset voltage
u n- or p-channel input stage
u Exercises (5.3-5.5)
u hand calculations
u spice simulations
JMM v1.0
MOS Differential Pair
and Gain Stage
u most integrated amplifiers have differential input,
realized with a differential transistor pair
ID2 ID2
V+ V-
Q1 Q2
Ibias
id1=is1 id2=is2
v+ v-
is1 is2
rs1 rs2
gate current is
zero in T model
JMM v1.0
MOS Differential Pair
(con’t 1)
to simplify analysis the output impedance of the transistor is ignored
id1=is1 id2=is2
Definition: v+ v-
is1 is2
v in ≡ v + − v − rs1 rs2
i
v in v in s1
id 1 = i s 1 = =
rs1 + rs2 1 / g m1 + 1 / g m2
g m1 gm1
id 1 = v in id 2 = − v in
2 2
Definition: thus:
iout ≡ i d 1 − id 2 iout = g m 1 v in
JMM v1.0
MOS Differential Pair
(con’t 2)
If a differential pair has a current mirror as an active load, a complete
differential-input, single-ended-output gain stage can be realized.
JMM v1.0
MOS Differential Pair
(con’t 3)
The evaluation of the output resistance rout is determined by using the
small-signal equivalent circuit and applying a voltage at the output node.
Note that the T-model is used for Q1, Q2 and Q3, and the the hybrid-π
model is used for Q4.
vx Q3 Q4
rout ≡ rout
ix is1
id4 vout
+
is1
vin Q1 Q2
+ gm4va -
rds3 //rs3 va rds4 Ibias
-
ix1 ix4 ix
is5
ix2 ix3 vx +
-
is1 is2
rout = rds2 rds4
is1 rs1 rds1 rds2 rs2 is2
A v = g m1 (rds 2 rds4 )
JMM v1.0
MOS Differential Pair
(con’t 4)
The evaluation of the large signal amplification is determined by using the
large-signal transistor model in the active region of the fets.
Note that the T-model is used for Q1, Q2 and Q3, and the the hybrid-π
model is used for Q4.
µC W Q3 Q4
ID = 0 ox (VGS − Vtn )2 IS1
Iout
Vout
2 L ID4
β +
ID = (VGS − Vtn )2 VIN Q1
ID1 ID2
Q2
2
-
Ibias
β VIN2 β 2 VIN4
I OUT = ID 1 − I D 2 = I bias − 2
I bias 4Ibias
1.5
IOUT
1
Ibias
0.5
0
2
-3 -2 -1 0 1 2 3
-0.5
VI D β / Ib i a s
-1
-1.5
JMM v1.0
Two-Stage CMOS OpAmp
u Basic OpAmp design are discussed
u OpAmp gain
u frequency response
u slew rate
u systematic offset voltage
u n-channel or p-channel input stage
capacitor ensures
stability when OpAmp CC is often called
is used in feedback Miller capacitance
to illustrate its
CC effect on input
+
Vin A1 -A2 1 Vout
-
JMM v1.0
CMOS realization of a
two-stage OpAmp
p-well process
necessary
VDD
25 25 Q 11 Q5300 Q6 300
Q 10 500
Q8
300 300
25 25
Q1 Q2
Vin- Vin+ Vout
Q 14 Q 12
Q 16 CC
100 25
Q 15 Q 13 150 150 300 500
Q4
Q3
Q7 Q9
Rb VSS
JMM v1.0
Two-Stage OpAmp
Gain
u overallgain for low frequency application is the
most critical parameter of an OpAmp
gain of the first stage
(differential stage)
Av1 = gm1 (rds1 rds 4 )
W W I bias
g m1 = 2 µ p C ox I D1 = 2 µ pC ox
L 1 L 1 2
Li approximation to the finite output
rdsi ≅ α VDGi + Vti resistance, where a is technology
I Di dependent parameter: 5e-6 V1/2/m
ignoring short channel effects
gm 9
gain of the third stage Av 3 =
(common-drain stage) G L + g m 9 + g ds8 + g ds9
gain of the third stage gm 9
with body effect (bulk Av 3 =
not connected to source) G L + g m 9 + g s8 + gds8 + g ds9
body effect constant γ=0.5V1/2 g mγ
2φF=0.7V gs =
2 VSB + 2φ F
MicroLab, vlsi27 (10/34)
JMM v1.0
Two-Stage OpAmp
Frequency Response
u frequency response where capacitor Cc causes the
magnitude of the gain to decrease, but still well
below unity gain frequency (open-loop gain = 1)
ð midband frequency
u only compensation capacitor CC repsected
u assume Q16 is not present (resistor for lead
compensation, effect only at unity gain frequency)
u discuss simplified circuit:
g m1
midband gain Av (s ) ≅
Q 300
5 sC C
Vbias g m1
untity gain frequency ω ta ≅
CC
300 300
vin- Q1 Q2 CC
v1 v2
vin+
-A2 A3
150 150 i=gm1 vin vout
Q3 Q4
JMM v1.0
Two-Stage OpAmp
Slew Rate
u slew rate SR is the maximum rate the output
changes when input signals are large
u at slew rate limitation all current of Q5 goes either
in Q1 or Q2
ð this current has to go through CC
dv out
SR ≡ max
dt
Q5300
Vbias 2 I D1
SR = = Veff 1ω ta
CC
300 300
vin- Q1 Q2 CC
v1 v2
vin+
-A2 A3
150 150 I vout
Q3 Q4
JMM v1.0
Two-Stage OpAmp
Systematic Offset Voltage Cancelation
u two-stage OpAmps may have a systematic input
offset voltage if not properly designed
u the differential input is zero: v in+= vin-
u ID6 = ID7 , which requires a well defined V GS7 value
VDD
Q5300 Q6 300
Vbias
300 300
Q1 Q2 Vout
Vin- Vin+
(W / L)7 = 2 (W / L )6
(W / L)4 (W / L)5
MicroLab, vlsi27 (13/34)
JMM v1.0
Two-Stage OpAmp
n- or p- channel input stage
u comparison between n- and p-channel input stage
OpAmps
u overal dc gain is largely unaffected since both designs
have one stage with n-channel and one stage with one or
more p-channel driving fets.
u for a given power dissipation, and therefore bias current,
having a p-channel input-pair stage maximizes the slew
rate.
u having a p-channel input first stage implies that the
second stage has an n-channel input drive fet. This
arrangement maximizes the transconductance of the drive
fet of teh 2nd stage, which is critical when high
frequency operation is important.
u output stage: n-channel source follower is preferable
because this will have less of a voltage drop (if separate
p-well is used). Its higher transconductance reduces the
effect of the load cap on the second pole. There is also
less degradation on the gain when small load resistances
are being driven.
ð p-channel input fets for the first stage is almost
always the best choice
JMM v1.0
Feedback and OpAmp Compensation
JMM v1.0
First Order Model of
Closed-Loop Amplifier
u First order model of transfer function of a
dominant-pole compensated OpAmp:
A0
A(s ) = real axis
(1 + s / ω p1 ) dominant pole
A0
unity gain frequency definition A( jω ta ) ≡ 1 ≅
ω ta / ω p1
unity gain frequency of first ω ta ≅ A0ω p1
order OpAmp model
JMM v1.0
Linear Settling Time
u the settling time performance is an important
design parameter of OpAmps
u the charge transfer in SC circuits is closely related to
OpAmps step response
u settling time is defined as the time it takes for an
OpAmp to reach a specified percentage of its final value
when a step input is applied
u linear settling time portion is due to the finite unity gain
frequency (independent on output step size)
u nonlinear settling time portion is due to the slew rate
limit (dependent on output step size)
ð unity gain frequency estimation for linear settling time
portion
vout (t ) = Vstep (1 − e −t / τ )
step response for a
closed-loop OpAmp
JMM v1.0
OpAmp Compensation
(second order model)
u for compensating OpAmps the first order model is
insufficient, because it ignores poles and zeros at
high frequencies which may cause instabilities.
u a more accurate open-loop transfer model adds one
additional pole (real axis poles and zeros):
A0
A(s ) =
(1 + s / ω p1 )(1 + s / ω eq )
first dominant pole higher frequency poles
u ωeq may be approximated with a set of real-axis
poles and zeros:
m n
1 1 1
≅∑ −∑
ω eq i =2 ω pi i=1 ω zi
u phase margin PM is an often used measure how far
an OpAmp with feedback is from becoming unstable
∠LG ( jω ) = −90 − tan (ω / ω eq )
o −1 unity gain
of LG
PM ≡ ∠LG( jω t ) − ( −180o ) = 90o − tan −1 (ω t / ω eq )
independent of β ω t = tan(90 o − PM )ω eq
MicroLab, vlsi27 (18/34)
JMM v1.0
OpAmp Compensation
(second order model con’t)
u Closed-loop gain if β is frequency independent (if
ωt is far away from high frequency poles and zeros)
ACL0
ACL ( s ) =
s (1 / ω p1 + 1 / ω eq ) s2
1+ +
1 + βA0 1 + βA0
A0 1
ACL0 = ≅
1 + βA0 β
u General equation for a second order transfer function:
K
H 2 (s) =
s s2
1+ + 2
ω oQ ω 0
u comparing:
ω 0 = (1 + βA0 )(ω p1ω eq ) ≅ βω taω eq
JMM v1.0
OpAmp Compensation
(2nd order transfer function)
u Relationship between Q factor and phase margin
u transfer function: Q=sqrt(1/2):
u no peaking
u widest passband
u ω0 = ω -3dB
JMM v1.0
Compensating the Two-Stage OpAmp
Vbias
Q 16 CC
JMM v1.0
Compensating the Two-Stage OpAmp
small-signal model
u simplified small-signal model of two-stage OpAmp
for compensation analysis
v1 CC RC vout2
gm1vin1 gm7v1
R1 C1 R2 C2
JMM v1.0
Compensating the Two-Stage OpAmp
(discussion)
s s
D( s ) = 1 + 1 +
ω
ω p1 p2
I
CC CC
gm7
R
ω p2 ω p1 ωz
1 gm 7
ω p1 ≅ ωz = −
g m 7 R1 R2C C CC
gm 7
ω p2 ≅
C1 + C 2
JMM v1.0
Compensating the Two-Stage OpAmp
(lead compensation)
u with a non-zero RC, a third pole is introduced, but
is at high frequency and has almost no effect
u However the zero opens a number of possibilities:
−1
ωz =
CC (1 / g m 7 − RC )
u one could eliminate the right-half plane zero:
RC = 1 / g m 7
u one could choose RC to be even larger and thus move the
right-half-plane zero into the left half plane to cancel
the nondominant pole ωp2:
1 C1 + C 2
RC = 1 +
gm 7 CC
u one could choose RC even larger to move the now left-
half-plane zero to a frequency slightly greater than the
unity-gain frequency that would result without the
resistor - say 20% larger (recommended): ω = 1.2ω
z t
1
RC ≅
1.2 g m1
MicroLab, vlsi27 (24/34)
JMM v1.0
Lead Compensation
Design Procedure
ŒStart by choosing, somewhat arbitrarily, C C' ≅ 5pF
•Using Spice, find the frequency at which a -125°
phase shift exists. Let the gain at this frequency be
denoted A’ and ωt.
ŽChoose a new CC so that ωt becomes the unity-gain
frequency of the loop gain, thus resulting in a 55°
phase margin. This can be achieved by taking CC
according to the equation (iterations possible):
C C = C C' A'
•Choose RC according: 1
RC =
1.2ω t C C
•The resulting phase margin is approximately 85°
(leaving 5° for process variations). It may be neces-
sary to iterate on RC to optimize the phase margin
•If after step 4 the phase margin is not adequate, then
increase CC while leaving RC constant
‘Replace RC by a fet with the following size:
1
RC = rds16 =
W
µ nC ox VeffMicroLab,
16
vlsi27 (25/34)
JMM v1.0
L 16
Compensation Independent of
Process and Temperature
u Making lead compensation process and temperature
insensitive
u the ratios of all transconductances remain relatively
constant over process and temperature variations as
all fets depend on the same biasing network:
gm 7 g m1
ω p2 ≅ ω ta ≅
C1 + C 2 CC
u when a resistor is used to realize lead
compensation, RC can also be made to track the
inverse of transconductance (1/gm7), and thus the
lead compensation will be mostly independent of
process and temperature variantions:
−1
ωz =
CC (1 / g m 7 − RC )
JMM v1.0
Compensation Independent of
Process and Temperature (con’t 2)
Making RC proportional to 1/gm7
1
RC = rds16 =
W
µ nC ox Veff 16
L 16
g m 7 = µ nC ox (W / L )7 Veff 7
(W / L )7 Veff 7
RC g m 7 =
(W / L)16 Veff 16
Therefor, all that remains is to ensure that Veff16 /Veff7 is independent of
process and temperature variations. The ratio can be made constant by
deriving Vgs16 from the same biasing circuit used to derive Vgs7
JMM v1.0
Compensation Independent of
Process and Temperature (con’t 3)
if Veff 13 = Veff 7 25 Q 11 Q6
Vbias
then Va = Vb
then (gates connected)
25
Veff 16 = Veff 12 Q 12
Q 16 CC
Veff 7 Veff 13 25
Va
thus = Q 13
Veff 16 Veff 12 300
2ID7 2 I D13
=
µ nC ox (W / L )7 µ nC ox (W / L)13
I D 7 (W / L)7 I D 7 (W / L)6
= =
however the current
I D13 (W / L )13 I D13 (W / L)11
is set by Q6, Q11
(W / L)6 = (W / L )11
condition to be satisfied
(W / L)7 (W / L)13
RC g m 7 =
(W / L )7 (W / L )12
(W / L )16 (W / L )13 as ID12=ID13 are equal
JMM v1.0
Biasing an OpAmp
to Have Stable Transconductances
u Fet transconductances are the probably the most
important parameters in OpAmps to be stabilized
u the following approach matches transconductances
to conductance of a resistor
u as a result, the fet transconductances are
independent of power-supply voltage as well as
process and temperature variations
assuming (W / L )10 = (W / L )11
25 25
2 1 −
(W / L )13 Q 10
Q 11
( )
g m13 = W / L 15
Rb
25 25
for (W / L )15 = 4(W / L)13 Q 14 Q 12
1 100 25
g m13 = Q 15 Q 13
Rb
Rb
µ i (W / L)i I Di
g mi = × g m13
µ n (W / L )13 I D13
MicroLab, vlsi27 (29/34)
JMM v1.0
Exercises VLSI-27
Ex ana3.9 (difficulty: easy): Consider a differential
pair amplifier shown on transparency vlsi-27/3
where Ibias=200µA and all transistors have
W=100µm and L=1.6µm. Given
µnCox=92µA/V2 and rds-n=8000 [L (µm)]/[ID
(mA)]. Find the output impedance and the gain.
Result: Av =68.6V/V, rout=64kΩ (see
Johns/Martin pp146)
JMM v1.0
Exercises VLSI-27 (con’t 2)
Ex ana5.2 (difficulty: easy): Find the unity gain
frequency of the OpAmp shown on transparency vlsi-
27/9, with CC=5pF . Assume ID5=100µA, first
stage VDG=0.5V, 2nd and 3rd stage VDG=1V and
bulk of Q8 connected to VSS. Given µnCox
=3µ pCox=96µA/V2, VDD=-VSS=2.5V,
RL=10kΩ, γ=0.5V1/2, φF=0.35V,
α=5e6V1/2/m, Vtn=- Vtn=0.8V.
Result: fta = 24.7MHz (see Johns/Martin pp227)
JMM v1.0
Exercises VLSI-27 (con’t 3)
Ex ana5.4 (difficulty: easy): Consider the OpAmp
shown on transparency vlsi-27/9, where Q3 qnd Q4
are each changed to widths of 120µm and we want
the output stage have a bias current of 150µA. Find
the new sizes of Q6 qnd Q7 such that there is no
systematic offset voltage.
Result: W6 = 450µm, W7 = 360µm(see
Johns/Martin pp231)
JMM v1.0
Exercises VLSI-27 (con’t 4)
Ex ana5.7 (difficulty: medium): OpAmp has an open-
loop transfer function given by:
A0 (1 + s / ω z )
A(s ) =
(1 + s / ω p1 )(1 + s / ω 2 )
Assume that ω2=2π 50MHz and A0=104
a) Assuming ωz=inf, find ωp1 and the unity-gain
frequency ωt‘ so that the OpAmp has a unity-gain
phase margin of 55°
b) Assuming ωz=1.2 ωt‘ (use ωt‘ from a), what is
the unity-gain frequency ωt. Also find the new
phase margin.
Result: a) ωt‘=2π 35MHz, ωp1=2π 4.27kHz, b)
ωt=2π 46.6MHz, PM= -85° (see Johns/Martin
pp245)
JMM v1.0
Coming Up...
u Next topic…
Advanced Current Mirrors and OpAmps
u Readings
for next time…
Johns&Martin: Sections 3.8 and 5
u Exercises:
Have a look at the exercises in Johns&Martin.
JMM v1.0
Analog Microelectronics
Advanced Current Mirrors and OpAmp Design
Today’s handouts:
(1) Lecture Slides
MicroLab, vlsi28 (1/12)
JMM v1.0
Outline
u Johns&Martin
u advanced current mirrors (chap 6.1)
u wide-swing current mirrors
u wide-swing constant-transconductance bias circuit
u enhanced output-impedance current mirrors (not yet)
u wide-swing current mirror with enhanced output
impedance (not yet)
u folded-cascode OpAmp (chap 6.2)
u small signal analysis
u slew rate
JMM v1.0
Advanced current mirrors
wide-swing current mirrors
u The classical two-stage OpAmp was dicussed in
vlsi27.
u Recently a number of alternate OpAmps designs
have been gaining in popularity. They make use of
more advanced current mirrors.
JMM v1.0
Wide-swing current mirrors
JMM v1.0
Wide-swing constant-
transconductance bias circuit
Q 18
10/1
10/1
10/1.6 10/1.6
Q 15
10/1.6
Q4 Q 13
Q1 Q 16
JMM v1.0
Enhanced output-impedance
current mirror
u Another variation of the cascode current mirror is
the enhanced output-impedance current mirror
shown as simplified version
u basic idea: use of feedback amplifier to keep the
drain-source voltage across Q2 stable, irrespetive
of the output voltage
ð the additional amplifier increases the output
impedance (see classical cascode current mirror,
vlsi-25 slides 16, 17)
Rout ≅ g m1rds1rds2 (1 + A)
Iout Rout
Iin Vbias -
A Q1
+
Q3 Q2
JMM v1.0
Folded-cascode OpAmp
JMM v1.0
Folded-cascode OpAmp con’t
may be replaced by a wide-swing
constant-transconductance bias network
and thus VB1, VB2 would be Vcasc-n, Vcasc-p
current mirror
Q3 Q4
Q 11 folded cascode fets
(see vlsi-25 slide 19)
VB1
Q 13
Q 12
Q5 Q6
Ibias1
Q1 Q2 Vout
Vin -
+ CL
Ibias2
differential-input Q8
VB2
single-ended output Q7
compensation
Q9 Q 10
Purpose of Q12, Q13
- increase slew-rate performance
- recovering improvement from slew-rate wide-swing cascode current mirror
Design hints:
- Ibias1 and Ibias2 should be derived from a single bias network
- any current mirrors should be designed by parallel combination of unit size fets
JMM v1.0
Folded-cascode OpAmp
small-signal analysis
Assumption: gm5 and gm6 are much larger than gds3 and gds4
- differential output current from drains of differential pair Q1 and Q2 is
applied to the load capacitance
- the small-signal current from Q1 passes directly from source
to drain of Q6 and thus to CL (indirect for Q2 to Q5 and CL)
Vout (s )
Av = = g m1 Z L ( s ) (for gm1 = gm2)
Vin (s )
g m1rout g m rds2
Av (s ) = rout ≅
1 + srout C L 2
(see vlsi-25 slide 20)
JMM v1.0
Folded-cascode OpAmp
slew-rate
u The diode connected fets Q12 and Q13 are turned off
during normal operation and have almost no effect
u slew-rate limiting behavior:
u assume there is a large differential input voltage that
causes Q 1 to be turned on hard and Q 2 to be turned off
u since Q 2 is off, all of the bias current of Q 4 will be
directed to through cascode fet Q5 through n-channel
current mirror and out of the load capacitance
u the output voltage will decrease linearly with a slew-
rate given by:
Id4
SR ≅
CL
u Q1 and current source Ibias will go into triode region,
moving the drain voltage of Q 1 to the negative power
supply
u Q12 and Q 13 clamp the drain voltages so they don’t change
as much during slew-rate limitation
u in addition Q 12 and Q 13 increase the bias currents for Q 3
and Q 4 and thus for C L
JMM v1.0
Exercises VLSI-28
Ex ana6.2 (difficulty: medium): find reasonable fet sizes
for the folded-cascode OpAmp: Assume pos/neg 2.5V
power supply, power dissipation maximal 2mW,
current ratio 4:1 between input and cascode fets, bias
current or Q11 is 1/30 of Q3 (thus ignoring it for
power dissipation), maximum fet width is 300um,
L=1.6um and Veff=0.25V for all except input fets,
W1=W2=300um, rounding widths to 10um,
CL=10pF, unCox= 3u pCox= 96uA/V2
a) find all fet sizes, unitiy gain frequency,
b) slew-rate with and without clamp fets
c) reasonable lead compensation RC
Result: a) Q1 to Q4=300um, Q5, Q6=60um, Q7 to
Q10=20um, Q11 to Q12=10um, ωt=2π 38MHz
b) SR= 32V/us,
c) RC=347Ω (see Johns/Martin pp271-273)
JMM v1.0
Coming Up...
u Nexttopic…
Comparators
u Readings
for next time…
Johns&Martin: Sections 6.1 and 6.2
u Exercises:
Have a look at the exercises in Johns&Martin.
JMM v1.0
VLSI Systems Design
FSM-D Architecture Model
FSM-D
data data
data-path
(RTL logic)
inputs outputs
control
(sensors) (actuators)
cotrol path
(finite state machine)
control control
Goal: You are able to use logic gates and flip-flops wisely
and not only in an ad-hoc manner. You master the finite
state machine data path model.
JMM v1.4
Architecture Philosophy
?FSM Chatacteristics
?manager
?controlling, taking decision, initiating sub-tasks
?Data-Path Characteristics
?worker, specialist
?executing, calculating, storing & moving data
JMM v1.4
FSM-D Architecture Model
FSM-D
data data
data-path
(RTL logic)
inputs outputs
control
(sensors) (actuators)
cotrol path
(finite state machine)
control control
JMM v1.4
FSM Structures
? Mealy machine
s[k+1]
i[k]
o[k]
transition state output
?outputs are dependent of logic register logic
? Moore machine
s[k+1]
i[k]
o[k]
transition state output
on states only
(functional restricted) s[k]
on states only
?outputs are hazard-free s[k]
JMM v1.4
Data-Path Elements
bus[31:0] bus[31:0]
32 32 32 mux
bus[31:16] 32 32
16 32
32
2
a 1
32 cout enable
ADD
result register
32 32
b cin 32
32 1
JMM v1.4
Data-Path Memory Element
clock
enable
di data
do data
JMM v1.4
Design Steps
?design steps:
?step 1: definitions of the algorithm
?step 2: FSM-D interface definition
?step 3: data-path design
?step 4: data-path interface definition
?step 5: FSM interface definition
?step 6: FSM state definition
?step 7: FSM design
?step 8: VHDL coding
?step 9: test-bench design and simulation
JMM v1.4
Design Step 1:
Algorithm Definition
?goal of the Black Jack game:
?get as close as possible to 21 points
?lost if overpassed 21 points
?game restrictions:
?the cards have the following values:
2, 3, 4, 5, 6, 7, 8, 9, 10 and 11
as well as boy, lady and queen all
three representing 10 points
?game rules:
?ask for as many cards as needed
?the Ass can be treated as 11 points or as 1 point
?our players behavior:
?ask for cards as long as the summed-up points are below
16
?treat Ace alyways as 11 points
?when overpassed 21 points treat possible Ace as 1 point
to get a second chance
JMM v1.4
Design Step 2
FSM-D Interface Definition
?defining the interface of the overal FSM-D
architecture model
?defining edge sensitivity of clock and active level of
control signals
FSM-D
cardReady newCard
score(4:0)
BlackJack Player
cardValue(3:0)
lost
clk
finished
start
JMM v1.4
Design Step 3:
Data-Path Definition
?data-path has to be able to execute all functional
operations of the algorithm
?clearly separate control-path and data-path tasks as
in the manager/worker analogical model
?use memory elements, buses and multiplexers for
storing and moving data
?use combinational logic for functional operations
like adding, comparing, etc
JMM v1.4
Design Step 3:
Data-Path Definition:
loading&comparing
cmp11
A=B?
A B
enaLoad register 11
enable
cardValue(3:0) di do
regLoad
clk rst
JMM v1.4
Design Step 3:
Data-Path Definition:
accumulating
cmp11
A=B?
A B
enaLoad register 11
enable
cardValue(3:0) register
di do a enaAdd enable
regLoad regAdd
ADD
result di do
clk rst b
clk rst
JMM v1.4
Design Step 3:
Data-Path Definition:
comparing sum
cmp11
JMM v1.4
Design Step 3:
Data-Path Definition:
subtracting 10
cmp11
JMM v1.4
Design Step 4
Data-Path Interface Definition
?defining the interface of the data-path block
?defining edge sensitivity of clock and active level of
control signals
DataPath score(4:0)
cardValue(3:0)
clk
rst
sel
enaLoad
cmp11
cmp16
cmp21
enaAdd
enaScore
JMM v1.4
Design Step 5
FSM Interface Definition
?defining the inputs and outputs of the FSM block
JMM v1.4
Design Step 5:
Interface Definition
Completed FSM-D Hierarchy
BlackJack Player
FSM-D
DataPath
cardValue(3:0) score(4:0)
rst
enaScore
enaLoad
enaAdd
cmp11
cmp16
cmp21
sel
clk
lost
finished
rst
start
JMM v1.4
Design Step 6
FSM State Definition
?draw a skeleton state with placeholders for the
state name and the output signals.
state
name
enaScore
newCard
enaLoad
finished
enaAdd
lost
sel
output
signals
JMM v1.4
Design Step 7
FSM Design – FSMD Timing
?single clock cycle schema
?Moore type FSM
?FSM-D timing diagram
?registered values are available in next state or
when leaving next state
?combinational values are available in current
state or when leaving current state
clock
enable (FSM)
inform (D)
select (FSM)
JMM v1.4
Design Step 7
FSM Design
?design the Moore type state diagram
?conditions on arrows are FSM inputs
?output values are defined in states
?use bilzard arrow for asynchronous reset
reset
cardReady
enaScore
newCard
newCard
enaLoad
enaLoad
enaAdd
enaAdd
broke
broke
hold
hold
sel
sel
output output
signals 0 0 1 - - 0 0 signals 0 0 1 1 1 0 0
cmp11cmp16
cardReady
cmp11cmp16cmp21
state AddCard state Handshake
name cardReady name
enaScore
enaScore
newCard
newCard
enaLoad
enaLoad
enaAdd
enaAdd
broke
broke
hold
hold
sel
sel
output output
signals 0 0 0 - 0 1 0 signals 0 0 0 - 0 1 0
cmp11cmp21 cmp16
cmp16cmp21
JMM v1.4
Design Step 8:
Coding – Data-Path
?all registers with associated logic are placed in one
process (same clock and asynchronous reset)
?loosely coupled combinatorial logic can be coded
with conditional signal assignments
cmp11
process(clk,rst)
begin
if (rst = ‘0‘) then process
regLoad <=“00000“;
regAdd <=“00000“;
regScore <=“00000“;
elsif (clk‘event and clk=‘0‘) then continuous conditional
if (enaAdd=‘1‘) then
regAdd <= regAdd +regLoad;
assignment
end if;
... cmp11 <= ‘1‘ when (regLoad =“01011“), else ‘0‘;
end if; cmp16 <= ‘1‘ when (regAdd > “10000“) else ‘0‘;
end process; cmp21 <= ‘1‘ when (regAdd > “10101“) else ‘0‘;
MicroLab, VLSI-30 (21/27)
JMM v1.4
Design Step 8:
Coding – FSM
?one clocked process is used for the state transition
?one combinatorial process is used for the state
dependent output assignment
state
s[k+1]
i[k]
o[k]
transition state output
logic register logic
s[k]
process(clk,rst)
begin process(state)
if (rst = ‘0‘) then begin
state<=StartState; case state is
elsif (clk‘event and clk=‘0‘) then when StartState =>
case state is outvec <= “000--00“;
when StartState => when CallCard =>
state <= CallCard; outvec <= “001--00“;
when CallCard => when others =>
if (cardReady = ‘1‘) then outvec <= “UUUUUUU“;
state <= LoadCard; -- used for VHDL analysis
end if; -- „null“ for synthesis
when others => end case;
state <= IllegalState; end process;
-- used for VHDL analysis
-- „null“for synthsis
finished <= outvec(6);
end case;
lost <= outvec(5);
end if;
newCard <= outvec(4);
end process;
...
JMM v1.4
Design Step 9:
Test-Bench Design
?compare a test bench with MicroLab-I3S:
?there are chips and PCBs needed to be tested
?there is a nice measurement equipment
?there are skilled and hard working people
?there are no signals coming or going to the outside of
the lab
Test Bench
control response
and generation
stimulus and
generation verification
JMM v1.4
Design Step 9:
Test-Bench Design – Test Cycle
?cycle based test
?apply input patterns at begining of test cycle
?capture response after rising or falling clock edge
apply capture
stimuli response
test cycle
clock
inputs
(sync)
JMM v1.4
Design Step 9:
Test-Bench Design – Simulation
?cycle based test
?apply input patterns at begining of test cycle
?observe response after rising or falling clock edge
?visualize data-path registers and FSM state
JMM v1.4
Errors and Pitfalls
s[k] FSM
JMM v1.4
Summary and Conclusion
JMM v1.4
MicroLab, VLSI-30 (28/27)
JMM v1.4