0% found this document useful (0 votes)
51 views91 pages

EE6306 Slides (W9-13)

Uploaded by

邱梁栋
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
51 views91 pages

EE6306 Slides (W9-13)

Uploaded by

邱梁栋
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 91
EE6306 Digital Integrated Circuit Design @ Topics: a Sub-System Design in Digital Circuits a Design Methodology Lecturer: lvan C C Jong Office: S1-B1c-76 Tel: 6790 4788 email: [email protected] tr, Sub-System Design in Digital Circuits @ Topics: a Datapaths in Digital Systems a Designing Arithmetic Building Blocks > Shifters » Adders » Multipliers a Power Considerations in Datapath Structures : @ Text Book: Digital Integrated Circuits Jan M Rabaey Prentice Hall @ References: a Introduction to VLSI Systems: A Logic, Circuit, and System Perspective Ming-Bo Lin CRC Press a Modern VLSI Design: System-on-Chip Design Wayne Wolf Prentice Hall Books = Generic Digital System @ A Generic Digital System consists of: 2 Datapath a Control Unit a Memory a Input/Output Modules Datapath | *|Control Unit} Input * Datapath @ Datapath is the core of the system @ All computations are performed in the datapath @ A typical datapath consists of an inter- connection of basic combinational functions such as logic gates (NAND, XOR, etc.) and arithmetic modules (adder, multiplier, etc.) @ Results from the datapath are stored in Memory @ Control Unit determines the sequence of executions in the data path @ Intermediate results are stored in Registers + Control Unit @ A Control Unit determines what actions happen in the datapath at any given point in time @ It can be viewed as a Finite State Machine (FSM) @ It consists registers (flip flops) and logic and hence it is a sequential circuit i Memory @ The memory serves as the centralized data storage area @ Memory in digital systems can be registers, ROM, RAM, etc. @ Registers are used for storing individual data can be accessed all at the same time @ ROM and RAM are for storing a collection of data, which can be accessed only one or two data at a time + @ Input and output circuitry connects the internal circuit to the outside world @ Input circuitry may include = Input/Output protection circuit a kD re @ The 2 diodes clamps the PAD} —Wv 4x 6 input voltage to be 4 2 J between Vpp+Vp and -Vp, TL where Vp is the forward L Teh voltage drop of the diode, usually 0.4 - 07V. Interconnect @ Interconnect network joins the modules to one another @ Interconnect can be in different styles such as buses or mesh interconnect (point-to-point) @ Buses allow many modules to be connected together but only module can write the bus @ Tri-state buffers are required at the outputs of all the modules connected to a bus = Tri-state buffer —C——=- @ Inverting Tri-state Buffer @ En=1 enables buffer (Out = NOT In) = Arithmetic Modules @ Datapath in digital systems consists of mainly arithmetic modules @ Common arithmetic modules include Adders (Subtractor), Multipliers, Shifters sl Shifters @ Combinatorial Shifters are very useful for arithmetic operations as shifting is equivalent to multiplication by powers of two @ Shift registers is simple but it shifts only 1 bit each clock cycle and not suitable for shifting several bits in one cycle @ Simple shift with fixed direction and number of bits can be implemented easily by hardware rewiring @ Shifters: Hard-Wired Shifter, Programmable Shifter, Barrel Shifter, Logarithmic Shifter Logical Shift vs Arithmetic Shift @ Logical Shift mel 0 nl o <- <—_ k—0 0-9 — > t t (4) Logical tet shit (U) Logical riglat tit @ Arithmetic Shift <~ — k« — +> 4 A (©) Arithmetic Lett shutt (@) Anthmetie nght shutt 183 @eee = Hard-Wired Shifter — — 7 @ For shift with fixed number of bits a7 @ The shifter is inflexible and a only for the output of one as module to the input of a, a, another module a a 3 6 © Eg. 878 6@5€4ag8281ap >> 3 ay as = 000a7a,a,a,a3 ay ay ao a3 i @ 1-bit Left-Right shifter can be used ~- Bight Nop Lett __, for shifting of 1 bit either left or right or, | | > unchanged ‘ L ‘ @ Multi-bit shifts can : be done by ! ] cascading number 4. ae 1 -t , Programmable Shifter of 1-bit shifters, but ct Tu t could be large and or ' slow eeco5e . = Barrel Shifter — © Built from array of transistors @ Number of rows = word length @ Number of columns = maximum shift width @ Advantage: Signal passes 1 transmission gate @ So delay is constant for any shift @ Decoder usually needed for control signals (n signals) ‘ > * >» — Data wire De Control wire dis >: Barrel Shifter Layout —ay sal Logarithmic Shifter @ Total shift value is decomposed into shifts over powers of 2: i.e. 1, 2, 4, ... @ Amaximum shift width of M consists of log,M stages @ e.g. shift 3: Sh, & Sh, in shift mode, Sh, in pass mode 1 0 —— Data wire -- Control wire max shift = 7 (to right) . (4 LSB shown) Advantages of Logarithmic Shifter ay ¢ Advantages are: a Control signals are in encoded form, hence, less control signals a No decoder is needed a More effective than barrel shifter for large shift values a Smaller in area and faster in speed than barrel shifter bl Addition @ Addition is most commonly used arithmetic operation @ It is often the speed-limiting element too @ Optimization of adder is very important @ Optimization can be at logic or circuit level @ Logic level optimization tries to rearrange Boolean equations so that a faster and/or smaller circuit is achieved @ Circuit optimization manipulates transistor sizes & circuit topology to optimize the speed 22 eeee = Full Adder @ A Full Adder (FA) has 2 adder inputs (a, b), 1 carry input (c;), a sum output (Ss) and a carry output (c,) @ Boolean equations are: |@ > oe ; ; initial carry in S, S, S,, 2 eocne Delay in Ripple-Carry Adder @ For an n-bit Ripple-Carry Adder, the worst case delay occurs when a carry generated at LSB propagates all the way to MSB @ Delay tagder = (M - 1) X teary * tsum ® tary = propagation delay from ¢; to c, ® tsum = propagation delay from c; tos @ for 8-bit addition, one of the worst case delay occurs ont when a= 01111111 + 0000001 and b= 00000001 1000000 25 Inverting Property of Adder Cell @ From the delay equation, it is known that the critical path is on the carry chain @ It is essential to minimize the carry delay for each full adder cell @ Inlereslingly, adder cell has an Inverling property: _ --- ab ais ¢ 4 s(a, b, G) = s(a, b, ¢) 0 0 0j0 0 u G,(a, b, c)) = ¢,(a, B, G) go 4/1 8 74 sh 98 sph $343 4 me ee 123184 so SO de SP 141 iit4 2 eoec5e : Complementary Static CMOS Full Adder Cell ay @c,=ab+bc+ac es=a@®b@c=abce,+c, (at+b+c) 28 Transistors 27 Complementary Static CMOS Full Adder Cell - Drawbacksumsy. @ The Complementary Static CMOS full adder cell has 28 transistors @ It is large and slow @ PMOS stacks are in carry and sum part @ Intrinsic capacitance of c, is large: 6 gate + 2 diffusion + interconnect capacitance e Carry propagates through 2 inverting stages e@ Sum output requires extra logic (although not critical) 28 eeee = Complementary Static CMOS Full Adder Cell - Improvementayy. @ Remove an inverter of the carry chain @ eg. 4-bit adder Boy BA, BLA, ByAs Hh oo & co oe s. S; ® oo oo 2 29 Transmission—Gate-Based Adder e@ P=A6B © S=P@C,=PC+PC, e Co =AB+PC, = PA+PG, A Von a TH “Ur oe Tv ef “Ty. ue pl cs Manchester Carry Gates @ Carry gate can be further simplified @ E.g. Manchester carry gate: a D=A-B (carry delete) a P=A®B (carry propagate) Van a G=A-B (carry generate) t dod ; T Yop gel 1 c, a GA static + “dynamic To P, oe yo an = @ Uses pass transistors for carry chain @ All pass transistors are precharged to Vpp(¢ =0) Yoo 4-bit Manchester Carry Chain ° P=AB z) a - ° G=AB - . - oCazaercg fy Y @eeee “ cs @ Carry-Bypass or Carry-Skip @ IfBP=1,C,3=Cio ee ae aS cn va (8) aw LSet Con Cos Carry-Bypass Adder (b) Adding a bypass Manchester Carry-Bypass Adder @ Either carry bypass or generated in the chain @ In either case, delay is smaller © BP =P,P,P,P; f if f£ ft i te LJ ~ En py ey BP Delay in Carry-Bypass Adder @ N-bit adder divided in N/M stages @ ty = teetupt Mtcamy*(N/M-1 )tsypass*(M-1 )tearryttsum 2 teotyp = fixed delay to get P and G signal © teony = Carry delay in 1 bit, worse is M bit in 1 stage © thypass = delay through MUX in 1 stage 2 tuum = delay of sum in final stage ‘otup whos mes wen munis Seam, Cow), Cx Sn = J a Phone Bl pret) a qe sea] i Sm] * in| = Carry-Select Adder @ Both 0 and 1 carry bits are generated @ Once carry from previous stage is known, the carry from this stage is selected with MUX Multiplexer JL Carry vector = Linear Carry-Select Adder @ N-bit Adder built with M-bit of Carry-Select @ taga = teetup + Mtcamy + (N/M)tmux + Bit 0-3 Bit47 Bit Bits tsum setup Setup Setup wt wv tt oO o> Carry O>| = 0-Carry O>) Carry vr 1 ifn] +f ret] [eed |Sum generation |Sum generation [Sum generation Sos Sia Seu Sis ar sr N-bit Square-Root Carry-Select Adder @ Initial stage has less bits @ Increase 1 bit for per stage (No. Stage P =/2N ) ® taag = tsetup + Mtcarry * V2N trnux + tsum Carry-Look-Ahead Adder (CLA) @ The delay in adder is entirely due to the carry @ To speed up the addition, reduce carry delay 2 Cie = Gi + Pic (=). @ Cy =Go + Polo sor} Lf @ Cy =Q, + Pic; | D-p-:. = 91 + Pi(Go + PoC) D-—— Fa = G1 * Pio + P1PoCo 2 C3 = Qo + Poly = Qo + Po(G1 + P19o + P1PoCo) = Qo + P2941 + P2P19o + P2P1Po%o © Cat = Ok * PKC + Pasl--. + P1(Go * PoSo))) 39 Carry-Look-Ahead Generator @ Cy = Qo + Pofo 2 Cy = 91 + P19 + P1PoCo 2 C3 = Qo + P21 + P2P19o + P2P1Po%o 2 Cy = 93 + P3G2 + P3P291 + PsP2P19o + PsP2P1PoCo FUG OU OOS fe Carry-Look-Ahead Conceptual Diagram @ The delay due to the carry is eliminated with the cost of extra logic Ay Bo Ay By “ Axor Brot 30 3 we Sy = 4-bit Look-ahead Carry Generator @ C4= G3tP3G,+P3P2G,+P3P2P,Go+P3P>PyPoCo 2 Dynamic Implementation of P and G @ Precharge when Clk = 0 @ Keeper to compensate for the charge lost due to the pull-down leakage paths Yoo Keeper Yoo a Praath, Clko-4 Gi = ab; 404 bro ao ck | net = Chko = Binary Multiplication @ Consider 2 unsigned binary M-bit X and N-bit Y wel yl X=> Xx! y=>y,2/ io - MaN=1 Z=XxY= 2,2! = Zz -(Exal¥ r2'| s i=0 =0 xa") = @ e.g.42x 11 = 462 Binary Multiplication Example 101010 Multiplicand x 101 Multiplier 101010 101010 Partial 00000 products + 1 1010 1 10011410 Result 7 Dot Diagram i @ 4 bit multiplication YY =¥ (multiplicand) x, =X (multiplier) dot diagram partial products 0 Product = Partial Products @ Partial products (pp) can be generated by logical AND PP... PP na PP, PPo a Bit-Serial Multiplier °@ Q=MxA Multiplicand m-bit'adder m+ = Multiplier partial product = cs Array Multiplier @ Partial products are added to produce the result @ Itrequires N-1 M-bit adders © eg. 4 bits: (first two rows may be combined) 49 : Delay in Array Multiplier _— © Critical paths have the same length @ Delay is long: © that = [(M-1) + (N-2)]teany + (N-1)tsum * tano 444 ie ie io = © Carry is fed to the FA asi of the next level ; 4 ty © Delay reduced: © truit = tano + (N-1)teany + tmese ff / it] / i] /iL/ | Carry-Save Multiplier Cita path Carry-Save Multiplier Floorplan @ Rectangle floorplan of carry-save multiplier My MM [ita matte Vector-merging cell Xand Ys = Wallace and Dadda CSA Trees @ 4x4 bit Multiplier Dot Diagra Wallace Tree Stage 1 3KAs +1 HA Stage 2 2FAs +2 HAs 4-bit CPA 8x8 Wallace and Dadda CSA Trees Wallace Tree Dadda Tree |» How many HA and FA ' | used in each case? = Wallace vs. Dadda CSA Trees © Comparisons of Wallace tree and Dadda tree: Wallace Tree | Dadda Tree Strategy for As soon as As late as adding PP possible possible CSA Tree More complex Simpler CPA Shorter Longer : Signed Multiplication = @ Multiplication of 2’s complement numbers: ° MSB a : negative number l-a S-a=a-1) (Dy »% Yo x CD V3Xq ¥% Yoro V3X) VX Vor Yor P, Ps Ps P, P, Py P, Py @eeee = Y (multiplicand) =X (multiplier) Partial product Product Signed Multiplication CDI M2 My HY x CDs Xo 0 Yor Ps Py Py Py PL Py = (multiplicand) =X (multiplier) Partial product -1110 Product 57 7 Signed Multiplication — Y%o Voto y Partial product + } -1110 Py Product Xo Yoo Yor Partial product + P, Po Product Power Considerations in Datapath e In CMOS designs, power consumption is a function of power supply (Vpp), operating frequency (f), total load capacitance (C,) and switching activity (a) ° P=fx CX Vpp? X Za @ It can be seen from the equation that reducing Vpp leads to quadratic power savings @ However, reducing Vpp also leads to delay increasing and loss of performance Design Considerations for Low Power @ To compensate the reduction of speed, design techniques can be used @ Parallel functional blocks operate at lower frequency and lower Vpp can be used to process data in parallel to compensate loss of performance @ Pipeline structure of functional blocks can also be used to save power as lower Vpp is needed to charge smaller capacitance for the same speed 61 = Other Low Power Techniques e Other techniques used include: @ Multiple supply voltages: modules operating at different speed can be operated with different power supplies. Slow modules can be operated at lower Vpp while fast modules operated in higher Vpp @ Dynamic Voltage (V) and Frequency (F) Scaling (DVFS): multiple F/V pairs allowed @ Power-down mode: put the modules that are not needed for processing in standby mode 62 eeee = Clock Gating @ Clock Gating is used for power reduction module Isola ea 0 Enable clk when Ei. = 0 a— : a Ja PQ (@) Bad Circuit 5, [\ -— tock ated mode »~ TLA — Isolation ze | [> : a—J > a er a | LT re a . f1\ Es to JS (b) Good Circuit (c) Waveforms 7 Power Gating _— @ Power Gating is effective for power reduction = seo EE ae Logie blocic f Seep (a) Fine Grain (b) Coarse Grain Sleep = 0 to sleep Sleep = 1 to sleep sal Design Methodology e Topics: a Design Complexity a Design Flow a Analysis and Verification a Implementation Approaches a Design Synthesis a Testing = “s eeee Design Complexity ‘twansistors 10.000.000000 MOORE'S LAW coteretert = 100,000000 Inet Penton’ 4 Precestey, er Peat ae 10,000,000 1,000,000 Intel Microprocessors = Year uP #Transistors Speed | Technology | 1974 4004 2,300 A08KHz 10um 1974 8080 4,500 2MHz 6um 1978 8086 29,000 5MHz 3um 1982 80286 134,000 6MHz 1.5um 1985 80386, 275,000 A6MHz 4.5um 1989 80486 1,200,000 25MHz dum 1993 Pentium 3,100,000 66MHz 0.8um 1995 Pentium Pro. 5,500,000 200MHz 0.6um 1997 Pentium Il 7,500,000 300MHz 0.25um_ 1999 Pentium Ill 9,500,000 500MHz 0.18um 2000 Pentium IV 42,000,000 1.5GHz 0.18um 2002 Pentium M 55,000,000 4.7GHz 90nm 2005 Pentium D 291,000,000 3.2GHz 65nm 2007 Quad-Core 820,000,000 3.0GHz 45nm, 7 eoec5e = Moore’s Law e Gordon Moore: a Fairchild Corporation a Cofounder of Intel e Prediction in 1965: “The number of transistors incorporated in a chip will approximately double every 24 months.” Design Abstraction Levels = + ‘ALUs / MUXs / Registers Gates / Flip-Flops / Cells Transistors / Contacts / Wires 0 _ Systems - Algorithms — Register Fransters _ Transfer Functions , i 4 Physical Domain = Objects In Each Domain ou ey os cies Dar Enns ees peat! ra) PS eee enn So eee ee [eric A rete eu FSM ey core ees Cell Tet Layout __) ” = Design Hierarchy Gate Level ee nm : Design Flow — 3 eel t= Design Entry ie e Text based Hardware Description Languages a VHDL a Verilog e Schematics e State Diagram e Flow Chart } Visual Graph Synthesis, Partitioning & Simulation e Logic Synthesis a Use synthesis tools to translate HDL to a netlist of logic gates and their connections e System Partitioning a Divide a large system into smaller blocks a Several ICs if necessary e Prelayout Simulation a Verify the functionality of the system Floorplanning, Placement & Routing e Floorplanning a Arrange the blocks of the design to optimize the size and the interconnections e Placement a Decide the locations of modules and logic cells e Routing a Make connection between cells and modules Extraction & Postlayout Simulation e Extraction a Convert the layout back to circuits a Determine the resistance and capacitance of the interconnect e Postlayout Simulation a Verify the functionality again, with the added loads of the interconnect a Verify the timing * EDA Tools e 4 Main Categories: Design Entry e Analysis and Verification e Synthesis and Implementation e Testing * Design Entry Tool e Schematic Editor a Cell Library: Contains components to be used a Editing Functions: Place, Move, Delete, Connect, Rotate/Flip, Copy/Paste a Hierarchical Design: Modules contain lower-level schematics a Netlist Description Language: > EDIF: Electronic Design Interchange Format cs Hierarchical Schematic (a) HADD Schematic @ (b) Schematic Symbol @ (c) Higher-level Schematic that makes use of HADD (d) Hierarchy of HADD = Analysis and Verification —+ e Circuit Simulation (Transistor Level Simulation) e Timing Simulation 2 Switch Level Simulation 2 Gate Level Simulation e Static Timing Analysis e Functional Simulation 2 Behavioural Simulation at eeee Circuit Simulation e Circuits are represented at transistor level e Transistor Models are required to describe their nonlinear voltage and current characteristics e Simulation accuracy depends on the quality and complexity of the models e Resulting Voltage and Current Signals are represented as continuous waveforms e Accurate but complex and timing consuming e Impractical for large circuits = Timing Simulation i e Use simpler transistor models Complexity is reduced e Simulation time is decreased e Accuracy is compromised = Switch Level Simulation —= Transistors are represented with switch-level models 2 Nonlinear characteristics are approximated with linear resistance o In Off Mode, resistance is infinity e In On Mode, average on-resistance is used e Resulting network is a time-variant, linear network of resistors and r CMOS Inverte capacitors Switch-Level Model e Less complex and accurate HH Pods = e Also called Logic Simulation e Netlist is a set of logic gates and their interconnects e Models for logic gates are used e@ Gate model includes function, input pin capacitance, output pin capacitance and delay model (for calculation of delays) e Simulation results are waveforms of logic values, with or without delays e Fast as compared to circuit simulation Gate Level Simulation = Logic Strength e In Logic Simulation, Logic has a level and a strength e Logic levels are 0 and 1 e Logic Strength can be strong or weak e Astrong 0 has a logic level 0 with forcing strength o Aweek 0 has a logic level 0 with resistive strength e A logic not at any level has a logic strength of high impedance al IEEE Std 1164-1993 e IEEE Standard Std 1164-1993 defines a 9-value logic system Logic State Logic Value ‘0 Strong Low ‘v Stiong High Lv Weak Low ‘HW’ Weak High x’ Strong Unknown ‘Ww’ Weak Unknown ‘Z High Impedance a Don’t Care ‘VU Uninitialized eoece0e ar = Signal Resolution Table e Signal resolution table defines the logic function operation e Inversion (NOT) resolution table: —= AND Signal Resolution Table = -OR Table on Signal Resolut = bl Delay Models e Delay Models describe delays inside logic cell e Delays include: a Pin-to-pin delay: between an input pin and an output pin. It represents delay without interconnect a Pin delay: delay lumped with an input pin a Net delay (Wire delay): delay of interconnect 1 eeee = Static Timing Analysis e Static Timing Analysis Calculates: e Entry Delay: Paths that start at an input point and end on the data input of a sequential logic cell (e.g. D input of a D flip-flop) e Stage Delay: Paths that start at a clock input of a sequential logic cell and end at the data input of another sequential logic cell e Exit Delay: Paths that start at a sequential logic cell output and end at an output point = Functional Simulation — e Extension of logic simulation e Circuit can contain elements of arbitrary complexity, e.g. AND gate, Registers, Multiplier, RAM e Functionality of each element can be described using HDL (VHDL or Verilog) e The output results are logic signals e Usually zero delay model is used to speed up the simulation 93 eeee = e Difference between functional and behavioural description: a Functional: represents the intended hardware structure, i.e. blocks that connected together a Behavioural: describes only the input-output functionality Timing is ignored and Clock cycle is used e e.g. simulating the instruction set of a uP VHDL and Verilog can be used for behavioural description and simulation e@ Outputs can be logic or numerical values Behavioural Simulation = Design Verification e Simulation results do not guarantee the correctness and functionality of a design e Simulation only tells how the design reacts under a given set of input excitations e Design Verification is used to detect design errors in the circuits e Three Types of Design Verification a Electrical Verification a Timing Verification a Formal Verification 95 eeee = Electrical Verification e Electrical Verification takes a transistor level schematic and checks for some rules: a The number of inversions between two C2MOS gates should be even a In a pseudo-NMOS gate, a well-defined ratio between the PMOS pull-up and NMOS pull-down devices in necessary to guarantee a good noise- margin low (NM_) a To ensure rise and fall times, minimum bounds should be set on the sizes of the driver transistors as a function of the fanout bl Timing Verification e Itis difficult to identify the critical delay path in a complex circuit e Timing verifier traverses an electrical network and calculates the delays of various paths e Smart verifier can detect False Path e e.g. output z = 1, there is no delay a {> : or eeee Formal Verification e Formal verifier tries to prove, in the mathematical sense, two representations of a circuit are equivalent e In formal verification, components are described behaviourally as a function of its input and internal state e Formal verifier compares a derived circuit with its initial specification e The two circuits need not be the same but are equivalent e It reports any discrepancy Implementation Approaches Implementation Approaches Off-the-shelf standard — Semi-Custom Full-Custom parts including possibly Solution (Specialised) uP (PCB & Hybrid) (Ic) (Ic) Gate Programmable Standard Array Logic Device Cell [oT PAL CPLD FPGA PLA PROM ROM 99 @eeee Full-Custom IC = Expert hand crafted Designers design the circuits and layouts High density & silicon utilization High flexibility & performance Long design time & high cost e Adopted only for high volume production or high performance (area, power, speed) circuits e Tools: o Layout Editor o Design Rule Checker o Circuit Extractor = e Layout Editor allows designers to draw layouts e Most editing functions are available e Layouts are stored as electronic data in certain formats such as: a CIF - Caltech Intermediate Format a GDSII - From Cadence e Layouts are sent to manufacturer for IC fabrication Layout Editor —+ 401 eeee = e Layouts must be drawn to satisfy design rules e The design rules specify some constraints for layout dimension e Examples of rules: a Metal strips must be separated by 0.25um a Transistor size must be at least 0.25 x 0.25 um? e Design Rule Checker checks these rules and reports any violations Design Rule Checker Circuit Extractor Circuit Extractor converts layouts back to circuits The transistor network is reconstructed, including the sizes of the devices and interconnects The extracted circuits can used compared with the original circuit schematics to verify the functionality Accurate simulation (post-layout simulation) and analysis can be performed Semi-Custom IC Standard Cell Gate Array bl Standard Cell = e The cells are predesigned, including layout and characterisation (delay, power) e Alibrary of standard cells are used e Cells can logic gates, functional modules e Only functional and logic design are required e Automatic placement & routing e Less silicon area efficient e Shorter design time e Lower development cost Me ompited Cells and Module Genera e Acell compiler takes a gate-level schematic and transistor sizes e It automatically generates layouts e Module Generators automatically create functional modules such as adders, multipliers, registers, memories, etc. e The generators are typically parameterizable e e.g. adders of 8 bits or 16 bits can be generated 106 Gate Array Mask Programmable Gate Arrays (MPGA) Primitive cells or transistors are manufactured by the vendors Logic gates can be configured from the cells or transistors by one or more connection layers (masks) Design time is the same as standard cell because EDA tools are used Fabrication time is shorter than standard cell approach 407 eeee Programmable Logic Devices PLDs consists of AND gates and OR gates There are programmable connections Three main types: a PROM - Programmable Read Only Memory a PLA - Programmable Logic Array a PAL - Pragrammable Array Logic The type depends on whether the AND array and OR array are programmable al PROM e Fixed AND array eo Programmable OR array e AND plane provides all minterms @eeee * PAL ih e Programmable AND ees array o Fixed OR array e Any product term can be generated eo Number of product terms is fixed by the OR array 110 PLA Programmable AND array Programmable OR array Any product term can be generated Number of product terms is NOT fixed by the OR array 1m @eeee Field Programmable Gate Array No mask is customized Fuse-programmable oF programmable RAM-based logic cells °°" '°9"" and interconnects Core consists of logic cells Logic cells contain combinational logic & "i2e"P" - sequential logic (flip-flop) Programmable Interconnects surround cells Programmable I/O cells surround the core 112 Pirslementation Approach Comparison Characteristic! FPGA/ | Gate | Standard Full CPLD Arrays) Cells Custom Design Time Short Short Short Long Fabrication Short Long Long Chip Area |Very Large} Large | Intermediate] Small Cost Very Low | Low |Intermediate| High Versatility | Very Low | Low | Intermediate] High Design Cycle | Very Short} Short | Intermediate] Long 113 eeee = Design Flexibility —,- Design Flexibility MPGAs Fase of Implementation User's Logic 14 os Production Volume ll + [OEM choice Standard off-the-shelf Ice Srondard ool's Full ovstom 1000 10000 100000 Production quantity required 118 @eeee ‘ol Design Synthesis e Design Synthesis can be defined as the transformation between two different design views (abstracts) e Typically it is a transformation from a behavioural specification into a structural description o Three kinds of synthesis are discussed: a Circuit Synthesis a Logic Synthesis a Architectural Synthesis (High Level Synthesis) = Overview of Design Synthesis — ‘Architectural Level Logic Leva Circuit Level (i: 1..16) i 4 ‘Structural View "7 = Circuit Synthesis e Translate a logic description into a network of transistors o Meet the timing constraints e Two stages in the process: a Derivation of transistor netlist from the logic equations a Transistor Sizing to meet performance constraints 118 eeee Derivation of Transistor Netlist e Select circuit style a Complementary Static a Pass-transistors a Dynamic a DCVSL (Differential Cascade Voltage Switch Logic) e Construct the logic network 2 Designers choose the circuit style e@ Computer algorithms have been developed to generate the logic circuit 119 = CMOS Logic Cell Netlist e Use De Morgan Theorem to push inversion bubbles to the inputs e Build the NMOS and PMOS network from series and parallel combinations of transistors — —__ OR = arate AND ssetes z @ sl Transistor Sizing @ Make the size ratio of NMOS and PMOS transistors to have same drive strength (same gain factor B, = Bp) e Sizing rules: a Any string of transistors connected between a power supply and the output with 1X drive should have 1X inverter size a Two parallel transistors with W,/L, and W,/L, are equivalent to one transistor with (W,/L, + W/L,)/1 a Two series transistors with W,/L, and W,/L, are equivalent to one transistor with 1/(L,/W, + L,/W,) 124 = Compute Transistor Size = e e.g. a gate with a ratio 2/1 —_, 614141) =2/1 Ai eS an a an {16 3 = Logic Synthesis e Logic Synthesis generates a gate-level implementation (structure view) of a logic function (logic-level view) e The logic function can be specified in terms of state transition diagrams, schematics, Boolean equations, truth table or HDL descriptions e Synthesis results depend on the implementation architecture: multilevel logic, PLA, FPGA/CPLD = Logic Synthesis Tasks The objective of logic synthesis is to optimize the area, speed, power, or their combination The task has two main stages: Logic minimization. a lechnology- independent phase, where the logic is optimized using a number of Boolean algebraic manipulation techniques Technology mapping: a phase considers the implementation architecture, such as standard cells, PLA, FRGA/CPLD, etc. 124 eeee Combinational Logic Synthesis Logic Equation: [A B CIs C, a S=(A@B OC) 0 0 0/0 0 a C,=AeB+AeC+BeC, 00 1/1 0 a C,=Ae(B+C)+BeG, 0 1 0/1 0 —— 01 1/0 4 =Ae(B+C)*(B+C) 10 0/1 0 5 10 1/0 1 2D . Cell Boundary 110/014 2D AD 11441{114 c Lf Truth Table For Full Add ' >> Tul able For Ful ler R 4 Tc Multi-Level Logic Implementation Gj po *— using OR-AND-NOT gate Sequential Logic Synthesis e Sequential Logic Synthesis includes a State Minimization: Two states are equivalent if the output sequences for any input are the same a State Encoding: Different code assignment for the states can result in different logic implementation a State Machine Decomposition: Dividing a large state Machine into 2 or more small ones. Logic will be simpler and easier to be minimized. Speed can be increased too. a Retiming: The clock speed depends on the delay between any two stages of a sequential circuit. By rearranging the sequential logic elements (flip- flops), the delay can be reduced. 126 @eeee Retiming = BOoogog Boogoog 127 Clock Clock Retiming of Sequential Logic Circui e Moving the AND gate reduces delay on one path but increase delay on another = Architecture Synthesis e Also referred to as Behavioural Synthesis or High-Level Synthesis e It generates a structural view of an architecture design from a behavioural description of the task to be executed e It optimizes one or more performance factors such as area, speed and power e@ Two main tasks: a Operation Scheduling a Data Path Allocation 120 = Behavioural Representation e A behavioural view of a design usually represented by Data Flow Graph (DFG) a »b c in the architecture synthesis process e A data flow graph consists of operation nodes and directed arcs indicating the operation sequence eeg y=atbtec y = e The task is to assign the operation nodes of a DFG to Control Steps (or Clock Cycles) under certain constraints and subject to the precedence constraint 2 The constraints can be: a Hardware Resource a Operation Speed (Number of Clock Cycles) e Ahigh-level synthesiser can: a Optimize the hardware resource under the speed constraint a Optimize the speed under the hardware constraint Operation Scheduling 191 Operation Scheduling Algorithms e There are many algorithms for operation scheduling: a As Soon As Possible (ASAP) a As Late As Possible (ALAP) a List Scheduling a Force Directed Scheduling a Integer Linear Programming Scheduling e Reference: “Synthesis And Optimization of Digital Circuits” Giovanni De Micheli, McGraw-Hill International A DFG Example wit ASAP Scheduling i Time Step 1 Time Step 3 Resource: sae o 4* 1+ Time Step 4 i a 1< 1 @eeee ALAP Scheduling Time Step 1 Time Step 2 Time Step 3 Resource: woe 1+ 1- 1< Time Step 4 List Scheduling Time Step 1 Time Step 2 Time Step 3 Time Step 4 Time Step 5 = Data Path Allocation e The task is to assign the operands (values) to storage elements and the scheduled operations to physical functional modules e The objective is to minimize the number of storage elements and the amount of interconnections e The sub-tasks are: a Register Allocation a Module Allocation a Interconnect Allocation 137 @eeee Register Allocation o Intermediate values have to be stored for the next operation e The tasks of register allocation is to assign the values to registers aiming at minimizing the number of registers and interconnections e Values that do not exist at the same period can share a single register e Well-known register allocation algorithm includes Left-Edge Algorithm and Clique Partitioning Technique Register Allocation Example — et Time Step 1 nnn ve et:tl>2 2:1 > 2 03:12 > 13 4:8 > 4 05:12 > 13 6:13 > 4 e7:t1>2 08:13 > 4 Time Step 2 Time Step 4 139 * Left-Edge Algorithm _ 7 e Lifetime Table a t2 t3 t4 Register Allocation: R1 = {e1, e3, e4} R2 = {e2, e5, e6} R3 = {e7, e8} Register Allocation Implementation e Ascheduled DFG ut Register Allocation — Data Path ‘a e Data path with: ® " etoteceed bas + T,={@3,€5,86} » t{e) = = k + refer} | ma \mny* out, out, @eeee Register Allocation — Data Path 4 e Data path with: int in2 in3 # To={Co,€2,€5} 9. + ={e3} mux # 48}, &y, @c} + Te={e7} outo outt 143 @eeee Module Allocation e Operations are bound to physical functional module during the Module Allocation stage e The number of modules have been determined in the scheduling state e Different bindings will result in different interconnects (including MUX, bus, wires) e Register Allocation and Module Allocation are interdependent 2 They influence the interconnection allocation results = e A Scheduled DFG Module Allocation Example “@ @ @ @ e e\gey 4 Cy Cy eo =4 t erp Ta ty fouty —= Module Allocation — Data Path e Data Path with: _ ind int ind in3 # To={€p,€1,8} g r,=1e. nh { I mux mux + T2={@5, @g, ez} = To={Eq} 4 i ot + addert = {+,, +5} " + adder2 = {+3, +4} jadder1 adder2 outd outt 146 eecce = Scheduling Inputs —- e Asystem with 3 inputs and 1 output e The 3 inputs can be read within 1 clock cycle or 3 clock cycles e The inputs can be scheduled in the same way as the operation nodes input] ingit fs ange iaa Tee a eu —] < apa x a 20 4 opt Pe 5] | eS] 5 S| PI oF 147 = Architectures e Four-Cycle vs One-Cycle Implementation with Pipeline = Interconnect Allocation e The interconnect allocation is highly influenced by the register allocation and module allocation stage as well as the connection styles in the architecture e Connection styles can be: a Point-to-point: registers and functional modules are directly connected with wires and multiplexers a Bus: wire connections are collected to form buses a Mixture of the above two styles 149 Difficulties in Architectural Synthesi. e Alack of understanding of what a behavioural description means at the architecture level for some domains, such as general-purpose piPs e Behavioural synthesis assumes the availability of an established synthesis approach at the register transfer level e Lack of physical layout information at the behavioural level, specially on interconnections and power consumption = _ o Why test? a Acorrect design does not guarantee that the manufactured circuit will be operational Testing a Manufacturing defects can occur during fabrication > impurities in the silicon crystal > misalignment » etching accuracy a Faults may be introduced during the stress tests a The later a fault is detected, the higher the cost a It is always cheaper to find a fault in a component than to find it in a system 151 eeee Defect Level e Product quality is measured by Defect Level e Defect Level is the number of defective parts in one million parts e eg. If there are 10 defective component in a production of 100,000 components, the defect level is 0.1% or 100ppm. bl Design For Testability e During the design phase, the designer has unlimited access to all the nodes in the circuit 2 Observation can be done at any desired node @ This is not the case once the circuit is fabricaled e@ Acomplex circuit such as a uP contains millions of transistors and uncountable states e Itis impossible to go into a particular node and observe the circuit response e Design For Testability is very important Test Categories e Diagnostic test: to identify and locate the offending fault e Functional test (also called go/no go test): to determine whether or not a manufactured component is functional. This is simpler than the diagnostic test since the only answer expected is YES or NO e Parametric test: to check on a number of nondiscrete parameters, such as noise margins, propagation delays, maximum clock frequencies, under a variety of conditions (e.g. temperature, supply voltage) Testing Issues Reduce the test time -> increase the throughput of the tester -> reduce test cost Consider testing early in the design phase will simplify the testing process Exhauslive Tesling Impossible. e.g. N inputs. K inputs N inputs K inputs No. of Test patterns: (BEERS QN QN+M Testing Approach Premises Exhaustive testing contains substantial amount of redundancy, i.e. a single fault is covered by a number of test patterns Number of test patterns can be reduced by relaxing the condition that all faults must be detected. e.g. detect the last 1% of possible faults may require much more patterns and hence high cost. The replacement cost may be lower Typical test only attempts a 95-99% coverage 156 Controllability Controllability measures the ease of bringing a circuit node to a given condition using only the input pins. A node is easily controllable if it can be brought to any condition by only a single input vector. A node with low controllability needs a long sequence of vectors to be brought to a desired state. High controllability is desirable 187 eeee Observability Observability measures the ease of observing the value of a node at the output pins A node with high observability can be monitored directly on the output pins A node with low observability needs a number of cycles before its state appears on the outputs A testable circuit should have a high observability = Scan-Based Test e To avoid the sequential-test problem, turn all registers into externally loadable & readable e Tocontrol a node, a vector is load into the registers and propagated through the logic. o The result of the excitation propagates to the registers and transferred to the output Scanin ScanOut, Register Register Scan-based Test Procedure e An excitation vector for logic module A (and/or) B is entered through pin Scanin and shifted into the registers under the control of a test clock. e The excitation is applied to the logic and propagates to the output of the logic module. The result is latched into the registers by issuing a single system-clock event. e The result is shifted out through ScanOut and compared with the expected data. Register with Serial-Scan Chain e 4-bit register extended with a scan chain: Out, Out, Out, Out, 161 eeee = e A standard test method started by Joint Test Action Group (JTAG) in 1986 e IEEE standard (1149.1b) in 1994 o BST was used to test board level design e BST is also used for ASIC test now e The basic idea of BST is to connect the input and output pins into a serial scan chain e During test mode, vectors can be scanned in and out, providing controllability and observability at the boundary Boundary-Scan Test * Boundary Scan Chain = s [O-O-O-O-o-o d Application logic Oooo ooooog eaeess ae ee 100] tq ScanOut Application logic (O-O-O-O-O-0 Senne fetes 4 * BST Operation e To use BST, a BST cell is added to each I/O e The cells are joined together to form a boundary-scan shift register e BST uses 4 wires at the interface e TDI: Test Data Input to the shift register e TDO: Test Data Output of the shift register e TCK: Test Clock e TMS: Test Mode Select for operation control = BST Basis Functions e Parallel capture of test result vectors into the Boundary Scan cells (capture) e Serial shifting in of test vectors and simultaneous shifting out of captured test results (shift) @ Parallel connection of loaded test vectors to the circuit node that has to be tested (update) e Test/ stimulation of the internal connections of an integrated circuit (internal test) e Test / stimulation of pin connections on a board- or system-level circuit (external test) € BST Cell ° _—_- e A Data Register Cell for BST: 4 modues a Normal: data_out = data_in a Scan: scan_out = scan_in a Capture: q1 = data_in; data_out = data_in or q2 a Update: q2 = q1; data_out = q2 (tain oo) (sano) icockoR 166 sd Built-In Self-Test (BIST) e The idea of BIST is to have additional circuits to test the design e The additional circuits include a sequence generator and a response analyzer 467 @eeee * Fault Models e Manufacturing faults can be various types a short-circuits between signals a short-circuits to the supply rails a floating nodes Fault models relate faults to circuit model Most popular fault model is the Stuck-At Model o In the Stuck-At Model, only short circuits to the supplies are considered These are called the Stuck-At-Zero (SAO) and Stuck-At-One (SA‘1) faults IDDQ Test ye Measurement of the supply current provides a quick way of finding the bad chip A good chip should not have any short between VDD and GND A supply current of more than a few mA indicates a bad chip IDDQ test is based on the measurement of supply current to find the bad chips = Fault Simulation A fault simulator measures the quality of a test program It determines the fault coverage In the simulation, a fault is inserted If the simulation shows that the outputs of the faulty circuit are different from the correct one, a fault is detected. Otherwise, a fault is not detected. 170 @eeee = Test Pattern e Atest pattern, or test vector, is a value at the inputs to detect one or more faults in a circuit e To cover more faults, more test patterns are required e Exhaustive test, i.e. applying all the combinations of the inputs, is costly, impractical and not necessary. a eeee = Test Concept e E.g. Test an AND gate a by e To test whether the output y has a fault SAO, apply a test pattern ab = 11 at the inputs. If the output gives 1, y is not Stuck-at-0. e If the output gives 1, there is a fault at y, Stuck-at-0. e Note that ab = 11 dose not test y Stuck-at-1. = Exhaustive Test _ e E.g. Test a NOR gate @ | Fenine | Suny | Detectable faults a oo ot 0 Jail, A1,y0 $ =p: o 1} 0 1 | B0,y/ 8 rol oo 1 law, yt a) 1 fw @ It is not necessary to apply all the four test patterns. e Apply test patterns, ab = 00, 01 and 10 would be sufficient to test both the SAO and SA1 at all the nodes. 473 eeee Automatic Test Pattern Generation e ATPG program determines a minimum set of excitation vectors that cover a sufficient portion of the fault e Consider a SAO at U e To detect this, AB must be 11, X must be 1 too and E must be 0 14 i D Algorithm For ATPG e Avalue called D is used to indicate whether a circuit is good (D=1) or bad (D=0) @ This is written as D=1/0 or in general g/b 2 g/b is a composite logic value e Dmeans 0 is good and 1 is bad. vo-0 > > goodvad | 9 07, 175 eeee D Values of Logic Gates e AND: 1 to propagate A 2 OR :Otoprorogate A o NAND : 1 to propagate A to A oe NOR :0to propagate A to A _ 1 orm > nOT(A) 176 @eeee = ATPG Example e Choose a fault SA1 » Work backward c a — B- _ 8 t } us . | us ; az - ‘_» ot (p D Activate fault > justly O\ e NAND gates to 1 e Work backward co ‘enabing ae ve - A justi 1 8 \ = propagate fault test vector a7 @eeee

You might also like