0% found this document useful (0 votes)
146 views

DFT Architecture in Multimedia Design: Gil Bouganim

The document discusses the DFT architecture in a 65nm multimedia IC. It includes blocks such as a DDR interface, USB interfaces, WiFi interface, video engines, and Ethernet interface. The DFT structures included OCC structures for at-speed testing of the multiple frequencies and clock schemes in the design. It also integrated two scan-inserted intellectual property cores using both isolation and full-integration approaches. The DFT architecture targeted high fault coverage while working with many third-party cores that did not allow register transfer level changes for test.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views

DFT Architecture in Multimedia Design: Gil Bouganim

The document discusses the DFT architecture in a 65nm multimedia IC. It includes blocks such as a DDR interface, USB interfaces, WiFi interface, video engines, and Ethernet interface. The DFT structures included OCC structures for at-speed testing of the multiple frequencies and clock schemes in the design. It also integrated two scan-inserted intellectual property cores using both isolation and full-integration approaches. The DFT architecture targeted high fault coverage while working with many third-party cores that did not allow register transfer level changes for test.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

DFT Architecture in Multimedia Design

Gil Bouganim

DSPG
Herzlia, Israel

www.dspg.com
Table of Contents
1. Introduction ............................................................................................................................... 3
2. The IC ....................................................................................................................................... 4
3. OCC concept ............................................................................................................................. 6
4. Integration of @ speed test in multi frequency, timing & power efficient design ................... 7
5. Integration of 2 "scan inserted" IPs ....................................................................................... 141
6. Targeting for high coverage while enforced to work with many 3rd party cores .................... 14
7. Lessons learned …………………………………………………………………………….16
8. References ...............................................................................................................................17

Table of Figures
Figure 1 – block diagram …………………………………………………………………………4
Figure 2 - IC floorplan ................................................................................................................... 5
Figure 3 - OCC shift & capture waveforms .................................................................................... 6
Figure 4 - clock scheme ................................................................................................................. 8
Figure 5 - Flip flop with uncontrollable synchronous reset ……………………………………..14
Figure 6 - Flip flop with synchronous reset – fixed with scan_enable ………………………….15

Table of Tables
Table 1 - 1 OCC Vs multi OCC ...................................................................................................... 9

SNUG 2011 2 DFT Architecture in Multimedia Design


1. Introduction

DSPG provides a variety of wireless chipset solutions for converged communications at home.
At the end of 2010 DSPG taped out a 65 nm lp multimedia IC - DMW96. The dft structures
integrated into DMW96 include OCC structures for @ spped testing, and structures for scan
compression. In this paper, I will cover main aspects of our methodology, highlight the DFT
challenges we faced, and lessons we have learned from this project.
The DFT challenges of this project were:
1. Integration of @ speed test in multi frequency, timing & power efficient design.
At such circomstances, the DFT structure must align with the complexed clock scheme
structure of the design. This means that while planning the clock scheme architecture, @
speed considerations must be addressed in the same high priority as other functional
considerations, and should not be addressed only in implementation stages. On the other
hand, the DFT structures that are used, must align with the design, not vise versa.
2. Integration of 2 "scan inserted" IPs.
There are 2 common approaches for integartion of scan inserted Ips: the conservative –
the "isolation approcah", and the progressive – "full integration approach". We have
decided to use both approaches at the same time.
3. Targeting for high coverage while enforced to work with many 3rd party cores.
3rd party cores, are imported "as is" to the design, and as such, do not allow any rtl
changes for DFT purposes. In these cases, we used the "autofix" feature of dft-compiler
to account for poor testability.

SNUG 2011 3 DFT Architecture in Multimedia Design


2. The IC
The design, 65nm multimedia IC, includes a variety of interfaces and blocks. For example -
- DDR interface
- USB2 interfaces
- wifi interface
- Video engines
- MII Ethernet interface
- High frequency processors

The following figure shows the IC's block diagram:


TDM
TDM
I2C Sony TDM SPI 4 x Timers
Slave I2C
master GPIO + SDMMC NAND Flash / LCD UART
UART Remote Master/
Master/ SPI
CCU0 Memory UART Master/
Slave PACP master + RTC + CMU/PMU
MII/RMII master KBD-IF Host Controller Control Slave master
Stick Slave WDG
CCUAHB0

802.3 EAHB AHB2APB


MAC Bus Matrix
Interrupts controller
AHBCOM
CPUAHB

DMAAHB1
PortB PortA

Generic
DMA DMAAHB2 Bus Matrix (R.R)
Engine
CPUAHB
USB
OTG
USB
OTG
DMAAHB1 Data
RAM
WiFi-MAC WiFi-BB
Bus Matrix (R.R.)

2.0 2.0 USB0AHB PLL1


PHY MAC CPUAHB 32K APB Slave
AHBCOM RX Data
converter
CPUAHB
Bus Matrix (R.R)

USB USB Data AHBWMAC


OTG OTG AHBWMAC AHB Master
RAM WiFi BB RX Data
2.0 2.0 USB1AHB AHB2
PHY MAC
32K WiFi MAC IF RX/TX converter
CPUAHB AHB Slave TX Data
converter
Security SAHB
Accelerator
CPUAHB ROM CPUAHB
Communication
128K Int_In
Bus Matrix (R.R) Subsystem
BUS MATRIX

CPUAHB Master AHB JTAG


SMC
AHBCOM SEC CCU1
IRQ_CPU CPUAHB AHBCCU1
Slave AHB
Mi-P
GC400 IRQ_COM
2D/3D ETM IF AHBCIU
Bus Matrix
Hi-P

250 MHz CIU


AHBVID
G1 AHB
Video ISOLATOR OSDM
Low-P

Decoder
AHBOSD
AHBCOM
250MHz
AHBDISP
LCDC
AHB
IRQ/FIQ

7280
Video ISOLATOR
Encoder
250MHz DBM

Conf CPUAHB
External Memory Interface

32KB
AXI2AHB

D-$$

AXI_VE (64)
AXI Fabric

CPU core
256K L2 $$
AXI2AXI

Cortex-A8

P0
L2 CTL

AXI_VD (64)
BIU

P1
AXI_GR (64)
(LPDDRII)

Coresight
32KB

P2
I-$$

P5
AHB_MAS (32)
P4
AHBCOM (32)
P6
AHBDISP (32)
P7
AHBVID (32)
P3
AXI_CPU (64)
JTAG

Figure 1: block diagram

SNUG 2011 4 DFT Architecture in Multimedia Design


The following figure shows the physical aspect – the IC's floorplan.

Figure 2 – IC floorplan

SNUG 2011 5 DFT Architecture in Multimedia Design


3. OCC concept

On chip clock (OCC) controller basically multiplexes 2 free running clocks – ATE clock &
PLL clock.
The purpose of using an OCC is for @ speed testing.
The OCC guarantees that shift is done by the ATE clock, and capture (which is composed of
launch and capture cycles) is done by the PLL clock.

Figure 3: OCC shift & capture waveforms

Basic concepts of OCC insertion used in our project:


- The OCC is automatically inserted by dft-compiler as part of the "insert_dft" operation.
- OCC is instantiated in the top level hierarchy of the IC. We used 2 buffer instances in the
RTL to mark each point for OCC insertion (OCC was inserted between the buffers). This
meant that clocks that passed through OCC, had to be routed from the clock unit to the top
level hierarchy, and back.
- Tetramax requirement: the driver of the "fast_clk" pin in the OCC should be defined as
black-box. We defined the first of the 2 buffers surrounding the OCC as a black-box.
- In the next chapter, I will also present the commands for OCC insertion in dft-compiler.

SNUG 2011 6 DFT Architecture in Multimedia Design


4. Integration of @ speed test in multi frequency, timing & power effi-
cient design

The DMW96 design clock scheme includes:


 4 PLLs.
 More than 30 different clock domains in various frequencies.
 Clock gaters for power saving purposes.
 More than 15 clock dividing units and several Glitchless clock muxes.

The Implementation process was initiated after few decisions were made.
The primary decision we had to take was which clocks should be controlled by OCC and
which not. Those that would not be controlled by OCC, could be checked "@ speed", but not
at their specific working frequency. The goal than, was to use reasonable number of OCC
structures and cover as much design as possible with OCCs. We decided that any clock with
frequency higher than 125 MHz should be controlled by OCC. This assured that OCC struc-
tures will fanout to more than 80% of flip flops in the design, and that we limit OCC structures
generation to a reasonable number of 11 OCCs. We distributed them as follows:
- PLL1 – system pll: 6 OCCs (500 MHz, 250 MHz, 125 MHz clocks).
- PLL2 – cpu pll: 2 OCCs (850 MHz, 425 MHz clocks).
- PLL3 – dram pll: 1 OCC (266 MHz clocks).
- PLL4 – comm. pll: 2 OCCs (300MHz, 150MHz clocks).

SNUG 2011 7 DFT Architecture in Multimedia Design


The following shows the complexity of DMW96 clock scheme.
DMW96 CMU scheme SWSLOWSRC OCC
12-850MHz
clk_cortex_out_occ
cortex_shell
CORTEX
Last print: 18-Aug-10 08:28
Last update: 12-May-11 21:36
clk_bp_div4
1
cmu_clkgen cmu_axi_en
1 clk_slow
0 32KHz clk_32k SLOWDIVRATE clk_axidiv_out_occ
CLKSWCNTRL[14:4]
clk_slowdiv_out 29 clk_axidiv_out_2occ OCC AXI
xin clk_32k_osc SLOWDIVSRC (def. 7FF – div/2048)
6-425MHz signals
pad 0 BRIDGE (no clock)
32Khz
32Khz 23
clk div (11 bit) 32K-12MHz
clk_slowdiv_in 15
Osc. clk div (8 bit)

12
SLOWDIV SWSLOWSRC
VIDDECDIVRATE(
CLKDIVCNTRL2[15:8] clk_viddecdiv_out_2occ clk_viddecdiv_out_occ
pad
CLKSWCNTRL[16]
(STRAP) OCC
xout clk_12m VIDDECDIV VIDDEC_CLK_EN(SWCLKENR1[25] & ~fuse_viddec_disable)
50-250MHz
VIDENCDIVRATE
clk div (4 bit)

11
CLKDIVCNTRL2[3:0] clk_videncdiv_out_2occ clk_videncdiv_out_occ
XTALON_32K clk_cortex_out_2occ OCC
CLKOSCCR[0]
(Default 1) VIDENCDIV VIDENC_CLK_EN(SWCLKENR1[26] & ~fuse_videnc_disable) 50-250MHz
clk_gpudiv_out_occ
cmu_hf_pll2 clk div (4 bit)
GPUDIVRATE
OCC

5
PLL2SWCTL CLKDIVCNTRL2[19:16] 500MHz clk_gpudiv_out_2occ
BCLK CPUCLKCNTRL[9:8]
cmu_cortex_src
/2 + axi_en clk_gpudiv_out_div2_occ
pad
(default=0)
12-850MHz GPUDIV GPU_CLK_EN(SWCLKENR1[24] & ~fuse_gpu_disable) | cmu_en_clk_in_reset
/2 30 OCC
CPUSRC clk_cortex_out_2occ clk_sysbusdiv_out_2occ
clk_sysbusdiv_out_occ

bp_div4_en clk_13m PLL2SW


1-250MHz
clk_sysbus_div2 to MCU ROMS OCC
2 12-20MHz clk_cpu_src
clk div (4 bit) clk_sysbus

8
clk_20m 0 clk_pll2sw_out /2+hclk_en and clk_hclk_ug
clk_bp_div4 1 /4 +hclk_en_div4 1-125MHz OCC
1
0
clk_12m 0 PLL2CONTROL
[16:0] 1 clk_cortex_out SYSBUSDIVRATE SYSBUSDIV CG clk_hclk_xxx
clk_bp PLL2LD
PLL2CONTROL[18] clk_pll2_out CPUCLKCNTRL[7:4]
(def. 1 = div/2)
cmu_hclk_div2_en to IMC_ROM
CPUSRC 2 cmu_hclk_en
CPUCLKCNTRL[17:16]
(default=0)
clk div (8 bit) SIMDIVRATE
CLKDIVCNTRL3[15:8] 13 1-20MHz clk_simdiv_out
xin 1 clk_12m R PLL2 SIMDIV
SIM_CLK_EN

pad Main 0
SWCLKENR1[27]
24

CPU clk div (8 bit)

10
MSDIVRATE

6
clk div (8 bit) 2-40MHz clk_msdiv_out
13.824M 13M clk_13m_osc
PGN65LP25SMF1000A_140A PLL2_PLL4PREDIV CLKDIVCNTRL1[23:16]

clk_ms
pad Osc. PLL4PREDIV clk_out_straps | 31 MSDIV MS_CLK_EN(SWCLKENR1[12] & ~fuse_ms_disable)

xout WIFI PLL4PREDIV_EN


CLKDIVCNTRL2[31:24] test_mux_sel =10 clk_cc_rnd_src clk_cpu_src 00clk_out_mux
FSM (SWCLKENR2[19]) to clk_out_mux
clk_cortex_out_cg 01
1
&
PLL4ALTSRC==0
cmu_rstgen clk div (8 bit) CLKOUT_CLK_EN
SWCLKENR2[11] clk_out
DECTOSCEN
0 (CLKDIVCNTRL2[5:4])
clk_pll2_pll4prediv_out
10
CLKOSCCR[1]
ALTSYSSRC CLKOUTDIV clk_sysbus
CG clk_clkoutdiv_out 11
OFF CHIP (default=0)

clk_bp_div4
clk_altsys
XTALON
clk_test_mux
SDDIVRATE clk_out_straps
clk div (8 bit) 2-50MHz clk_sddiv_out

7
clk_32k {SYSSRC_NOR,PLL1PD_NOR} CPUCLKCNTRL[13:11] (default=001) CLKDIVCNTRL1[31:24]
clk_sd
3
pad PLL5
/4 /4 clk_13m
2
CORTEX_STANDBY_WFI
SDDIV
SD_CLK_EN
SWCLKENR1[13]
auto clock
WIFI clk_20m 1 SYS_AUTO CPUCLKCNTRL[18] (default=0)
clk_dpdiv_out (to PAD)
WIFI clk div (8 bit) DPDIVRATE
1-20MHz

9
FSM CLKDIVCNTRL3[7:0]
RADIO clk_80m_pll5 clk_12m 0 clk_500m
pad 25 DP_CLK_EN

{SYSSRC_LP,PLL1PD_LP} CPUCLKCNTRL[21:19] (default=001)


DPDIV SWCLKENR1[28]

0 0 500MHz CIUDIVRATE clk_ciu_div_out = clk_ciu (to PAD)

21
clk_bp
1 ALTSYSSRC
CPUCLKCNTRL[15:14] 1 clk_500m_sc
clk_500m_sc
clk div (8 bit) CLKDIVCNTRL1[7:0] 1-62.5MHz
CIU_CLK_EN
(default=0)
12-20MHz clk_altsys 2 CIUDIV SWCLKENR1[19]
tdm_clk[n]
clk_80m SW500MSRC From PAD
PLL1PD TDM1/2/3 0.25-40MHz

1
PLL1INMUX SYNC2 clk_tdm_sclk
WIFI
CG
PLL1CONTROL[20:19]
(default=0)
PLL1LD
PLL1CONTROL[18] 0 clk
clkdiv(12
clk div (Nbit)
div (N bit)
bit)
clk_tdm3div_out

2
PHY (bb_top) clk_pll1_out 1
PLL1CONTROL clk_pll3_to_lcdpixdiv/TDM1/2/3DIV
clk_tdm2div_out to PAD CG tdm_tx_pol clk_tdm_tx

3
WIFI
[16:0] clk_pll4_to_lcdpixdiv/TDM1/2/3DIV 2
TDM[n]DIV
TDM[n]DIV
TDM[n]DIV clk_tdm1div_out tdm_rx_pol clk_tdm_rx X3
FSM clk_13m PLL4PREDIV LCDPIXDIV_CLK_SRC
clk_pll1_src 62-500MHz
CLKDIVCNTRL2[31:24] PLL4PREDIV_EN CLKSWCNTRL[18:17] tdm_fsync_pol clk_tdm_fsync
clk_20m 2 PLL1 (SWCLKENR2[19])
& 5-83.3MHz clk_lcdpixdiv_out
clk div (8 bit)

4
clk_ethmac_tx_from_phy R 0
clk_12m 1 PLL4ALTSRC==1
clk_lcdc
pad
0 System (CLKDIVCNTRL2[5:4]) clk_pll3_to_lcdpixdiv/TDM1/2/3DIV 1

clk_ethmac_rx_from_phy PGN65LP25SMF1000A_140A
clk div (8 bit)
PLL1_PLL4PREDIV
clk_pll1_pll4prediv_out
clk_pll4_to_lcdpixdiv/TDM1/2/3DIV
ETHDIV_CLK_SRC
2
LCDPIXDIV (to PAD too)
pad PLL1SRC CLKSWCNTRL[25]
LCDPIXDIVRATE LCD_CLK_EN
SWCLKENR1[21]
clk_12m_bp_div4 clk_pll4_to_lcdpixdiv/TDM1/2/3DIV
CLKDIVCNTRL1[15:8]

clk_ethmac_rmii SYNC2 1
pad clk_bp_div4 1 0 clk div (4 bit) 50MHz
See Ethernet_scheme
clk_12m cmu_dram_ctl_clk_en SWDRAMSRCSEL PADS
tdm_clk[n]
0 CLKSWCNTRL[0]
(def. 0)
ETHRDIVRATE
CLKDIVCNTRL2[23:20]
ETHRDIV ETHERDIV_EN
sheet
pad PLL3CONTROL (def. 9 = div/10)
[16:0]
PLL3LD
CG SWDRAMSRC
clk_12m PLL3CONTROL[18] 0
jtag_tck clk_pll3_to_lcdpixdiv/TDM1/2/3DIV
COMALTDIV
pad
clk_12m
EFUSE
PLL3 40-266MHz clk_pll3_out 1
12-266MHz COMCLKSEL[3:0]
(def. 3 = div/4) clk div (4 bit)
CTRL 250MHz
sjtag_tck
pad DRAM to PLL4ALTSRC
COM_CLK_EN
SWCLKENR1[23] COMALTDIV
clk div (8 bit) clk_pll3_pll4prediv_out
COM_CLK_EN(SWCLKENR1[23]) | cmu_en_clk_in_reset
PLL4ALTSRC
PGN65LP25SMF1000A_140A
PLL3_PLL4PREDIV PLL4PREDIV_EN
clk_dram_2occ clk_dram_occ clk_comaltdiv_out COM_ETM_CLK_EN clk_com_arm_etm_occ
CLKDIVCNTRL2[5:4]
clk_bp 1 (default =1)
(SWCLKENR2[19])
OCC SYNC2 SWCLKENR1[31]
SYNC2
&
0 300MHz CG

16
clk_com_arm_occ
xin
0
clk_pll3_pll4prediv_out 2
PLL4PREDIV
CLKDIVCNTRL2[31:24]
clk_pll4_out
PLL4ALTSRC==2
(CLKDIVCNTRL2[5:4]) 40-300MHz
1 clk_swcomsrc_out
CG OCC 17 CG
pad clk_12m_osc clk_pll1_pll4prediv_out 1
PLL4CONTROL
PLL4LD 12-50MHz clk_com_occ=clk_com 300MHz
[16:0]
2
12M 26
clk_pll2_pll4prediv_out 0 clk_pll4prediv PLL4CONTROL[18] SYNC2
com_hclk_en
12M clk_pll4_src_bp COMLPSRC SWCOMSRC
COM_ARM_CLK_EN
(SWCLKENR1[1] &
Osc. PLL4SRC COMCLKSEL[13:12] ~fuse_arm926_disable) cmu_en_clk_in_reset clk_com_ahbdiv_out
C
pad clk_13m 3 PLL4 (default=2)
1 clk_com_hclk_ug 150MHz
xout 2
clk_pll4_to_lcdpixdiv/TDM1/2/3DIV
0 OCC S
clk_20m 1
Comm SWCOMSRC
1
0
/2 + hclk_en
CG
PGN65LP25SMF1000A_140A 40-300MHz COMCLKSEL[11:10] clk_com_hclk_xxx
USBOSCEN
CLKOSCCR[2] clk_12m 0 clk_pll4_src ARM_STANDBY_WFI (default=2) S
(default=1) COMLPEN

19
COM_ADPCM_EN

PLL4INMUX
0 COMCLKSEL[14] SWCLKENR[29] clk div(5bit) + CG clk_com_adpcm
USB1/2_PHY_CLK_EN 1 arm_dbgtck_en
SWCLKENR1/2[17]
PLL4CONTROL[20:19]
clk_bp_div4 arm_nirq rst_sys_com_n cmu_armclk_en COMADPCMDIV
COMADPCMDIV
(default=0)
USB1_MAC_CLK_EN
COMCLKSEL[8:4]
(def. 5 = div/6)
25MHz
SWCLKENR1[16] COM_BMP_EN
SYNC2 cmu_bp_en & cmu_pll4_bp_div4 BMPREFDIV SWCLKENR[30]
clk_12m_usb1otg_phy clk_12m_usb2otg_phy cmu_armclk_en COMCLKSEL[9]
arm_nfiq CMU_TEST[12]

clk_usb1otg_phy CG
Default 0 /12 | /1 clk_com_bmp
27 clk_usb1otg_mac clk_13m COMBMPDIV
USB1 13.824/1.152MHz
USB

20
USB PLL
PLL clk
clk div
div 28
USB2_MAC_CLK_EN
SWCLKENR2[16] MAC
Osc.
Osc. 66 SYNC2
USB2 OCC xxx xxx Control 1 0
USB2_PHY

1
USB2_PHY MAC HardMacro
From register
0 clk_test
CG on chip clock (SCAN) block in top level block in digital core clock mux sync. point divider
clk_usb2otg_mac bypass mux

clk_usb2otg_phy

Figure 4: clock scheme

After 11 OCCs were placed as the output segments of the relevant clocks in the clock scheme,
we had to decide how to generate and connect the fast functional clock to the OCC structures
during @ speed test. First option was to bypass the complexity of the clock generation unit (its
dividers, gaters and muxes), and the second option was to force the functional architecture to
work in a manner that will assure that the desired functional clocks will be free running at the
input pins of OCCs. We chose to implement the second option. The benefit we gain from such
an approach is that our system clocks work in test mode exactly as they work in functional
mode. In order to force the clocks to be free running in test mode, we added combinational
logic in the RTL code of the clock generation module along the combinational path of the de-
sired clocks. The RTL code used scan_mode signal to assure that during test mode:
- The selected clock gaters are open.
- The selected clock dividers are continuously dividing in the desired rate.
- Glitchless clock muxes function properly and select the desired clocks.
- PLLs:
i. The reference pin of PLL is connected to its primary input. The path is
unblocked, and allows free running clock at the PLL's reference pin.

SNUG 2011 8 DFT Architecture in Multimedia Design


ii. Configuration inputs of PLLs are constant and set to the proper values, in
order to generate clocks desired frequency.
iii. PLL is OFF during stuck @ test. This requires usage of additional signal
in RTL code – pll_bypass signal.

Before moving on to the implementation stage with dft-compiler, we had to decide how many
ATE clocks are required, and whether we want to implement 1 big OCC unit, or several OCC
units. The following table compares the pros and cons of each method:

Implementation aspect 1 OCC multiple OCCs


Dft Easier generation and mainte- Generation of several struc-
nance tures is needed. Should verify
several structures throughout
the entire stages of implemen-
tation.
1 simple shift clock Several shift clocks, which
resolves with lockup latches
insertion etc.
Snps_clk_chain – just 1 spe- Several snps_clk_chain seg-
cial clock chain for OCC ments. Each segment is pulsed
clk_enable by different trailing edge shift
clock. Thus, high level sensi-
tive lockup latches are placed
between these segments. This
is not the recommended flow,
and introduces some compli-
cations in atpg.
Backend – CTS Because PLLs and their clock Each OCC structure is placed
domains are located in differ- and correlated with its relevant
ent locations, this 1 hierarchi- functional clock structures
cal structure needs to be scat- (PLLs, dividers etc.). This
tered around them. On the means optimal placement.
other hand, If it is placed in
one square location, it will be
too far from some of the clock
structures, and this might
harm the integrity of clock
signals.
Backend – power & IR drop 1 shift clock results in higher Number of shift clocks with
peak power during shift considerable skew, results in
less peak power.
Table 1: 1 OCC Vs multi OCCs

SNUG 2011 9 DFT Architecture in Multimedia Design


Because we gave special attention to clock structures (shielding, non default rules etc.), and
due to the importance of clock integrity in such high frequencies, we decided that backend
considerations overcome dft simplicity, and have decided to implement 11 different OCC in-
stances. One OCC per every functional clock domain.
We decided to have 5 ATE clocks. One per each PLL and one for all flip flops that are not on
the fanout of OCCs. The decision on the number of ATE clocks was derived from the number
of PLLs, and from clock balancing considerations:
- Since OCCs were used as clock sources for CTS, and most clocks that were gen-
erated from the same PLL were synchronous to each other, it allowed us to use
the functional clock tree balancing for ATE clock tree balancing.
- Having several ATE clocks should guarantee skew between clocks which results
in lower peak power during shift.

Implementation in dft-compiler:
We followed the documented commands for OCC insertion:
set_dft_configuration -clock_controller enable
set_dft_signal -view existing -type refclock –port osc_port -test_mode all
set_dft_signal -view existing -type Oscillator –port gpio0 -test_mode all ; ATE clock
set_dft_signal -view existing -type Oscillator -hookup_pin [get_pins "buff/Z"] -test_mode all ;
PLL clock
set_dft_signal -view spec -type pll_reset -active_state 0 -test_mode all
set_dft_signal -view spec -type pll_bypass -active_state 0 -test_mode all
set_dft_clock_controller -cell_name XXX_occ \
-design_name snps_clk_mux \
-pllclocks [get_pins "buff/Z"] \
-ateclocks gpio0 \
-test_mode_port test_mode_port \
-cycles_per_clock 2 -chain_count 1

#test_mode_port is the same port that is defined as type TestMode for scan compression.

SNUG 2011 10 DFT Architecture in Multimedia Design


5. Integration of 2 "scan inserted" IPs

Scan inserted IPs are digital cores with complete layout, that already include routed scan
chains. The dft characteristics of the IP are determined by the IP vendor.
In DMW96, we integrated 2 IPs that were "scan inserted" by their vendors - wifi_afe &
usb_phy.
The common approaches for scan integration of "scan inserted" IPs are:
 The conservative – "full isolation" approach:
There are several reasons why a backend team would consider it risky to fully integrate a
"scan inserted" IP with the IC's scan chains as if it were a common hard macro design:
o DFT aspect - Since the backend team did not implement the dft structure of the IP,
it must rely on the correctness of the atpg netlist that is delivered by the IP vendor.
o Timing aspect – the backend team cannot run STA on the internal parts of the IP
in "test mode", and verify it meets "test mode" timing requirements.
o Usually these IPs are mixed signal designs. There is always a risk that the analog
design interferes with the proper function of the digital design during test mode.
Another reason for following this approach is that usually these IP's digital cores are very
small compared to IC's digital core. Thus, the integration effort, and the fact that such a
small digital core can interfere with the functionality of the entire IC scan architecture,
make the "full integration" approach (see below) not advisable.
In order to account for the above risk, dedicated scan ports (scan In/Out, scan clock) are
assigned to the IP, and its scan chains are not mixed with any of the IC's scan chains.
In case the IP's scan chains don’t work on the tester, it will not have any effect on the rest
of the IC.
 The progressive – "full integration" approach:
This approach gives credit to the IP provider, and its validation of the IP's scan structure.
It is safe to take this approach especially when the IP is silicon proven.
Unlike the conservative approach, in this case, the IP is integrated as if it were a common
hard macro in the IC. It is best to use an ILM view for its integration, but a liberty model
(for timing) and CTL test model should be sufficient enough. Using a test model will al-
low connecting the scan chains of the IP with the IC's scan chains, and making better us-
age of the IC's scan resources (primary inputs & outputs).
One more advantage this approach has over the conservative one, is that it allows better
controllability and observability of the IP's input/output pins, and thus better test cover-
age.
After implementation, it is required to verify that the integrated IP meets the IC's test
mode STA requirements. To do so, a Liberty model that represents the IP's characteristics
in test mode should be read into primetime.

Our decision was to mix the above 2 approaches. This mixture gave us the following benefits:
- We made the best usage of scan signal resources we had.
- In case the IP's scan chains don't work, we can remove their scan chains from the
overall set of scan chains, and still check the rest of the IC on the tester.

SNUG 2011 11 DFT Architecture in Multimedia Design


Implementation:
1. We asked the IP vendors to deliver us test models (in ASCII format), and liberty models
that are characterized for test mode.
2. We read the test models into dft-compiler.
3. We defined dedicated scan chains in dft-compiler for these IPs in Internal_scan & Scan-
Compression_mode. We prevented mixing of the IPs scan chains with other scan flip
flops by using the "-complete" option of set_scan_path command.
4. We integrated the two IPs. I'll refer them as ip0 & ip1, from this point forward.
a. ip0 was composed from 4 scan chains: 2 leading edge chains (SC1 & SC3), and 2
chains that started with trailing edge flip flops, and ended with leading edge flip
flops (SC0 & SC2). This meant we were forced to have at least 2 scan chains for
ip0 – {SC0-SC1} chain, and {SC2-SC3} chain.
b. Ip1 had 2 instances, which were grouped under 1 shell module. Each instance was
composed from 9 scan chains: 1 trailing edge scan chain, and 8 leading edge scan
chains. In this case we were able to stitch all 18 scan chains of ip1 to 1 long scan
chain in the IC.
5. The cost for scan chain balancing:
a. Internal_scan – 3 scan chains out of 104 with {830, 830, 2578} flip flops. An av-
erage scan chain length in this test mode is 4800 flip flops.
b. ScanCompression_mode – 12 scan chains out of 1248 with { 414, 414, 414, 414,
353, 317, 318, 318, 318, 318, 318, 318} flip flops. An average scan chain length in
this test mode is 400 flip flops.

Implementation in dft-compiler:
#Internal_scan
set_scan_path chain65 -view spec -scan_data_in [get_ports gpioXX] -scan_data_out
[get_ports gpioYY] -test_mode Internal_scan \
-ordered_elements [list u_ip0/PHY_SC0 u_ip0/PHY_SC1] -complete true
set_scan_path chain66 -view spec -scan_data_in [get_ports gpioKK] -scan_data_out
[get_ports gpioLL] -test_mode Internal_scan \
-ordered_elements [list u_ip0/PHY_SC2 u_ip0/PHY_SC3] -complete true
set_scan_path chain94 -view spec -scan_data_in [get_ports gpioZZ] -scan_data_out
[get_ports gpioTT] -test_mode Internal_scan \
-ordered_elements [list u_ip1_shell/inst_2_chain8 u_ip1_shell/inst_1_chain8 \
u_ip1_shell/inst_2_chain0 u_ip1_shell/inst_2_chain1 \
u_ip1_shell/inst_2_chain2 u_ip1_shell/inst_2_chain3 \
u_ip1_shell/inst_2_chain4 u_ip1_shell/inst_2_chain5 \
u_ip1_shell/inst_2_chain6 u_ip1_shell/inst_2_chain7 \
u_ip1_shell/inst_1_chain0 u_ip1_shell/inst_1_chain1 \
u_ip1_shell/inst_1_chain2 u_ip1_shell/inst_1_chain3 \
u_ip1_shell/inst_1_chain4 u_ip1_shell/inst_1_chain5 \
u_ip1_shell/inst_1_chain6 u_ip1_shell/inst_1_chain7] -complete true
#ScanCompression_mode
set_scan_path 10 -view spec -ordered_elements [list u_ip0/PHY_SC0] -complete true -
test_mode ScanCompression_mode

SNUG 2011 12 DFT Architecture in Multimedia Design


set_scan_path 11 -view spec -ordered_elements [list u_ip0/PHY_SC1] -complete true -
test_mode ScanCompression_mode
set_scan_path 12 -view spec -ordered_elements [list u_ip0/PHY_SC2] -complete true -
test_mode ScanCompression_mode
set_scan_path 13 -view spec -ordered_elements [list u_ip0/PHY_SC3] -complete true -
test_mode ScanCompression_mode
set_scan_path 14 -view spec -ordered_elements [list u_ip1_shell/inst_2_chain8
u_ip1_shell/inst_1_chain8 u_ip1_shell/inst_1_chain0 u_ip1_shell/inst_1_chain1] -complete
true -test_mode ScanCompression_mode
set_scan_path 15 -view spec -ordered_elements [list u_ip1_shell/inst_2_chain0
u_ip1_shell/inst_2_chain1] -complete true -test_mode ScanCompression_mode
set_scan_path 16 -view spec -ordered_elements [list u_ip1_shell/inst_1_chain2
u_ip1_shell/inst_1_chain3] -complete true -test_mode ScanCompression_mode
set_scan_path 17 -view spec -ordered_elements [list u_ip1_shell/inst_1_chain4
u_ip1_shell/inst_1_chain5] -complete true -test_mode ScanCompression_mode
set_scan_path 18 -view spec -ordered_elements [list u_ip1_shell/inst_1_chain6
u_ip1_shell/inst_1_chain7] -complete true -test_mode ScanCompression_mode
set_scan_path 19 -view spec -ordered_elements [list u_ip1_shell/inst_2_chain2
u_ip1_shell/inst_2_chain3] -complete true -test_mode ScanCompression_mode
set_scan_path 20 -view spec -ordered_elements [list u_ip1_shell/inst_2_chain4
u_ip1_shell/inst_2_chain5] -complete true -test_mode ScanCompression_mode
set_scan_path 21 -view spec -ordered_elements [list u_ip1_shell/inst_2_chain6
u_ip1_shell/inst_2_chain7] -complete true -test_mode ScanCompression_mode

SNUG 2011 13 DFT Architecture in Multimedia Design


6. Targeting for high coverage while enforced to work with many 3rd
party cores

It is most common and advised to modify the RTL code in order to make it dft friendly. Most
design modifications are made to allow controllability of clocks and reset signals.
RTL code that is imported from 3rd party vendor is integrated "as is" in the IC, and cannot be
modified. This actually restricts the dft engineer from improving the design's testability.
Even though dft-compiler supports "autofix" commands for several years, many engineers
choose not to use it, because it modifies the RTL and introduces a risk of loosing control over
the design's architecture.
Since we had many 3rd party cores in our design, and some did not meet our testability crite-
ria, we had to find a way to improve testability without making any direct RTL modifications.
The autofix commands proved to be very efficient for our needs, which were mainly focused
on controlling clock signals and allowing as many flip flops to be scannable.
One of the cases we had to solve was of minority group of flip flops with synchronous reset in
an asynchronous reset design. This group of hundreds of flip flops were disturbed during
shift, because their synchronous reset was uncontrollable (was generated by a scan element,
and passed through logic), thus were not able to function as scan elements (See figure 5).

Figure 5: Flip flop with uncontrollable synchronous reset

SNUG 2011 14 DFT Architecture in Multimedia Design


We used autofix commands to bypass the problem with the scan_enable signal. Figure 6 illus-
trates the usage of scan_enable signal:

Figure 6: Flip flop with synchronous reset – fixed with scan_enable

Implementation in dft-compiler:
#Fix sync reset
set_dft_configuration -fix_reset enable
set_dft_signal -view existing -type ScanEnable -port scan_en
set_dft_signal -view spec -type ScanEnable -port scan_en
#fix sync reset
set_autofix_configuration -type reset -control_signal scan_en -method gate

SNUG 2011 15 DFT Architecture in Multimedia Design


7. Lessons learned

Some of the interesting lessons we've learned:


1. The most important lesson in OCC based flow – OCC planning is an integral part of the
clock scheme planning. By doing that, we refrain from complications and patches in RTL.
2. No dividing unit should be allowed to function at the fanout cone of the OCC output. If
such clock diviosion unit exists, bypassing it with a mux, will intrduce a capture procedure
that operates in higher frequency than the synthesized frequency, and thus to failure.
3. We have come to a conclusion that it is best to use a number of 4 clock enable bits for the
OCC. 4 bits allow up to 4 edge triggered PLL clocks during capture. This improves @
speed robustness, allows fast sequential detection and checking functional multicycle paths
(this is achieved by editing the STIL file – modifying 1111 shift vector on the clk_enable
chain to 1010 vector). Such technique can solve lesson 2 problem, in case the dividing unit
is "/2".
4. Simulation –
Simulation at early stages of implementation proved itself as very efficient and important.
Since tetramax treats the fast_clk driver of OCC as black-box, the only way to verify that a
valid PLL clock passes through OCC, is by simulation. especially in our case where
propagation of PLLs to OCCs passed through several levels of logic.
5. STA environment - Creating an STA environment for such DFT architecture can be
complicated task. Since functional clocks are involved, the STA @ speed environment
should mix both functional constraints (mainly exceptions) and test mode related
constraints. This must be reflected in the project's plan, and not posponed to the last stages
of STA closer.

SNUG 2011 16 DFT Architecture in Multimedia Design


8. References

[1] Solvenet article:


https://solvnet.synopsys.com/retrieve/022826.html?otSearchResultSrc=advSearch&otSearchResultNu
mber=7&otPageNum=1

SNUG 2011 17 DFT Architecture in Multimedia Design

You might also like