PSV
PSV
Error Detection
ECE
7502
S2015
Customer Validate Requirements
Verify
Specification
PCB
Architecture
Architecture
Post Silicon Design and Test
Verification Development
Logic / Circuits PCB Circuits
PCB Physical
Physical Design
Design
Test
Fabrication PCB Fabrication
Manufacturing
Test Test
Packaging Test
PCB Test
System Test
Post-Silicon Verification
AFTER fabrication, make sure you built it right
Find BUGS, not DEFECTS
Identify problem of bug and determine a fix
Steps:
Detect problem
Localize problem (hardest part?)
Find cause (Scan helps with this)
Fix / bypass (survivability)
3
Post-Silicon Verification
Challenges: complex chips, short schedules,
complicated designs, diverse techniques
Pros: at speed (OoM faster); real system (no
model error); real context
Cons: less controllability, observability; costly
equipment, techniques (eg, BIST);
4
Approaches
Design in features
Better pre-Si verification; emulation; esp. IO and
mixed signal; CANNOT SEPARATE PRE- / POST-SI
Build tools for post-Si verification; EDA is key
The new EDA challenge??
Formal (standardized?) interfaces
Formal coverage methods; assertions
SW: e.g. trace analysis, QED
Codesign verification/test with survivability
Instruction Footprint Recording (HW or SW)
Error resilience
5
Challenges for Post-Si
Verification
Long error detection latency (e.g. delay bw
error occurrence and error detection) need
faster solutions
HW solutions require a priori design SW
solutions can retrofit
Low bug coverage need to define, increase
Failure reproduction
6
QED observations
Some bugs arise from multiple instructions in
processor
Some bugs arise across multiple instructions
outside processor, in uncore
Bugs affected by random events: electrical
activity, asynchronous triggers, etc.
Augmenting code for validation can obscure the
bugs (intrusiveness)
Conventional methods can take Billions of
cycles to identify bug events
7
Example:
Accesses to memory
locations A and B
end up creating
error in cached C
Self checking A,B
doesn’t find it
Long latency to find
it
9
QED principles / techniques (2)
PLC:
Why? Find bugs in uncore
How? Loads/consistency checks on variables from all threads
Principle?
Tradeoff: different lengths of instructions bw checks; different
numbers of variables checked
CFCSS-V / CFTSS-V:
Why? Find bugs in control flow
How? Confirm flow of instruction blocks matches intent
Principle?
Tradeoff: different lengths of instructions bw checks
10
CFCSS from [2]
“Map” flow of code blocks; generate signatures
for each block; store those signatures and check
at runtime
12
QED in action (2)
Sims on multicore with 80
bug classes, 1368 logic
bug scenarios
QED catches bugs way earlier!
Runtime is way longer (Table IV)
by 32000X
14
Modeling Approach
Simulate “all” effects, generate characteristic
curves, simplify curves (e.g. to PWL), create
bounds, trim stored points
Principles: SIMPLIFY
16
Discussion questions
1. How does the concept of fault coverage relate to
the QED techniques?
2. For each of EDDI-V, PLC, CFxSS-V, what underlying
principles are at work? What are alternative ways
to apply those principles?
3. How does SoC testing differ from testing a
monolithic circuit?
4. in [1] section V.A, how does the new test
determine deadlock if no additional instructions
are run beyond deadlock?
5. Writing: how could the order of the paper be
changed to improve the paper?
17
Bonus Discussion Questions
Are there HW equivalents to QED methods?
18
Papers
[1] Lin, D.; Hong, T.; Yanjing Li; Eswaran, S.; Kumar, S.; Fallah, F.; Hakim, N.; Gardner, D.S.;
Mitra, S., "Effective Post-Silicon Validation of System-on-Chips Using Quick Error
Detection," Computer-Aided Design of Integrated Circuits and Systems, IEEE
Transactions on , vol.33, no.10, pp.1573,1590, Oct. 2014.
[2] Oh, N.; Shirvani, P.P.; McCluskey, E.J., "Control-flow checking by software
signatures," Reliability, IEEE Transactions on , vol.51, no.1, pp.111,122, Mar 2002.
[3] Das, P.; Gupta, S.K., "Gate delay modeling for pre- and post-silicon timing related
tasks for ultra-low power CMOS circuits," Computer Design (ICCD), 2013 IEEE 31st
International Conference on , vol., no., pp.227,234, 6-9 Oct. 2013.
[4] Keshava, J.; Hakim, N.; Prudvi, C., "Post-silicon validation challenges: How EDA and
academia can help," Design Automation Conference (DAC), 2010 47th ACM/IEEE , vol.,
no., pp.3,7, 13-18 June 2010.
[5] Mitra, S.; Seshia, S.A.; Nicolici, N., "Post-silicon validation opportunities, challenges
and recent advances," Design Automation Conference (DAC), 2010 47th ACM/IEEE ,
vol., no., pp.12,17, 13-18 June 2010.
19
Paper Map
[1] Lin, D.; …"Effective Post-Silicon Validation of …," ICASICS’14.
[2] Oh, N.; …"Control-flow checking by software …," ITR’02.
[3] Das, P.; …"Gate delay modeling for pre- and …," ICCD’13.
[4] Keshava, J.; … "Post-silicon validation challenges: …” DAC’10.
[5] Mitra, S.; … "Post-silicon validation …," DAC’10.
21