0% found this document useful (0 votes)
100 views

PSV

Here are the key terms defined in the document: - Post-Silicon Verification: Testing done after fabrication to ensure the design was implemented correctly and identify bugs. - Bugs vs Defects: Bugs are issues with the design logic, defects are physical flaws introduced during fabrication. - QED: Quick Error Detection techniques that aim to reduce the latency between an error occurring and being detected to find bugs faster.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views

PSV

Here are the key terms defined in the document: - Post-Silicon Verification: Testing done after fabrication to ensure the design was implemented correctly and identify bugs. - Bugs vs Defects: Bugs are issues with the design logic, defects are physical flaws introduced during fabrication. - QED: Quick Error Detection techniques that aim to reduce the latency between an error occurring and being detected to find bugs faster.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Post-Silicon Verification using Quick

Error Detection

ECE 7502 Class Discussion


Ben Calhoun
Thursday January 22, 2015

ECE
7502
S2015
Customer Validate Requirements

Verify
Specification

PCB
Architecture
Architecture
Post Silicon Design and Test
Verification Development
Logic / Circuits PCB Circuits

PCB Physical
Physical Design
Design

Test
Fabrication PCB Fabrication

Manufacturing
Test Test

Packaging Test

PCB Test

System Test
Post-Silicon Verification
 AFTER fabrication, make sure you built it right
 Find BUGS, not DEFECTS
 Identify problem of bug and determine a fix

 Test in context, prevent bugs from going to field


 Issues often from design interacting with electrical conditions

 Steps:
 Detect problem
 Localize problem (hardest part?)
 Find cause (Scan helps with this)
 Fix / bypass (survivability)

 NB: ambiguity w/ verification vs validation

3
Post-Silicon Verification
 Challenges: complex chips, short schedules,
complicated designs, diverse techniques
 Pros: at speed (OoM faster); real system (no
model error); real context
 Cons: less controllability, observability; costly
equipment, techniques (eg, BIST);

 NB: ambiguity w/ verification vs validation

4
Approaches
 Design in features
 Better pre-Si verification; emulation; esp. IO and
mixed signal; CANNOT SEPARATE PRE- / POST-SI
 Build tools for post-Si verification; EDA is key
 The new EDA challenge??
 Formal (standardized?) interfaces
 Formal coverage methods; assertions
 SW: e.g. trace analysis, QED
 Codesign verification/test with survivability
 Instruction Footprint Recording (HW or SW)
 Error resilience

5
Challenges for Post-Si
Verification
 Long error detection latency (e.g. delay bw
error occurrence and error detection)  need
faster solutions
 HW solutions require a priori design  SW
solutions can retrofit
 Low bug coverage  need to define, increase
 Failure reproduction

 How do you know you’re done?

6
QED observations
 Some bugs arise from multiple instructions in
processor
 Some bugs arise across multiple instructions
outside processor, in uncore
 Bugs affected by random events: electrical
activity, asynchronous triggers, etc.
 Augmenting code for validation can obscure the
bugs (intrusiveness)
 Conventional methods can take Billions of
cycles to identify bug events

7
Example:
 Accesses to memory
locations A and B
end up creating
error in cached C
 Self checking A,B
doesn’t find it
 Long latency to find
it

[1] Lin et al, TCADICS’14 8


QED principles / techniques
 Start with existing tests and transform them to
improve bug detection
 Trade-off detection latency and intrusiveness
 EDDI-V:
 Why? Find bugs in processor core
 How? Replicate code blocks and run both copies
 Principle?
 Tradeoff: different lengths of instruction list

9
QED principles / techniques (2)
 PLC:
 Why? Find bugs in uncore
 How? Loads/consistency checks on variables from all threads
 Principle?
 Tradeoff: different lengths of instructions bw checks; different
numbers of variables checked

 CFCSS-V / CFTSS-V:
 Why? Find bugs in control flow
 How? Confirm flow of instruction blocks matches intent
 Principle?
 Tradeoff: different lengths of instructions bw checks

10
CFCSS from [2]
 “Map” flow of code blocks; generate signatures
for each block; store those signatures and check
at runtime

[2] Oh et al, ITR’02 11


QED in action
 Multicore with bug: deadlock – no execution
 Before: 10s watchdog timer: ~15B cycles
 Is this a fair base case?
 After: locate code causing bug after ~9-14 cycles
 How was it located? Deadlock stops function….
 “measured” intrusiveness with EDDI-V

12
QED in action (2)
 Sims on multicore with 80
bug classes, 1368 logic
bug scenarios
 QED catches bugs way earlier!
 Runtime is way longer (Table IV)
by 32000X

[1] Lin et al, TCADICS’14


 Detect ALL bugs from original
tests
 Detect up to 2X MORE bugs than
original tests
 Intel HW
 Similar results, 2X slower tests
 Orthogonal to other
techniques!
13
[3] Delay modeling
 Model captures delay bounds; used for timing
closure in design; pre-Si verification;
 Delay testing: measuring delays on paths in Si
 Post-Si testing intimately tied to pre-Si models:
identify paths, generate vectors, analyze vectors
 [3]: Problem: near / sub VT delay variation,
poorly modeled. Multiple input switching (MIS)
effect of 30-40% is ignored.

14
Modeling Approach
 Simulate “all” effects, generate characteristic
curves, simplify curves (e.g. to PWL), create
bounds, trim stored points
 Principles: SIMPLIFY

[3] Das et al, ICCD’13 15


Conclusion
 Post-Si verification is critical but tricky
 Ad hoc approach can work, but very costly
 Make use of solid verification principles to get
best results
 QED techniques are effective for multicore
SOCs, relatively easy to implement in code

16
Discussion questions
1. How does the concept of fault coverage relate to
the QED techniques?
2. For each of EDDI-V, PLC, CFxSS-V, what underlying
principles are at work? What are alternative ways
to apply those principles?
3. How does SoC testing differ from testing a
monolithic circuit?
4. in [1] section V.A, how does the new test
determine deadlock if no additional instructions
are run beyond deadlock?
5. Writing: how could the order of the paper be
changed to improve the paper?

17
Bonus Discussion Questions
 Are there HW equivalents to QED methods?

 Were the results for QED convincing?

18
Papers
 [1] Lin, D.; Hong, T.; Yanjing Li; Eswaran, S.; Kumar, S.; Fallah, F.; Hakim, N.; Gardner, D.S.;
Mitra, S., "Effective Post-Silicon Validation of System-on-Chips Using Quick Error
Detection," Computer-Aided Design of Integrated Circuits and Systems, IEEE
Transactions on , vol.33, no.10, pp.1573,1590, Oct. 2014.
 [2] Oh, N.; Shirvani, P.P.; McCluskey, E.J., "Control-flow checking by software
signatures," Reliability, IEEE Transactions on , vol.51, no.1, pp.111,122, Mar 2002.
 [3] Das, P.; Gupta, S.K., "Gate delay modeling for pre- and post-silicon timing related
tasks for ultra-low power CMOS circuits," Computer Design (ICCD), 2013 IEEE 31st
International Conference on , vol., no., pp.227,234, 6-9 Oct. 2013.
 [4] Keshava, J.; Hakim, N.; Prudvi, C., "Post-silicon validation challenges: How EDA and
academia can help," Design Automation Conference (DAC), 2010 47th ACM/IEEE , vol.,
no., pp.3,7, 13-18 June 2010.
 [5] Mitra, S.; Seshia, S.A.; Nicolici, N., "Post-silicon validation opportunities, challenges
and recent advances," Design Automation Conference (DAC), 2010 47th ACM/IEEE ,
vol., no., pp.12,17, 13-18 June 2010.

19
Paper Map
 [1] Lin, D.; …"Effective Post-Silicon Validation of …," ICASICS’14.
 [2] Oh, N.; …"Control-flow checking by software …," ITR’02.
 [3] Das, P.; …"Gate delay modeling for pre- and …," ICCD’13.
 [4] Keshava, J.; … "Post-silicon validation challenges: …” DAC’10.
 [5] Mitra, S.; … "Post-silicon validation …," DAC’10.

[1] summary work on QED (2 [3] 1st work on alternative


prior conf pprs) post-Si method
[1] builds on [2] for 1 technique

[2] is 1st work


on control flow
checking
One approach: Alternative approach:
SW method modeling method

[4] and [5] are broad, foundational reviews of the post-Si


verification topic area
20
Glossary
 Blocking bug: prevents testing/discovery of
further issues
 Electrical bugs: from electrical state – subtle
 Intrusiveness: test changes design so as to
obscure/prevent the original bug
 Logic bugs: from design errors
 Survivability features: ways to fix bugs post fab;
chicken switches, µcode updates, fuses, etc.
 Uncore: anything that is not processor

21

You might also like