0% found this document useful (0 votes)
4 views

Reliability_testing_of_VCSELs_Transceivers_and_ASICs

The document discusses reliability testing of VCSELs, transceivers, and ASICs, highlighting past failures in ATLAS and ongoing experiments to understand the causes of these failures. It outlines future testing plans, including standard damp heat tests and monitoring of link performance under elevated temperatures. The document emphasizes the importance of rigorous testing to prevent accidents and improve reliability in high-energy physics applications.

Uploaded by

monohsieh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Reliability_testing_of_VCSELs_Transceivers_and_ASICs

The document discusses reliability testing of VCSELs, transceivers, and ASICs, highlighting past failures in ATLAS and ongoing experiments to understand the causes of these failures. It outlines future testing plans, including standard damp heat tests and monitoring of link performance under elevated temperatures. The document emphasizes the importance of rigorous testing to prevent accidents and improve reliability in high-energy physics applications.

Uploaded by

monohsieh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Reliability testing of VCSELs,

Transceivers and ASICs.


History, status and plans

Opto Mini-Workshop, CERN 21/3/14

Tony Weidberg Opto mini workshop March '14 1


Outline
• VCSEL failures in ATLAS
– Reminder TL failures
– Controlled experiments to determine cause of
damage
– Outstanding mysteries
• TL and AOC VCSELs
• Plans for future reliability testing
– VCSEL
– Transceiver
– ASICs

Tony Weidberg Opto mini workshop March '14 2


Failure Rates in ATLAS Operation
CERN Field Failure Data (Lognormal Probability Plot)
99

2 years
Cumulative Percentage Failed (%)

s)
re
ilu
fa
0 0
50

(6
1 year

r ts
pa
ed
ct
te
ro
10 r es)
p
un 20 fai lu
5 pa r ts (
al

te d
in

u la
caps
rig

- en
ic
lectr
O

1 ren t d ie
0.5 Cu r

0.1 r ate for


c ipa te d fail ureulated VC SELs
0.05 An ti
-e nca p s
diele ctric
0.01
3000 10000 50000
Time in service (hours)

Tony Weidberg Opto mini workshop March '14 3


STEM Failed Channel
TL VCSEL array after FIB cut

DBR
Defects at edge of Oxide 
DBR  active MQW region

Oxide
MQW

Analysis by EAG
Tony Weidberg Opto mini workshop March '14 4
More Controlled Tests
• Aged VCSEL array in 70C/85% RH with
regular power measurements and EL
imaging.
• Stopped as soon as significant
decrease in power detected.
• EL image shows 4% of area is dark.
• Subsequent TEM analysis (next
slides).

Tony Weidberg Opto mini workshop March '14 5


Plan View TEM
• Dislocations in dark region from EL
– Two dislocations emanating from tip of Oxide.

Zoom
Tony Weidberg Opto mini workshop March '14 6
X-Section TEM

• X-section views
– after thinning to ~ 1.8
um (“thick”).

– after further thinning


to ~ 0.8 um. This
allows tracing of
defects.

Tony Weidberg Opto mini workshop March '14 7


Tracing Defects
• line dislocations starting Top View (PV)
Thin section: 0.84 µm
from oxide tip (crack?). Thick section: 1.84 µm

• traveled down from oxide


aperture  active region Delamination
Oxide aperture p-DBR

below, and started the


Active region

Side View (thick)


DLD network.
n-DBR

• Note lines travel up Side View (thin)


0.5 µm

before looping down


(follow current wind).

Tony Weidberg Opto mini workshop March '14 8


Remaining Mysteries -1
• Compare lifetime data from TL VCSELs in ATLAS USA-15
with accelerated ageing tests (ULM).
– MTTF in USA-15 is lower than predicted by model fitting ULM
data by factor 4 to 6.
– Null hypothesis that ULM and USA-15 data described by
common parameters for the acceleration model excluded at
90%.
• Compare controlled experiment in SR1 with USA-15.
– 4 TL arrays operated in SR1 for more than 500 days.
– Only 1 channel died.
– Inconsistent with observed MTTF in USA-15, null hypothesis of
same MTTF in SR1 as USA-15, gives p-value 8.3 10-6.

Tony Weidberg Opto mini workshop March '14 9


Remaining Mysteries - 2
• Decrease in power for AOC arrays in USA-15
• Measure power using current in p-i-n diode
on detector.
– Note we do expect significant decrease in
responsivity from radiation damage.
– See similar decrease for all barrel layers  see
slide
– incompatible with radiation damage?

Tony Weidberg Opto mini workshop March '14 10


p-i-n Diode Radiation Damage
• Decrease in responsivity ~
30% with relatively low
fluence than plateaus.
– 24 GeV protons
• Fluence seen by inner
barrel ~ 0.06 10 14 n cm -2

Tony Weidberg Opto mini workshop March '14 11


Will Kalderon
AOC arrays in USA-15
Current measured by p-i-n diode on detector

Layer 3 at largest radius 


smallest fluence

Tony Weidberg Opto mini workshop March '14 12


Remaining Mysteries - 3
• Long term monitoring of optical power for
AOC TXs in SR1 using LAPD (measure power
from all 12 channels).
• Do not reproduce decrease of 10%/year seen
in USA-15  slides.

Tony Weidberg Opto mini workshop March '14 13


Optical Power (mV)
Temperature Correlation
Steve McMahon

AOC TX in
T correction fit Bat 161

14
T (°C)
AOC TX in Bat 161 Steve McMahon

AOC TX DT>1 in a day (hence missing days)


Optical Power (mV)

15
Time (days)
Optical Power (mV) AOC in SR1 Steve McMahon

16
Time (days)
VCSEL Testing Plans
• Standard damp heat tests
– 1000 hours, 85C/85% RH.
– Drive current 10 mA dc
– Measure optical power continuously.
– Aim for much higher statistics than we have done
in the past  learn about infant mortality and
random failure rates as well as lifetime.
– So far we have tested 2 VCSELs, would like to do
200 devices?
• Have equipment to do batches of 80 devices.

Valencia Feb 2014 Tony Weidberg 17


Transceiver Tests
• Monitor link performance while operating at
elevated temperatures.
• Look for evidence of degradation using
– Eye diagrams
– BER scans

Valencia Feb 2014 Tony Weidberg 18


Eye Diagrams
• Use Digital
Communication Analyser
to measure eye diagrams
– We are getting our DCA
firmware upgraded to
allow testing at a bit rate
of 4.8 Gbits/s.
– Determine many
parameters, e.g.
horizontal and vertical
eye opening, rise and fall
times, noise, random and
deterministic jitter.
Valencia Feb 2014 Tony Weidberg 19
Equipment for BER Scan
• FPGA
– Generates PSRB data
– Measures BER
• Loopback test, e.g. transceiver VTRx to receiver VTRx.
• Computer controlled optical attenuator to allow scan of BER
vs OMA. Has a 10% and 90% tap to allow for power
measurement during BER scan.
• Optical switch to allow many channels to be measured.
• We are getting a copy of CERN VL system so we can use their
firmware and software.

Valencia Feb 2014 Tony Weidberg 20


BER System

VTRx
FPGA Optical
Switches Optical Power
VTRx attenuator Meter

Loopback tests
Optical switches allow many VTRx to be tested in
an environmental chamber.
Valencia Feb 2014 Tony Weidberg 21
BER Scans
• Measure BER vs OMA
(optical modulation
amplitude).
• Define minimum OMA
to achieve BER = 10-12.
• Measure this during
continuous operation
at elevated
temperature.
• Curves show example
BER scans with and
w/o beam.

Valencia Feb 2014 Tony Weidberg 22


Chip Reliability
• What is there to worry about?
• Failure Mechanisms
• Statistical analysis PoF
• Plans for testing GBTx (similar study for
ABC130).

23
Why worry?
• Traditionally failures in HEP not dominated
by ASIC reliability
– Connectors, solder, wire bonds, cracks in tracks
and vias, capacitors, power supplies
– Non-ideal scaling in DSM processes
• Aggressive designs target optimal performance
• Voltage decreases insufficient to compensate density
increase  higher T  lower reliability.

24
FA Webinar- Cheryl Tulkoff
(slide from J. Bernstein)

25
ASIC Reliability
• Lifetime tests at different T ( low and high)
and elevated V
• Fit model parameters  extrapolate MTTF to
use case (see backup slides for details).
• Start with ATLAS pixel FE-I4
• Test GBTx when large numbers available

Tony Weidberg Opto mini workshop March '14 26


Summary & Outlook
• “If you think safety is expensive, try having an
accident”
• Plenty of painful experience in ATLAS  must perform
rigorous testing before production.
• Still trying to understand VCSEL failures in ATLAS
• Plan rigorous campaign to understand reliability
for phase II upgrades for ATLAS/CMS
– VCSELs
– Transceivers
– ASICs

Tony Weidberg Opto mini workshop March '14 27


BACKUP SLIDES

Tony Weidberg Opto mini workshop March '14 28


Chip Reliability
AUW: ITK Opto-electronics, Electrical
Services and DCS: 14/5/13

Steve McMahon & Tony Weidberg

29
Chip Reliability
• What is there to worry about?
• Failure Mechanisms
• Statistical analysis PoF
• Plans for testing GBTx (similar study for
ABC130).

30
Why worry?
• Traditionally failures in HEP not dominated
by ASIC reliability
– Connectors, solder, wire bonds, cracks in tracks
and vias, capacitors, power supplies
– Non-ideal scaling in DSM processes
• Aggressive designs target optimal performance
• Voltage decreases insufficient to compensate density
increase  higher T  lower reliability.

31
FA Webinar- Cheryl Tulkoff
(slide from J. Bernstein)

32
Physics of Failure (PoF)
• Assumption of single dominant damage
mechanism can lead to wrong extrapolation of
lifetimes from accelerated tests.
• PoF aims to understand different failure
mechanisms
– Fit model parameters to data for each damage
mechanism
– Combine results to predict reliability at operating
conditions
– Health warning: competing models for some damage
mechanisms can give very different extrapolations to
operating conditions.
33
Time Dependent Dielectric
Breakdown (TDDB)
Holes injected into
• In DSM processes E fields over oxide  Stress
gate oxides ~ 5 MV/cm cf Induced leakage
currents by
breakdown fields of > ~ 10
tunnelling 
MV/cm. breakdown
– Gradual degradation  later failures
• Acceleration model
– Mean Time to Failure (MTTF)
– MTTF=A×10-βE exp(-Ea/kT)
– Example fits look ok but activation
energy not constant?  next slide
–  can’t fit to single failure mechanism!

34
TDDB Fits
• Fits to Voltage (E field)
and T look ok but
MTTF
estimated value of Ea vs E
depends on E ?
Fitted Ea not constant!

MTTF
vs 1/T

35
Hot Carrier Injection (HCI)
• Non-ideal scaling  larger E fields  “hot”
carriers can overcome barrier between Si and
gate oxide
– Trapped charges lead to changes in VTh and gm
– Eventually lead to failure
– t = c (Isub)-m
– T dependence because at low T electron mfp
longer  acquire more energy in E field 
impact ionization.
36
HCI
• Shift Min Vcc
• Example fits to
threshold shifts.
• Typical fit values
– m~3
• Also need to consider T
variation.

37
Electro-migration (EM)
• High current densities, force exerted by electrons large enough to
cause diffusion of metal ions in the direction of the e flow.
– Creates voids  increases R  thermal runaway  open circuit
– Excess build up of ions at the anode can give short circuit
• Very sensitive to material, doping, grain boundaries etc…
• EM is thermally activated, T gradients  flux divergences.
• Best model MTTF  A( j e ) n exp( Ea / kT )

– Typical values : Ea=0.6 eV and n ~ 2.

38
Other Mechanisms
• NBTI (Negative Bias Temperature Instability)
– Degradation (Vth/Gm shift) occurring due to
negative biased BT (bias temperature) stress in
PMOS FETs
• Stress migration
– CTE mismatch can cause stress even with no
current.
• Assembly & packaging

39
Combining Failure Rates
• Common method is just to assume exponential
distributions
– Total failure rate: TOTAL  i i
– But we know that failure distributions aren’t exponential !
• Failure distributions better modelled by Weibull or log-
normal distributions.
• Finally we don’t actually want MTTF we need MTT01
(1% failure) or MTT10 (10% failure).
– Need to combine distributions correctly from different
failure mechanisms.
– Determine MTT0X numerically

40
Weibull Distribution (from Wiki)
 1
m  x ( x /  )m
f ( x ; m,  )    e
  
• Commonly used distribution in reliability theory
• m < 1 indicates that the failure rate decreases
over time significant “infant mortality”.
• m = 1  failure rate is constant over time, i.e.
random failure.
• m > 1 failure rate increases with time. This
happens if there is an "aging" process
Compare Distributions
• Compare exponential, Weibull and log-normal
• Note Weibull and log normal totally different from
exponential for small x
– This is just the region we are interested in!

0.6

0.5

0.4
exp(-x)
0.3 Weibull
0.2 log normal
0.1

0
0 1 2 3 4 5 6 42
Example Weibull Distributions
2

1.8

1.6

1.4

1.2 m=2
m=10
1

0.8

0.6

0.4

0.2

0
0 1 2 3 4 5 6
Measuring MTTF
• How well can we determine MTTF in an AL
(Accelerated Lifetime) test? Depends on
– Sample size
– Weibull shape parameter n.
• Example Fits Sample size % error t_m
10 18.9
– Assume n=2 (pessimistic) 30 10.0
50 8.2

– Assume n=10 (optimistic) Sample size % error t_m


10 3.8
30 2.0
50 1.7 44
Determining Model Parameters
• Brute force: Run ALT for matrix of different T and
V and fit data to get model parameters.
– Too many tests  too slow/expensive.
• Smarter approach
– High T/High V  TDDB
• Vary T  Ea, vary V  exponent c
– Low T/High V  HCI
• Vary T  Ea2, vary V  g2
– High T/low V  EM dominates
• Vary T  Ea3

45
Determining (V,T) Grid
• Use case assumed: V=1.2V, T=20C.
• Assumed 3 damage mechanisms have equal rates at
use condition (pessimistic)
• (V,T) Matrix designed to determine model
parameters with minimum number of tests.
– EM: Temp values 85 95 110
– TDDB: Voltage values: 1.5 1.6 1.7
– HCI: Temp ©
C
Voltage -20 -10 0
1.55 x
1.5 x x x
1.45 x
46
(V,T) Grid
• Simplify analysis
– Can we factorise different damage mechanisms in fits?
– Look at purity
EM TDDB HCI
– Not perfect?
88.3 11.7 0.0
0.0 95.1 4.9
0.0 11.6 88.4

– Acceleration rates:
• high so that tests last not longer than ~1000 hours
• Not too high so that other mechanisms are dominant and
extrapolation to use case is too large.
• AF in range 103 to 2 105.

47
Errors on Acceleration Factors from Fits
% error MTTF % error AF

• EM fits for Ea in 2
4
7.5
15.0
8 12.7
exp(- Ea /kT) 10 30.8
20 95.8

% error MTTF % error AF


• TDDB fits for c 2 4.5
4 8.8
in vc 8 17.9
10 21.9
• HCI fits 20 47.6

MTTF error % AF error % MTTF error % AF error %


2 3.8 2 4.9
4 7.6 4 9.8
T variation 8 15.5
V variation 8 20.1
 Ea2 10 19.0 g 10 24.7
20 42.3 20 48
59.9
Next Steps
• Global Fits:
– Use all (V,T) data in one fit
– Build reliability model  plot predicted
cumulative failure rates at some reference point.
– Predict MTT10 and MTT01 failure
– Note: eventually this type of information will be
used to decide whether we need redundancy.

49
Practical issues
• Can we use this (V,T) range (TBD with Paulo).
• Need minimum 11 grid points and between 10 and 30
chips per point.
• Also need to do quick tests with fewer chips to determine
centres of the grids.
– Check that MTTF is in reasonable range (1 to 1000 hours).
• Number of chips required in range 150 to 400.
• Use several environmental chambers
– Combine tests at same T but different V conditions  need
between 3 and 7 environmental chambers depending if all
tests are done in parallel or some in series.
– Hope to find new collaborators …

50
References
• Bernstein, Physics of Failure Based Handbook
of Microelectronic Systems, RIAC.
• Srinivasaan et al, The impact of Technology
Scaling on Lifetime Reliability, DSN-04.
• Semiconductor Reliability Handbook,
www.renesas-electoronics.com

51
LTx in SR1
• LTx optical power.
• No T correction
• Initial decrease
~1%.
– No burn-in
preformed for
this array 
probably ok but
should run longer

52
53
Accelerated Aging Tests
• Measure Mean Time To Failure at several elevated
temperature/current and RH use Arrehnius
equation for Acceleration Factor from (I2,T2) to
(I1,T1) Activation energy: EA and exponential for
relative humidity (RH).
 EA 
exp 
I2 2
AF  ( )  k BT2  AF  exp(a * RH )
I1  EA 
exp 
 k BT1 

Tony Weidberg Opto mini workshop March '14 54


Fit Results
EA=0.72 eV
a = 0.059 (/%)

Tony Weidberg Opto mini workshop March '14 55


VCSELs in air show decrease in VCSELs in dry N2 show no
width with time and then decrease in width with time
plateau
Tony Weidberg Opto mini workshop March '14 56
EBIC comparison working & Failed
channels TL VCSEL array
Working Dead

• All taken with same SEM settings: 10KV spot 5 (roughly same mag 4700X and 5000x)
• Original Image LUTs stretched to accentuate EBIC changes across VCSELs
• Only Ch 10 shows distinct EBIC minima (dark spots) within the emission region
• Ch 06 & 08 show some inhomogeneity but no distinct minima
• Small dark speckles are surface topography

Tony Weidberg AnalysisOpto


by mini
EAGworkshop March '14 57
STEM Unused Channel
TL VCSEL array after FIB cut

Top DBR

oxide

MQW
(active region)

Bottom
DBR

Analysis by EAG
Tony Weidberg Opto mini workshop March '14 58
Example Spectra
• Air ~ 50% RH • Dry N2
– Loss of higher order – Higher order modes
modes visible very similar

Tony Weidberg Opto mini workshop March '14 59

You might also like