Confidential
HBM: Memory Solution for
Bandwidth-Hungry Processors
Joonyoung Kim and Younsu Kim
SK hynix Inc.
August 2014
2014 SK hynix Inc.
This material is proprietary of SK hynix Inc. and subject to change without notice.
205.132.242.85 / 2014. 07. 18 16 : 50 / B34047 / 2057897
Memory requirement
HBM: Memory Solution for Density & Bandwidth-Hungry Processors
< Exa-scale Roadmap >
High-End Graphics
40G/100G Ethernet
Exa-scale HPC
Source : SciDAC,
www.scidacreview.org
205.132.242.85 / 2014. 07. 18 16 : 50 / B34047 / 2057897
Memory bottleneck & solution - Speed, Density, Power & SFF
TSV is a revolutionary technology for overcoming the bottleneck
High Bandwidth
High Capacity
solution beyond
solution to overcome DRAM
DDR4/GDDR5/LPDDR4
Scaling Limit
Power Efficiency
Small Form Factor
solution to meet
solution to meet
system power budget
Size & Cost limitation
205.132.242.85 / 2014. 07. 18 16 : 50 / B34047 / 2057897
TSV(Through Silicon via)
TSV is the technology of 3D Stack
(High Density / Small size PKG / High speed)
Die Pad
Die Pad
Bond wire
TSV
PKG
Pad
PKG
Pad
Die #2
Die #1
< Wire bonding PKG >
< TSV PKG >
205.132.242.85 / 2014. 07. 18 16 : 50 / B34047 / 2057897
Bottleneck 1) Bandwidth
DDR3
TSV(HBM)
Config.
soc
IO
64 DQ
1024 DQ
Speed
1.6G bps
1~2Gbps
Bandwidth
Compare
64 Gbps
Long Length
12.8GBps
1024 Gbps
RLC increase
Short Length
Max 256GBps
RLC decrease
205.132.242.85 / 2014. 07. 18 16 : 50 / B34047 / 2057897
Bottleneck 2) Technology Limit
1. Capacity Limit
2. Scale Down Limit
205.132.242.85 / 2014. 07. 18 16 : 50 / B34047 / 2057897
Bottleneck 3) Small Form Factor
Wire bonding - DDR3
TSV - HBM
Image
http://www.shmj.or.jp/english/
packaging/pac90s.html
PKG Size@die
100% (117mm2)
36% (42mm2)
mm2@128GB/s
100% (3744mm2)
11% (42mm2)
Power
Consumption*
@128GB/s
100% (6.4W)
51% (3.3W)
* Power Cal = IMPT
205.132.242.85 / 2014. 07. 18 16 : 50 / B34047 / 2057897
Bottleneck 4) Low Power
Lower Speed/pin and x1024 Wide IO
Lower Cio(0.6pF), No Termination
low power consumption per Pin.
small IO current consumption
Power consumption is decreased by 42% compared with GDDR5
Ratio
[mW/Gbps/Pin]
1.0
42%
0.63
0.55
0.32
DDR3 x16
DDR4 x16
GDDR5 x32
HBM
I/O Power Efficiency@IDD4R
205.132.242.85 / 2014. 07. 18 16 : 50 / B34047 / 2057897
HBM Overall Architecture
4 Core DRAM + 1 Base logic die (Chip on Wafer)
Items
Target
# of Stack
4(Core) + 1(Base)
Ch./Slice
Total Ch. for KGSD
IO/Ch.
128
Total IO/KGSD
1024(=128 x 8)
Address/CMD
Dual CMD
Data Rate
DDR
[1] D.U Lee, SK hynix, ISSCC 2014
205.132.242.85 / 2014. 07. 18 16 : 50 / B34047 / 2057897
HBM, What are the differences?
KGSD* Memory supporting for System in package
HBM in 2.5D SiP
FBGA
Side Molding
Side Molding
DRAM Slice
DRAM Slice
DRAM Slice
DRAM Slice
DA ball
TSV
PHY
SoC
PHY
Interposer
KGSD
Base Die
DFT Area
TSV Area
DA BALL
TSV OVERLAP
(TEST Logic)
ADD/CMD,
DQ (RX/TX),
(Direct Access)
DFT
PHY
DRAM POWER
Supply
SIGNAL
Connectivity
Test
* KGSD (Known Good Stacked Die)
10
205.132.242.85 / 2014. 07. 18 16 : 50 / B34047 / 2057897
HBM Market Segments
HBM market will scale-out to various segments
(Over 21 Design-Ins In Progress)
Low
Power
Capacity
& BW
Improve
RAS
POST
GDDR5
128GBps BW
JEDEC std
Density flexibility
>256GBps BW
VDD=1.XV
Good I/O power eff.
Digital Consumer,
Automotive, etc.
ECC
Post-package Repair
MBIST
205.132.242.85 / 2014. 07. 18 16 : 50 / B34047 / 2057897
HBM Overall specification
1nd Gen HBM
2Gb per DRAM die
1Gbps speed /pin
128GB/s Bandwidth
4 Hi Stack (1GB)
x1024 IO
1.2V VDD
Base Die
KGSD w/ Bump
2nd
Gen HBM
Interposer
8Gb per DRAM die
2Gbps speed/pin
256GBps Bandwidth/Stack
4/8 Hi Stack (4GB/8GB)
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
HBM Gen1 5mKGSD Structure
5mKGSD (molded Known Good Stacked Die)
- 1 Base + 4 Core (DRAM) with Side Mold -
<Cross Section View>
490mm
25um
5.48mm 25um
<Top View>
7.29mm 25um
Exposed Silicon
(a), (b), (c), (d)
<Bottom View>
Item
Value
(a)
uBump
Diameter
25um
(b)
uBump
Height
( 3.5 um)
(c)
uBump Pitch
55um
(d)
uBump Array
(MPGA)
JEDEC
Remark
( 3 um)
35um
Cu/Ni/SnAG
(17/3/15um)
JC11-2.883
JC11-4.884
Side Mold(190um)
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
Comparison of HBM and other DRAMs
Item
DDR3 (x8)
GDDR5 (x32)
4-Hi HBM (x1024)
I/O
32
1024
Prefetch (Per IO)
8Byte
32Byte
256Byte
2GB/s
28GB/s
128~256GB/s
40~48ns
40ns(=1.6v, 1.5v)
48ns(=1.35v)
40~48ns
tCCD
4ns (=4tCK)
2ns (=4tCK)
2ns (=1tCK)
VPP
Internal VPP
Internal VPP,
(Opt. Ext. VPP)
Ext. VPP
VDD
1.5, 1.35
1.6, 1.5, 1.35
1.2
Single CMD
Single CMD
Dual CMD
Refresh Single Bank
DBI mode
O (DBI_DC)
O (DBI_AC)
Access Granularity
(=I/O x Prefetch)
Max. Bandwidth
tRC
CMD Input
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
Dual CMD interface
CMD efficiency increased by Semi-independent row/column input
Row/column input through different pins
RAS
CMD
Conventional drams share RAS/CAS CMD.
ADD
RA, CA
Bank
RA, CA
CMD
RAS/CAS/WE/CS
decoder
CMD
CAS
decoder
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
REF Single Bank
Single bank refresh and programmable tRAS
Concurrent read/write operation with single bank refresh allows data bus to remain active.
Refresh
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
REF Single Bank
Command bus efficiency can be maximized
- Dual Command & REF single bank -
CLK
T+1
T+2
T+3
BANK0
ACT
tRCD
BANK1
tRRD
ACT
T+5
T+6
T+7
WT
T+8
PCG
WT
BANK2
PCG
BANK3
WT
BANK4
tRRD
RD
WT
BANK5
RD
BANK6
BANK7
T+4
REFSB
RD
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
HBM Core Architecture
HBM single die has 2channels
1 channel consists of 128 TSV I/O with 2n prefetch
CH-Left
B0
YCTRL
B2
DWORD 0
32 I/O
B6
B0
YCTRL
YCTRL
B3
B2
DWORD 1
32 I/O
B4
YCTRL
B1
AWORD
CH-Right
DWORD 2
32 I/O
B5
B4
YCTRL
YCTRL
B7
B6
B1
B0
YCTRL
YCTRL
B3
B2
DWORD 3
32 I/O
DWORD 0
32 I/O
B5
B4
YCTRL
YCTRL
B7
B6
B1
B0
YCTRL
YCTRL
B3
B2
DWORD 1
32 I/O
B5
C
YCTRL
B7
AWORD
B1
C
YCTRL
B3
DWORD 2
32 I/O
DWORD 3
32 I/O
1 bank
B4 : 2 sub-banks(64Mb)
B5
non-shared I/O between
sub-banks
YCTRL
B6
YCTRL
B7
[2] D.U Lee, SK hynix, ISSCC 2014
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
HBM Base Die Architecture
Base die consists of 3 Areas PHY, TSV, Test Port Area
HBM ballout area
6,050x3,264 m
[3] D.U Lee, SK hynix, ISSCC 2014
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
Base Die Customization Future HBM Concept
Logic Layer
Host I/F + Memory I/F + Base Logic/IP Block
Overcome
Memory Scaling
Memory Die
Customization to
meet various
requirements
- Timing
- Refresh
Parallel-to-Serial(P2S)/S2P
JTAG, PMBIST
Configuration Registers
Error Handling
20
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
HBM Thermal Management
Temp. [[]
Thermal dummy bumps as well as well-designed device architecture are helpful for
thermal dissipation No mechanical reliability issues by thermal dummy bumps.
Temp. Saturation
TSV
Thermal
Dummy Bumps
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
HBM Long-term Roadmap(2) (Preliminary)
HBM product longevity is critical in several applications
SK hynix plans to address longevity requirement
2014
(1)
2015
2016
2017
2018
2019
2020
2021
2022
2023
Bandwidth: starts at 128GB/s & plans for higher
Note 1 anticipated future HBM density
Note 2 roadmap is subject to changes without prior notifications
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
HBM Summary
Perfect memory solution for various application requirement
High Bandwidth
High Density PKG
~256GB/s
Up to 8GB
HBM
Smaller
Form Factor
-65%
1 ~ 4 Cubes
per GPU
4 ~ 8 Stacks
KGSD
1)
Good Power
Efficiency
68%
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897
Thank You !
205.132.242.85 / 2014. 07. 18 16 : 51 / B34047 / 2057897