The MD6 Hash Function
(aka Pumpkin Hash)
Ronald L. Rivest
MIT CSAIL
CRYPTO 2008
MD6 Team
Dan Bailey
Sarah Cheng
Christopher Crutchfield
Yevgeniy Dodis
Elliott Fleming
Asif Khan
Jayant Krishnamurthy
Yuncheng Lin
Leo Reyzin
Emily Shen
Jim Sukha
Eran Tromer
Yiqun Lisa Yin
Juniper Networks
Cilk Arts
NSF
Outline
Introduction
Design considerations
Mode of Operation
Compression Function
Software Implementations
Hardware Implementations
Security Analysis
MD5 was designed in
1991
Same year WWW announced
Clock rates were 33MHz
Requirements:
{0,1}*
{0,1}d for digest size d
Collision-resistance
Preimage resistance
Pseudorandomness
Whats happened since then?
Lots
What should a hash function --MD6 --- look like today?
NIST SHA-3 competition!
Input: 0 to 264-1 bits, size not known
in advance
Output sizes 224, 256, 384, 512 bits
Collision-resistance, preimage
resistance, second preimage
resistance, pseudorandomness,
Simplicity, flexibility, efficiency,
Due Halloween 08
Design Considerations /
Responses
Wang et al. break MD5
(2004)
Differential cryptanalysis
(re)discovered by Biham and Shamir
(1990). Considers step-by-step
``difference (XOR) between two
computations
Applied first to block ciphers (DES)
Used by Wang et al. to break
collision-resistance of MD5
Many other hash functions broken
similarly; others may be vulnerable
So MD6 is
provably resistant to differential
attacks (more on this later)
Memory is now
``plentiful
Memory capacities have increased
60% per year since 1991
Chips have 1000 times as much
memory as they did in 1991
Even ``embedded processors
typically have at least 1KB of RAM
So MD6 has
Large input message block size:
512 bytes (not 512 bits)
This has many advantages
Parallelism has arrived
Uniprocessors have hit the wall
Clock rates have plateaued, since power
usage is quadratic or cubic with clock
rate:
P = VI = V2/R = O( freq2 )
(roughly)
Instead, number of cores will double
with each generation: tens, hundreds
(thousands!) of cores coming soon
16
64
256
So MD6 has
Bottom-up tree-based mode of
operation (like Merkle-tree)
4-to-1 compression ratio at each node
Which works very well in
parallel
Height is log4( number of nodes )
But most CPUs are
small
Most biomass is bacteria
Storage proportional to tree height
may be too much for some
CPUs
So MD6 has
Alternative sequential mode
IV
(Fits in 1KB RAM)
Actually, MD6 has
a smooth sequence of alternative
modes: from purely sequential to
purely hierarchical L parallel
layers followed by a sequential
layer, 0 L 64
Example: L=1:
IV
Hash functions often
``keyed
Salt for password, key for MAC,
variability for key derivation,
theoretical soundness, etc
Current modes are post-hoc
So MD6 has
Key input K
of up to 512 bits
K is input to every compression
function
Generate-and-paste
attacks
Kelsey and Schneier (2004), Joux
(2004),
Generate sub-hash and fit it in
somewhere
Has advantage proportional to size
of initial computation
So MD6 has
1024-bit intermediate (chaining)
values
root truncated to desired final
length
(2,0)
(2,3)
(2,2)
(2,1)
Location (level,index) input to each
node
Extension attacks
Hash of one message useful to
compute hash of another message
(especially if keyed):
H( K || A || B ) = H( H( K || A) ||
B)
So MD6 has
``Root bit (aka z-bit or
pumpkin bit) input to each
compression function:
True
Side-channel attacks
Timing attacks, cache attacks
Operations with data-dependent
timing or data-dependent resource
usage can produce vulnerabilities.
This includes data-dependent
rotations, table lookups (S-boxes),
some complex operations (e.g.
multiplications),
So MD6 uses
Operations on 64-bit words
The following operations only:
XOR
AND
SHIFT by fixed amounts:
x >> r
x << l
<<
>>
Security needs vary
Already recognized by having
different digest lengths d (for
MD6: 1 d 512)
But it is useful to have reducedstrength versions for analysis,
simple applications, or different
points on speed/security curve.
So MD6 has
A variable number r of rounds.
( Each round is 16 steps. )
Default r depends on digest size d :
d
r
r = 40 + (d/4)
160 224 256 384 512
80
96 104 136 168
But r is also an (optional) input.
MD6 Compression function
Compression function
inputs
64 word (512 byte) data block
message, or chaining values
8 word (512 bit) key K
1 word U = (level, index)
1 word V = parameters:
Data padding amount
Key length (0 keylen 64 bytes)
z-bit (aka ``root bit aka``pumpkin bit)
L (mode of operation height-limit)
digest size d (in bits)
Number r of rounds
74 words total
Prepend Constant + Map +
Chop
const
key+UV
data
15
8+2
64
Prepend
1-1 map
89 words
Map
89
Chop
words
16 words
Simple compression
function:
Input: A[ 0 .. 88 ] of A[ 0 .. 16r +
88]
for i = 89 to 16 r + 88 :
x = Si A[ i-17 ] A[ i-89 ]
( A[ i-18 ] A[ i-21 ] )
( A[ i-31 ] A[ i-67 ] )
x = x ( x >> ri )
A[i] = x ( x << li )
return A[ 16r + 73 .. 16r + 88 ]
Constants
ri
li
Taps 17, 18, 21, 31, 67 optimize
diffusion
Constants Si defined by simple
recurrence; change at end of each
16-step round
Shift
amounts
repeat
each
round
0 1 2 3 4 5 6 7 8 9 1 1 1 1
(best diffusion of 1,000,000
0 such
1 2 3
1 5 1 1 1 1 2 7 1 1 7 1 1 7
tables):
0
1
1
2
4
1
6
1
5
2
7
1
5
2
9
1
5
1
4
1
5
1
2
3
1
Large Memory (sliding
window)
2
Array of 16r + 89 64-bit words.
Each computed as function of
preceding 89 words.
Last 16 words computed are output.
Small memory (shift
register)
89 words
2 3 2 1 5 6 3 2 7 1 3 2 6 3 1 4 0 1
Si
Shifts
Shift-register of 89 words (712
bytes)
Data moves right to left
Software Implementations
Software implementations
Simplicity of MD6:
Same implementation for all digest
sizes.
Same implementation for SHA-3
Reference or SHA-3 Optimized
Versions.
Only optimization is loop-unrolling
(16 steps within one round).
NIST SHA-3 Reference
Platforms
32-bit
64-bit
MD6-160
44 MB/sec
97 MB/sec
MD6-224
38 MB/sec
82 MB/sec
MD6-256
35 MB/sec
77 MB/sec
MD6-384
27 MB/sec
59 MB/sec
MD6-512
22 MB/sec
49 MB/sec
SHA-512
38 MB/sec
202 MB/sec
Multicore efficiency
MD6-256
SHA-256
Cilk!
Efficiency on a GPU
Standar
d $100
NVidia
GPU
375
MB/sec
on one
card
8-bit processor (Atmel)
With L=0 (sequential mode), uses
less than 1KB RAM.
20 MHz clock
110 msec/comp. fn for MD6-224
(gcc actual)
44 msec/comp. fn for MD6-224
(assembler est.)
Hardware
Implementations
FPGA Implementation (MD6512)
Xilinx XUP FPGA (14K logic slices)
5.3K slices for round-at-a-time
7.9K slices for two-rounds-at-a-time
100MHz clock
240 MB/sec (two-rounds-at-a-time)
(Independent of digest size due to
memory bottleneck)
Security Analysis
Generate and paste attacks
(again)
Because compression functions
are location-aware, attacks that
do speculative computation hoping
to cut and paste it in somewhere
dont work.
Property-Preservations
Theorem. If f is collision-resistant, then
MD6f is collision-resistant.
Theorem. If f is preimage-resistant, then
MD6f is preimage-resistant.
Theorem. If f is a FIL-PRF, then MD6f is a
VIL-PRF.
Theorem. If f is a FIL-MAC and root node
effectively uses distinct random key (due
to z-bit), then MD6f is a VIL-MAC.
(See thesis by Chris Crutchfield.)
Indifferentiability (Maurer et al.
04)
Variant notion of
indistinguishability appropriate
when distinguisher has access to
inner component (e.g. mode of
operation MD6f / comp. fn f).
MD6f
FIL RO
VIL RO
? or ?
Indifferentiability (I)
Theorem. The MD6 mode of
operation is indifferentiable from a
random oracle.
Proof: Construct simulator for
compression function that makes it
consistent with any VIL RO and MD6
mode of operation
Advantage: 2 q2 / 21024
where q = number of calls (measured
in terms of compression function calls).
Indifferentiability (II)
Theorem. MD6 compression function
f is indifferentiable from a FIL
random oracle (with respect to
random permutation ).
Proof: Construct simulator S for
and -1 that makes it consistent with
FIL RO and comp. fn. construction.
Advantage: q / 21024 + 2q2 /
24672
SAT-SOLVER attacks
Code comp. fn. as set of clauses,
try to find inverse or collision with
Minisat
With many days of computing:
Solved all problems of 9 rounds or less.
Solved some 10- or 11-round ones.
Never solved a 12-round problem.
Note: 11 rounds 2 rotations
(passes over data)
Statistical tests
Measure influence of an input bit
on all output bits; use AndersonDarling A*2 test on set of
influences.
Cant distinguish from random
beyond
12 rounds.
Differential attacks dont
work
Theorem. Any standard
differential attack has less chance
of finding collision than standard
birthday attack.
Proof. Determine lower bound on
number of active AND gates in 15
rounds using sophisticated
backtracking search and days of
computing. Derive upper bound on
probability of differential path.
Differential attacks (cont.)
Compare birthday
bound BB with our
lower bound LB on
work for any standard
differential attack.
(Gives adversary
fifteen rounds for
message
modification, etc.)
These bounds can be
improved
BB
LB
160
80
280
2104
224
96
2112 2130
256 104 2128 2150
384 136 2192 2208
512 168 2256 2260
Choosing number of
rounds
We dont know how to break any
security properties of MD6 for more
than 12 rounds.
For digest sizes 224 512 , MD6 has
80 168 rounds.
Current defaults probably
conservative.
Current choice allows proof of
resistance to differential cryptanalysis.
Summary
MD6
is:
Arguably secure against known
attacks (including differential
attacks)
Relatively simple
Highly parallelizable
Reasonably efficient
THE END
MD6
03744327e1e959fbdcdf7331e959cb2c28101166
Round constants Si
Since they only change every 16
steps, let Sj be the round constant
for round j .
S0 = 0x0123456789abcdef
Sj+1 = (Sj <<< 1) (Sj mask)
mask = 0x7311c2812425cfa0