06 Hashtables
06 Hashtables
ADMINISTRIVIA
C O U R S E S TAT U S
D ATA S T R U C T U R E S
Internal Meta-data
Core Data Storage
Temporary Data Structures
Table Indexes
DESIGN DECISIONS
Data Organization
→ How we layout data structure in memory/pages and what
information to store to support efficient access.
Concurrency
→ How to enable multiple threads to access the data
structure at the same time without causing problems.
H A S H TA B L E S
S TAT I C H A S H TA B L E
S TAT I C H A S H TA B L E
ASSUMPTIONS
H A S H TA B L E
T O D AY ' S A G E N D A
Hash Functions
Static Hashing Schemes
Dynamic Hashing Schemes
HASH FUNCTIONS
HASH FUNCTIONS
CRC-64 (1975)
→ Used in networking for error detection.
MurmurHash (2008)
→ Designed to a fast, general purpose hash function.
Google CityHash (2011)
→ Designed to be faster for short keys (<64 bytes).
Facebook XXHash (2012)
→ From the creator of zstd compression.
Google FarmHash (2014)
→ Newer version of CityHash with better collision rates.
CMU 15-445/645 (Fall 2019)
13
4000
Throughput (MB/sec)
3000
2000
1000
0
1 2 3 4 5 6 7 8
Source: Fredrik Widlund
Key Size (bytes)
CMU 15-445/645 (Fall 2019)
14
64 192
21000
32
14000
7000
0
1 51 101 151 201 251
Source: Fredrik Widlund
Key Size (bytes)
CMU 15-445/645 (Fall 2019)
15
S TAT I C H A S H I N G S C H E M E S
hash(key)
A
B
A | val <key>|<value>
C
D
E
F
hash(key) B | val
A
B
A | val
C
D
E
F
hash(key) B | val
A
B
A | val
C
D C | val
E
F
hash(key) B | val
A
B
A | val
C
D C | val
E D | val
F
hash(key) B | val
A
B
A | val
C
D C | val
E D | val
F E | val
hash(key) B | val
A
B
A | val
C
D C | val
E D | val
F E | val
F | val
hash(key) B | val
A
B
A | val
Delete C
D C | val
E D | val
F E | val
F | val
hash(key) B | val
A
B
A | val
Delete C
D
E D | val
F E | val
F | val
hash(key) B | val
A
B
A | val
C
Find D
E D | val
F E | val
F | val
NON-UNIQUE KEYS
Value Lists
Choice #1: Separate Linked List XYZ value1
→ Store values in separate storage area for ABC
value2
value3
each key.
value1
value2
Choice #2: Redundant Keys
→ Store duplicate keys entries together in
the hash table. XYZ|value1
ABC|value1
XYZ|value2
XYZ|value3
ABC|value2
hash(key)
A
B A | val [0] # of "Jumps" From First Position
C
D
E
F
C U C KO O H A S H I N G
C U C KO O H A S H I N G
Hash Table #1 Hash Table #2
Insert A
hash1(A) hash2(A)
⋮ ⋮
C U C KO O H A S H I N G
Hash Table #1 Hash Table #2
Insert A
hash1(A) hash2(A)
A|val
⋮ ⋮
C U C KO O H A S H I N G
Hash Table #1 Hash Table #2
Insert A
hash1(A) hash2(A)
A|val
Insert B
hash1(B) hash2(B)
⋮ ⋮
C U C KO O H A S H I N G
Hash Table #1 Hash Table #2
Insert A
hash1(A) hash2(A) B|val
A|val
Insert B
hash1(B) hash2(B)
⋮ ⋮
C U C KO O H A S H I N G
Hash Table #1 Hash Table #2
Insert A
hash1(A) hash2(A) B|val
A|val
Insert B
hash1(B) hash2(B)
⋮ ⋮
Insert C
hash1(C) hash2(C)
C U C KO O H A S H I N G
Hash Table #1 Hash Table #2
Insert A
hash1(A) hash2(A) C|val
A|val
Insert B
hash1(B) hash2(B)
⋮ ⋮
Insert C
hash1(C) hash2(C)
C U C KO O H A S H I N G
Hash Table #1 Hash Table #2
Insert A
hash1(A) hash2(A) C|val
B|val
Insert B
hash1(B) hash2(B)
⋮ ⋮
Insert C
hash1(C) hash2(C)
hash1(B)
C U C KO O H A S H I N G
Hash Table #1 Hash Table #2
Insert A
hash1(A) hash2(A) C|val
B|val
Insert B
hash1(B) hash2(B) A|val
⋮ ⋮
Insert C
hash1(C) hash2(C)
hash1(B)
hash2(A)
CMU 15-445/645 (Fall 2019)
24
O B S E R VAT I O N
CHAINED HASHING
CHAINED HASHING
Buckets
hash(key)
⋮ ⋮
CHAINED HASHING
Buckets
hash(key)
⋮ ⋮
EXTENDIBLE HASHING
EXTENDIBLE HASHING
global 2 00010… 1 local
01110…
00…
01… 10101… 2 local
10… 10011…
11…
11010… 2 local
EXTENDIBLE HASHING
global 2 00010… 1 local
01110… Find A
00… hash(A) = 01110…
01… 10101… 2 local
10… 10011…
11…
11010… 2 local
EXTENDIBLE HASHING
global 2 00010… 1 local
01110… Find A
00… hash(A) = 01110…
01… 10101… 2 local
10… 10011… Insert B
10111… hash(B) = 10111…
11…
11010… 2 local
EXTENDIBLE HASHING
global 2 00010… 1 local
01110… Find A
00… hash(A) = 01110…
01… 10101… 2 local
10… 10011… Insert B
10111… hash(B) = 10111…
11…
11010… 2 local Insert C
hash(C) = 10100…
EXTENDIBLE HASHING
global 2 00010… 1 local
01110… Find A
00… hash(A) = 01110…
01… 10101… 2 local
10… 10011… Insert B
10111… hash(B) = 10111…
11…
11010… 2 local Insert C
hash(C) = 10100…
EXTENDIBLE HASHING
global 3
2 00010… 1 local
01110… Find A
00… hash(A) = 01110…
01… 10101… 2 local
10… 10011… Insert B
10111… hash(B) = 10111…
11…
11010… 2 local Insert C
hash(C) = 10100…
EXTENDIBLE HASHING
global 3
2 00010… 1 local
01110… Find A
000… hash(A) = 01110…
010… 10101… 2 local
100… 10011… Insert B
10111… hash(B) = 10111…
110…
001… 11010… 2 local Insert C
011… hash(C) = 10100…
101…
111…
EXTENDIBLE HASHING
global 3
2 00010… 1
01110… Find A
000… hash(A) = 01110…
010… 3
100… 10011… Insert B
hash(B) = 10111…
110…
001… 10101… 3 Insert C
011… hash(C) = 10100…
10111…
101…
111… 11010… 2
EXTENDIBLE HASHING
global 3
2 00010… 1
01110… Find A
000… hash(A) = 01110…
010… 3
100… 10011… Insert B
hash(B) = 10111…
110…
001… 10101… 3 Insert C
011… 10100…
hash(C) = 10100…
10111…
101…
111… 11010… 2
LINEAR HASHING
LINEAR HASHING
8
20
0
1 5
2 9
13
3
6
7
11
LINEAR HASHING
Split 8
Pointer
20
0
1 5
2 9
13
3
6
hash1(key) = key % n 7
11
LINEAR HASHING
Split 8
Pointer Find 6
20
0
hash1(6) = 6 % 4 = 2
1 5
2 9
13
3
6
hash1(key) = key % n 7
11
LINEAR HASHING
Split 8
Pointer Find 6
20
0
hash1(6) = 6 % 4 = 2
1 5 Insert 17
2 9
hash1(17) = 17 % 4 = 1
13
3
6
hash1(key) = key % n 7
11
LINEAR HASHING
Split 8
Pointer Find 6
20
0
hash1(6) = 6 % 4 = 2
1 5 17 Insert 17
2 9
hash1(17) = 17 % 4 = 1
13
3
6 Overflow!
hash1(key) = key % n 7
11
LINEAR HASHING
Split 8
Pointer Find 6
20
0
hash1(6) = 6 % 4 = 2
1 5 17 Insert 17
2 9
hash1(17) = 17 % 4 = 1
13
3
6 Overflow!
4
hash1(key) = key % n 7
hash2(key) = key % 2n 11
LINEAR HASHING
Split 8
Pointer Find 6
0
hash1(6) = 6 % 4 = 2
1 5 17 Insert 17
2 9
hash1(17) = 17 % 4 = 1
13
3
6
4
hash1(key) = key % n 7
hash2(key) = key % 2n 11 20
LINEAR HASHING
Split 8
Pointer Find 6
0
hash1(6) = 6 % 4 = 2
1 5 17 Insert 17
2 9
hash1(17) = 17 % 4 = 1
13
3
4 6 Find 20
hash1(20) = 20 % 4 = 0
hash1(key) = key % n 7
hash2(key) = key % 2n 11 20
LINEAR HASHING
Split 8
Pointer Find 6
0
hash1(6) = 6 % 4 = 2
1 5 17 Insert 17
2 9
hash1(17) = 17 % 4 = 1
13
3
4 6 Find 20
hash1(20) = 20 % 4 = 0
hash2(20) = 20 % 8 = 4
hash1(key) = key % n 7
hash2(key) = key % 2n 11 20
LINEAR HASHING
Split 8
Pointer Find 6
0
hash1(6) = 6 % 4 = 2
1 5 17 Insert 17
2 9
hash1(17) = 17 % 4 = 1
13
3
4 6 Find 20
hash1(20) = 20 % 4 = 0
hash2(20) = 20 % 8 = 4
hash1(key) = key % n 7 Find 9
hash2(key) = key % 2n 11 20
hash1(9) = 9 % 4 = 1
CMU 15-445/645 (Fall 2019)
31
LINEAR HASHING
Split 8
Pointer Find 6
0
hash1(6) = 6 % 4 = 2
1 5 17 Insert 17
2 9
hash1(17) = 17 % 4 = 1
13
3
4 6 Find 20
hash1(20) = 20 % 4 = 0
hash2(20) = 20 % 8 = 4
hash1(key) = key % n 7 Find 9
hash2(key) = key % 2n 11 20
hash1(9) = 9 % 4 = 1
CMU 15-445/645 (Fall 2019)
32
LINEAR HASHING
hash1(key) = key % n 7
hash2(key) = key % 2n 11 20
hash1(key) = key % n 7
hash2(key) = key % 2n 11 20
hash1(key) = key % n 7
hash2(key) = key % 2n 11 20
hash1(key) = key % n 7
hash2(key) = key % 2n 11
hash1(key) = key % n 7
hash2(key) = key % 2n 11
hash1(key) = key % n 7
11
hash1(key) = key % n 7
11
CONCLUSION
NEXT CLASS
B+Trees
→ aka "The Greatest Data Structure of All Time!"