hashing-2 (1)
hashing-2 (1)
Types of Hashing
• What does H(K) point to:
– A cell of a table in memory where K* is stored
(internal hashing)
– A bucket on disk where K* is stored (external
hashing)
• A bucket consists of 1 or more pages.
• Hash file maintenance:
– Static hashing
• File size is fixed
– Dynamic & extensible hashing
2
• File size can grow
1
Hashing to a File
Key
r records
N slots*
H(K)
…
…
Hash Function
2
Hashing Function 1
Student id Name address
0234134 John 4
0349423 Mary 3
0428421 Jean 1
1324532 Sandy 2
2374734 Randy 4
Let some digits of the key, for example the last digit
of the student id, represent the location.
5
Hash Function 2
• Key is student id (six digits), we have
100,000 record positions (0 – 99,999)
• H(K): student_id mod 99999
085768 → 085768 mod 99999 = 85768
134281 → 134281 mod 99999 = 34282
101004 → 101004 mod 99999 = 1005
3
More Hash Functions
• Folding
– Replace the key by numeric code
• ALBERT = 01 22 02 05 18 20
– Fold and Add
• 0122 + 0205 + 1820 = 2147
– Take the modulo relative to the size of address space
• 2147 mod 101 = 26
• Midsquare: Square key and take middle
– (453)2 = 205209 → 52
• Radix Transformation
– (453)10 = (382)11 → 382 mod 99 = 85 7
Hashing Function 3
• concatenate the alphabetic positions of all letters,
partition the result into equal parts, multiply each
part by its position, fold and add, divide the result
by the size of the address space (a prime number)
and take the reminder.
Name Address
John 10 15 08 14 (1015*1 + 0814*2) mod 43 = 20
Mary 13 01 18 25 (1301*1 + 1825*2) mod 43 = 6
Jean 10 05 01 14 (1005*1 + 0114*2) mod 43 = 29
Sandy 19 01 14 04 25 (1901*1 + 1404*2 + 0025*3) mod 43 = 11
Randy 18 01 14 04 25 (1801*1 + 1404*2 + 0025*3) mod 43 = 40
4
Hash Function Design Issues
• Key space
– The set of all possible values for keys
• Address space (N)
– The set of all storage units
– Physical location of file
• In general
– Address space must accommodate all records in
file
– Address space is usually much smaller than key
space 9
Features of Hashing
• Randomizing
– Records are randomly spread over the whole
storage space
• Collision
– Two different keys may be hashed into the
same address (synonyms)
– To deal with it, two ways:
• choose hashing functions that reduce collisions
• rearrange the storage of records to reduce collisions
10
5
Good and Bad Functions
1 1 1
2 2 2
A 3 A 3 A 3
B 4 B 4 B 4
C 5 C 5 C 5
D 6 D 6 D 6
E 7 E 7 E 7
F 8 F 8 F 8
G 9 G 9 G 9
10 10 10
11
6
A Hashing Function
1. Convert the key to a number (if it is not)
key K
2. Compute an address from the number
address = K mod M
• Suggestion: Choose M to be a prime
number (why?).
13
Collisions
• A key is mapped to an address that is full.
• Collision Resolution: Where to store the
overflow key?
– Static methods
• Linear probing
• Double hashing
• Separate overflow
– Dynamic methods
• Extendable hashing
• Linear hashing
14
7
Linear Probing
• For each key, generate a sequence of
addresses A0, A1, A2, …
A0 = hash(key) mod M
Ai+1 = [Ai + step] mod M
15
Example
Key hash(key) = A0 A1 A2 A3 A4
Mozart 1 2 3 4 5
Tchaikovsky 1 2 3 4 5
Ravel 3 4 5 6 0
Beethoven 5 6 0 1 2
Mendelssohn 5 6 0 1 2
Bach 3 4 5 6 0
Greig 3 4 5 6 0
2 M=7
step = 1
3
6 16
8
Linear Probing - Problems
• Performance degradation as more rows are
added.
• Waste of space as more rows are deleted.
• These are problems for all static methods
• Solutions
– Reorganization
– Use a dynamic method
17
Extendable Hashing
• The address space is changed dynamically.
• The hash function is adjusted to
accommodate the change.
• A common family of hash functions
– hk(key) = h(key) mod 2k (use the last k bits of
h(key))
– At any given time a unique hash, hk , is used
18
9
Extendable Hashing - Example
v h(v)
pete 11010
mary 00000
jane 11110
bill 00000
john 01001
vince 10101
Location karen 10111
mechanism
buckets
directory
00
The size of the directory hk(key) = h(key) mod 2k
01
corresponds to the currently k=2 directory size = 22 = 4
active hash function hk 10 (use last k=2 bits of h(key))
11 19
Example (con’t)
Next action: insert ‘sol’, where h(sol) = 10001.
v h(v)
mary, bill B0 pete 11010
mary 00000
john, vince B1 jane 11110
h2 bill 00000
pete, jane B2
john 01001
vince 10101
karen 10111
karen B3 sol 10001
10
Example (con’t)
directory Solution:
000 mary, bill B0 1. Split the overfilled bucket
001
2. Switch to h3 (double the directory)
john, sol B1 hk(key) = h(key) mod 2k
010 k=3 directory size = 23 = 8
011 pete, jane B2 (use last k=3 bits of h(key))
100 3. Update the pointers
101 karen B3
v h(v)
110 pete 11010
111 mary 00000
vince B4 jane 11110
3 bill 00000
john 01001
Current hash vince 10101
current_hash identifies karen 10111
current hash function. sol 10001 21
Example (con’t)
mary, bill B0
000
• Next action: Insert judy,
001 where h(judy) = 00110
john, sol B1
010 • B2 overflows, but directory
011 pete, jane B2 need not be extended
100
karen B3
101
3
110
Current hash 111 vince B4
22
11
Example (con’t)
mary, bill B0
000 2
001 john, sol B1
010 3 Bucket level
Example (con’t)
mary, bill B0
000
2
001 john, sol B1 v h(v)
010 3 pete 11010
011 pete, jane B2 mary 00000
X
3
jane 11110
100 bill 00000
karen B3
3 101 john 01001
2
vince 10101
110
Current hash
karen 10111
111 vince B4 sol 10001
3 judy 00110
judy, jane B5
3
24
12
v h(v)
pete 11010
mary 00000
jane 11110
bill 00000
john 01001
vince 10101
karen 10111
sol 10001
judy 00110
25
26
13
Indexing in Oracle
(un-clustered index)
• Create an un-clustered index on author:
CREATE TABLE book (
callnochar(10),
author char(20),
title char(30),
year char(4),
PRIMARY KEY (callno)
);
27
Indexing in Oracle
(clustered index on primary key)
• Create a clustered index on callno:
CREATE TABLE book (
callno char(10),
author char(20),
title char(30),
year char(4),
PRIMARY KEY (callno)
)
ORGANIZATION INDEX;
• This syntax allows a clustered index on the
primary key of the table only.
28
14
Indexing in Oracle
(clustered index on non-primary key columns)
• Create a clustered index on author:
CREATE TABLE book (
callnochar(10),
author char(20),
title char(30),
year char(4),
PRIMARY KEY (callno)
)
cluster authcl(author);
Indexing in DB2
• Create un-clustered indexes on callno and author:
30
15
Choosing an Index
Ex 1 SELECT E. Id
FROM Employee E
WHERE E.Salary < :upper AND E.Salary > :lower
Choosing an Index
Ex 2 SELECT T.StudId
FROM Transcript T
WHERE T.Grade = :grade
32
16
Choosing an Index
Ex 3 SELECT T.CrsCode, T.Grade
FROM Transcript T
WHERE T.StudId = :id AND T.Semester = ‘F2000’
33
Choosing an Index
Ex 3 (con’t)
SELECT T.CrsCode, T.Grade
FROM Transcript T
WHERE T.StudId = :id AND T.Semester = ‘F2000’
34
17