0% found this document useful (0 votes)

4 views17 pages

hashing-2 (1)

The document discusses hashing techniques for efficient data retrieval, focusing on equality searches that can be performed in a single disk access. It covers types of hashing, hash functions, collision resolution methods, and the design considerations for effective hashing functions. Additionally, it includes examples of indexing in databases like Oracle and DB2, highlighting the importance of choosing the right index for different types of queries.

Uploaded by

bhaivipin283

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views17 pages

hashing-2 (1)

Uploaded by

bhaivipin283

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Hashing

• Given a search key, can we guess its

location in the file?
• Goal:
– Support equality searches in one disk access!
• Method: hash keys into addresses
key page

Types of Hashing
• What does H(K) point to:
– A cell of a table in memory where K* is stored
(internal hashing)
– A bucket on disk where K* is stored (external
hashing)
• A bucket consists of 1 or more pages.
• Hash file maintenance:
– Static hashing
• File size is fixed
– Dynamic & extensible hashing
2
• File size can grow

1
Hashing to a File
Key
r records

N slots*
H(K)

…
…

*Slots store either the actual records

(clustered index) or (key, ptr) pairs
(unclustered index)
3

Hash Function

• Input: a field of a record; usually its key K

(student id, name, …)
• Compute index function H(K)
H(K): K → A
to find the address of K*.
H(K)=A is the address of the record (or index
entry) with key K
4

2
Hashing Function 1
Student id Name address
0234134 John 4
0349423 Mary 3
0428421 Jean 1
1324532 Sandy 2
2374734 Randy 4

Let some digits of the key, for example the last digit
of the student id, represent the location.
5

Hash Function 2
• Key is student id (six digits), we have
100,000 record positions (0 – 99,999)
• H(K): student_id mod 99999
085768 → 085768 mod 99999 = 85768
134281 → 134281 mod 99999 = 34282
101004 → 101004 mod 99999 = 1005

3
More Hash Functions
• Folding
– Replace the key by numeric code
• ALBERT = 01 22 02 05 18 20
– Fold and Add
• 0122 + 0205 + 1820 = 2147
– Take the modulo relative to the size of address space
• 2147 mod 101 = 26
• Midsquare: Square key and take middle
– (453)2 = 205209 → 52
• Radix Transformation
– (453)10 = (382)11 → 382 mod 99 = 85 7

Hashing Function 3
• concatenate the alphabetic positions of all letters,
partition the result into equal parts, multiply each
part by its position, fold and add, divide the result
by the size of the address space (a prime number)
and take the reminder.
Name Address
John 10 15 08 14 (1015*1 + 0814*2) mod 43 = 20
Mary 13 01 18 25 (1301*1 + 1825*2) mod 43 = 6
Jean 10 05 01 14 (1005*1 + 0114*2) mod 43 = 29
Sandy 19 01 14 04 25 (1901*1 + 1404*2 + 0025*3) mod 43 = 11
Randy 18 01 14 04 25 (1801*1 + 1404*2 + 0025*3) mod 43 = 40

4
Hash Function Design Issues
• Key space
– The set of all possible values for keys
• Address space (N)
– The set of all storage units
– Physical location of file
• In general
– Address space must accommodate all records in
file
– Address space is usually much smaller than key
space 9

Features of Hashing
• Randomizing
– Records are randomly spread over the whole
storage space
• Collision
– Two different keys may be hashed into the
same address (synonyms)
– To deal with it, two ways:
• choose hashing functions that reduce collisions
• rearrange the storage of records to reduce collisions
10

5
Good and Bad Functions

Best Worst Acceptable

1 1 1
2 2 2
A 3 A 3 A 3
B 4 B 4 B 4
C 5 C 5 C 5
D 6 D 6 D 6
E 7 E 7 E 7
F 8 F 8 F 8
G 9 G 9 G 9
10 10 10

Choice of Hash Function

• Perfect hash function
– One-to-one: No synonyms
– Onto: Key space = Address space
– Not feasible for large and active files
• Desirable hashing function
– Minimize collisions
– Relatively smaller address space
• Tradeoff
– The larger the address space, the easier it is to
avoid collisions
– The larger the address space, the worse the 12
storage utilization becomes

6
A Hashing Function
1. Convert the key to a number (if it is not)
key K
2. Compute an address from the number
address = K mod M
• Suggestion: Choose M to be a prime
number (why?).

Collisions
• A key is mapped to an address that is full.
• Collision Resolution: Where to store the
overflow key?
– Static methods
• Linear probing
• Double hashing
• Separate overflow
– Dynamic methods
• Extendable hashing
• Linear hashing
14

7
Linear Probing
• For each key, generate a sequence of
addresses A0, A1, A2, …
A0 = hash(key) mod M
Ai+1 = [Ai + step] mod M

M : file size (max # of addresses)

step: a constant

Example
Key hash(key) = A0 A1 A2 A3 A4
Mozart 1 2 3 4 5
Tchaikovsky 1 2 3 4 5
Ravel 3 4 5 6 0
Beethoven 5 6 0 1 2
Mendelssohn 5 6 0 1 2
Bach 3 4 5 6 0
Greig 3 4 5 6 0

2 M=7
step = 1
3

6 16

8
Linear Probing - Problems
• Performance degradation as more rows are
added.
• Waste of space as more rows are deleted.
• These are problems for all static methods
• Solutions
– Reorganization
– Use a dynamic method

Extendable Hashing
• The address space is changed dynamically.
• The hash function is adjusted to
accommodate the change.
• A common family of hash functions
– hk(key) = h(key) mod 2k (use the last k bits of
h(key))
– At any given time a unique hash, hk , is used

9
Extendable Hashing - Example
v h(v)
pete 11010
mary 00000
jane 11110
bill 00000
john 01001
vince 10101
Location karen 10111
mechanism
buckets

directory
00
The size of the directory hk(key) = h(key) mod 2k
01
corresponds to the currently k=2 directory size = 22 = 4
active hash function hk 10 (use last k=2 bits of h(key))
11 19

Example (con’t)
Next action: insert ‘sol’, where h(sol) = 10001.

v h(v)
mary, bill B0 pete 11010
mary 00000
john, vince B1 jane 11110
h2 bill 00000
pete, jane B2
john 01001
vince 10101
karen 10111
karen B3 sol 10001

sol, can’t be stored here since the bucket is full

10
Example (con’t)
directory Solution:
000 mary, bill B0 1. Split the overfilled bucket
001
2. Switch to h3 (double the directory)
john, sol B1 hk(key) = h(key) mod 2k
010 k=3 directory size = 23 = 8
011 pete, jane B2 (use last k=3 bits of h(key))
100 3. Update the pointers
101 karen B3
v h(v)
110 pete 11010
111 mary 00000
vince B4 jane 11110
3 bill 00000
john 01001
Current hash vince 10101
current_hash identifies karen 10111
current hash function. sol 10001 21

Example (con’t)
mary, bill B0
000
• Next action: Insert judy,
001 where h(judy) = 00110
john, sol B1
010 • B2 overflows, but directory
011 pete, jane B2 need not be extended
100
karen B3
101
3
110
Current hash 111 vince B4

Need a mechanism for deciding whether the directory has to be

doubled.

11
Example (con’t)
mary, bill B0
000 2
001 john, sol B1
010 3 Bucket level

011 pete, jane B2

2
100
karen B3
3 101
2
110
Current hash
111 vince B4
3

Add a bucket level – if current_hash > bucket_level[i],

then do not enlarge directory
23

Example (con’t)
mary, bill B0
000
2
001 john, sol B1 v h(v)
010 3 pete 11010
011 pete, jane B2 mary 00000
X
3
jane 11110
100 bill 00000
karen B3
3 101 john 01001
2
vince 10101
110
Current hash
karen 10111
111 vince B4 sol 10001
3 judy 00110
judy, jane B5
3

12
v h(v)
pete 11010
mary 00000
jane 11110
bill 00000
john 01001
vince 10101
karen 10111

sol 10001
judy 00110
25

Hash Indices - Summary

• Range search is not supported.
– Since adjacent elements in range might hash to
different buckets
• Partial key search is not supported.
– Entire key must be provided
• But, an equality search on average takes
only 1 disk access

13
Indexing in Oracle
(un-clustered index)
• Create an un-clustered index on author:
CREATE TABLE book (
callnochar(10),
author char(20),
title char(30),
year char(4),
PRIMARY KEY (callno)
);

CREATE INDEX authidx ON book (author);

• Result: an un-clustered dense index on author.

Indexing in Oracle
(clustered index on primary key)
• Create a clustered index on callno:
CREATE TABLE book (
callno char(10),
author char(20),
title char(30),
year char(4),
PRIMARY KEY (callno)
)
ORGANIZATION INDEX;
• This syntax allows a clustered index on the
primary key of the table only.
28

14
Indexing in Oracle
(clustered index on non-primary key columns)
• Create a clustered index on author:
CREATE TABLE book (
callnochar(10),
author char(20),
title char(30),
year char(4),
PRIMARY KEY (callno)
)
cluster authcl(author);

CREATE INDEX authidx on cluster authcl;

• An Oracle cluster may contain rows from more
than one table.
29

Indexing in DB2
• Create un-clustered indexes on callno and author:

CREATE INDEX callno_idx on book (callno)

CREATE INDEX auth_idx on book (author)

• Can make (only) one index clustered:

CREATE INDEX auth_idx on book (author) cluster

Data must be (preferably) sorted on clustering column(s) in the OS file.

15
Choosing an Index

Ex 1 SELECT E. Id
FROM Employee E
WHERE E.Salary < :upper AND E.Salary > :lower

- a range search on Salary.

- Suppose the primary key is employee id; it is likely that
there is a main, clustered index on that attribute that is
of no use for this query.
- Choose a secondary, B+ tree index with search key Salary

Choosing an Index
Ex 2 SELECT T.StudId
FROM Transcript T
WHERE T.Grade = :grade

- an equality search on Grade.

- Suppose the primary key is (StudId, Semester, CrsCode); it is
likely that there is a main, clustered index on these attributes
that is of no use for this query.
- Choose a secondary, B+ tree or hash index with search key
Grade

16
Choosing an Index
Ex 3 SELECT T.CrsCode, T.Grade
FROM Transcript T
WHERE T.StudId = :id AND T.Semester = ‘F2000’

- Equality search on StudId and Semester.

- If the primary key is (StudId, Semester, CrsCode) it is
likely that there is a main, clustered index on this
sequence of attributes.
- If the main index is a B+ tree it can be used for this search.
- If the main index is a hash it cannot be used for this
search. Choose B+ tree or hash with search key StudId
or (StudId, Semester)

Choosing an Index
Ex 3 (con’t)
SELECT T.CrsCode, T.Grade
FROM Transcript T
WHERE T.StudId = :id AND T.Semester = ‘F2000’

- Suppose Transcript has primary key (CrsCode, StudId, Semester).

Can this index be useful (independent of being hash or B+ tree)?

Paper Fan Pleating
No ratings yet
Paper Fan Pleating
5 pages
Cambridge Primary Checkpoint Science P1 Specimen 2012
91% (75)
Cambridge Primary Checkpoint Science P1 Specimen 2012
20 pages
Lab Report
No ratings yet
Lab Report
3 pages
Math 9 - Curriculum Map - 2021-2022 - 2
100% (2)
Math 9 - Curriculum Map - 2021-2022 - 2
4 pages
Laboratory Report CHM 213 (Physical Chemistry)
No ratings yet
Laboratory Report CHM 213 (Physical Chemistry)
6 pages
Data Management: INFO125
No ratings yet
Data Management: INFO125
111 pages
Adbs 5
No ratings yet
Adbs 5
37 pages
Unit III-Hashing
100% (1)
Unit III-Hashing
135 pages
Unit_6
No ratings yet
Unit_6
38 pages
CS143: Hash Index
No ratings yet
CS143: Hash Index
26 pages
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
No ratings yet
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
15 pages
Unit-3 Part 2 Indexing and Hashing
No ratings yet
Unit-3 Part 2 Indexing and Hashing
36 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
11 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
5 pages
Ch11 Hash Indexes 1perpage Annotated
No ratings yet
Ch11 Hash Indexes 1perpage Annotated
28 pages
Unit-4 Hand Written
No ratings yet
Unit-4 Hand Written
35 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
CO3 Session 6
No ratings yet
CO3 Session 6
29 pages
UNIT 1- Hashing
No ratings yet
UNIT 1- Hashing
118 pages
Unit 6.2 Indexing and Hashing
No ratings yet
Unit 6.2 Indexing and Hashing
37 pages
04_UW_Hashing (3)
No ratings yet
04_UW_Hashing (3)
79 pages
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
No ratings yet
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
7 pages
Chapter 7 Indexing Part2
No ratings yet
Chapter 7 Indexing Part2
41 pages
11 What Is Hashing in DBMS
No ratings yet
11 What Is Hashing in DBMS
20 pages
Hashing
No ratings yet
Hashing
16 pages
Hash Tables
No ratings yet
Hash Tables
20 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
UNIT 1- Hashing
No ratings yet
UNIT 1- Hashing
118 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Data and File Structures: Hashing
No ratings yet
Data and File Structures: Hashing
24 pages
Hashing
No ratings yet
Hashing
23 pages
14 Hashing
No ratings yet
14 Hashing
61 pages
BCSE302L-Database Systems Module - 4 Part2
No ratings yet
BCSE302L-Database Systems Module - 4 Part2
71 pages
Hashing
No ratings yet
Hashing
8 pages
Unit 3.Docx Dbms
No ratings yet
Unit 3.Docx Dbms
25 pages
UNIT III DBMS
No ratings yet
UNIT III DBMS
36 pages
DSAD Dynamic Hashing
No ratings yet
DSAD Dynamic Hashing
79 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Unit-3 Hashing Storage Btree
No ratings yet
Unit-3 Hashing Storage Btree
26 pages
MODULE 5_BCS304_HASHING_Leftisht trees_OBST_Notes
No ratings yet
MODULE 5_BCS304_HASHING_Leftisht trees_OBST_Notes
32 pages
Unit Iv Implementation Techniques
No ratings yet
Unit Iv Implementation Techniques
91 pages
358 33 Powerpoint Slides DSC Chapter 15
No ratings yet
358 33 Powerpoint Slides DSC Chapter 15
55 pages
CH 4
No ratings yet
CH 4
58 pages
Hash Tables
No ratings yet
Hash Tables
21 pages
File Organization-Lec11
No ratings yet
File Organization-Lec11
15 pages
Final Hashing
No ratings yet
Final Hashing
41 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
No ratings yet
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
19 pages
Unit28 Hashing1
No ratings yet
Unit28 Hashing1
19 pages
Hashing Explained
No ratings yet
Hashing Explained
20 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
9 pages
Hashing Unit 1
No ratings yet
Hashing Unit 1
91 pages
mod 5
No ratings yet
mod 5
13 pages
Hashing
No ratings yet
Hashing
33 pages
DBMS Unit-3 Notes
No ratings yet
DBMS Unit-3 Notes
9 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Searching, Sorting and Hashing
No ratings yet
Searching, Sorting and Hashing
52 pages
Hashing
No ratings yet
Hashing
42 pages
L5 HashTables
No ratings yet
L5 HashTables
22 pages
Hashing in DBMS: Static & Dynamic With Examples
No ratings yet
Hashing in DBMS: Static & Dynamic With Examples
8 pages
Master Fundamental Concepts of Math Olympiad: Maths, #1
From Everand
Master Fundamental Concepts of Math Olympiad: Maths, #1
Subbalakshmi Devaki
No ratings yet
RAychem Case Analysis
100% (2)
RAychem Case Analysis
11 pages
Big Picture: Big Picture in Focus: Ulob. Create A Swot Analysis Matrix
No ratings yet
Big Picture: Big Picture in Focus: Ulob. Create A Swot Analysis Matrix
6 pages
Written Report
No ratings yet
Written Report
5 pages
Abu Dhabi Proposal March 2023 v2
No ratings yet
Abu Dhabi Proposal March 2023 v2
3 pages
Assessment of Information Literacy and Fake News Identification of Benguet State University Freshmen - Implications For Library Literacy Program
No ratings yet
Assessment of Information Literacy and Fake News Identification of Benguet State University Freshmen - Implications For Library Literacy Program
19 pages
Theory of Mind Dissertation
100% (2)
Theory of Mind Dissertation
4 pages
Importance of IPR
No ratings yet
Importance of IPR
3 pages
Engineering Graphics DI01000111
No ratings yet
Engineering Graphics DI01000111
4 pages
Eds Short Answers SS
No ratings yet
Eds Short Answers SS
14 pages
Escom-Practica 3 Circuitos
No ratings yet
Escom-Practica 3 Circuitos
13 pages
The Weather - Vocabulary 1st Y
100% (1)
The Weather - Vocabulary 1st Y
2 pages
How HT Audio Affects Autonomic Emotional State
No ratings yet
How HT Audio Affects Autonomic Emotional State
11 pages
Women Entrepreneurship
100% (1)
Women Entrepreneurship
103 pages
PLM Dojo-A Trick For Updating A Precise Assembly Quickly
No ratings yet
PLM Dojo-A Trick For Updating A Precise Assembly Quickly
3 pages
Shorter or Taller?: To Compare The Heights of Objects
No ratings yet
Shorter or Taller?: To Compare The Heights of Objects
6 pages
SIM Diploma and MFS
No ratings yet
SIM Diploma and MFS
28 pages
Forest Sector Market Survey - Forest InfMark
No ratings yet
Forest Sector Market Survey - Forest InfMark
23 pages
Airten V2 Manual
No ratings yet
Airten V2 Manual
6 pages
Chapter5 - Fluid Mechanics
No ratings yet
Chapter5 - Fluid Mechanics
55 pages
Shan Muhammad Mmmmmmumuh: // Objective
No ratings yet
Shan Muhammad Mmmmmmumuh: // Objective
2 pages
8.relavance Feedback - II
No ratings yet
8.relavance Feedback - II
52 pages
Fibers and Fiber Consumption in Nonwovens NOTES
No ratings yet
Fibers and Fiber Consumption in Nonwovens NOTES
6 pages
Rotocol Anual: Protocol Manual, "D" Protocol March 2, 1999
No ratings yet
Rotocol Anual: Protocol Manual, "D" Protocol March 2, 1999
8 pages
Schultz Catalog 2015-Eng
No ratings yet
Schultz Catalog 2015-Eng
102 pages
Draft Memo WR214
No ratings yet
Draft Memo WR214
2 pages

hashing-2 (1)

Uploaded by

hashing-2 (1)

Uploaded by

Hashing

• Given a search key, can we guess its

*Slots store either the actual records

• Input: a field of a record; usually its key K

Best Worst Acceptable

Choice of Hash Function

M : file size (max # of addresses)

sol, can’t be stored here since the bucket is full

Need a mechanism for deciding whether the directory has to be

011 pete, jane B2

Add a bucket level – if current_hash > bucket_level[i],

Hash Indices - Summary

CREATE INDEX authidx ON book (author);

• Result: an un-clustered dense index on author.

CREATE INDEX authidx on cluster authcl;

CREATE INDEX callno_idx on book (callno)

• Can make (only) one index clustered:

CREATE INDEX auth_idx on book (author) cluster

Data must be (preferably) sorted on clustering column(s) in the OS file.

- a range search on Salary.

- an equality search on Grade.

- Equality search on StudId and Semester.

- Suppose Transcript has primary key (CrsCode, StudId, Semester).

You might also like