0% found this document useful (0 votes)

75 views

Hash Table PDF

The document summarizes hashing techniques. Hashing maps keys to table slots using a hash function to enable fast lookups. Collisions occur when distinct keys hash to the same slot. Separate chaining resolves collisions by storing keys in linked lists at slots. Open addressing resolves collisions by probing alternate slots using techniques like linear probing, quadratic probing, and double hashing. Hashing enables fast average-case lookups but can cause clustering which degrades performance. Prime table sizes and well-designed hash functions minimize collisions.

Uploaded by

John Titor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views

Hash Table PDF

Uploaded by

John Titor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Hashing

K.Suganthi
Assistant professor(Sr)
School of Electronics Engineering
VIT,Chennai.
Hash table
• Support the following operations
– Find
– Insert
– Delete. (deletions may be unnecessary in some
applications)
• Unlike binary search tree, the following functions
cannot be done:
– Minimum and maximum
– Successor and predecessor
– Report data within a given range
– List out the data in order
Unrealistic solution
• Each position (slot) corresponds to a key in the universe of
keys
– T[k] corresponds to an element with key k
– If the set contains no element with key k, then T[k]=NULL
Unrealistic solution
• insert, delete and find all take O(1) (worst-
case) time
• Problem:
– The scheme wastes too much space if the
universe is too large compared with the actual
number of elements to be stored.
• E.g. student IDs are 8-digit integers, so the universe size
is 108, but we only have about 7000 students
Hashing

Usually, m << N.

h(Ki) = an integer in [0, …, m-1] called the hash value of Ki

Example applications
• Compilers use hash tables (symbol table) to keep track of
declared variables.

• On-line spell checkers. After prehashing the entire dictionary,

one can check each word in constant time and print out the
misspelled word in order of their appearance in the
document.

• Useful in applications when the input keys come in sorted

order. This is a bad case for binary search tree. AVL tree and
B+-tree are harder to implement and they are not necessarily
more efficient.
Hashing
• With hashing, an element of key k is stored in T[h(k)]

• h: hash function
– maps the universe U of keys into the slots of a hash table T[0,1,...,m-1]
– an element of key k hashes to slot h(k)
– h(k) is the hash value of key k
Hashing
• Problem: collision
– two keys may hash to the same slot
– can we ensure that any two distinct keys get different
cells?
• No, if |U|>m, where m is the size of the hash table
 Design a good hash function
– that is fast to compute and
– can minimize the number of collisions
 Design a method to resolve the collisions when
they occur
Hash Function
• The division method
– h(k) = k mod m
– e.g. m=12, k=100, h(k)=4
Requires only a single division operation (quite fast)

• Certain values of m should be avoided

– e.g. if m=2p, then h(k) is just the p lowest-order bits of k; the hash function
does not depend on all the bits
– Similarly, if the keys are decimal numbers, should not set m to be a power of
10

• It’s a good practice to set the table size m to be a prime number

• Good values for m: primes not too close to exact powers of 2

– e.g. the hash table is to hold 2000 numbers, and we don’t mind an average of
3 numbers being hashed to the same entry
• choose m=701
Hash Function...
• Can the keys be strings?
• Most hash functions assume that the keys are natural
numbers
– if keys are not natural numbers, a way must be found to interpret
them as natural numbers
• Method 1
– Add up the ASCII values of the characters in the string
– Problems:
• Different permutations of the same set of characters would have the
same hash value
• If the table size is large, the keys are not distribute well. e.g. Suppose
m=10007 and all the keys are eight or fewer characters long. Since ASCII
value <= 127, the hash function can only assume values between 0 and
127*8=1016
Hash Function...
• Method 2 a,…,z and space 272

– If the first 3 characters are random and the table size is 10,0007 => a
reasonably equitable distribution
– Problem
• English is not random
• Only 28 percent of the table can actually be hashed to (assuming a table size of
10,007)
• Method 3
– computes
– involves all characters in the key and be expected to distribute well


KeySize1
i 0
Key[ KeySize  i  1] * 37i
Collision Handling:
(1) Separate Chaining
• Instead of a hash table, we use a table of linked list
• keep a linked list of keys that hash to the same value

h(K) = K mod 10
Separate Chaining
• To insert a key K
– Compute h(K) to determine which list to traverse
– If T[h(K)] contains a null pointer, initiatize this entry to
point to a linked list that contains K alone.
– If T[h(K)] is a non-empty list, we add K at the
beginning of this list.

• To delete a key K
– compute h(K), then search for K within the list at
T[h(K)]. Delete K if it is found.
Separate Chaining
• Assume that we will be storing n keys. Then we should make
m the next larger prime number. If the hash function works
well, the number of keys in each linked list will be a small
constant.

• Therefore, we expect that each search, insertion, and deletion

can be done in constant time.

• Disadvantage: Memory allocation in linked list manipulation

will slow down the program.

• Advantage: deletion is easy.

Collision Handling:
(2) Open Addressing
• Open addressing:
– relocate the key K to be inserted if it collides with an existing key. That
is, we store K at an entry different from T[h(K)].

• Two issues arise

– what is the relocation scheme?
– how to search for K later?

• Three common methods for resolving a collision in open

addressing
– Linear probing
– Quadratic probing
– Double hashing
Open Addressing
• To insert a key K, compute h0(K). If T[h0(K)] is empty, insert it
there. If collision occurs, probe alternative cell h1(K), h2(K), ....
until an empty cell is found.

• hi(K) = (hash(K) + f(i)) mod m, with f(0) = 0

– f: collision resolution strategy
Linear Probing
• f(i) =i
– cells are probed sequentially (with wraparound)
– hi(K) = (hash(K) + i) mod m

• Insertion:
– Let K be the new key to be inserted. We compute hash(K)

– For i = 0 to m-1
• compute L = ( hash(K) + I ) mod m
• T[L] is empty, then we put K there and stop.

– If we cannot find an empty entry to put K, it means that the table

is full and we should report an error.
Linear Probing
• hi(K) = (hash(K) + i) mod m
• E.g, inserting keys 89, 18, 49, 58, 69 with hash(K)=K mod 10

To insert 58, probe

T[8], T[9], T[0], T[1]

To insert 69, probe

T[9], T[0], T[1], T[2]
Primary Clustering
• We call a block of contiguously occupied table entries a cluster

• On the average, when we insert a new key K, we may hit the middle of a cluster.
Therefore, the time to insert K would be proportional to half the size of a cluster.
That is, the larger the cluster, the slower the performance.

• Linear probing has the following disadvantages:

– Once h(K) falls into a cluster, this cluster will definitely grow in size by one. Thus, this
may worsen the performance of insertion in the future.

– If two cluster are only separated by one entry, then inserting one key into a cluster can
merge the two clusters together. Thus, the cluster size can increase drastically by a
single insertion. This means that the performance of insertion can deteriorate
drastically after a single insertion.

– Large clusters are easy targets for collisions.

Quadratic Probing
• f(i) = i2
• hi(K) = ( hash(K) + i2 ) mod m
• E.g., inserting keys 89, 18, 49, 58, 69 with hash(K) = K mod 10

To insert 58, probe

T[8], T[9], T[(8+4)
mod 10]

To insert 69, probe

T[9], T[(9+1) mod
10], T[(9+4) mod 10]
Quadratic Probing
• Two keys with different home positions will have different probe sequences
– e.g. m=101, h(k1)=30, h(k2)=29
– probe sequence for k1: 30,30+1, 30+4, 30+9
– probe sequence for k2: 29, 29+1, 29+4, 29+9

• If the table size is prime, then a new key can always be inserted if the table is
at least half empty (see proof in text book)

• Secondary clustering
– Keys that hash to the same home position will probe the same alternative cells
– Simulation results suggest that it generally causes less than an extra half probe per
search
– To avoid secondary clustering, the probe sequence need to be a function of the
original key value, not the home position
Double Hashing
• To alleviate the problem of clustering, the sequence of probes
for a key should be independent of its primary position => use
two hash functions: hash() and hash2()

• f(i) = i * hash2(K)
– E.g. hash2(K) = R - (K mod R), with R is a prime smaller than m
Double Hashing
• hi(K) = ( hash(K) + f(i) ) mod m; hash(K) = K mod m
• f(i) = i * hash2(K); hash2(K) = R - (K mod R),
• Example: m=10, R = 7 and insert keys 89, 18, 49, 58, 69
To insert 49,
hash2(49)=7, 2nd
probe is T[(9+7) mod
10]

To insert 58,
hash2(58)=5, 2nd
probe is T[(8+5) mod
10]

To insert 69,
hash2(69)=1, 2nd
probe is T[(9+1) mod
10]
Choice of hash2()
• Hash2() must never evaluate to zero

• For any key K, hash2(K) must be relatively prime to the table size m.
Otherwise, we will only be able to examine a fraction of the table entries.
– E.g.,if hash(K) = 0 and hash2(K) = m/2, then we can only examine the entries
T[0], T[m/2], and nothing else!

• One solution is to make m prime, and choose R to be a prime smaller than

m, and set
hash2(K) = R – (K mod R)

• Quadratic probing, however, does not require the use of a second hash
function
– likely to be simpler and faster in practice
Deletion in open addressing
• Actual deletion cannot be performed in open addressing hash
tables
– otherwise this will isolate records further down the probe sequence

• Solution: Add an extra bit to each table entry, and mark a

deleted slot by storing a special value DELETED (tombstone)

1995 DS Gas Electric 101805301
No ratings yet
1995 DS Gas Electric 101805301
72 pages
WEC
No ratings yet
WEC
50 pages
11-Hashing-Hong Kong (1)
No ratings yet
11-Hashing-Hong Kong (1)
25 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Hashing
No ratings yet
Hashing
20 pages
Hashing PDF
No ratings yet
Hashing PDF
65 pages
Cs 218 - Data Structures: Hashing
No ratings yet
Cs 218 - Data Structures: Hashing
18 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Hashing Updated
No ratings yet
Hashing Updated
26 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Hashing
No ratings yet
Hashing
35 pages
Hashing
No ratings yet
Hashing
35 pages
Hashing PPT
No ratings yet
Hashing PPT
39 pages
Lecture 13 - Hash Tables
No ratings yet
Lecture 13 - Hash Tables
51 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
unit 1 Hashing
No ratings yet
unit 1 Hashing
61 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Hash Table v2
No ratings yet
Hash Table v2
34 pages
Hashing
No ratings yet
Hashing
10 pages
Study_Material_on_Hashing
No ratings yet
Study_Material_on_Hashing
4 pages
Hashing - Datastructures and Algorithms
No ratings yet
Hashing - Datastructures and Algorithms
32 pages
ADS M TECH MID 2
No ratings yet
ADS M TECH MID 2
26 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
CH 4
No ratings yet
CH 4
58 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Hashing
No ratings yet
Hashing
56 pages
Module 5
No ratings yet
Module 5
22 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Hashing new
No ratings yet
Hashing new
48 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
HASHING
No ratings yet
HASHING
63 pages
Hash Tables
No ratings yet
Hash Tables
21 pages
Chapter10_HashTables
No ratings yet
Chapter10_HashTables
49 pages
Struktur Data: By: Sri Rezeki Candra Nursari
No ratings yet
Struktur Data: By: Sri Rezeki Candra Nursari
34 pages
Search vs. Hashing
No ratings yet
Search vs. Hashing
55 pages
DSA LABTASK 12
No ratings yet
DSA LABTASK 12
5 pages
Hashing
No ratings yet
Hashing
30 pages
MODULE-5
No ratings yet
MODULE-5
33 pages
Group 15 Hash Tables
No ratings yet
Group 15 Hash Tables
42 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
15 HashTables
No ratings yet
15 HashTables
27 pages
CS2040 Summary
No ratings yet
CS2040 Summary
16 pages
Theory PDF
No ratings yet
Theory PDF
18 pages
Hashing: Amar Jukuntla
No ratings yet
Hashing: Amar Jukuntla
22 pages
Hashing Techniques
No ratings yet
Hashing Techniques
13 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
Chapter 28 Hashing: Hash Table. The Function That Maps A Key To An Index in The Hash Table Is
No ratings yet
Chapter 28 Hashing: Hash Table. The Function That Maps A Key To An Index in The Hash Table Is
4 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Collision
No ratings yet
Collision
24 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Hash Functions
No ratings yet
Hash Functions
60 pages
Dsa Lecture 13 Hash Tables
No ratings yet
Dsa Lecture 13 Hash Tables
15 pages
Lab 09 - Hashing
No ratings yet
Lab 09 - Hashing
47 pages
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
No ratings yet
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
39 pages
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
No ratings yet
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
31 pages
L04 Hashing
No ratings yet
L04 Hashing
63 pages
The Green Book of Mathematical Problems
From Everand
The Green Book of Mathematical Problems
Kenneth Hardy
4.5/5 (3)
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Digital Empowerment Project (1)
No ratings yet
Digital Empowerment Project (1)
3 pages
4 Lab Izot Test
No ratings yet
4 Lab Izot Test
11 pages
Interview Preparation
No ratings yet
Interview Preparation
4 pages
Impact of Technology On Strategy
No ratings yet
Impact of Technology On Strategy
37 pages
Jake S Resume-2
No ratings yet
Jake S Resume-2
1 page
Future of SEO Updated
No ratings yet
Future of SEO Updated
2 pages
The Open Orthopaedics Journal: A Review of Handgrip Strength and Its Role As A Herald of Health
No ratings yet
The Open Orthopaedics Journal: A Review of Handgrip Strength and Its Role As A Herald of Health
6 pages
Atul Kumar Anuragi
No ratings yet
Atul Kumar Anuragi
2 pages
silsN@20242023-11-081699443130Reading Non-Fiction Fizzy Reaction
No ratings yet
silsN@20242023-11-081699443130Reading Non-Fiction Fizzy Reaction
5 pages
Main Alloys PDF
No ratings yet
Main Alloys PDF
6 pages
Amphenol Musician Range PDF
No ratings yet
Amphenol Musician Range PDF
8 pages
Peel Paper
No ratings yet
Peel Paper
2 pages
DS8000快速維修手冊
No ratings yet
DS8000快速維修手冊
78 pages
Risk Register Template
No ratings yet
Risk Register Template
10 pages
7. Thermal Properties of Matter 11th
No ratings yet
7. Thermal Properties of Matter 11th
62 pages
WLP Q4 English WK1
No ratings yet
WLP Q4 English WK1
7 pages
Paper Hybrid_work_OSH_en_0
No ratings yet
Paper Hybrid_work_OSH_en_0
27 pages
Xt3su3225dff000xxx Acc
No ratings yet
Xt3su3225dff000xxx Acc
57 pages
Get Clock Wise Workbook
100% (1)
Get Clock Wise Workbook
25 pages
Launch Vodafone in Pakistan Strategies
No ratings yet
Launch Vodafone in Pakistan Strategies
45 pages
CV Norsaiful
No ratings yet
CV Norsaiful
6 pages
Fabrication and Study of Laboratory Scale Rainfall Simulator For Soil Erosion Assessment
No ratings yet
Fabrication and Study of Laboratory Scale Rainfall Simulator For Soil Erosion Assessment
5 pages
IGNOU MCA MCS-031 Solved Assignment 2010
No ratings yet
IGNOU MCA MCS-031 Solved Assignment 2010
13 pages
Why Homosexuality Is Prohibited in Islam
No ratings yet
Why Homosexuality Is Prohibited in Islam
260 pages
New MRP LIST
No ratings yet
New MRP LIST
67 pages
The Canterbury Tales: General Prologue & Frame Story Theme of Literature and Writing
No ratings yet
The Canterbury Tales: General Prologue & Frame Story Theme of Literature and Writing
9 pages
R2V4PX310R: Electrical Specifications
No ratings yet
R2V4PX310R: Electrical Specifications
6 pages
Samsung Icr18650 22fm
No ratings yet
Samsung Icr18650 22fm
17 pages

Hash Table PDF

Uploaded by

Hash Table PDF

Uploaded by

Hashing

h(Ki) = an integer in [0, …, m-1] called the hash value of Ki

• On-line spell checkers. After prehashing the entire dictionary,

• Useful in applications when the input keys come in sorted

• Certain values of m should be avoided

• It’s a good practice to set the table size m to be a prime number

• Good values for m: primes not too close to exact powers of 2

• Therefore, we expect that each search, insertion, and deletion

• Disadvantage: Memory allocation in linked list manipulation

• Advantage: deletion is easy.

• Two issues arise

• Three common methods for resolving a collision in open

• hi(K) = (hash(K) + f(i)) mod m, with f(0) = 0

– If we cannot find an empty entry to put K, it means that the table

To insert 58, probe

To insert 69, probe

• Linear probing has the following disadvantages:

– Large clusters are easy targets for collisions.

To insert 58, probe

To insert 69, probe

• One solution is to make m prime, and choose R to be a prime smaller than

• Solution: Add an extra bit to each table entry, and mark a

You might also like