0% found this document useful (0 votes)
4 views

19hashing

This lecture covers hashing, including hash tables, hash functions, and collision resolution. It emphasizes the efficiency of hashing for data storage and retrieval, achieving O(1) time complexity for operations. Additionally, it discusses the importance of understanding AVL trees and their balancing mechanisms in the context of data structures.

Uploaded by

thirtythr33spam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

19hashing

This lecture covers hashing, including hash tables, hash functions, and collision resolution. It emphasizes the efficiency of hashing for data storage and retrieval, achieving O(1) time complexity for operations. Additionally, it discusses the importance of understanding AVL trees and their balancing mechanisms in the context of data structures.

Uploaded by

thirtythr33spam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Data Structures

<Lecture 19: Hashing>

Sungmin Cha
New York University

11.13.2024
Outline
• Notice

• Review the previous lecture

• Hashing
– Hash Table
– Hash Function
– Collision Resolution

2
Outline
• Notice

• Review the previous lecture

• Hashing
– Hash Table
– Hash Function
– Collision Resolution

3
Notice
• Final exam
– Date: Dec 16th from 14:00 to 15:30
– Location: 60 Fifth Ave 110
– The exam scope covers all topics
▪ Questions in the final exam may include content related to topics
learned before the midterm (e.g., Linked List, Time Complexity,
Sorting, etc.)
▪ However, questions directly asking about contents learned before
the midterm will not be included
▪ Topics majorly covered in the final exam
: Tree (BST, and AVL Tree), Hash Table, Graph

4
Notice
• Changed Schedule

– Lecture 25 will be a pre-recorded video lecture


– HW4 is about Hash and Graph
▪ Implementation of Hash
▪ Solving problems using Graph algorithms
o 3-4 problems? 5
Notice
• Question about HW3 BST and grading midterm exam
– I am checking them now , and I will post an announcement about
them to Ed discussion this week.

6
Outline
• Notice

• Review the previous lecture

• Hashing
– Hash Table
– Hash Function
– Collision Resolution

7
AVL Tree
• Repairing a tree
– So far, we have learned about cases of imbalance in subtree t
▪ And how to resolve it using rotations
– Sometimes, resolving the imbalance of a subtree t can lead to a
new imbalance in the parent tree of t

8
AVL Tree
• Example: repairing a tree
– 1) Balanced tree
6
0

3 7
0 8

2 4 7 8
0 0 0 7

1 2 3 5 6 7 8 9
7 3 5 0 5 3 4 2

2 2 3 3 4 5 6 6 7 7 8 8 9 9
9 6 5
2 5 3 7 5 5 4 7 2 7 2 0

2 3 4 5 5 6 6 6 7 8 8 8 8 9 9 9
7 1 1 4 7 3 6 8 5 4 3 5 9 1 4 7
5 6 8 8 9 9 9
8 9 0 8 3 6 8
9
9

9
AVL Tree
• Example: repairing a tree
– 2) Call delete(9)
6
0

3 7
0 8

2 4 7 8
0 0 0 7

1 2 3 5 6 7 8 9
7 3 5 0 5 3 4 2

2 2 3 3 4 5 6 6 7 7 8 8 9 9
9 6 5
2 5 3 7 5 5 4 7 2 7 2 0

2 3 4 5 5 6 6 6 7 8 8 8 8 9 9 9
7 1 1 4 7 3 6 8 5 4 3 5 9 1 4 7
5 6 8 8 9 9 9
8 9 0 8 3 6 8
9
9

10
AVL Tree
• Example: repairing a tree
– 2) Imbalance in the subtree of the node with 20
6
0

3 7
Left rotation 0 8

2 4 7 8
0 0 0 7

1 2 3 5 6 7 8 9
7 3 5 0 5 3 4 2

2 2 3 3 4 5 6 6 7 7 8 8 9 9
2 5 3 7 5 5 4 7 2 7 2 6 0 5
2
2 3 4 5 5 6 6 6 7 8 8 8 8 9 9 9
7 1 1 4 7 3 6 8 5 4 3 5 9 1 4 7
5 6 8 8 9 9 9
8 9 0 8 3 6 8
9
9

11
AVL Tree
• Example: repairing a tree
– 3) Apply left rotation to the subtree
6
0

3 7
0 8

2 4 7 8
3 0 0 7

2 2 3 5 6 7 8 9
0 5 5 0 5 3 4 2

1 2 2 3 3 4 5 6 6 7 7 8 8 9 9
7 2 7 3 7 5 5 4 7 2 7 2 6 0 5

3 4 5 5 6 6 6 7 8 8 8 8 9 9 9
1 1 4 7 3 6 8 5 4 3 5 9 1 4 7
Balanced! 5 6 8 8 9 9 9
8 9 0 8 3 6 8
9
9

12
AVL Tree
• Example: repairing a tree
– 4) Determine whether there is an imbalance in the upper subtree
6
Left rotation 0

3 7
0 8

2 4 7 8
3 0 0 7

2 2 3 5 6 7 8 9
0 5 5 0 5 3 4 2

1 2 2 3 3 4 5 6 6 7 7 8 8 9 9
7 2 7 3 7 5 5 4 7 2 7 2 6 0 5

3 4 5 5 6 6 6 7 8 8 8 8 9 9 9
1 1 4 7 3 6 8 5 4 3 5 9 1 4 7
2 5 6 8 8 9 9 9
8 9 0 8 3 6 8
9
9

13
AVL Tree
• The impact of recursive implementation of
insert(),delete()
– After inserting or deleting a specific node, check for imbalance in
the subtree where the node is located
▪ And perform repairing if needed
– As the recursive function calls return, sequentially check for
imbalance in the parent node’s subtree and proceed with
repairing
– Finally, check up to the root node of the entire tree and return
the root node

14
Outline
• Notice

• Review the previous lecture

• Hashing
– Hash Table
– Hash Function
– Collision Resolution

15
Towards the Most Efficient Data Structure

• Array and Linked List


– Array List

– Linked List

– Time complexity of search, insert and delete


▪ In any case, at least one operation is O(n) 16
Towards the Most Efficient Data Structure

• Binary Search Tree

– Time complexity of search, insert and delete


▪ In the average and best case, the time complexity is O(log n)
while in the worst case, it is O(n)
17
Towards the Most Efficient Data Structure

• AVL Tree

– Time complexity of search, insert and delete


▪ In all cases, the time complexity is O(log n)

18
Towards the Most Efficient Data Structure

• Improvement of time complexity


– For insert, delete and search operations

Linked and Array List BST AVL Tree

Average case
At least one
: O(log n) For all cases
operation is
Worst case : O(log n)
O(n)
: O(n)

Is it possible to consider faster data structures?

Can operations be performed in constant average


time (O(1)) regardless of the amount of stored data?
19
Towards the Most Efficient Data Structure

• Think different - the core limitation of BST


– Why can’t BST avoid an average time complexity of O(log n)?

▪ It is based on the key comparison between the given key and


the keys in the tree
▪ Because it compares keys to find their positions, it cannot have a time
complexity better than O(log n)

20
Towards the Most Efficient Data Structure

• Think different – Hashing


– Can we determine the position directly based on the key?

Position key Item


0
Key An algorithm Position = 3 1
(12)
Item or function 2
(a) 3 12 a
4

– This is the key idea of Hashing!

21
Hashing

• Hashing
– A data structure where the position of a key is determined by
the key’s value
– In other words, the goal is to find the position for storing a
key based on its value without comparing it with the stored
keys
▪ Also, it aims to do this calculation just once (O(1))

22
Hashing

• Components for Hashing


– Hash table and function
Hash Table
Hash v. key
0
Key Hash Hash value = 3 1
(12) Function 2
3 12
4

– Hash table
: A table capable of storing m keys. Each slot has a hash value
ranging from 0 to m-1
– Hash function
: receive an arbitrary key and return one of the hash values 23
Hashing

Hash Table
Hash v. key
0
1
Key (x)
2
3
4

24
Hashing

• Hashing example
– 1) Insert a key value of 1

Hash Table
Hash v. key
0
h(1) = 1
1 1
Key (1)
2
3
4

▪ Time complexity: O(1)

25
Hashing

• Hashing example
– 2) Insert a key value of 15

Hash Table
Hash v. key
0 15
h(15) = 0
Key 1 1
(15) 2
3
4

▪ Time complexity: O(1)

26
Hashing

• Hashing example
– 3) Insert a key value of 24

Hash Table
Hash v. key
0 15
h(24) = 4
Key 1 1
(24) 2
3
4 24

▪ Time complexity: O(1)

27
Hashing

• Hashing example
– 4) Search for a key value of 15

Hash Table
Hash v. key
0 15
h(15) = 0
Key 1 1
(15) 2
3
4 24

▪ Time complexity: O(1)

28
Hashing

• Hashing example
– 5) Delete a key value of 1

Hash Table
Hash v. key
0 15
h(1) = 1
1 1
Key (1)
2
3
4 24

▪ Time complexity: O(1) Since the Hash Function calculates the location for
searching, storing, or deleting keys in one step,
achieving a time complexity of O(1) is possible.
29
Hashing

• Hashing example – hash collision


– 6) Insert a key value of 46

Hash Table
Hash v. key
0 15
h(46) = 1
Key 1 46
(46) 2
3
4 24

30
Hashing

• Hashing example - hash collision


– 7) Insert a key value of 91

Hash Table
Hash v. key
0 15
h(91) = 1
Key 1 46
(91) 2
3
4 24

▪ Since the slot of hash value 1 already has 46, the key 91 cannot be
inserted into the slot pointed to the hash value
: Hash collision
31
Hashing

• Three factors that can cause a hash collision


– 1) Limited hash value range
Hash Table
Hash v. key
0 15
h(91) = 1
Key 1 46
(91) 2
3
4 24

▪ When the hash function maps a large number of keys into a smaller
range of hash values, collisions are more likely due to multiple keys
being assigned to the same hash value.

32
Hashing

• Three factors that can cause a hash collision


– 2) High load factor
Hash Table
Hash v. key
0 15
Key h(28) = 3 1 46
(28) 2
3 28
4 24
Load factor = 4/5
▪ If the number of keys stored in the hash table approaches or exceeds
the number of available slots (known as the load factor), collisions
become more frequent
o load factor: num of saved keys / size of table
33
Hashing

• Three factors that can cause a hash collision


– 3) Poor hash function design
Hash Table
Hash v. key
0 15
h(91) = 1
Key 1 46
(91) 2
Available
slots 3
4 24

▪ A hash function that doesn't distribute keys evenly across the hash table
can lead to clusters of keys in certain slots
▪ Therefore, the goal of hash function design is to evenly distribute
input keys across the entire hash table
34
Hashing

• Three factors that can cause a hash collision


– 1) Limited hash value range
Solution: collision resolution
– 2) High load factor
– 3) Poor hash function design - Solution: consider a more
efficient hash function

– Hash collision can disrupt achieving time complexity of O(1)


▪ Therefore, preventing this becomes a fundamental concept in hashing

• Key concepts in hashing


– 1) Hash function: division and multiplication method
– 2) Collision resolution
▪ Chaining
▪ Open Addressing
35
Hashing

• ADT of Hashing

– Table[] can be an array list or a linked list


– The search(), insert(), and delete() operations in hashing are
almost identical to those in arrays or linked lists.
▪ However, we need to consider hash function and collision resolution 36
Hash Function

• The goal of designing a hash function


– The input keys should be evenly distributed across the entire
hash table for storage

• Representative hash functions


– Division method
– Multiplication method

37
Hash Function

Hash Table
Hash v. key
0 15
h(46) = 1
Key 1 46
(46) 2
3
4 24

38
Hash Function

• Division method
– Pros
▪ The most basic hash function
▪ Simple yet allows for fast computation

– Cons
▪ Excess space an be left relatively unused, leading to memory inefficiency

39
Hash Function

40
Hash Function

• Multiplication method
– Example
▪ m = 65,536 and A = 0.6180339887
Hash Table
▪ Key = 1,025,390 is given
Hash v. key
Hash Function 0
Key 1
(1,025,390) …
57,125 1,025,390
h(1,025,390) = 57,125 …
65,536
xA = 1,025,390*0.6180339887 = 633,725.871673093
0.871673093 * 65,536 = 57,125

41
Hash Function

• Multiplication method
– Pros
▪ Distribute hash values evenly
▪ Simple Implementation
▪ Efficient performance
: when the hash table size is appropriately chosen, operations can often bo
done in constant time on average O(1)

– Cons
▪ Sensitive to constant value (A)
: if it is poorly chosen, it can lead to increased collisions
▪ Limited flexibility
: it may not be as flexible or adaptive to changing data distributions or
hash table sizes
42
Concluding Remarks

• Hashing
– The motivation of hashing
– Hash table and function
– Hash collision

43
Thank you!

E-mail: [email protected]

44

You might also like