0% found this document useful (0 votes)
61 views55 pages

Unit2 Hashing DSA

The document outlines the syllabus for an Advanced Data Structures and Algorithms course, focusing on hashing techniques. It covers objectives, outcomes, and various hashing methods, including hash functions and collision resolution techniques. Additionally, it discusses the advantages of hashing, its applications in real-world scenarios, and the importance of efficient data retrieval.

Uploaded by

ciyixa8792
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views55 pages

Unit2 Hashing DSA

The document outlines the syllabus for an Advanced Data Structures and Algorithms course, focusing on hashing techniques. It covers objectives, outcomes, and various hashing methods, including hash functions and collision resolution techniques. Additionally, it discusses the advantages of hashing, its applications in real-world scenarios, and the importance of efficient data retrieval.

Uploaded by

ciyixa8792
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Department Name: Computer Engineering

Course Name:Advanced Data Structures and Algorithms


Course Code: C0PCC403

Name of faculty member: N. S. Patil


Unit 2:Hashing: Syllabus
Unit 2 Hashing: objective and outcome

OBJECTIVE:

• To introduce advanced data structures like Hash tables, Skip list to


solve complex problems in various domains.
• To explain various hash functions.
• To introduce various collision resolution techniques.

OUTCOME:

• Identify and articulate the complexity goals and benefits of a good


hashing scheme for real-world applications.
• Analyze the algorithmic solutions for resource requirements and
optimization
Need of hashing
Suppose we want to design a system for storing students records and we
want to perform following operations efficiently:
• Insert, Search and Delete operations on the basis of student Id.

class Studentinfo
{
long Id; // Unique Student Id
String name; // Student name
String class; // Student class
}

Possible data structure options and their respective time complexity


• A array implementation would take O(log n)time if binary search is
used.
• A linked list implementation would take O(n) time.

Is there an alternative to get O(1) access time ?


•Hashing is a technique that is used to uniquely identify a specific object from a group of

similar objects.
• Some examples of how hashing is used in our lives include:
• In school and colleges, each student is assigned a unique roll number that can be used to
retrieve information about them.
• In libraries, each book is assigned a unique number that can be used to determine
information about the book, such as its exact position in the library or the users it has
been issued to etc.
• In both these examples the students and books were hashed to a unique number.

5
• Assume that you have an object and you want to assign a key to it to make
searching easy. To store the key/value pair, you can use a simple array like a
data structure where keys (integers) can be used directly as an index to store
values. However, in cases where the keys are large and cannot be used directly
as an index, you should use hashing.

6
In hashing, large keys are converted into small keys by using hash functions.
• The values are then stored in a data structure called hash table.
• The idea of hashing is to distribute entries (key/value pairs) uniformly across an
array.
• Each element is assigned a key (converted key).
• By using that key you can access the element in O(1) time. Using the key, the
algorithm (hash function) computes an index that suggests where an entry can be
found or inserted.

7
• Hashing is implemented in two steps:
• An element is converted into an integer by using a hash
function. This element can be used as an index to store the
original element, which falls into the hash table.
• The element is stored in the hash table where it can be quickly
retrieved using hashed key.
• hash = hashfunc(key) index
= hash % array_size

8
• Advantages
• The main advantage of hash tables over other table data structures is speed.
• This advantage is more apparent when the number of entries is large
(thousands or more).
• Hash tables are particularly efficient when the maximum number of entries can
be predicted in advance, so that the bucket array can be allocated once with the
optimum size and never resized.

9
Hashing
Hashing is the process of indexing and retrieving element in a data structure to provide
faster way of finding the element using the hash key.

(With hashing we get O(1) search time on average (under reasonable assumptions) and O(n) in
worst case.)
• A hash function is any function that can be used to map a data set of an arbitrary
size to a data set of a fixed size, which falls into the hash table.
• The values returned by a hash function are called hash values, hash codes, hash sums, or
simply hashes.

10
Basic concepts : hash table
• Hash table is a data structure used for storing and retrieving
data quickly.
• All data is inserted into hash table based on hash key value.
It is used to map the data with the index in the hash table.
Other basic concepts of hashing
• Buckets: A bucket in a hash file is unit of storage (typically a disk
block) that can hold one or more records.The hash table consists of b
buckets and each bucket consists of s slots. (Usually s = 1)

• Collision: Collision is situation in which hash function returns the


same address for more than one record.

• Probe: Alternative list of location produces after collision occurs.

• Synonym: The set of keys that hash to the same location are called as
synonyms.
Other basic concepts of hashing
• Overflow: When hash table becomes full and new record needs to be
inserted then it is called overflow. An overflow occurs when we hash a
new identifier into a full bucket

• Perfect hash function: It is a function that maps distinct key elements


into hash table with no collision

• Load factor or Load density of hash table:


lo = n/m
n = no. of elements stored in the table
m = size of the table
Hash Functions
• A good hash function should:
· Minimize collisions.

· Be easy and quick to compute.

· Distribute key values evenly in the hash table.

· Use all the information provided in the key.


Division Method
Idea:
• Computes hash value from key using the % operator.
• Map a key k into one of the m slots by taking the remainder of k divided by m
h(k) = k mod m

Example: k=1276, n=10,


h(1276) = 1276 mod 10 = 6

• Advantage: fast, requires only one operation


• Disadvantage:
• Certain values of m are not good choice, e.g.,
• power of 2 : Table size that is a power of 2 like 32 and 1024 should be avoided, for it
leads to more collisions.

• non-prime numbers : Generally a prime number is a best choice which will spread
keys evenly. Prime numbers not close to powers of 2 are better table size values.
Multiplication Method
Idea:
• Multiply key k by a constant A, where 0 < A < 1
• Extract the fractional part of k A and multiply the fractional part by m
• Take the floor of the result
h(k) = m (k A mod 1)
fractional part of kA = kA - kA
Example: k=123, m=100, A=0.618033
h(123) = 100 (123 * 0.618033 mod 1)
= 100 (76.018059 mod 1)
= 100 (0.018059) = 1
• Advantage: Value of m is not critical and it can work with any value of A
• Disadvantage: Slower than division method
Digit Extraction method
Idea
• Selected digits are extracted form the key and used as address

Address = selected digits from key

Example: If six digit employee number is 379245 then select first three digit as the
index so 379 is the key address.

• Disadvantages:
· May not evenly distribute key values in the hash
table
Folding/Digit Folding
Idea
•It involves splitting keys into two or more parts and then combining the parts to
form the hash addresses.

•To map the key 25936715 to a range between 0 and 9999, we can:
· split the number into two as 2593 and 6715 and
· add these two to obtain 9308 as the hash value.

•Very useful if we have keys that are very large.

• Advantage:
· Fast and simple especially with bit patterns
· It is ability to transform non-integer keys into integer values.
Mid-square method
Idea
•The key is squared and the middle part of the result taken as the hash value.

•To map the key 3121 into a hash table of size 1000, we square it 31212 =
9740641 and extract 406 as the hash value.

• Advantage : Works well if the keys do not contain a lot of leading or trailing
zeros.

• Disadvantage:
· Selection of middle part
· Non-integer keys have to be pre-processed to obtain
corresponding integer values.
Collision handling techniques
• Separate chaining

• Open addressing
-Linear Probing
-Quadratic Probing
-Double Probing
Separate chaining
• Maintain array of linked list
• Separate list is maintained for all elements mapped to the
same value
Separate chaining: pros and cons
PROS:
• Collision resolution is simple
• No problem of load factor can hold more number of elements
• Table size need not be prime number

CONS:
• Implementation of separate data structure (linked list) required for chains
• The main cost of chaining is the extra space required for linked list
Open addressing : Linear probing
• Table remains a simple array of size m

• On insert(x),
First compute h(x)= x mod m,
if the collision occur,
find another location by sequentially searching for the
next available slot

• Go to h(x)+1, h(x)+2 etc..


Insert following keys into hash table using linear
probing where table size m=7 and h(x)= x mod m,
keys={ 76,93,40,47,10,55}
Types of linear probing
Linear probing with chaining (without replacement)
• Excessive collision when occurs it becomes very difficult to maintain indexes of same
hash key
• Extra field is added to maintain chain Key Chain
Example: Let M = 10 , H(X)= X MOD M, 0 0 -1
KEYS={0,1,4,71,64,89,11,33} 1 1 2

2 71 3
11 -1
3
4 5
4
64 -1
5
33 -1
6
-1
7 -1
8 89 -1
9
Types of linear probing
Linear probing with chaining (with replacement)
• Problem of misplaced starting location of the chain is handled.
• Extra field is added to maintain chain Key Chain
Example: Let M = 10 , H(X)= X MOD M, 0 -1 0 -1
0 1 2 1 3
KEYS={0,1,4,71,64,89,11,22} 1 71 3 22 -1
if add 22 : 1->11->71 2 11 -1 11 6
3 4 5 4 5
4 64 -1 64 -1

5 22 -1 71 -1

6 -1
7 -1

8 89 -1

9
•The keys 12, 18, 13, 2, 3, 23, 5 and 15 are inserted into an initially empty
•hash table of length 10 using open addressing with hash function h(k) = k
• mod 10 and linear probing. What is the resultant hash table?

• Given the following input (4322, 1334, 1471, 9679, 1989, 6171, 6173,
4199) and the hash function x mod 10, which of the following statements
are true?
• i. 9679, 1989, 4199 hash to the same value ii. 1471, 6171 has to the same
value iii. All elements hash to the same value iv. Each element hashes to a
different value

• A) I only B) II only c) I &II d)none


After inserting 6 values into an empty hash table, the table is as shown
below. Which one of the following choices gives a possible order in which the
key values could have been inserted in the table?

((QUESTION)) A hash table of length 10 uses open addressing with hash function h(k)=k
mod 10, and linear probing.

((OPTION_A)) 46, 42, 34, 52, 23, 33

((OPTION_B)) 34, 42, 23, 52, 33, 46

((OPTION_C)) 46, 34, 42, 23, 52, 33

((OPTION_D)) 42, 46, 33, 23, 34, 52


Primary clustering problem
 Long chunks of occupied slots are created
 Increases search time
Quadratic hashing/probing
• It is one of the ways to reduce “Primary clustering problem”

• Resolve collisions by examining certain cells away from the original


probe point

• Collision policy:
- Start from original hash location i
- If collision occur search for i+12, i+22, i+32…..

• Hash function :
hi(x) = (h(x) + i 2) mod m
where i = 0,1,2,3…….
Insert following keys into hash table using quadratic probing where table
size m=7 and h(x)= x mod m, keys={ 76,40,48,5,55}
Double hashing
• It reduces clustering in a better way
• Use primary hash function h1 (k) to determine the first slot
• Use a second hash function h2 (k) to determine the increment for the probe
sequence

h(k,i) = (h1(k) + i h2(k) ) mod m, i=0,1,...

• Initial probe: h1(k)


• Second probe is offset by h2(k) mod m, so on ...
• Advantage: avoids clustering
Insert following keys into hash table using quadratic probing where table size m=7
and h1(x)= x mod m, H2(x)= Prime- (x mod prime),keys={76,40,48,5,55}
In second hash function take prime number smaller than table size.
Open addressing: pros and cons
PROS:
• All data items are stored in the hash table itself no need of separate
data structure
• More efficient storage-wise

CONS:
• Dependent on choosing a proper table size
• The keys of the objects to be hashed must be distinct
Hash table overflow
• An overflow occurs when the home bucket for a new pair (key, element) is full.
• We may tackle overflows by searching the hash table in some systematic manner for
a bucket that is not full.
Linear probing (linear open addressing).
Quadratic probing.
Random probing.
• Eliminate overflows by allowing each bucket to keep a list of all pairs for which it is
home bucket.
Array linear list.
Chain.
• Open addressing is performed to ensure that all elements are stored directly into the
hash table
Q.Insert following values into hash table of size 10 using quadratic probing:
37,90,55,22,11,17,49,87
Q.Insert following values into hash table of size 10 using double hashing:
37,90,45,22,17,49,55
Q. For the given set of values 35,36,25,47,2501,129,65,29,16,14,99.Create hash
table of size 15 and resolve collision using quadratic probing and double
hashing

36
Some Applications of Hash Table
• Database systems: Specifically, those that require efficient random access.
Generally, database systems try to optimize between two types of access methods:
sequential and random. Hash tables are an important part of efficient random
access because they provide a way to locate data in a constant amount of time.

• Data dictionaries: Data structures that support adding, deleting, and searching for
data. Although the operations of a hash table and a data dictionary are similar,
other data structures may be used to implement data dictionaries. Using a hash
table is particularly efficient.

• Symbol tables: The tables used by compilers to maintain information about


symbols from a program. Compilers access information about symbols frequently.
Therefore, it is important that symbol tables be implemented very efficiently.
Some Applications of Hash Table
• Network processing algorithms: Hash tables are fundamental components of
several network processing algorithms and applications, including route
lookup, packet classification, and network monitoring.

• File System : The hashing is used for the linking of the file name to the path of
the file. To store the correspondence between the file name and path, and the
physical location of that file on the disk, the system uses a map, and that map is
usually implemented as a hash table.

• Password Verification: Cryptographic hash functions are very commonly used


in password verification

• Pattern Matching: The hashing is also used to search for patterns in the
strings. Rabin-karp algorithm use hashing for the searching of a pattern in a
string The pattern matching is also used to detect plagiarism.
Problems for which hash tables are not suitable
• Problems for which data ordering is required.
- Hash table is an unordered data structure, certain
operations like iterating through the keys in order efficiently

• Problems in which the data does not have unique keys.


- Open-addressed hash tables cannot be used if the data does not
have unique keys. An alternative is use separate-chained hash tables.
Cuckoo Hashing
• It was described by Rasmus Pagh and Flemming Friche Rodler in the year 2001. Cuckoo hashing
is a type of closed hashing.
• It uses two hash functions and two tables to avoid collisions. We pass our key to the first hash
function to get a location in the first table. If that location is empty, we store the key and stop.
• Cuckoo hashing applies the idea of multiple-choice and relocation together and guarantees
O(1) worst case lookup time!
• Multiple-choice: We give a key two choices the h1(key) and h2(key) for residing.
• Relocation: It may happen that h1(key) and h2(key) are preoccupied. This is resolved by
imitating the Cuckoo bird: it pushes the other eggs or young out of the nest when it hatches.
Analogously, inserting a new key into a cuckoo hashing table may push an older key to a
different location. This leaves us with the problem of re-placing the older key.

• If the alternate position of older key is vacant, there is no problem.


• Otherwise, the older key displaces another key. This continues until the procedure finds a vacant position,
or enters a cycle. In the case of a cycle, new hash functions are chosen and the whole data structure is
‘rehashed’. Multiple rehashes might be necessary before Cuckoo succeeds.
40
Input:
{20, 50, 53, 75, 100, 67, 105, 3, 36, 39} Hash Functions:
h1(key) = key%11 h2(key) = (key/11)%11

41
Let’s start by inserting 20 at its possible position in
the first table determined by h1(20):

42
Next: 50

43
Next: 53. h1(53) = 9. But 20 is already there at 9. We
place 53 in table 1 & 20 in table 2 at h2(20)

44
Next: 75. h1(75) = 9. But 53 is already there at 9. We place 75 in table 1 & 53 in table
2 at h2(53)

45
Next: 100. h1(100) = 1

46
Next: 67. h1(67) = 1. But 100 is already there at 1. We place 67 in table 1 &
100 in table 2

47
Next: 105. h1(105) = 6. But 50 is already there at 6. We place
105 in table 1 & 50 in table 2 at h2(50) = 4. Now 53 has been
displaced. h1(53) = 9. 75 displaced: h2(75) = 6

48
Next: 3. h1(3) = 3

49
Next: 36. h1(36) = 3. h2(3) = 0.

50
Next: 39. h1(39) = 6. h2(105) = 9. h1(100) = 1. h2(67) = 6. h1(75) = 9. h2(53) = 4. h1(50) =
6. h2(39) = 3.
Here, the new key 39 is displaced later in the recursive calls to place 105, which it
displaced.

51
Example 2:
• Let our key set be . Further, ={20, 33, 6, 45, 61, 11, 231, 90, 101, 122}. Let h1(x)
= xmod11 and h2(x)=xmod13 . We’ll use two hash tables, T1(using h1 ),
and T2 (using h2). Both tables have 15 cells.

52
53
Text Book & Reference Books
• Text Books:

1. Horowitz, Sahani, Dinesh Mehata, ―Fundamentals of Data Structures in C++‖, Galgotia Publisher, ISBN: 8175152788,
9788175152786.
2. M Folk, B Zoellick, G. Riccardi, ―File Structures‖, Pearson Education, ISBN:81-7758-37-5

• Reference Books:

1. A. Aho, J. Hopcroft, J. Ulman, ―Data Structures and Algorithms‖, Pearson Education, 1998, ISBN-0-201-43578-0.
2. Sartaj Sahani, ―Data Structures, Algorithms and Applications in C++‖, Second Edition, University Press, ISBN:81-7371522 X.
3. G A V Pai, ―Data Structures and Algorithms‖, The McGraw-Hill Companies, ISBN -9780070667266.
• Online Sites:

1. https://www.geeksforgeeks.org/data-structures/?ref=shm

2. https://www.tutorialspoint.com/data_structures_algorithms/index.htm

54
Thank You

You might also like