0% found this document useful (0 votes)

9 views

11 Hashing

The document discusses hashing as a method for efficient data retrieval, highlighting the importance of hash functions that map keys to specific slots in a hash table. It covers various collision handling techniques such as chaining and open addressing, including linear and quadratic probing, as well as double hashing. Additionally, it addresses the challenges of static and dynamic hashing, emphasizing the need for effective hash functions to minimize collisions and improve search efficiency.

Uploaded by

Ebruhocaninnotlari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

11 Hashing

Uploaded by

Ebruhocaninnotlari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 60

Hashing

BBM371 Data Management

Motivation
• Consider the problem of searching an array for a given value

• If the array is not sorted, the search requires O(n) time

• If the value isn’t there, we need to search all n elements
• If the value is there, we search n/2 elements on average

• If the array is sorted, we can do a binary search

• A binary search requires O(log n) time
• About equally fast whether the element is found or not

• It doesn’t seem like we could do much better

• How about an O(1), that is, constant time search?
• We can do it if the array is organized in a particular way
Hashing
• Suppose we were to come up with a “magic function” that, given a value to
search for, would tell us exactly where in the array to look
• If it’s in that location, it’s in the array
• If it’s not in that location, it’s not in the array

• This function would have no other purpose

• If we look at the function’s inputs and outputs, they probably won’t “make
sense”

• This function is called a hash function because it “makes hash” of its inputs
Hash Function
• Hash function h:
• Mapping from U to the slots of a hash table T[0..m–1].
h : U  {0,1,…, m–1}

• With arrays, key k maps to slot A[k].

• With hash tables, key k maps or “hashes” to slot T[h[k]].

• h[k] is the hash value of key k.

Hashing (cont’d) Bucket: a unit of storage that can store one or more records
Universal set of keys: All possible values that can be stored in a linked list of index entries or records
your domain for key representation. T : set of all bucket addresses

0
U
(universe of keys) h(k1)

h(k4)
K k1 k4 no two elements have the same key
(actual k2 i.e. keys are distinct and
h(k2) the range of keys is 0 to m-1
keys)
k3

h(k3)
Hash function: is a
A: set of all search-key values. mapping function which T[i] = k, if key of data k = i
maps all the set of search T[i] = NULL, otherwise
keys to the address where
m–1
actual records are placed.
Finding the Hash Function
• How can we come up with this magic function?

• In general, we cannot--there is no such magic function 

• In a few specific cases, where all the possible values are known in advance, it has
been possible to compute a perfect hash function

• What is the next best thing?

• A perfect hash function would tell us exactly where to look
• In general, the best we can do is a function that tells us where to start looking!
• The hash function should have a efficient relation to provide it in a constant time.
Example Hash Function
• A hash function is a mathematical formula which, when applied to a key, produces an
integer which can be used as an index for the key in the hash table.
• The main aim of a hash function is that elements should be relatively, randomly, and
uniformly distributed.
• Map a key k into one of the m slots by taking the remainder of k divided by m
(Division Method). That is,
h(k) = k mod m

h(ssn) = ssn mod 100 (i.e., the last two digits)

e.g., if ssn = 10123411 then h(10123411) = 11

A prime not too close to an exact power of 2 is often a good choice for m.
Keys as Natural Numbers
• Hash functions assume that the keys are natural numbers.

• When they are not, have to interpret them as natural numbers.

• It could be strings, large objects

• Example: Interpret a character string as an integer expressed in some

radix notation. Suppose the string is CLRS:

• ASCII values: C=67, L=76, R=82, S=83.

• There are 128 basic ASCII values.
• So, CLRS = 67·1283+76 ·1282+ 82·1281+ 83·1280 = 141,764,947.
Collision
• When two values hash to the same array location, this is called a collision

• There is no hash function that eliminates collisions completely. A good hash

function can only minimize the number of collisions by spreading the
elements uniformly throughout the array.

• Collisions are normally treated as “first come, first served”—the first value
that hashes to the location gets it

• We have to find something to do with the second and subsequent values that
hash to this same location
Collisions

0
U
(universe of keys) h(k1)

h(k4)
K k1 k4
(actual k2
h(k2)=h(k
) 5) Ex: mod m=10, key 75 and 25
keys) k5
k3 collisio
n
h(k3)

m–1
Handling Collisions
0
k1 k4

• Chaining
• Store all elements that hash to the same slot in a linked list. k5 k2 k6

• Store a pointer to the head of the linked list in the hash table k7
k8
k3

slot. m–1

• Open Addressing (closed hashing)

• All elements stored in hash table itself.
• When collisions occur, use a systematic (consistent) procedure
to store elements in free slots of the table.
• probe sequence
Chaining

0
U
(universe of keys) X h(k1)=h(k4)

k1
k4
K
(actual k2
X
k5 k6 h(k2)=h(k5)=h(k6)
keys)
k8 k7
k3 X
h(k3)=h(k7)
h(k8)
m–1
Chaining (cont’d)

0
U
(universe of keys) k1 k4

k1
k4
K
(actual k2 k6
k5 k5 k2 k6
keys)
k8 k7
k3
k7 k3

k8
m–1
Dictionary Operations with Chaining
• Chained-Hash-Insert (T, x)
• Insert x at the head of list T[h(key[x])].
• Worst-case complexity – O(1).

• Chained-Hash-Delete (T, x)
• Delete x from the list T[h(key[x])].
• Worst-case complexity – proportional to length of list with singly-linked lists. O(1)
with doubly-linked lists.

• Chained-Hash-Search (T, k)
• Search an element with key k in list T[h(k)].
• Worst-case complexity – proportional to length of list.
Analysis of Hashing with Chaining:
Worst Case
T

• How long does it take to search for an element with a given 0

key?

• Worst case:
• All n keys hash to the same slot
• Worst-case time to search is (n), plus time to compute the hash chain
function
• Main objective is to provide a hashfunction to minimize the collision. m-1
Open Addressing
• If collision occurs, open addressing scheme probes for some other empty
(or open) location in which to place the item.

• The sequence of locations that we examine is called the probe sequence.

• The process of examining memory locations in the hash table is called

probing.
• There are different open-addressing schemes:
• Linear Probing
• Quadratic Probing
• Double Hashing
Linear Probing
• In linear probing, we search the hash table sequentially starting from
the original hash location.

• If a location is occupied, we check the next location

• We wrap around from the last table location to the first table location if
necessary.
Linear Probing: Example
0 9
• Table Size is 11 (0..10)
1

• Hash Function: h(x) = x mod 11 2 2

3 13
• Insert keys: 4 25
• 20 mod 11 = 9 5 24
• 30 mod 11 = 8 6
• 2 mod 11 = 2
7
• 13 mod 11 = 2  2+1=3
• 25 mod 11 = 3  3+1=4 8 30
• 24 mod 11 = 2  2+1, 2+2, 2+3=5 9 20
• 10 mod 11 = 10 10 10
• 9 mod 11 = 9  9+1, 9+2 mod 11 =0
Linear Probing: Clustering Problem
• One of the problems with linear probing is that table items tend to cluster together in the
hash table.
• there is a higher risk of more collisions where one collision has already taken place.
• This means that the table contains groups of consecutively occupied locations.

• More the number of collisions, higher the probes that are required to find a free location
and lesser is the performance.

• This phenomenon is called primary clustering.

• Clusters can get close to one another, and merge into a larger cluster.
• Thus, the one part of the table might be quite dense, even though another part has relatively few items.

• Primary clustering causes long probe searches and therefore decreases the overall efficiency.
Quadratic Probing
• Primary clustering problem can be almost eliminated if we use quadratic
probing scheme.

• In quadratic probing,
• We start from the original hash location I
• If the location is free, the value is stored in it, else subsequent locations probed are
offset by factors that depend in a quadratic manner on the probe number i.
• If a location is occupied, we check the locations i+12 , i+22 , i+32 , i+42 ...
• We wrap around from the last table location to the first table location if necessary.
Quadratic Probing: Example
0
1
• Table Size is 11 (0..10)
2 2
• Hash Function: h(x) = x mod 11 3 13
4 25
• Insert keys:
5
• 20 mod 11 = 9
• 30 mod 11 = 8 6 24
• 2 mod 11 = 2 7 9
• 13 mod 11 = 2  2+12=3
8 30
• 25 mod 11 = 3  3+12=4
• 24 mod 11 = 2  2+12, 2+22=6 9 20
• 10 mod 11 = 10 10 10
• 9 mod 11 = 9  9+12, 9+22 mod 11, 9+32 mod 11 =7
Double Hashing
• Although quadratic probing is free from primary clustering, it is still liable to what is known as secondary
clustering.
• It means that if there is a collision between two keys, then the same probe sequence will be followed for both.
• Double hashing also reduces clustering. In double hashing, we use two hash functions rather than a single
function.

• In linear probing and and quadratic probing , the probe sequences are independent from the key.

• We can select increments used during probing using a second hash function. The second hash function h2
should be:
h2(key)  0
h2  h1

• We first probe the location h1(key)

• If the location is occupied, we probe the location h 1(key)+h2(key), h1(key)+(2*h2(key)), ...
Double Hashing: Example 0
1
• Table Size is 11 (0..10)
2
3 58
• Hash Functions: h1(x) = x mod 11
h2(x) = 7 – (x mod 7) 4

• Insert keys: 5
• x=58: h1(key) h1(58) 58 mod 11 = 3 6 91
• x=14
• h1(key)  14 mod 11 = 3 7
• h1(key)+h2(key) h1(14) + h2(14)  3+7=10
8
• x=91:
• h1(key)  91 mod 11 = 3 9
• h1(key)+h2(key) h1(91) + h2(91)  3+7=10
10 14
• h1(key)+(2*h2(key) (3+2*7) mod 11=6
Open Addressing: Retrieval &
Deletion
• In open addressing, to find an item with a given key:

• We probe the locations (same as insertion) until we find the desired item or we reach to an
empty location.

• Deletions in open addressing cause complications:

• We CANNOT simply delete an item from the hash table because this new empty (deleted
locations) cause to stop prematurely (incorrectly) indicating a failure during a retrieval.

• Solution: We have to have three kinds of locations in a hash table: Occupied, Empty, Deleted.

• A deleted location will be treated as an occupied location during retrieval and insertion.
Hashing Techniques
• Static Hashing
• One of the problems with static hashing is that we need to know how many records are
going to be stored in the index. If over time a large number of records are added,
resulting in far more records than buckets, lookups would have to search through a large
number of records stored in a single bucket, or in one or more overflow buckets, and
would thus become inefficient.
• Dynamic Hashing: the hash index can be rebuilt with an increased number of
buckets. For example, if the number of records becomes twice the number of
buckets, the index can be rebuilt with twice as many buckets as befor
• Linear
• Extendable
Static Hashing
• With hash based indexing, we assume that we have a function h, which tells us
where to place any given record.
• E.g., page_number = h(value) mod N, N should b prime

Data
Page
Data
Page
Data
Page
::: Data
Page N-
Data
Page N
1 2 3 1
Static Hashing
• A bucket is a unit of storage containing one or more records (a bucket is typically a
disk block).

• In a hash file organization we obtain the bucket of a record directly from its search-
key value using a hash function.

• Hash function h is a function from the set of all search-key values K to the set of all
bucket addresses B.

• Hash function is used to locate records for access, insertion as well as deletion.

• Records with different search-key values may be mapped to the same bucket; thus
entire bucket has to be searched sequentially to locate a record.
Static Hashing: Overflow
• Insertion may cause overflow. The
solution is to create chains of overflow
pages.
• Worst has function maps all search-key
values to the same bucket h
allocated sequentially
and never de-allocated
Data Data Data Data
Primary bucket pages
Page Page Page ::: Data
Page N- Page N
allocated (as needed) 1 2 3 1
when corresponding Overflow
buckets become full Long overflow chains
for 3
degrade performance!
Overflow pages
Deficiencies of Static Hashing
• In static hashing, function h maps search-key values to a fixed set of B bucket
addresses.

• Databases grow with time. If initial number of buckets is too small, performance will degrade
due to too much overflows.

• If file size at some point in the future is anticipated and number of buckets allocated accordingly,
significant amount of space will be wasted initially.

• If database shrinks, again space will be wasted.

• One option is periodic re-organization of the file with a new hash function, but it is very
expensive.

• These problems can be avoided by using techniques that allow the number of buckets
to be modified dynamically.
Dynamic Hashing
• Dynamic = Changing number of Buckets B dynamically

• Two methods
• Extendible (or Extensible) Hashing: Grow B by doubling it
• Linear Hashing: Grow B by incrementing it by 1

• To save storage space both methods can choose to shrink B dynamically

• Must avoid oscillations when removes and additions are both common.
Extendible Hashing
• Idea: Use directory of pointers to buckets,
• double # of buckets B by doubling the directory, splitting just the
bucket that overflowed!
• Directory much smaller than file, so doubling it is much cheaper.
• Only one page of data entries is split. No overflow blocks.
• Trick lies in how hash function is adjusted!
Extendible Hashing Overview
The result of applying a hash function h is treated as a binary number and the l
d bits are interpreted as an offset into the directory

d is referred to as the
global depth of the hash
Extract last d bits file and is kept as part
of the header of the file

To search for a data entry, apply a hash function h to the key

and take the last d bits of its binary representation to get the
bucket number
Extendible Hashing
• h(k) maps keys to a fixed address space

• File pointers point to blocks of records known as buckets,

• where an entire bucket is read by one physical data transfer, buckets may be added to or
removed from the file dynamically

• The (last) d bits are used as an index in a directory array containing entries,
which usually resides in primary memory

• The value d, the directory size (), and the number of buckets change
automatically as the file expands and contracts
Example 00
2
Bucket A
4* 12* 32* 16*
01 Bucket B
13 = 1101 1* 5* 21* 13*
d=2 10
Bucket C
the directory size 11
10*

Bucket D
15* 7* 19*
DIRECTORY

00 Bucket A
4* 12* 32* 16*
01 Bucket B
5 = 101 1* 5* 21*
10
Bucket C
10*
11
Bucket D
15* 7* 19*
DIRECTORY
DATA PAGES
Example
• Directory is array of size 4. LOCAL DEPTH 2
Bucket A
GLOBAL DEPTH 4* 12* 32* 16*
• To find bucket for r, take last `global depth’
# bits of h(r); we denote r by h(r). 2 2
Bucket B
• If h(r) = 5 = binary 101, it is in bucket 00 1* 5* 21* 13*
pointed to by 01. 01
10 2
• Global depth of directory: Max # of bits 11 10*
Bucket C

needed to tell which bucket an entry

belongs to. DIRECTORY 2
Bucket D
15* 7* 19*
• Local depth of a bucket: # of bits used to DATA PAGES
determine if an entry belongs to this bucket.
Local depth should be less than or equal to global depth
Insert an Item
• Locate the bucket
• If there is space in bucket, insert the item.
• If bucket is full, split it (allocate new page, re-distribute).

• If necessary, double the directory.

• If insert causes local depth to become > global depth
• directory is doubled by copying it over and `fixing’ pointer to split image page.
Insert6 =h(r) =
binary 00110
6 (The Easy Case)
6 = binary 00110

Bucket A Bucket A
LOCAL DEPTH 2 LOCAL DEPTH 2

GLOBAL DEPTH 4* 12* 32* 16* GLOBAL DEPTH 4* 12* 32* 16*

2 2 Bucket B 2 2 Bucket B

00 1* 5* 21* 13* 00 1* 5* 21* 13*

01 01
Bucket C Bucket C
10 2 10 2

11 10* 11 10* 6*

Bucket D Bucket D
DIRECTORY 2 DIRECTORY 2
15* 7* 19* 15* 7* 19*

DATA PAGES DATA PAGES

Extendible Hashing: Inserting
Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the given
entry

 Example: insert 13*

00 Bucket A
4* 12* 32* 16*
01 Bucket B
13 = 1101 1* 5* 21* 13*
10
Bucket C
10*
11
Bucket D
15* 7* 19*
DIRECTORY
Insert h(r) = 20 (Causes Doubling)
20 = binary 10100

FULL, hence, split and redistribute! 3 Bucket A

LOCAL DEPTH
Bucket A
LOCAL DEPTH 2
GLOBAL DEPTH 32* 16* 32= 10000
4* 12* 32* 16* 16= 01000
GLOBAL DEPTH
2 2 Bucket B

2 2 Bucket B 00 1* 5* 21* 13*

01
00 1* 5* 21* 13*
10 2 Bucket C
01 11 10* The third bit
Bucket C distinguishes
10 2
2 Bucket D between these
11 10*
DIRECTORY two buckets!
15* 7* 19*
Bucket D
DIRECTORY 2
3 Bucket A2
15* 7* 19* 4 =00100
4* 12* 20* 12=01100
DATA PAGES (`split image' 20=10100
of Bucket A)
Insert h(r) = 20 (Causes Doubling)
20 = binary 10100

LOCAL DEPTH 3 Bucket A LOCAL DEPTH 3

32* 16* 32* 16* Bucket A
GLOBAL DEPTH GLOBAL DEPTH

2 2 Bucket B 3 2

00 1* 5* 21* 13* 000 1* 5* 21* 13* Bucket B

01 001
10 2 Bucket C 010 2
11 10* 011 10* Bucket C
100
2 Bucket D 2
DIRECTORY 101
15* 7* 19* 110 15* 7* 19* Bucket D
111
3 Bucket A2 3
4* 12* 20* DIRECTORY 4* 12* 20* Bucket A2
(`split image'
(`split image' of Bucket A)
of Bucket A)
Double the directory and
increase the global depth
Extendible Hashing: Inserting
Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the given
entry
GLOBAL DEPTH 32* 16* Bucket A

FULL, hence, split!

 Example: insert 9* 3
000 1* 5* 21* 13* Bucket B
001
010
9 = 1001 011 10* Bucket C
100
101
110 15* 7* 19* Bucket D
111

DIRECTORY 4* 12* 20* Bucket A2

(`split image'
of Bucket A)
Extendible Hashing: Inserting
Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the given
entry
GLOBAL DEPTH 32* 16* Bucket A

 Example: insert 9* 3
1 =00001
000 1* 9* Bucket B
9= 01001
001
010
10* Bucket C
9 = 1001 011
100
101 15* 7* 19* Bucket D
110
Almost there… 111
4* 12* 20* Bucket A2
(`split image‘ of A)
DIRECTORY 5 =00101
5* 21* 13* Bucket B2
13=01101
(`split image‘ of B) 20=10101
Extendible Hashing: Inserting
Entries
 Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the given
entry LOCAL DEPTH 3
GLOBAL DEPTH 32* 16* Bucket A

3
 Example: insert 9* 3
000 1* 9* Bucket B
001 2
010
10* Bucket C
9 = 1001 011
100 2

There was no need to 101 15* 7* 19* Bucket D

double the directory! 110 3

111
4* 12* 20* Bucket A2
(`split image‘ of A)
When NOT to double the DIRECTORY 3

directory? 5* 21* 13* Bucket B2

(`split image‘ of B)
Comments on Extendible Hashing
• If directory fits in memory, equality search answered with one disk access; else
two.
• 100MB file, 100 bytes/rec, 4K pages contains 1,000,000 records (as data entries) and
25,000 directory elements; chances are high that directory will fit in memory.
• Directory grows in spurts, and, if the distribution of hash values is skewed, directory can
grow large.

• Delete: If removal of data entry makes bucket empty, can be merged with `split
image’. If each directory element points to same bucket as its split image, we
can halve directory (this is rare in practice).
Linear Hashing
• This is another dynamic hashing scheme, an alternative to Extendible Hashing.
• LH handles the problem of long overflow chains without using a directory, and
handles duplicates.
• Idea: Use a family of hash functions h0, h1, h2, ...
• hi(key) = h(key) mod(2iN); N = initial # buckets
• h is some hash function (range is not 0 to N-1)
• If N = 2d0, for some d0, hi consists of applying h and looking at the last di bits, where di =
d0 + i.
• hi+1 doubles the range of hi (similar to directory doubling)
Linear Hashing: Bucket Split
• When the first overflow occurs (it can occur in any bucket), bucket 0, which is
pointed by p, is split (rehashed) into two buckets:
• The original bucket 0 and a new bucket m.

• A new empty page is also added in the overflown bucket to accommodate the
overflow.

• The search values originally mapped into bucket 0 (using function h0) are now
distributed between buckets 0 and m using a new hashing function h1.
Linear Hashing: Insertion
• Locate bucket to insert
• If bucket to insert into is full:
• Add overflow page and insert data entry.
• (Maybe) Split Next bucket and increment Next.
• A split occurs in case of overflow
• Since buckets are split round-robin, long overflow chains don’t develop!
• Round-robin: the next pointer is indeed iteratively goes on the number of levels
which are number of bits. Next run you need to increase the bit
Linear Hashing: Background Insert 20

3
We have seen what it 32* 16*
2
means to split a bucket… 4* 12* 32* 16*
3
4* 12* 20*

Before After

We have seen
what it means
2 2 2
to add an
4* 12* 32* 16* 4* 12* 32* 16* 20*
overflow page
to a bucket…
Before After
Triggering Splits

• A split performed whenever a bucket overflow occurs is an uncontrolled split.

• Let l denote the Linear Hashing scheme’s load factor, i.e., l = S ∕ b where S is
the total number of records and b is the number of buckets used.

• The load factor achieved by uncontrolled splits is usually between 50–70%,

depending on the page size and the search value distribution.

• In practice, higher storage utilization is achieved if a split is triggered not by

an overflow, but when the load factor l becomes greater than some upper
threshold, which is called controlled split.
Overview of Splitting as Rounds
• Splits occur in a round robin fashion, i.e., as rounds.

Buckets split in this round:

Bucket to be split If h Level ( search key value )
Next is in this range, must use
h Level+1 ( search key value )
Buckets that existed at the
to decide if entry is in
beginning of this round: `split image' bucket.
this is the range of
hLevel
`split image' buckets:
created (through splitting
of other buckets) in this round
Bucket to
be split
Next
Bucket to h
Level
be split
Next

Bucket to
be split
hLevel Next
h
Level
Linear Hashing: Example
43=101011

3 bits 2 bits

32=10000

9=1001

44=10110
!!!!The bucket that is split may not be the same as the one that overflowed! 36=10010
Linear Hashing: Example
Insert 29 (00011101)
Linear Hashing: Example
Insert 22 (00010110)
Example: End of Round
All previous nodes before the next pointer are already splitted out.
We’re gonna increase level by one and move to the beginning.
We’re gonna split in the next round with one more bits.

50=110010
Summary
• Hash-based indexes: best for equality searches, cannot support range
searches.
• Static Hashing can lead to long overflow chains.
• Extendible Hashing avoids overflow pages by splitting a full bucket when a
new data entry is to be added to it. (Duplicates may require overflow pages.)
• Directory to keep track of buckets, doubles periodically.
• Can get large with skewed data; additional I/O if this does not fit in main
memory.
• Linear Hashing avoids a directory by splitting buckets round-robin, and using
overflow pages.

DURACRETEaguidelinefordurabilitybaseddesignofconcretestructures
No ratings yet
DURACRETEaguidelinefordurabilitybaseddesignofconcretestructures
10 pages
2015 - Bratman Et Al - The Benefits of Nature Experience - Improved Affect and Cognition
No ratings yet
2015 - Bratman Et Al - The Benefits of Nature Experience - Improved Affect and Cognition
10 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Hashing: Amar Jukuntla
No ratings yet
Hashing: Amar Jukuntla
22 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Chapter 11 Hashing
No ratings yet
Chapter 11 Hashing
42 pages
DSA LABTASK 12
No ratings yet
DSA LABTASK 12
5 pages
ADS M TECH MID 2
No ratings yet
ADS M TECH MID 2
26 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
Hashing PPT
No ratings yet
Hashing PPT
39 pages
Hashing
No ratings yet
Hashing
35 pages
Hashing
No ratings yet
Hashing
35 pages
Hashing
No ratings yet
Hashing
20 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
unit 1 Hashing
No ratings yet
unit 1 Hashing
61 pages
L04 Hashing
No ratings yet
L04 Hashing
63 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
Study_Material_on_Hashing
No ratings yet
Study_Material_on_Hashing
4 pages
Hash Tables in DS
No ratings yet
Hash Tables in DS
14 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Hash Functions
No ratings yet
Hash Functions
60 pages
Lab 09 - Hashing
No ratings yet
Lab 09 - Hashing
47 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Hashing Updated
No ratings yet
Hashing Updated
26 pages
Hashing
No ratings yet
Hashing
30 pages
Hash Table v2
No ratings yet
Hash Table v2
34 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
Hashing: CSE225: Data Structures and Algorithms
No ratings yet
Hashing: CSE225: Data Structures and Algorithms
14 pages
Hashing PDF
No ratings yet
Hashing PDF
65 pages
Hashing
No ratings yet
Hashing
23 pages
hashing v2 12032018
No ratings yet
hashing v2 12032018
23 pages
DOC-20240131-WA0024.
No ratings yet
DOC-20240131-WA0024.
11 pages
Hashing - Datastructures and Algorithms
No ratings yet
Hashing - Datastructures and Algorithms
32 pages
Hashing new
No ratings yet
Hashing new
48 pages
MODULE-5
No ratings yet
MODULE-5
33 pages
Hash Tables: Dr. Dibakar Saha
No ratings yet
Hash Tables: Dr. Dibakar Saha
26 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Hashing
No ratings yet
Hashing
23 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Hashing
No ratings yet
Hashing
10 pages
Matrix Hashing With Two Level of Collision Resolution: National Institute of Technology Raipur
No ratings yet
Matrix Hashing With Two Level of Collision Resolution: National Institute of Technology Raipur
7 pages
Algo Cha 8
No ratings yet
Algo Cha 8
20 pages
Hashing
No ratings yet
Hashing
56 pages
11-Hashing-Hong Kong (1)
No ratings yet
11-Hashing-Hong Kong (1)
25 pages
Hashing Techniques
No ratings yet
Hashing Techniques
13 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Hashing
No ratings yet
Hashing
57 pages
Hashing Slide
No ratings yet
Hashing Slide
16 pages
Chapter10_HashTables
No ratings yet
Chapter10_HashTables
49 pages
Hashing: Data Structure
No ratings yet
Hashing: Data Structure
17 pages
Chapter 8 - Searching
No ratings yet
Chapter 8 - Searching
44 pages
Full Unit 6 Cse 205 (1)
No ratings yet
Full Unit 6 Cse 205 (1)
20 pages
Hashing ClassNotes
No ratings yet
Hashing ClassNotes
8 pages
Lec 11 Hashing and Collision
No ratings yet
Lec 11 Hashing and Collision
16 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
Hashing: Data Structure
No ratings yet
Hashing: Data Structure
17 pages
Hashing
No ratings yet
Hashing
25 pages
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
The Red Book of Mathematical Problems
From Everand
The Red Book of Mathematical Problems
Kenneth S. Williams
No ratings yet
Unit+11 Electrostatics
No ratings yet
Unit+11 Electrostatics
12 pages
TL001-2025-OptimaLINE
No ratings yet
TL001-2025-OptimaLINE
5 pages
Reading Papyri Writing Ancient History Second Edition Roger S Bagnall instant download
100% (2)
Reading Papyri Writing Ancient History Second Edition Roger S Bagnall instant download
54 pages
2018 Thermo King Container Price File_010318
No ratings yet
2018 Thermo King Container Price File_010318
168 pages
Emm
No ratings yet
Emm
1 page
Method Statement FOR Mackintosh Probe Test: Project
100% (2)
Method Statement FOR Mackintosh Probe Test: Project
6 pages
GTX 3XX Part 23: AML STC Installation Manual
No ratings yet
GTX 3XX Part 23: AML STC Installation Manual
352 pages
Owner's Manual / Manual Del Propietario: Pressure Washer / Lavadora de Presión Model / Modelo 01903
No ratings yet
Owner's Manual / Manual Del Propietario: Pressure Washer / Lavadora de Presión Model / Modelo 01903
36 pages
Trimble S8 - DATASHEET
0% (1)
Trimble S8 - DATASHEET
4 pages
Practical 1&2
No ratings yet
Practical 1&2
32 pages
Data Modeling in Power BI
100% (1)
Data Modeling in Power BI
15 pages
Econometric Study On Malaysia's Palm Oil Position in The
No ratings yet
Econometric Study On Malaysia's Palm Oil Position in The
8 pages
CACTUSSSSS
No ratings yet
CACTUSSSSS
13 pages
PROFESSIONAL ETHICS IN FORENSIC PHARMACY
No ratings yet
PROFESSIONAL ETHICS IN FORENSIC PHARMACY
14 pages
Pastomaster RTL: Apply Identification Plate
No ratings yet
Pastomaster RTL: Apply Identification Plate
49 pages
Soal Skom4209 tmk1 2
No ratings yet
Soal Skom4209 tmk1 2
2 pages
Tugas 1 Bahasa Inggris Niaga
No ratings yet
Tugas 1 Bahasa Inggris Niaga
3 pages
Object Oriented Analysis and Design - Part1 (Analysis) : Ibm Ooad 833
No ratings yet
Object Oriented Analysis and Design - Part1 (Analysis) : Ibm Ooad 833
11 pages
Java Lecture 2
No ratings yet
Java Lecture 2
3 pages
MJE182 Low Voltage Hish Speed Switching NPN Transistor
No ratings yet
MJE182 Low Voltage Hish Speed Switching NPN Transistor
7 pages
Growth and Structure of L1 Ordered Fept Films On Gaas (001) : Submitted To: J. Phys.: Condens. Matter
No ratings yet
Growth and Structure of L1 Ordered Fept Films On Gaas (001) : Submitted To: J. Phys.: Condens. Matter
16 pages
5TH - Experiencing English New
100% (2)
5TH - Experiencing English New
65 pages
12MW, 3MW Protection
No ratings yet
12MW, 3MW Protection
6 pages
Customer Hierarchy in SAP SD PDF
No ratings yet
Customer Hierarchy in SAP SD PDF
9 pages
Ansi N45.2.9
No ratings yet
Ansi N45.2.9
6 pages
Waves and Sound Notes and Questions
No ratings yet
Waves and Sound Notes and Questions
28 pages
Bromma, Eh 5 U
No ratings yet
Bromma, Eh 5 U
4 pages
Monte Carlo Method
No ratings yet
Monte Carlo Method
2 pages

11 Hashing

Uploaded by

11 Hashing

Uploaded by

Hashing

BBM371 Data Management

• If the array is not sorted, the search requires O(n) time

• If the array is sorted, we can do a binary search

• It doesn’t seem like we could do much better

• This function would have no other purpose

• With arrays, key k maps to slot A[k].

• With hash tables, key k maps or “hashes” to slot T[h[k]].

• h[k] is the hash value of key k.

• In general, we cannot--there is no such magic function 

• What is the next best thing?

h(ssn) = ssn mod 100 (i.e., the last two digits)

e.g., if ssn = 10123411 then h(10123411) = 11

• When they are not, have to interpret them as natural numbers.

• Example: Interpret a character string as an integer expressed in some

• ASCII values: C=67, L=76, R=82, S=83.

• There is no hash function that eliminates collisions completely. A good hash

• Open Addressing (closed hashing)

• How long does it take to search for an element with a given 0

• The sequence of locations that we examine is called the probe sequence.

• The process of examining memory locations in the hash table is called

• If a location is occupied, we check the next location

• Hash Function: h(x) = x mod 11 2 2

• This phenomenon is called primary clustering.

• We first probe the location h1(key)

• Deletions in open addressing cause complications:

• If database shrinks, again space will be wasted.

• To save storage space both methods can choose to shrink B dynamically

To search for a data entry, apply a hash function h to the key

• File pointers point to blocks of records known as buckets,

needed to tell which bucket an entry

• If necessary, double the directory.

00 1* 5* 21* 13* 00 1* 5* 21* 13*

DATA PAGES DATA PAGES

 Example: insert 13*

FULL, hence, split and redistribute! 3 Bucket A

2 2 Bucket B 00 1* 5* 21* 13*

LOCAL DEPTH 3 Bucket A LOCAL DEPTH 3

00 1* 5* 21* 13* 000 1* 5* 21* 13* Bucket B

FULL, hence, split!

DIRECTORY 4* 12* 20* Bucket A2

There was no need to 101 15* 7* 19* Bucket D

double the directory! 110 3

directory? 5* 21* 13* Bucket B2

• A split performed whenever a bucket overflow occurs is an uncontrolled split.

• The load factor achieved by uncontrolled splits is usually between 50–70%,

• In practice, higher storage utilization is achieved if a split is triggered not by

Buckets split in this round:

You might also like