Lecture 09 - Searching (Updated)
Lecture 09 - Searching (Updated)
Searching
Array and Tree
Lecture Recap
• In our last session discuss following concepts
• Infix
• Pree fix
• Post fix
Outline
• This topic covers Searching
• Serial search: average case O(n)
• Binary search: average case O(log2n)
• Hashing
• Open address hashing
• Linear probing
• Double hashing
• Chained hashing
• Average number of elements examined is function of load
factor a.
Problem: Search
• We are given a list of records.
• Each record has an associated key.
• Give efficient algorithm for searching for a record containing a
particular key.
• Efficiency is quantified in terms of average time analysis (number of
comparisons) to retrieve an item.
Search
[0] [1] [2] [3] [4] [ 700 ]
…
Number 281942902 Number 233667136 Number 580625685
Number 701466868 Number 506643548 Number 155778322
3 6 7 11 32 33 53
Binary Search
Example: sorted array of integer keys. Target=7.
3 6 7 11 32 33 53
3 6 7 11 32 33 53
3 6 7 11 32 33 53
3 6 7 11 32 33 53
3 6 7 11 32 33 53
3 6 7 11 32 33 53
3 6 7 11 32 33 53
3 6 7 11 32 33 53
3 6 7 11 32 33 53
3 6 7 11 32 33 53
11
6 33
3 7 32 53
Search for target = 7
Find midpoint:
3 6 7 11 32 33 53
Start at root:
11
6 33
3 7 32 53
Search for target = 7
Search left subarray:
3 6 7 11 32 33 53
3 7 32 53
Search for target = 7
Find approximate midpoint of subarray:
3 6 7 11 32 33 53
3 7 32 53
Search for target = 7
Search right subarray:
3 6 7 11 32 33 53
3 7 32 53
Binary Search: Analysis
• Worst case complexity?
• What is the maximum depth of recursive calls in binary search as
function of n?
• Each level in the recursion, we split the array in half (divide by two).
• Therefore maximum recursion depth is floor(log2n) and worst case =
O(log2n).
• Average case is also = O(log2n).
Can we do better than O(log2n)?
• Average and worst case of serial search = O(n)
• Average and worst case of binary search = O(log2n)
...
What
What is
is aa Hash
Hash Table
Table ?? [4]
Number 506643548
...
What
What is
is aa Hash
Hash Table
Table ?? [4]
Number 506643548
...
What
What is
is aa Hash
Hash Table
Table ??
• When a hash table is in use,
some spots contain valid
records, and other spots are
"empty".
...
Open
Open Address
Address Hashing
Hashing Number 580625685
...
Inserting
Inserting aa New
New Record
Record
Number 580625685
...
Number 580625685
3
What is (580625685 % 701) ?
...
Number 580625685
[3]
...
Inserting
Inserting aa New
New Record
Record
• The hash value is used for
the location of the new
record.
...
Collisions
Collisions Number 701466868
...
Collisions
Collisions
Number 701466868
When
Whenaacollision
collisionoccurs,
occurs,
move
moveforward
forwarduntil
untilyou
you
find
findan
anempty
emptyspot.
spot.
[0] [1] [2] [3] [4] [5] [ 700]
Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 155778322
...
Collisions
Collisions Number 701466868
When
Whenaacollision
collisionoccurs,
occurs,
move
moveforward
forwarduntil
untilyou
you
find
findan
anempty
emptyspot.
spot.
[0] [1] [2] [3] [4] [5] [ 700]
Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 155778322
...
Collisions
Collisions Number 701466868
When
Whenaacollision
collisionoccurs,
occurs,
move
moveforward
forwarduntil
untilyou
you
find
findan
anempty
emptyspot.
spot.
[0] [1] [2] [3] [4] [5] [ 700]
Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 155778322
...
Collisions
Collisions
• This is called a collision,
because there is already
another valid record at [2].
The
Thenewnewrecord
recordgoes
goes
in
inthe
theempty
emptyspot.
spot.
[0] [1] [2] [3] [4] [5] [ 700]
Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 701466868 Number 155778322
...
Searching
Searching for
for aa Key
Key
Number 701466868
...
Number 701466868
Not me.
...
Number 701466868
Not me.
...
Number 701466868
Not me.
...
Number 701466868
Yes!
...
Number 701466868
Yes!
...
Deleting
Deleting aa Record
Record
• Records may also be deleted from a hash table.
Please
delete me.
...
Deleting
Deleting aa Record
Record
• Records may also be deleted from a hash table.
• But the location must not be left as an ordinary
"empty spot" since that could interfere with searches.
...
Deleting
Deleting aa Record
Record
• Records may also be deleted from a hash table.
• But the location must not be left as an ordinary
"empty spot" since that could interfere with searches.
• The location must be marked in some special way so
that a search can tell that the spot used to have
something in it.
...
Hashing
Hashing
• Hash tables store a collection of records with keys.
• The location of a record depends on the hash value of the record's
key.
• Open address hashing:
• When a collision occurs, the next available location is used.
• Searching for a particular key is generally quick.
• When an item is deleted, the location must be marked in a special way, so
that the searches know that the spot used to be used.
• See text for implementation.
Open Address Hashing
• To reduce collisions…
• Use table CAPACITY = prime number of form 4k+3
• Hashing functions:
• Division hash function: key % CAPACITY
• Mid-square function: (key*key) % CAPACITY
• Multiplicative hash function: key is multiplied by
positive constant less than one. Hash function
returns first few digits of fractional result.
Clustering
• In the hash method described, when the insertion
encounters a collision, we move forward in the
table until a vacant spot is found. This is called
linear probing.
• Problem: when several different keys are hashed to
the same location, adjacent spots in the table will
be filled. This leads to the problem of clustering.
• As the table approaches its capacity, these clusters
tend to merge. This causes insertion to take a long
time (due to linear probing to find vacant spot).
Double Hashing
• One common technique to avoid cluster is called double hashing.
• Let’s call the original hash function hash1
• Define a second hash function hash2
Record whose
Record whose Record whose
key hashes
key hashes key hashes
to 0
to 3
…
to 1
Record whose
Record whose Record whose
key hashes
key hashes key hashes
to 0
to 1 to 3
… … …
Time Analysis of Hashing
• Worst case: every key gets hashed to same array index! O(n) search!!
• Luckily, average case is more promising.
• First we define a fraction called the hash table load factor:
½ (1+ 1/(1-a))