0% found this document useful (0 votes)
173 views

Dynamic Hashing and Indexing

A type of directory, i.e., an array of 2d bucket addresses2is maintained, where d is called the global depth of the directory. A local depth dP2stored with each bucket2specifies the number of bits on which the bucket contents are based. The number of buckets changes dynamically due to coalescing and splitting of buckets.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
173 views

Dynamic Hashing and Indexing

A type of directory, i.e., an array of 2d bucket addresses2is maintained, where d is called the global depth of the directory. A local depth dP2stored with each bucket2specifies the number of bits on which the bucket contents are based. The number of buckets changes dynamically due to coalescing and splitting of buckets.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Dynamic Hashing

 Good for database that grows and shrinks in size


 Allows the hash function to be modified dynamically
 Extendable hashing – one form of dynamic hashing
 This hashing scheme take advantage of the fact that the result of
applying a hashing function is a non-negative integer which can be
represented as a binary number- a string of bits.
 a type of directory, i.e., an array of 2d bucket addresses—is
maintained, where d is called the global depth of the directory.
 A local depth d’—stored with each bucket—specifies the number
of bits on which the bucket contents are based
 Value of d grows and shrinks as the size of the database grows and
shrinks.
 Thus, actual number of buckets is < 2d
 The number of buckets changes dynamically due to coalescing
and splitting of buckets.

Database System Concepts 12.1 ©Silberschatz, Korth and Sudarshan


Splitting and Coalescing of Buckets
 Splitting of buckets is done when an overflow occurs; the value
of d is incremented by one.
For a bucket whose hash value starting with 01, after splitting,
first contains records whose hash value start with 010 and the
other with 011

 Coalescing occurs when records are deleted, i.e. d>d’; The


value of d is decremented by one.

Database System Concepts 12.2 ©Silberschatz, Korth and Sudarshan


Extendible Hashing - Example

Record K h(K) h(K)2


rec1 2639 1 00001
rec2 3760 16 10000
rec3 4692 20 10100
rec4 4871 7 00111
rec5 5659 27 11011
rec6 1821 29 11101
rec7 1074 18 10010
rec8 2115 11 01011
rec9 1620 20 10100
rec10 2428 28 11100
rec11 3943 7 00111
rec12 4750 14 01110
rec13 6975 31 11111
rec14 4981 21 10101
rec15 9208 24 11000
d1 = local depth
d = global depth

d1 = 1
Directory rec 1
Locations rec 4
splitting splitting
bucket bucket
d1=0
rec 1 0
rec 2 1
d=0 d=1
record 3 = rec 2 d1 = 1

overflow!! rec 3

record 5 =
overflow!!

NEXT
d1 = 1
rec 1
rec 4

00 splitting
01 d1 = 2
bucket
rec 2
10
rec 3 record 7 =
11
rec 5 d1 = 2 overflow!!
d=2
rec 6

NEXT
splitting
rec 1 d1 = 1 bucket
rec 4 record 8 =
000
d1 = 3
overflow!!
001 rec 2
010 rec 7
011 d1 = 3
rec 3
100
101
110
d1 = 2
111 rec 5
rec 6
d=3

NEXT
d1 = 3
rec 1 NEXT

d1 = 3
rec 4

d1 = 2
rec 8
000
001
010
011
100 rec 2 d1 = 3

101 rec 7
110 rec 3 d1 = 3

111 rec 9
splitting
d=3
rec 5
d1 = 2
bucket
rec 6 record 10 =
overflow!!
d1 = 3
rec 1 NEXT

d1 = 3
rec 4
rec 11
d1 = 2
rec 8
000
rec 12
001
010
011
100 rec 2 d1 = 3

101 rec 7
110 rec 3 d1 = 3

111 rec 9
d=3 d1 = 3
rec 5
splitting
bucket
d1 = 3
rec 6
rec 10 record 13 =
overflow!!
d1 = 3
rec 1

0000 d1 = 3
0001 rec 4
0010 rec 11
0011 rec 8 d1 = 2

0100 rec 12
0101
d1 = 3
0110 rec 2
0111 rec 7
1000 rec 3 d1 = 3

1001 rec 14
1010 d1 = 3
rec 5
1011
rec 15
1100
d1 = 4
1101 rec 6
1110 rec 10
d1 = 4
1111 rec 13
d=4
Advantages and Disadvantages
 Benefits of extendable hashing:
 Hash performance does not degrade with growth of file
 Minimal space overhead
 Disadvantages of extendable hashing
 Extra level of indirection to find desired record
 Bucket address table may itself become very big (larger than
memory)
 Need a tree structure to locate desired record in the structure!

 Changing size of bucket address table is an expensive operation


 Linear hashing is an alternative mechanism which avoids these
disadvantages at the possible cost of more bucket overflows.
That is the directory is not needed.

Database System Concepts 12.10 ©Silberschatz, Korth and Sudarshan


Indexing

Database System Concepts 12.11 ©Silberschatz, Korth and Sudarshan


Indexing : Basic Concepts
 Indexing mechanisms used to speed up access to desired
data.
 E.g., The catalog of library.
 Search Key - attribute to set of attributes used to look up
records in a file.
 An index file consists of records (called index entries) of the
form
search-key pointer

 Index files are typically much smaller than the original file
 Two basic kinds of indices:
 Ordered indices: search keys are stored in sorted order
 Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”.

Database System Concepts 12.12 ©Silberschatz, Korth and Sudarshan


Index Evaluation Factors
 Access types supported efficiently. E.g.,
 records with a specified value in the attribute
 or records with an attribute value falling in a specified range of
values.
 Access time
 Insertion time
 Deletion time
 Space overhead- additional space occupied by an index
structure.

Database System Concepts 12.13 ©Silberschatz, Korth and Sudarshan


Ordered Indices
Indexing techniques evaluated on basis of:
 In an ordered index, index entries are stored sorted on the
search key value. E.g., author catalog in library.
 Primary index: in a sequentially ordered file, the index whose
search key specifies the sequential order of the file.
 Also called clustering index
 The search key of a primary index is usually but not necessarily the
primary key.
 Secondary index: an index whose search key specifies an
order different from the sequential order of the file. Also called
non-clustering index.
 Index-sequential file: ordered sequential file with a primary
index.

Database System Concepts 12.14 ©Silberschatz, Korth and Sudarshan


Dense Index Files
 Dense index — Index record appears for every search-key value
in the file.

Database System Concepts 12.15 ©Silberschatz, Korth and Sudarshan


Sparse Index Files
 Sparse Index: contains index records for only some search-key
values.
 Applicable when records are sequentially ordered on search-key
 To locate a record with search-key value K we:
 Find index record with largest search-key value < K
 Search file sequentially starting at the record to which the index
record points
 Less space and less maintenance overhead for insertions and
deletions.
 Generally slower than dense index for locating records.
 Good tradeoff: sparse index with an index entry for every block
in file, corresponding to least search-key value in the block.

Database System Concepts 12.16 ©Silberschatz, Korth and Sudarshan


Example of Sparse Index Files

Database System Concepts 12.17 ©Silberschatz, Korth and Sudarshan


Multilevel Index
 If primary index does not fit in memory, access becomes
expensive.
 To reduce number of disk accesses to index records, treat
primary index kept on disk as a sequential file and construct a
sparse index on it.
 outer index – a sparse index of primary index
 inner index – the primary index file
 If even outer index is too large to fit in main memory, yet another
level of index can be created, and so on.
 Indices at all levels must be updated on insertion or deletion
from the file.

Database System Concepts 12.18 ©Silberschatz, Korth and Sudarshan


Multilevel Index (Cont.)

Database System Concepts 12.19 ©Silberschatz, Korth and Sudarshan


Index Update: Insertion
 Single-level index insertion:
 Perform a lookup using the search-key value appearing in the record
to be inserted.
 Dense indices – if the search-key value does not appear in the
index, insert it.
 Sparse indices – if index stores an entry for each block of the file, no
change needs to be made to the index unless a new block is
created. In this case, the first search-key value appearing in the
new block is inserted into the index.
 Multilevel insertion (as well as deletion) algorithms are simple
extensions of the single-level algorithms

Database System Concepts 12.20 ©Silberschatz, Korth and Sudarshan


Index Update: Deletion
 If deleted record was the only record in the file with its particular
search-key value, the search-key is deleted from the index also.
 Single-level index deletion:
 Dense indices – deletion of search-key is similar to file record
deletion.
 Sparse indices – if an entry for the search key exists in the index, it
is deleted by replacing the entry in the index with the next search-
key value in the file (in search-key order). If the next search-key
value already has an index entry, the entry is deleted instead of
being replaced.

Database System Concepts 12.21 ©Silberschatz, Korth and Sudarshan


Secondary Indices

 Frequently, one wants to find all the records whose


values in a certain field (which is not the search-key of
the primary index satisfy some condition.
 Example 1: In the account database stored sequentially
by account number, we may want to find all accounts in a
particular branch
 Example 2: as above, but where we want to find all
accounts with a specified balance or range of balances

 We can have a secondary index with an index record


for each search-key value; index record points to a
bucket that contains pointers to all the actual records
with that particular search-key value.

Database System Concepts 12.22 ©Silberschatz, Korth and Sudarshan


Secondary Index on balance field of
account

Database System Concepts 12.23 ©Silberschatz, Korth and Sudarshan


That’s all about Indices……

THANK YOU.

Database System Concepts 12.24 ©Silberschatz, Korth and Sudarshan

You might also like