Unit 6.2 Indexing and Hashing

Indexing and hashing are techniques used in database systems to efficiently retrieve records. Indexing works similarly to book indexes by storing search keys in sorted order to provide faster access. There are ordered and hash indices. Hashing maps records to buckets using a hash function, avoiding the need to access an index structure. Both indexing and hashing require handling collisions when multiple records hash to the same bucket. Dynamic hashing allows the number of buckets to change over time as the database grows or shrinks.

Uploaded by

Samaira Katoch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views

Unit 6.2 Indexing and Hashing

Uploaded by

Samaira Katoch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

Indexing and Hashing

-Ashu Mehta
Database systems
Indexing
• Indexing is a data structure technique to efficiently
retrieve records from database files based on some
attributes on which the indexing has been done.
• Indexing in database systems is similar to the one
we see in books.
• Two basic kinds of indices:
– Ordered indices: search keys are stored in sorted order
– Hash indices: used to access data that is distributed
uniformly across a range of buckets using a “hash
function”.
Purpose of Indexing
• It is a data structure that is added to a file to
provide faster access to the data.
• It reduces the number of blocks that the
DBMS has to check.
Index Evaluation Metrics
• Access types: Access types supported efficiently. E.g.,
– records with a specified value in the attribute
– or records with an attribute value falling in a specified range of values.
• Access time: Time it takes to find a particular data item, or set
of items, using the technique
• Insertion time: Time it takes to insert a new data item. This
value includes the time it takes to find the correct place to
insert the new data item, as well as the time it takes to update
the index structure
• Deletion time: Time it takes to delete a data item. This value
includes the time it takes to find the item to be deleted, as well
as the time it takes to update the index structure
• Space overhead: The additional space occupied by an index
structure.
Basic Concepts
• Search Key: An attribute or set of attributes
used to look up records in a file is called
search key.
• An Index file consists of records( called index
entries) of the form
Ordered Indices
• Each index structure is associated with a
particular search key.
• An ordered index, stores the value of the
search keys in sorted order and associates
with each search key the record that contains
it. E.g., index of a book, library catalog.
• A file may have several indices, on different
search keys.
Ordered Indices
• Primary index: in a sequentially ordered file, the index
whose search key specifies the sequential order of the file.
– Also called clustering index
– The search key of a primary index is usually but not necessarily
the primary key.
• Secondary index: an index whose search key specifies an
order different from the sequential order of the file.
– Also called non-clustering index.
• Index-sequential file: ordered sequential file with a primary
index.
Dense Index Files
Dense Index Files
Sparse Index Files
Multilevel Index
Multilevel Index
Secondary Indices
• Secondary indices must be dense, with an
index entry for every search-key value, and a
pointer to every record in the file.
• A primary index may be sparse, storing only
some of the search-key values, since it is
always possible to find records with
intermediate search-key values by a
sequential access to a part of the file.
Secondary Indices
• If the search key of a secondary index is not a candidate key,
it is not enough to point to just the first record with each
search-key value. The remaining records with the same
search-key value could be anywhere in the file, since the
records are ordered by the search key of the primary index,
rather than by the search key of the secondary index.
• Therefore, a secondary index must contain pointers to all the
records.
• An extra level of indirection is used to implement secondary
indices on search keys that are not candidate keys.
EXAMPLE
Hashing
• One disadvantage of sequential file organization
is that we must access an index structure to
locate data, or must use binary search, and that
results in more I/O operations.
• File organizations based on the technique of
hashing allow us to avoid accessing an index
structure.
• Hashing also provides a way of constructing
indices.
Example
• Hash file organization of account file, using
branch_name as key
• There are 10 buckets,
• The binary representation of the ith character is
assumed to be the integer i.
• The hash function returns the sum of the binary
representations of the characters modulo 10
– E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) =
3
Hashing
• A bucket is a unit of storage containing one or more records
(a bucket is typically a disk block).
• In a hash file organization we obtain the bucket of a record
directly from its search-key value using a hash function.
• Hash function h is a function from the set of all search-key
values K to the set of all bucket addresses B.
• Hash function is used to locate records for access, insertion as
well as deletion.
• Records with different search-key values may be mapped to
the same bucket; thus entire bucket has to be searched
sequentially to locate a record.
Static Hashing
• In static hashing, when a search-key value is
provided the hash function always computes
the same address.
• For example, if mod-4 hash function is used
then it shall generate only 5 values. The
output address shall always be same for that
function. The numbers of buckets provided
remain same at all times.
Hash Function
• Worst hash function maps all search-key values to the same
bucket; this makes access time proportional to the number
of search-key values in the file.
• An ideal hash function is having following properties:
• The distribution is uniform. That is, the hash function assigns
each bucket the same number of search-key values from the
set of all possible search-key values.
• The distribution is random. That is, in the average case, each
bucket will have nearly the same number of values assigned
to it, regardless of the actual distribution of search-key
values.
Handling of Bucket Overflows
• If the bucket does not have enough space, a bucket
overflow is said to occur.
• Bucket overflow can occur for several reasons:
– Insufficient buckets: The number of buckets, denoted by nB ,
must be chosen such that nB > nr /fr, where nr denotes the total
number of records that will be stored and fr denotes the
number of records that will fit in a bucket.
– Skew: Some buckets are assigned more records than are
others, so a bucket may overflow even when other buckets
still have space. This situation is called bucket skew. This can
occur due to two reasons:
• multiple records have same search-key value
• chosen hash function produces non-uniform distribution of key
values
Handling of Bucket Overflows
• Although the probability of bucket overflow can be reduced, it
cannot be eliminated; it is handled by using overflow buckets.
• Overflow chaining – the overflow buckets of a given bucket
are chained together in a linked list.
• Above scheme is called closed hashing.
Handling of Bucket Overflows
• Linear Probing: When hash function generates an address at
which data is already stored, the next free bucket is allocated
to it. This mechanism is called Open Hashing.
• Open hashing does not use overflow buckets, is not suitable
for database applications.
Hash Indices
• Hashing can be used not only for file organization, but also
for index-structure creation.
• A hash index organizes the search keys, with their
associated record pointers, into a hash file structure.
• The hash function is constructed as follows:
– Apply hash function on a search key to identify a bucket, and
store the key and its associated pointers in the bucket
• Strictly speaking, hash indices are always secondary indices
– if the file itself is organized using hashing, a separate primary
hash index on it using the same search-key is unnecessary.
– However, we use the term hash index to refer to both secondary
index structures and hash organized files.
Example of Hash Index
Deficiencies of Static Hashing
• In static hashing, function h maps search-key values to a fixed
set of B of bucket addresses. Databases grow or shrink with
time.
– If initial number of buckets is too small, and file grows, and the hash
function is choose based on the current file size, performance will
degrade due to too much overflows.
– If space is allocated for anticipated growth, a significant amount of
space will be wasted initially (and buckets will be underfull).
– If database shrinks, again space will be wasted.
• One solution: periodic re-organization of the file with a new
hash function
– Expensive, disrupts normal operations
• Better solution: allow the number of buckets to be modified
dynamically.
Dynamic Hashing
• Dynamic hashing provides a mechanism in
which data buckets are added and removed
dynamically and on-demand.
• Dynamic hashing is also known as extended
hashing.
• Hash function, in dynamic hashing, is made to
produce large number of values and only a
few are used initially.
Dynamic Hashing
Hashing Practice Problems
Problem 1

• Consider a hash table of size seven, with starting index

zero, and a hash function (3x + 4)mod7. Assuming the
hash table is initially empty, which of the following is
the contents of the table when the sequence 1, 3, 8,
10 is inserted into the table using Open hashing? Note
that ‘_’ denotes an empty location in the table.
(A) 8, _, _, _, _, _, 10
(B) 1, 8, 10, _, _, _, 3
(C) 1, _, _, _, _, _,3
(D) 1, 10, 8, _, _, _, 3
• 1=> (3x+4)mod7=7mod7=0
• 3 => (3x+4)mod7=13mod7=6
• 8 => (3x+4)mod7=28mod7=0
Because address ‘0’ is not empty, store 8 at next
empty data bucket ‘1’
• 10 => (3x+4)mod7=34mod7=6
Because address ‘6’ is not empty, store 10 at next empty data
bucket ‘2’
Correct option is B
Problem 2
• The keys 12, 18, 13, 2, 3, 23, 5 and 15 are
inserted into an initially empty hash table of
length 10 using open addressing with hash
function h(k) = k mod 10 and linear probing.
What is the resultant hash table?
• H(k)=kmod10
• 12=> 12mod10=2
• 18=>18mod10=8
• 13=>13mod10=3
• 2=>2mod10=2, not empty, next available=4
• 3=>3mod10=3, not empty, next available=5
• 23=>23mod10=3, not empty, next available=6
• 5=>5mod10=5, not empty, next available=7
• 15=>15mod10=5, not empty, next available=9
Correct option is C
Problem 3
• For question number 2, what would the correct option if method used
is closed hashing?
• H(k)=kmod10
• 12=> 12mod10=2
• 18=>18mod10=8
• 13=>13mod10=3
• 2=>2mod10=2
• 3=>3mod10=3
• 23=>23mod10=3
• 5=>5mod10=5
• 15=>15mod10=5
Correct option is D
Problem 4
• A hash table of length 10 uses open
addressing with hash function h(k)=k mod 10,
and linear probing. After inserting 6 values
into an empty hash table, the table is as
shown below.
Which one of the following choices gives a
possible order in which the key values could
have been inserted in the table?
(A) 46, 42, 34, 52, 23, 33
(B) 34, 42, 23, 52, 33, 46
(C) 46, 34, 42, 23, 52, 33
(D) 42, 46, 33, 23, 34, 52
• Solution: We will check whether sequence given in option A can
lead to hash table given in question. Option A inserts 46, 42, 34, 52,
23, 33 as:
• For key 46, h(46) is 46%10 = 6. Therefore, 46 is placed at 6th index
For key 42, h(42) is 42%10 = 2. Therefore, 42 is placed at 2nd index
For key 34, h(34) is 34%10 = 4. Therefore, 34 is placed at 4th index
• For key 52, h(52) is 52%10 = 2. However, index 2 is occupied with
42. Therefore, 52 is placed at 3rd index in the hash table. But in
given hash table, 52 is placed at 5th index. Therefore, sequence in
option A can’t generate hash table given in question.
• In the similar way, we can check for other options as well which
leads to answer as (C).

Nepali Barna PDF
No ratings yet
Nepali Barna PDF
16 pages
Homework III Solution: 1 Section 2.3
No ratings yet
Homework III Solution: 1 Section 2.3
4 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Ghazals
100% (1)
Ghazals
3 pages
Sahakari Sampati Suddhikaran Nirdesika 2074 PDF
100% (3)
Sahakari Sampati Suddhikaran Nirdesika 2074 PDF
60 pages
Class 11 and 12 Opt Nepali ..Com Nepali 11
No ratings yet
Class 11 and 12 Opt Nepali ..Com Nepali 11
38 pages
Mahila Laghubitta Bittiya Sanstha LTD
No ratings yet
Mahila Laghubitta Bittiya Sanstha LTD
41 pages
Nepal Bhasha Question Model
100% (1)
Nepal Bhasha Question Model
14 pages
Dhirendra Premarshi Gajal - Aashu Lukai Preliima
100% (1)
Dhirendra Premarshi Gajal - Aashu Lukai Preliima
3 pages
Class 4 Annual Terminal Exam 2080
100% (1)
Class 4 Annual Terminal Exam 2080
3 pages
B.SC - .CSIT-8th-sem-syllabus (New)
100% (1)
B.SC - .CSIT-8th-sem-syllabus (New)
5 pages
Maths Solution II Sem
No ratings yet
Maths Solution II Sem
210 pages
Ch4 OpenStack New (Autosaved)
No ratings yet
Ch4 OpenStack New (Autosaved)
242 pages
BA First Year Syllabus Com English I
100% (3)
BA First Year Syllabus Com English I
3 pages
MCSE-011 IGNOU Solved Assignment of 2013-14
100% (1)
MCSE-011 IGNOU Solved Assignment of 2013-14
14 pages
Format Specifiers in C
No ratings yet
Format Specifiers in C
3 pages
More Details On Data Models
No ratings yet
More Details On Data Models
23 pages
NEA Bidhan 2075.01.20 Final Draft
100% (1)
NEA Bidhan 2075.01.20 Final Draft
18 pages
Enter bibliographic details of five books in Koha.
No ratings yet
Enter bibliographic details of five books in Koha.
8 pages
Compulsory Maths Question Paper Class 9
100% (2)
Compulsory Maths Question Paper Class 9
2 pages
Baba Jai Gurudev-Bhavishyavani
100% (1)
Baba Jai Gurudev-Bhavishyavani
25 pages
Std07 Maths EM 2hh
No ratings yet
Std07 Maths EM 2hh
130 pages
Management Information Systems: Edusat Learning Resource Material
No ratings yet
Management Information Systems: Edusat Learning Resource Material
167 pages
Gambhira
No ratings yet
Gambhira
2 pages
Fuzzy and Crisp Relations (ASC) Unit-3
No ratings yet
Fuzzy and Crisp Relations (ASC) Unit-3
9 pages
10 JavaScript Questions With Solutions
No ratings yet
10 JavaScript Questions With Solutions
14 pages
DATA Interpretation Question With Solutions
No ratings yet
DATA Interpretation Question With Solutions
14 pages
Resource-Allocation Graph
No ratings yet
Resource-Allocation Graph
15 pages
Directives Unified Directives 2073 Revised New
0% (1)
Directives Unified Directives 2073 Revised New
402 pages
Uttar Pustika Karyabidhi - 2077
No ratings yet
Uttar Pustika Karyabidhi - 2077
13 pages
Partial Fraction
100% (1)
Partial Fraction
8 pages
Heros Convent HR - Sec.School First Term Examination Class - 4 Maths M.M.80
No ratings yet
Heros Convent HR - Sec.School First Term Examination Class - 4 Maths M.M.80
3 pages
Social Studies (Class 4)
No ratings yet
Social Studies (Class 4)
6 pages
Software Reuse
No ratings yet
Software Reuse
39 pages
NFRS
100% (1)
NFRS
7 pages
15 Cs 54
No ratings yet
15 Cs 54
79 pages
13.2 File Organisation & Access (MT-L)
No ratings yet
13.2 File Organisation & Access (MT-L)
6 pages
III. Performing Technical Studies
No ratings yet
III. Performing Technical Studies
18 pages
TSC Questions
100% (2)
TSC Questions
1 page
Saishik Suchana 2068
No ratings yet
Saishik Suchana 2068
241 pages
Nepal Higher Education Policy 2015
No ratings yet
Nepal Higher Education Policy 2015
16 pages
Data Structures Unit 1
No ratings yet
Data Structures Unit 1
96 pages
AKTU Products List
No ratings yet
AKTU Products List
16 pages
Sampati Suddikaran Final 2074 1533111013
No ratings yet
Sampati Suddikaran Final 2074 1533111013
48 pages
Solution and Suspension
No ratings yet
Solution and Suspension
7 pages
Loksewa
No ratings yet
Loksewa
34 pages
How To Use Library
No ratings yet
How To Use Library
4 pages
Chapter 1 Basic Concepts
No ratings yet
Chapter 1 Basic Concepts
49 pages
Simple Interest Old Questions PDF
No ratings yet
Simple Interest Old Questions PDF
4 pages
(Don't Bother To Memorize The Technical Terms Used!) : Three Places Two Numbers
No ratings yet
(Don't Bother To Memorize The Technical Terms Used!) : Three Places Two Numbers
5 pages
Organic Chemistry by Sir Pasha O Level 1
No ratings yet
Organic Chemistry by Sir Pasha O Level 1
34 pages
Problems On Time and Work
100% (2)
Problems On Time and Work
17 pages
Numeral in Hindi
No ratings yet
Numeral in Hindi
9 pages
Loksewa
No ratings yet
Loksewa
10 pages
Chapter 12 Context Free Grammars
100% (1)
Chapter 12 Context Free Grammars
68 pages
STD 2 Maths Notes 2021 2022
No ratings yet
STD 2 Maths Notes 2021 2022
33 pages
Unit-3 Part 2 Indexing and Hashing
No ratings yet
Unit-3 Part 2 Indexing and Hashing
36 pages
Adbs 5
No ratings yet
Adbs 5
37 pages
CO3 Session 6
No ratings yet
CO3 Session 6
29 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
11 pages
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
ECT306 INFORMATION THEORY AND CODING, MAY 2024
No ratings yet
ECT306 INFORMATION THEORY AND CODING, MAY 2024
3 pages
Field Programmable Gate Array Implementation of 14 Bit Sigma-Delta Analog To Digital Converter
No ratings yet
Field Programmable Gate Array Implementation of 14 Bit Sigma-Delta Analog To Digital Converter
4 pages
Instantaneous Pitch Estimation Algorithm Based On Multirate Sampling
No ratings yet
Instantaneous Pitch Estimation Algorithm Based On Multirate Sampling
5 pages
Machine learning week 4
No ratings yet
Machine learning week 4
24 pages
Module1 Lecture1
No ratings yet
Module1 Lecture1
23 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Solving The Teacher Assignment Problem by Two Metaheuristics
No ratings yet
Solving The Teacher Assignment Problem by Two Metaheuristics
15 pages
A Novel Deep Learning Framework: Prediction and Analysis of Financial Time Series Using CEEMD and LSTM
No ratings yet
A Novel Deep Learning Framework: Prediction and Analysis of Financial Time Series Using CEEMD and LSTM
21 pages
6 - Feature Descriptor - HOG
No ratings yet
6 - Feature Descriptor - HOG
81 pages
CS339 Voice Gender Recoginition
No ratings yet
CS339 Voice Gender Recoginition
19 pages
Hashing: Data Structures and Algorithms in Java
No ratings yet
Hashing: Data Structures and Algorithms in Java
78 pages
Lab Assignment #02: Digital Signal Processing
No ratings yet
Lab Assignment #02: Digital Signal Processing
6 pages
Control Por Computador: December 10, 2013
No ratings yet
Control Por Computador: December 10, 2013
41 pages
Việt Cường
No ratings yet
Việt Cường
14 pages
DSP-Lec 01-Introduction-IU-2023
No ratings yet
DSP-Lec 01-Introduction-IU-2023
19 pages
1 Neural Networks
No ratings yet
1 Neural Networks
16 pages
Basic Data Analysis and Pycbc Tutorial With Google
No ratings yet
Basic Data Analysis and Pycbc Tutorial With Google
19 pages
Runge-Kutta 4 Order Method: Example: K DT D
100% (1)
Runge-Kutta 4 Order Method: Example: K DT D
10 pages
Summary - Data Analytics& Machine Learning
No ratings yet
Summary - Data Analytics& Machine Learning
18 pages
Sors 2105 Ga-1
No ratings yet
Sors 2105 Ga-1
2 pages
CPSC 540: Machine Learning: Gibbs Sampling, Variational Inference
No ratings yet
CPSC 540: Machine Learning: Gibbs Sampling, Variational Inference
37 pages
Artificial Intelligence in Medical Imaging-From Theory to Clinical Practice 1st Edition Lia Morra (Author) - Download the ebook now for the best reading experience
100% (1)
Artificial Intelligence in Medical Imaging-From Theory to Clinical Practice 1st Edition Lia Morra (Author) - Download the ebook now for the best reading experience
68 pages
The Viola/Jones Face Detector
No ratings yet
The Viola/Jones Face Detector
21 pages
2024_PCS_24P2CSC04_Question Bank ML
No ratings yet
2024_PCS_24P2CSC04_Question Bank ML
7 pages
Canny Edge Detector Algorithm Matlab Codes
No ratings yet
Canny Edge Detector Algorithm Matlab Codes
2 pages
MICRO TEACHING 2 - English 10slide
No ratings yet
MICRO TEACHING 2 - English 10slide
13 pages
PID Gain KP
No ratings yet
PID Gain KP
4 pages
Access Introduction to Operations Research 9th Edition Hillier Solutions Manual All Chapters Immediate PDF Download
100% (15)
Access Introduction to Operations Research 9th Edition Hillier Solutions Manual All Chapters Immediate PDF Download
61 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 3
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 3
52 pages

Unit 6.2 Indexing and Hashing

Uploaded by

Unit 6.2 Indexing and Hashing

Uploaded by

Indexing and Hashing

• Consider a hash table of size seven, with starting index

You might also like