0% found this document useful (0 votes)

3 views33 pages

SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing

Chapter 10 discusses storage and file structure in databases, detailing file organization, fixed-length and variable-length records, and various file organization methods such as heap, sequential, and multitable clustering. It also covers indexing and hashing, explaining ordered and hash indices, their evaluation metrics, and the advantages of B+-tree index files over traditional indexed-sequential files. The chapter emphasizes the importance of efficient data access and management in database systems.

Uploaded by

kcbhoraniya008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views33 pages

SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing

Uploaded by

kcbhoraniya008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Chapter 10

Storage and File

Structure
BOOK REFERRED: KORTH 6 T H EDITION
File Organization
The database is stored as a collection of files. Each file is a collection of blocks. A block is a
collection of records. A record is a sequence of fields.
Database files are stored in secondary memory (disk).
How to access the data from the disk?
◦ Data access from disk to main memory is in the form of blocks.
Fixed-Length Records
Simple approach:
◦ Store record i starting from byte n  (i – 1), where n is the size of each record.
◦ Record access is simple but records may cross blocks or not.

Deletion of record i:
alternatives:
◦ move records i + 1, . . ., n
to i, . . . , n – 1
◦ move record n to i
◦ do not move records, but
link all free records on a
free list
Fixed Length Record

• The first approach is to shift the blocks up, if a block is deleted then shift the filled
blocks up so that the empty block is occupied.

• This will create a space at the bottom which is being released by the deleted record.
This takes a lot of effort and this is costly
A 1 12a A 1 12a A 1 12a

B 2 12b B 2 12b B 2 12b

C 3 13a D 3 14d

D 3 14d D 3 14d E 4 14e

E 4 14e E 4 14e
Example: Record 3 deleted

B
A
Delete the record 3 and then move up the last record 11 onto the location of record 3, in
order to utilize the free space
In Figure B, the ordering of the search key attribute is not preserved.
Fixed Length Record
• Store the address of the first deleted record in the file
header.
• Use this first record to store the address of the second
deleted record, and so on
• Can think of these stored addresses as pointers since
they “point” to the location of a record.
• More space-efficient representation: reuse space for
normal attributes of free records to store pointers.
(No pointers stored in in-use records.)
• The deleted records thus form a linked list, which is
referred to as free list.
Variable-Length Records
Variable-length records arise in database systems in several ways:
◦ Storage of multiple record types in a file.
◦ Record types that allow variable lengths for one or more fields such as strings (varchar)
◦ Record types that allow repeating fields (used in some older data models).

Attributes are stored in order

Variable length attributes represented by fixed size (offset, length), with actual data stored after
all fixed length attributes
Null values represented by null-value bitmap
Variable-Length Records: Slotted Page Structure

Slotted page header contains:

◦ number of record entries
◦ end of free space in the block
◦ location and size of each record

Records can be moved around within a page to keep them contiguous with no empty space between
them; entry in the header must be updated.
Pointers should not point directly to record — instead they should point to the entry for the record in
header.
Organization of Records in Files
Heap – a record can be placed anywhere in the file where there is space
Sequential – store records in sequential order, based on the value of the search key of each
record
Hashing – a hash function computed on some attribute of each record; the result specifies in
which block of the file the record should be placed
Multitable clustering file organization - records of several different relations can be stored in
the same file
◦ Motivation: store related records on the same block to minimize I/O
Heap File Organization

• Records can be placed anywhere in the file where there is free space
• Records usually do not move once allocated
If we want to access any data then we have to search it from the starting till we can
find the data item
Sequential File Organization

• In sequential file organization records are stored in a ascending or descending order of the
search key/ key field.
• Search Key: Any attribute or set of attributes having distinct values, that sort the data,
need not to be primary key or a super key.
• For fast retrieval of data we chain together the records with pointers, pointer of each
record points the next record in search key order.
• For minimizing the no. of block access we store the records in search key order.
• In sequential file system, it is difficult to insert and delete the records
• Therefore we have introduced the concept pointer chain
Sequential File Organization

• Deletion – use pointer chains

• Insert: Now it will search the free list to check if there is any space available nearby the location
where it has to be inserted, if the space is available then it would be stored there else
• If there is no free block then the field is stored in the buffer and the pointer that points to next list is
being updated
A 1 12a

A 1 12a B 2 12b

B 2 12b D 3 14d

D 3 14d E 4 14e

E 4 14e

C 8 12e

Search key is first attribute

Assume we want to add a field C, 8, 12e, it should be added in between B and D
Sequential File Organization
(Cont.)
Deletion – use pointer chains
Insertion –locate the position where the record is
to be inserted
◦ if there is free space insert there
◦ if no free space, insert the record in an overflow block
◦ In either case, pointer chain must be updated

Need to reorganize the file from time to time to

restore sequential order
Multitable Clustering File
Organization
Store department
several
relations in
one file
using a
multitable instructor
clustering
file
organization

multitable clustering
of department and
instructor
Multitable Clustering File Organization (cont.)
good for queries involving department instructor, and for queries
involving one single department and its instructors
bad for queries involving only department
results in variable size records
Can add pointer chains to link records of a particular relation
Chapter 11
Indexing and
Hashing
BOOK REFERRED: KORTH 6 T H EDITION
Basic Concepts
Indexing mechanisms used to speed up access to desired data.
◦ E.g., author catalog in library

Search Key - attribute to set of attributes used to look up records in a file.

An index file consists of records (called index entries) of the form
search-key pointer

Index files are typically much smaller than the original file
Two basic kinds of indices:
◦ Ordered indices: search keys are stored in sorted order
◦ Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”.
Index Evaluation Metrics
Access types supported efficiently. E.g.,
◦ records with a specified value in the attribute
◦ or records with an attribute value falling in a specified range of values.

Access time
Insertion time
Deletion time
Space overhead
Ordered Indices
In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog
in a library.
Primary index: in a sequentially ordered file, the index whose search key specifies the sequential
order of the file.
◦ Also called clustering index
◦ The search key of a primary index is usually but not necessarily the primary key.

Secondary index: an index whose search key specifies an order different from the sequential
order of the file. Also called the non-clustering index.
Files, with a clustering index on the search key, are called index-sequential files.
Dense Index Files
Dense index — Index record appears for every search-key value in the file.
E.g. index on ID attribute of instructor relation
Dense Index Files (Cont.)
Dense index on dept_name, with instructor file sorted on dept_name
Sparse Index Files
Sparse Index: contains index
records for only some search-key
values.
◦ Applicable when records are
sequentially ordered on search-key

To locate a record with search-key

value K we:
◦ Find index record with largest
search-key value < K
◦ Search file sequentially starting at
the record to which the index record
points
Sparse Index Files (Cont.)
Compared to dense indices:
◦ Less space and less maintenance overhead for insertions and deletions.
◦ Generally slower than dense index for locating records.

Good tradeoff: sparse index with an index entry for every block in file, corresponding to least
search-key value in the block.
Summary of Index Types
• Category 1: Based in index records
• Dense Index : Index record for every search key
• Sparse Index : Index record for some search keys only

• Category 2: Based on main file and search key

• Primary : Main File is sorted and search key is key of DB
• Clustered : Main File is sorted and search key is not key of DB
• Secondary : Main File is not sorted and search key may or may not be key of DB
We can summarize the types of indexing as

Ordered Primary Index Cluster index

Unordered Secondary Index Secondary Index

Key Non Key
Secondary Indices Example

Secondary index on salary field of instructor

Index record points to a bucket that contains pointers to all the actual
records with that particular search-key value.
Secondary indices have to be dense
Primary and Secondary Indices
Indices offer substantial benefits when searching for records.
BUT: Updating indices imposes overhead on database modification --when a file is modified,
every index on the file must be updated,
Sequential scan using primary index is efficient, but a sequential scan using a secondary index is
expensive
◦ Each record access may fetch a new block from disk
◦ Block fetch requires about 5 to 10 milliseconds, versus about 100 nanoseconds for memory
access
B+-Tree Node Structure
Typical node

◦ Ki are the search-key values

◦ Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets of records
(for leaf nodes).
The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn–1
(Initially assume no duplicate keys, address duplicates later)
Example of B+-Tree
Search in B Tree - Lookup
+
Search in B+ Tree : Range Query
B+-Tree Index Files
B+-tree indices are an alternative to indexed-sequential files.
Disadvantage of indexed-sequential files
◦ performance degrades as file grows, since many overflow blocks get created.
◦ Periodic reorganization of entire file is required.

Advantage of B+-tree index files:

◦ automatically reorganizes itself with small, local, changes, in the face of insertions and deletions.
◦ Reorganization of entire file is not required to maintain performance.

(Minor) disadvantage of B+-trees:

◦ extra insertion and deletion overhead, space overhead.

Advantages of B+-trees outweigh disadvantages

◦ B+-trees are used extensively
Thank You!!

5 - UI Automation (Part 1 of 5) Client Interfaces
No ratings yet
5 - UI Automation (Part 1 of 5) Client Interfaces
254 pages
ERAFLASH 8 Manual Short 81001
No ratings yet
ERAFLASH 8 Manual Short 81001
1 page
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
From Everand
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
1/5 (1)
DBMS - R2017 - Anna University
No ratings yet
DBMS - R2017 - Anna University
20 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
Unit 5
No ratings yet
Unit 5
185 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
Chapter 5. Record Storage and Primary File Organization
No ratings yet
Chapter 5. Record Storage and Primary File Organization
18 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
Unit 6 notes DBMS final
No ratings yet
Unit 6 notes DBMS final
14 pages
Indexing
No ratings yet
Indexing
62 pages
File Organization Notes
No ratings yet
File Organization Notes
21 pages
08 File Handling
No ratings yet
08 File Handling
18 pages
File Organization
No ratings yet
File Organization
11 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
81 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
33 pages
Data Indexing Presentation
No ratings yet
Data Indexing Presentation
38 pages
DBMS File Organization
No ratings yet
DBMS File Organization
69 pages
chapter 5
No ratings yet
chapter 5
20 pages
Lecture 17
No ratings yet
Lecture 17
24 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
Types of Indexes
No ratings yet
Types of Indexes
9 pages
File Organization
No ratings yet
File Organization
41 pages
File Organization Methods
No ratings yet
File Organization Methods
22 pages
File Organizations and Indexes
No ratings yet
File Organizations and Indexes
51 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
13 pages
ss2 DPR Second Term
No ratings yet
ss2 DPR Second Term
5 pages
DS_TM_Study_Material_Presentations_Unit-4_1TM
No ratings yet
DS_TM_Study_Material_Presentations_Unit-4_1TM
22 pages
File Structure Data Storage Query Evaluation Indexing and Hashing
No ratings yet
File Structure Data Storage Query Evaluation Indexing and Hashing
14 pages
DSA Unit6 Theory
No ratings yet
DSA Unit6 Theory
23 pages
d-s-s-1
No ratings yet
d-s-s-1
6 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
09_FIle.pptx
No ratings yet
09_FIle.pptx
22 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
10 File Organization in DBMS
No ratings yet
10 File Organization in DBMS
15 pages
Basic File Operation
100% (2)
Basic File Operation
4 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
DBMS UNIT-5
No ratings yet
DBMS UNIT-5
23 pages
Data Structure Unit 5
50% (4)
Data Structure Unit 5
14 pages
L2.2-File Organization Techniques
No ratings yet
L2.2-File Organization Techniques
42 pages
APznzaau2 Qp6kQQWFsnXwvNI9mRcLmOzCEkKD6XNs8L1jR BhV1PFdRVVjZd8UbQlgVD2em6PSiesHntJxnE6ihEzMeDrE4RhBGR5X9KstSsrDfvlwogTn9 FGEx0uSBSqEuhwJ 7XtrewN6wGhq1Q0hThOfEbaC 2lntBPcupU2TlkQP FEFF0tzLTzzZTo6he
No ratings yet
APznzaau2 Qp6kQQWFsnXwvNI9mRcLmOzCEkKD6XNs8L1jR BhV1PFdRVVjZd8UbQlgVD2em6PSiesHntJxnE6ihEzMeDrE4RhBGR5X9KstSsrDfvlwogTn9 FGEx0uSBSqEuhwJ 7XtrewN6wGhq1Q0hThOfEbaC 2lntBPcupU2TlkQP FEFF0tzLTzzZTo6he
22 pages
DBMS-Unit5-PPT (1)
No ratings yet
DBMS-Unit5-PPT (1)
40 pages
Dbms Unit III Notes
No ratings yet
Dbms Unit III Notes
27 pages
Single Level Indexing
No ratings yet
Single Level Indexing
9 pages
Unit 4 Chapter 1 Storage and Querying
No ratings yet
Unit 4 Chapter 1 Storage and Querying
37 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
7 pages
CS2202_IndexingHashing
No ratings yet
CS2202_IndexingHashing
83 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
Chapter 8 Indexing NEW
No ratings yet
Chapter 8 Indexing NEW
43 pages
Lesson 8 Cs450 - Indexing
No ratings yet
Lesson 8 Cs450 - Indexing
31 pages
Unit-1-Lecture-9
No ratings yet
Unit-1-Lecture-9
22 pages
dbms 5
No ratings yet
dbms 5
38 pages
Unit v Dbms Question and Answer
No ratings yet
Unit v Dbms Question and Answer
9 pages
Indexing_Hashing_Files
No ratings yet
Indexing_Hashing_Files
68 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
File Organization and Indexing: Structure of Disks
No ratings yet
File Organization and Indexing: Structure of Disks
28 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
DBMS_UNIT_5_NOTES
No ratings yet
DBMS_UNIT_5_NOTES
28 pages
File Organization
No ratings yet
File Organization
47 pages
ADVANCED DATA STRUCTURES FOR ALGORITHMS: Mastering Complex Data Structures for Algorithmic Problem-Solving (2024)
From Everand
ADVANCED DATA STRUCTURES FOR ALGORITHMS: Mastering Complex Data Structures for Algorithmic Problem-Solving (2024)
VIOLET CASTRO
No ratings yet
Oracle Database 12c Quickstart
From Everand
Oracle Database 12c Quickstart
Michael Elliott
5/5 (5)
Redshot
No ratings yet
Redshot
2 pages
Sharing Data IN Static Scope AND Dynamic Scope
No ratings yet
Sharing Data IN Static Scope AND Dynamic Scope
10 pages
Cdns Qspi Flash CTRL and Phy Design Specification
No ratings yet
Cdns Qspi Flash CTRL and Phy Design Specification
149 pages
Opportunistic Resource Allocation in NB-IoT
No ratings yet
Opportunistic Resource Allocation in NB-IoT
6 pages
Manual
No ratings yet
Manual
458 pages
Array Java Exercise
No ratings yet
Array Java Exercise
5 pages
Cyber Security Syllabus PDF
No ratings yet
Cyber Security Syllabus PDF
3 pages
Information Technology-Part B_Unit 1
No ratings yet
Information Technology-Part B_Unit 1
13 pages
Unit 5 Transaction and Concurrency Control
No ratings yet
Unit 5 Transaction and Concurrency Control
92 pages
Pipelining vs. Parallel Processing
No ratings yet
Pipelining vs. Parallel Processing
23 pages
SL - 045 - MMKSI - SIT - 9 - Vehicle Service History
No ratings yet
SL - 045 - MMKSI - SIT - 9 - Vehicle Service History
6 pages
Media Dozer II LIC Datasheet
No ratings yet
Media Dozer II LIC Datasheet
2 pages
1 Lec
No ratings yet
1 Lec
11 pages
001_Flash Starting Guide (READ ME FIRST)
No ratings yet
001_Flash Starting Guide (READ ME FIRST)
33 pages
Practice 18 - Handling Enqueue Waits
No ratings yet
Practice 18 - Handling Enqueue Waits
7 pages
Singleton Pattern
No ratings yet
Singleton Pattern
7 pages
CISSP Domain2 - 2024
No ratings yet
CISSP Domain2 - 2024
49 pages
10.1007@978 3 030 37436 5
No ratings yet
10.1007@978 3 030 37436 5
423 pages
Chapter 5 - Questions and Answers
No ratings yet
Chapter 5 - Questions and Answers
5 pages
Rakion Documents
No ratings yet
Rakion Documents
5 pages
Analog and Networking Systems: Network Graphic Annunciator
No ratings yet
Analog and Networking Systems: Network Graphic Annunciator
3 pages
Installation and Startup Guide Installation and Upgrade Quick Steps
No ratings yet
Installation and Startup Guide Installation and Upgrade Quick Steps
1 page
Introduction To Computer Graphics
No ratings yet
Introduction To Computer Graphics
7 pages
Malware analysis https___app-cdn.minepi.com Malicious activity _ ANY.RUN - Malware Sandbox Online
No ratings yet
Malware analysis https___app-cdn.minepi.com Malicious activity _ ANY.RUN - Malware Sandbox Online
19 pages
OS - Numerical Questions
No ratings yet
OS - Numerical Questions
38 pages
Demonstration of Lab Experiment Modules 3 and 4, Marks Allocation Is Based On The Following Assessment Criteria
No ratings yet
Demonstration of Lab Experiment Modules 3 and 4, Marks Allocation Is Based On The Following Assessment Criteria
7 pages
500-701-Demo
No ratings yet
500-701-Demo
5 pages
Design of Biomedical Devices-Class 10
No ratings yet
Design of Biomedical Devices-Class 10
41 pages

SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing

Uploaded by

SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing

Uploaded by

Chapter 10

Storage and File

B 2 12b B 2 12b B 2 12b

D 3 14d D 3 14d E 4 14e

Attributes are stored in order

Slotted page header contains:

• Deletion – use pointer chains

Search key is first attribute

Need to reorganize the file from time to time to

Search Key - attribute to set of attributes used to look up records in a file.

To locate a record with search-key

• Category 2: Based on main file and search key

Ordered Primary Index Cluster index

Unordered Secondary Index Secondary Index

Secondary index on salary field of instructor

◦ Ki are the search-key values

Advantage of B+-tree index files:

(Minor) disadvantage of B+-trees:

Advantages of B+-trees outweigh disadvantages

You might also like