0% found this document useful (0 votes)
3 views33 pages

SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing

Chapter 10 discusses storage and file structure in databases, detailing file organization, fixed-length and variable-length records, and various file organization methods such as heap, sequential, and multitable clustering. It also covers indexing and hashing, explaining ordered and hash indices, their evaluation metrics, and the advantages of B+-tree index files over traditional indexed-sequential files. The chapter emphasizes the importance of efficient data access and management in database systems.

Uploaded by

kcbhoraniya008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views33 pages

SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing

Chapter 10 discusses storage and file structure in databases, detailing file organization, fixed-length and variable-length records, and various file organization methods such as heap, sequential, and multitable clustering. It also covers indexing and hashing, explaining ordered and hash indices, their evaluation metrics, and the advantages of B+-tree index files over traditional indexed-sequential files. The chapter emphasizes the importance of efficient data access and management in database systems.

Uploaded by

kcbhoraniya008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Chapter 10

Storage and File


Structure
BOOK REFERRED: KORTH 6 T H EDITION
File Organization
The database is stored as a collection of files. Each file is a collection of blocks. A block is a
collection of records. A record is a sequence of fields.
Database files are stored in secondary memory (disk).
How to access the data from the disk?
◦ Data access from disk to main memory is in the form of blocks.
Fixed-Length Records
Simple approach:
◦ Store record i starting from byte n  (i – 1), where n is the size of each record.
◦ Record access is simple but records may cross blocks or not.

Deletion of record i:
alternatives:
◦ move records i + 1, . . ., n
to i, . . . , n – 1
◦ move record n to i
◦ do not move records, but
link all free records on a
free list
Fixed Length Record

• The first approach is to shift the blocks up, if a block is deleted then shift the filled
blocks up so that the empty block is occupied.

• This will create a space at the bottom which is being released by the deleted record.
This takes a lot of effort and this is costly
A 1 12a A 1 12a A 1 12a

B 2 12b B 2 12b B 2 12b

C 3 13a D 3 14d

D 3 14d D 3 14d E 4 14e

E 4 14e E 4 14e
Example: Record 3 deleted

B
A
Delete the record 3 and then move up the last record 11 onto the location of record 3, in
order to utilize the free space
In Figure B, the ordering of the search key attribute is not preserved.
Fixed Length Record
• Store the address of the first deleted record in the file
header.
• Use this first record to store the address of the second
deleted record, and so on
• Can think of these stored addresses as pointers since
they “point” to the location of a record.
• More space-efficient representation: reuse space for
normal attributes of free records to store pointers.
(No pointers stored in in-use records.)
• The deleted records thus form a linked list, which is
referred to as free list.
Variable-Length Records
Variable-length records arise in database systems in several ways:
◦ Storage of multiple record types in a file.
◦ Record types that allow variable lengths for one or more fields such as strings (varchar)
◦ Record types that allow repeating fields (used in some older data models).

Attributes are stored in order


Variable length attributes represented by fixed size (offset, length), with actual data stored after
all fixed length attributes
Null values represented by null-value bitmap
Variable-Length Records: Slotted Page Structure

Slotted page header contains:


◦ number of record entries
◦ end of free space in the block
◦ location and size of each record

Records can be moved around within a page to keep them contiguous with no empty space between
them; entry in the header must be updated.
Pointers should not point directly to record — instead they should point to the entry for the record in
header.
Organization of Records in Files
Heap – a record can be placed anywhere in the file where there is space
Sequential – store records in sequential order, based on the value of the search key of each
record
Hashing – a hash function computed on some attribute of each record; the result specifies in
which block of the file the record should be placed
Multitable clustering file organization - records of several different relations can be stored in
the same file
◦ Motivation: store related records on the same block to minimize I/O
Heap File Organization

• Records can be placed anywhere in the file where there is free space
• Records usually do not move once allocated
If we want to access any data then we have to search it from the starting till we can
find the data item
Sequential File Organization

• In sequential file organization records are stored in a ascending or descending order of the
search key/ key field.
• Search Key: Any attribute or set of attributes having distinct values, that sort the data,
need not to be primary key or a super key.
• For fast retrieval of data we chain together the records with pointers, pointer of each
record points the next record in search key order.
• For minimizing the no. of block access we store the records in search key order.
• In sequential file system, it is difficult to insert and delete the records
• Therefore we have introduced the concept pointer chain
Sequential File Organization

• Deletion – use pointer chains


• Insert: Now it will search the free list to check if there is any space available nearby the location
where it has to be inserted, if the space is available then it would be stored there else
• If there is no free block then the field is stored in the buffer and the pointer that points to next list is
being updated
A 1 12a

A 1 12a B 2 12b

B 2 12b D 3 14d

D 3 14d E 4 14e

E 4 14e

C 8 12e

Search key is first attribute


Assume we want to add a field C, 8, 12e, it should be added in between B and D
Sequential File Organization
(Cont.)
Deletion – use pointer chains
Insertion –locate the position where the record is
to be inserted
◦ if there is free space insert there
◦ if no free space, insert the record in an overflow block
◦ In either case, pointer chain must be updated

Need to reorganize the file from time to time to


restore sequential order
Multitable Clustering File
Organization
Store department
several
relations in
one file
using a
multitable instructor
clustering
file
organization

multitable clustering
of department and
instructor
Multitable Clustering File Organization (cont.)
good for queries involving department instructor, and for queries
involving one single department and its instructors
bad for queries involving only department
results in variable size records
Can add pointer chains to link records of a particular relation
Chapter 11
Indexing and
Hashing
BOOK REFERRED: KORTH 6 T H EDITION
Basic Concepts
Indexing mechanisms used to speed up access to desired data.
◦ E.g., author catalog in library

Search Key - attribute to set of attributes used to look up records in a file.


An index file consists of records (called index entries) of the form
search-key pointer

Index files are typically much smaller than the original file
Two basic kinds of indices:
◦ Ordered indices: search keys are stored in sorted order
◦ Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”.
Index Evaluation Metrics
Access types supported efficiently. E.g.,
◦ records with a specified value in the attribute
◦ or records with an attribute value falling in a specified range of values.

Access time
Insertion time
Deletion time
Space overhead
Ordered Indices
In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog
in a library.
Primary index: in a sequentially ordered file, the index whose search key specifies the sequential
order of the file.
◦ Also called clustering index
◦ The search key of a primary index is usually but not necessarily the primary key.

Secondary index: an index whose search key specifies an order different from the sequential
order of the file. Also called the non-clustering index.
Files, with a clustering index on the search key, are called index-sequential files.
Dense Index Files
Dense index — Index record appears for every search-key value in the file.
E.g. index on ID attribute of instructor relation
Dense Index Files (Cont.)
Dense index on dept_name, with instructor file sorted on dept_name
Sparse Index Files
Sparse Index: contains index
records for only some search-key
values.
◦ Applicable when records are
sequentially ordered on search-key

To locate a record with search-key


value K we:
◦ Find index record with largest
search-key value < K
◦ Search file sequentially starting at
the record to which the index record
points
Sparse Index Files (Cont.)
Compared to dense indices:
◦ Less space and less maintenance overhead for insertions and deletions.
◦ Generally slower than dense index for locating records.

Good tradeoff: sparse index with an index entry for every block in file, corresponding to least
search-key value in the block.
Summary of Index Types
• Category 1: Based in index records
• Dense Index : Index record for every search key
• Sparse Index : Index record for some search keys only

• Category 2: Based on main file and search key


• Primary : Main File is sorted and search key is key of DB
• Clustered : Main File is sorted and search key is not key of DB
• Secondary : Main File is not sorted and search key may or may not be key of DB
We can summarize the types of indexing as

Ordered Primary Index Cluster index

Unordered Secondary Index Secondary Index


Key Non Key
Secondary Indices Example

Secondary index on salary field of instructor

Index record points to a bucket that contains pointers to all the actual
records with that particular search-key value.
Secondary indices have to be dense
Primary and Secondary Indices
Indices offer substantial benefits when searching for records.
BUT: Updating indices imposes overhead on database modification --when a file is modified,
every index on the file must be updated,
Sequential scan using primary index is efficient, but a sequential scan using a secondary index is
expensive
◦ Each record access may fetch a new block from disk
◦ Block fetch requires about 5 to 10 milliseconds, versus about 100 nanoseconds for memory
access
B+-Tree Node Structure
Typical node

◦ Ki are the search-key values


◦ Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets of records
(for leaf nodes).
The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn–1
(Initially assume no duplicate keys, address duplicates later)
Example of B+-Tree
Search in B Tree - Lookup
+
Search in B+ Tree : Range Query
B+-Tree Index Files
B+-tree indices are an alternative to indexed-sequential files.
Disadvantage of indexed-sequential files
◦ performance degrades as file grows, since many overflow blocks get created.
◦ Periodic reorganization of entire file is required.

Advantage of B+-tree index files:


◦ automatically reorganizes itself with small, local, changes, in the face of insertions and deletions.
◦ Reorganization of entire file is not required to maintain performance.

(Minor) disadvantage of B+-trees:


◦ extra insertion and deletion overhead, space overhead.

Advantages of B+-trees outweigh disadvantages


◦ B+-trees are used extensively
Thank You!!

You might also like