SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
Deletion of record i:
alternatives:
◦ move records i + 1, . . ., n
to i, . . . , n – 1
◦ move record n to i
◦ do not move records, but
link all free records on a
free list
Fixed Length Record
• The first approach is to shift the blocks up, if a block is deleted then shift the filled
blocks up so that the empty block is occupied.
• This will create a space at the bottom which is being released by the deleted record.
This takes a lot of effort and this is costly
A 1 12a A 1 12a A 1 12a
C 3 13a D 3 14d
E 4 14e E 4 14e
Example: Record 3 deleted
B
A
Delete the record 3 and then move up the last record 11 onto the location of record 3, in
order to utilize the free space
In Figure B, the ordering of the search key attribute is not preserved.
Fixed Length Record
• Store the address of the first deleted record in the file
header.
• Use this first record to store the address of the second
deleted record, and so on
• Can think of these stored addresses as pointers since
they “point” to the location of a record.
• More space-efficient representation: reuse space for
normal attributes of free records to store pointers.
(No pointers stored in in-use records.)
• The deleted records thus form a linked list, which is
referred to as free list.
Variable-Length Records
Variable-length records arise in database systems in several ways:
◦ Storage of multiple record types in a file.
◦ Record types that allow variable lengths for one or more fields such as strings (varchar)
◦ Record types that allow repeating fields (used in some older data models).
Records can be moved around within a page to keep them contiguous with no empty space between
them; entry in the header must be updated.
Pointers should not point directly to record — instead they should point to the entry for the record in
header.
Organization of Records in Files
Heap – a record can be placed anywhere in the file where there is space
Sequential – store records in sequential order, based on the value of the search key of each
record
Hashing – a hash function computed on some attribute of each record; the result specifies in
which block of the file the record should be placed
Multitable clustering file organization - records of several different relations can be stored in
the same file
◦ Motivation: store related records on the same block to minimize I/O
Heap File Organization
• Records can be placed anywhere in the file where there is free space
• Records usually do not move once allocated
If we want to access any data then we have to search it from the starting till we can
find the data item
Sequential File Organization
• In sequential file organization records are stored in a ascending or descending order of the
search key/ key field.
• Search Key: Any attribute or set of attributes having distinct values, that sort the data,
need not to be primary key or a super key.
• For fast retrieval of data we chain together the records with pointers, pointer of each
record points the next record in search key order.
• For minimizing the no. of block access we store the records in search key order.
• In sequential file system, it is difficult to insert and delete the records
• Therefore we have introduced the concept pointer chain
Sequential File Organization
A 1 12a B 2 12b
B 2 12b D 3 14d
D 3 14d E 4 14e
E 4 14e
C 8 12e
multitable clustering
of department and
instructor
Multitable Clustering File Organization (cont.)
good for queries involving department instructor, and for queries
involving one single department and its instructors
bad for queries involving only department
results in variable size records
Can add pointer chains to link records of a particular relation
Chapter 11
Indexing and
Hashing
BOOK REFERRED: KORTH 6 T H EDITION
Basic Concepts
Indexing mechanisms used to speed up access to desired data.
◦ E.g., author catalog in library
Index files are typically much smaller than the original file
Two basic kinds of indices:
◦ Ordered indices: search keys are stored in sorted order
◦ Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”.
Index Evaluation Metrics
Access types supported efficiently. E.g.,
◦ records with a specified value in the attribute
◦ or records with an attribute value falling in a specified range of values.
Access time
Insertion time
Deletion time
Space overhead
Ordered Indices
In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog
in a library.
Primary index: in a sequentially ordered file, the index whose search key specifies the sequential
order of the file.
◦ Also called clustering index
◦ The search key of a primary index is usually but not necessarily the primary key.
Secondary index: an index whose search key specifies an order different from the sequential
order of the file. Also called the non-clustering index.
Files, with a clustering index on the search key, are called index-sequential files.
Dense Index Files
Dense index — Index record appears for every search-key value in the file.
E.g. index on ID attribute of instructor relation
Dense Index Files (Cont.)
Dense index on dept_name, with instructor file sorted on dept_name
Sparse Index Files
Sparse Index: contains index
records for only some search-key
values.
◦ Applicable when records are
sequentially ordered on search-key
Good tradeoff: sparse index with an index entry for every block in file, corresponding to least
search-key value in the block.
Summary of Index Types
• Category 1: Based in index records
• Dense Index : Index record for every search key
• Sparse Index : Index record for some search keys only
Index record points to a bucket that contains pointers to all the actual
records with that particular search-key value.
Secondary indices have to be dense
Primary and Secondary Indices
Indices offer substantial benefits when searching for records.
BUT: Updating indices imposes overhead on database modification --when a file is modified,
every index on the file must be updated,
Sequential scan using primary index is efficient, but a sequential scan using a secondary index is
expensive
◦ Each record access may fetch a new block from disk
◦ Block fetch requires about 5 to 10 milliseconds, versus about 100 nanoseconds for memory
access
B+-Tree Node Structure
Typical node