Lecture9 PDF
Lecture9 PDF
2433-001
Database Systems
DB
Blocks
DB
Sectors
Example:
Suppose we have a relation with fields: name, age, and salary.
How will do we sort it?
Index Records
What is an index?
Index
• It is:
– a data structure
– a pointer (called data entry in the textbook) to a
data record
– organized based on search key
• Three alternatives to indices and data record
interaction
– Put the data record with the index
– Store a record ID in the index to point to the data
record
– Store a list of record IDs of data record with the
same search key value
Special Case: Clustered Indexes
Definition:
The ordering of data records is the same as,
or close to, the ordering of some index.
Why is it important?
Reduces the cost of using an index to answer
a range of search queries.
But:
Too expensive to maintain when the data
is updated.
The Table above shows and average of the I/O cost only.
Before Choosing Your Index:
Know Your Workload
• For each query in the workload:
– Which relations does it access?
– Which attributes are retrieved?
– Which attributes are involved in selection/join
conditions? How selective are these conditions
likely to be?
• For each update in the workload:
– Which attributes are involved in selection/join
conditions? How selective are these conditions
likely to be?
– The type of update (INSERT/DELETE/UPDATE), and
the attributes that are affected.
Index Choice
• What indexes should we create?
– Which relations should have indexes? What
field(s) should be the search key? Should we
build several indexes?
• For each index, what kind of an index
should it be?
– Clustered? Hash/tree?
– Hash-based are optimized for equality
– Tree-based supports equality and range
– Sorted file is pretty expensive to maintain
Index Choice
• One approach:
– Consider the most important queries in turn.
– Consider the best plan using the current indexes
– see if a better plan is possible with an additional
index.
– If so, create it.
• Before creating an index, must also consider
the impact on updates in the workload!
– Trade-off: Indexes can make queries go faster,
updates slower. Require disk space, too.
Guidelines
• Attributes in WHERE clause are candidates for index
keys.
– Exact match condition suggests hash index.
– Range query suggests tree index.
• Clustering is especially useful for range queries; can also help on
equality queries if there are many duplicates.
• Multi-attribute search keys should be considered
when a WHERE clause contains several conditions.
– Order of attributes is important for range queries.
• Try to choose indexes that benefit as many queries
as possible. Since only one index can be clustered per
relation, choose it based on important queries that
would benefit the most from clustering.
Examples
SELECT E.dno
FROM Emp E
WHERE E.age>40
• Clustered
• B+ tree index on E.age can be used to
get qualifying tuples.
Examples
SELECT E.dno, COUNT (*)
FROM Emp E
WHERE E.age>10
GROUP BY E.dno
Blocks
Sectors
Pages and Heap Files
• Every record has a unique rid
• Every page in the file has the same size.
• Supported operations include:
– create and destroy files
– insert/delete a record with given rid
– get a record with given rid
– scan all records in the file
• Given the id or the record, we must be
able to find the id of the page containing
the record
Pages and Heap Files
• We must keep track of the pages in
each heap file to support scans.
• We must keep track of pages with
empty spaces to support efficient
insertions.
How to maintain this info?
Linked List of Pages
Data Data Data Full Pages
Page Page Page
Header
Page
Data Data Data
Pages with
Page Page Page
Free Space
Data
DIRECTORY Page N
Data
DIRECTORY Page N
Slot 1 Slot 1
Slot 2 Slot 2
Free
... Space
...
Slot N Slot N
Slot M
N 1 . . . 0 1 1M
number M ... 3 2 1 number
PACKED of records UNPACKED, BITMAP of slots
Page Format:
Variable Length Records
• Cannot divide the page into fixed-length
slots
• Challenge: When a new record is to be
inserted, we have to find an empty slot
of just the right length.
• Challenge: We must ensure that the
free space on the page is contiguous.
• So The ability to move records on a
page becomes very important
Page Format:
Variable Length Records
• Directory of slots
• <record offset, record length> per slot
• record offset: offset in bytes from the
start of the data area to the start of
the record
• Deletion: setting record offset to -1
• rid <page id, slot id> does not change
when a record moves.
Page Format:
Variable Length Records
• Maintain a pointer to the start of the free
space area
• When a new record does not fit into the
remaining free space move records in
the page to reclaim space deleted earlier.
• Cannot always remove a slot of a deleted
record (or the rid of the other slots will
change).
• When a new record is inserted, the
directory is scanned for an element not
pointing to a record.
Page Format:
Variable Length Records
Rid = (i,N)
Page i
Rid = (i,2)
Rid = (i,1)
20 16 24 N Pointer
N ... 2 1 # slots to start
of free
space
SLOT DIRECTORY
Page Format
• Beside slots information, a page usually
contains file-level information (e.g. id of
the next page, etc).
• The slotted page organization used for
variable length records can also be used
for fixed-length records
What about Records?
• How to organize fields within a record?
• Issues we have to take into account:
– Fields of the record are of fixed or
variable length
– cost of various operations on the records
• Information common to all records (e.g.
number of fields, types, …) are stored in
the system catalog
Fixed-Length Records
F1 F2 F3 F4
L1 L2 L3 L4
4 $ $ $ $