Execution
Execution
Produce an item
next
Sorting Hashing
Indices/indexes
Goal:
To reduce the number of accesses to secondary
storage
How?
By employing search techniques in the form of
indices (sometimes, also materialized views, but not
in this paper)
Indices map key or attribute values to locator
information with which database objects can be
retrieved.
Some Index Structures:
Clustered & Un-clustered
Clustered:order or organization of index entries
determines order of items on disk.
Sparse & Dense
Sparse: Indices do not contain an entry for each data
item in the primary file, but only one entry for each
page of the primary file;
Dense: there are same number of entries in index as
there are items in primary file.
Non-clustering indices must always be dense
Buffer Management
Goal: reduce I/O cost by cashing data in an I/O
buffer.
Design Issues
Recovery
Replacement policy
performance effect of buffer allocation
Interactions of index retrieval and buffer management
Implementation Issues
Interface provided : fixing –unfixing
Intermediate results kept in a separate buffer
Discussion (pairs)
There are many issues that could be
covered by either the OS or the
database. Break into groups and discuss
some of these issues. For each
issue, what are the pros and the cons of
handling it in the database?
- GPUs
- TPUs - Tensor matrix processing, hardware specific, ML
- DOJ: combining databases and OS, create a monopoly.
- OS as manager for resource allocations, works in general,
but not great for databases,
- in general, LRU (least recently used) works great, but not
for DB
- Database people always want more control. Turf war.
- Different for different scenarios
BINARY MATCHING OPERATIONS
Relational join most prominent binary matching
operation (others: intersection, union, etc)
Set operations such as intersection and
difference needed for any data model
Most commercial db systems as of 1993 used
only nested loops and merge-join. As per
research done for SystemR, these two were
supposed to be most efficient.
SystemR researchers did not consider Hash join
algorithms, which are today considered even
better in performance.
NESTED-LOOPS JOIN ALGORITHMS:
simple elegance
For each item in one input, scan entire other
input to find matches.
Performance is really poor, because inner input
is scanned often. (paper points this out)
Tricks to improve performance include:
larger input should be the outer one.
if possible, use an index on the attribute to be
matched in the inner input.
Inner input can be scanned once for each ‘page’ of
outer input.
MERGE-JOIN ALGORITHMS