Module Iippt
Module Iippt
Introduction
File Organization
Index Types (Primary, Secondary)
Use Cases
Introduction
🔍 Role of Indexing
• Indexes accelerate search operations.
• Reduce the need to scan the entire dataset.
• Crucial for query optimization.
File Organization Overview
⚙️Why It Matters:
• Performance Impact on:
• 🔍 Search speed
• ➕ Insertion efficiency
• ❌ Deletion complexity
• 🔄 Update cost
• Choice of file organization affects the efficiency of queries and
maintenance tasks.
• 🧱 Common Types:
• Heap (Unordered)
• Sequential (Sorted)
• Hash-based
Heap File Organization
:
Example hash(EmployeeID) → bucket number
•⚡ Fast access for equality searches (e.g., WHERE ID = 123)
•🚨 Hash collisions may occur — handled using overflow chains or open addressing.
•🔄 Dynamic hashing (e.g., extendible hashing) can help grow with data.
“In
hash file organization, a hash
function is applied to a key field—like
an employee ID—to determine where
the record should go.
This method shines when it comes to
equality lookups: it’s extremely fast.
But there’s a downside—since data
isn't stored in any particular order,
range queries become almost useless.
Another challenge is collisions—multiple keys might
hash to the same location.
•📚 What is an Index?
A data structure that speeds up data retrieval by
providing
quick lookup paths to records.
•🎯 Why use Indexes?
•Avoid scanning entire files (full table scan).
•Improve performance for searches, joins, and sorting.
🔑 Types of Indexes:
•Primary Index: Based on the primary key, often
sorted and unique.
•Secondary Index: Built on non-primary fields, can
be non-unique.
•📊 Index Structures commonly used:
•B-Trees / B+ Trees
•Hash Indexes
•Bitmap Indexes (for low-cardinality columns)
Indexes in databases are like the index in a book—it
helps you find the exact page where information is
located without flipping through every page. They
drastically improve search speed by providing
shortcuts.
Primary indexes are created on the key fields that
uniquely identify records and usually correspond to
how data is sorted on disk. Secondary indexes let you
quickly search based on other attributes, even if the
data isn’t stored in that order.
There are different data structures used for indexes,
with B-Trees being the most popular because they
keep data sorted and balanced for efficient search,
insert, and delete operations.”
Primary Index
• Definition:
An index built on the primary key of the table, which uniquely
identifies each record.
• 📄 File Organization:
The data file is usually sorted on this key.
• Types:
• Sparse Index: Index entries point to blocks, not individual
records (used when data is sorted).
• Dense Index: Index entries for every record (used when
data is unsorted).
• 🔍 Advantages:
• Fast access to records by primary key.
• Enables efficient range queries due to sorted data.
• 🚧 Constraints:
• Only one primary index per file (due to sorting
requirement).
“The primary index is built on the primary key,
which means the file itself is sorted on this key.
This sorting allows for fast direct access and
efficient range queries.
There are two main types: sparse and dense.
Sparse indexes only have entries for some blocks
(like the first record in each block), so they use
less space but require scanning within a block.
Dense indexes have entries for every record,
giving very fast lookup but using more space.
Because the data must be sorted on the primary
key, there can only be one primary index per file.”
Secondary Index
•🔎 Definition:
An index built on a non-primary key attribute (non-sorting key).
• File Organization:
The data file is not sorted on the secondary index key.
•🧩 Characteristics:
•Always dense: contains an index entry for every record.
•Supports multiple secondary indexes per table.
•🔄 Use Cases:
•Querying based on fields other than the primary key (e.g., searching by City or
Department).
•⚠️Performance Considerations:
•Can cause additional I/O cost (since data is unordered on this field).
•Requires more storage for index maintenance.
“Secondary indexes are created on fields other than
the primary key. Unlike primary indexes, the data
file isn’t sorted on these fields, so the index must
contain entries for every record—this is why they
are always dense.
You can have many secondary indexes on a table,
allowing flexible query capabilities on different
attributes. The trade-off is that these indexes can
increase storage requirements and slow down
insertions and deletions because the index must be
updated.
Secondary indexes are essential when you want to
search or filter on non-primary key fields efficiently.”
Dense vs. Sparse Index