0% found this document useful (0 votes)

0 views

Module Iippt

This document covers data storage and indexing, focusing on file organization methods such as heap, sequential, and hash file organizations, and their respective advantages and disadvantages. It also discusses primary and secondary index structures, their use cases, and the impact of indexing on data retrieval performance. Key index types include dense and sparse indexes, along with various structures like B-Trees and dynamic hashing techniques.

Uploaded by

soumimubash00.676

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Module Iippt

Uploaded by

soumimubash00.676

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

MODULE II

Data Storage and

Indexes
File Organizations, Primary and Secondary Index Structures
CONTENTS

 Introduction
 File Organization
 Index Types (Primary, Secondary)
 Use Cases
Introduction

🧠 Why Data Storage & Indexing Matter

• Efficient data storage ensures optimal use of memory and disk.
• Fast data retrieval is critical for performance in large-scale
databases.

🔍 Role of Indexing
• Indexes accelerate search operations.
• Reduce the need to scan the entire dataset.
• Crucial for query optimization.
File Organization Overview

📂 What is File Organization?

• The method used to store records on disk.
• Determines how data is accessed, inserted, updated, and deleted.

⚙️Why It Matters:
• Performance Impact on:
• 🔍 Search speed
• ➕ Insertion efficiency
• ❌ Deletion complexity
• 🔄 Update cost
• Choice of file organization affects the efficiency of queries and
maintenance tasks.
• 🧱 Common Types:

• Heap (Unordered)
• Sequential (Sorted)
• Hash-based
Heap File Organization

• 🔄 Unordered records — new records are placed wherever space is

available.
• ➕ Fast insertions — no need to maintain order.
• 🔍 Slow searches — full scan required unless indexed.
• 🧹 Deletion leaves empty spaces that may need periodic cleanup
(compaction).
 “In heap file organization, records are stored
in no specific order.
 It’s efficient for inserting new data since
there’s no need to sort or shift existing
records.
 However, searching is inefficient—unless an
index is in place—because the database may
have to scan every single record to find what
it needs.
 This method is best for workloads with heavy
insert operations and minimal searching.”
Sequential File Organization

• 📈 Records stored in sorted order (typically by primary key).

• 🔍 Efficient for range queries and ordered data retrieval.
• ➕ Fast sequential access (e.g., retrieving top 10 records).
• ❌ Slow insertions/deletions — maintaining order requires shifting
data or rewriting.
 “Sequential file organization stores records in a
sorted manner—usually based on a key like
EmployeeID or Name.
 This is ideal when you need to process data in
order, or handle range-based queries.
 The trade-off? Inserting or deleting data can be
expensive.
 The system might need to shift multiple
records or reorganize the file, which adds
overhead.
 It’s best used for systems with frequent read
and range operations but infrequent updates.”
Hash File Organization

•🔑 Uses a hash function to compute the storage location from a key.

:
Example hash(EmployeeID) → bucket number
•⚡ Fast access for equality searches (e.g., WHERE ID = 123)

•❌ Inefficient for range queries (no order preserved).

•🪣 Data stored in buckets, with one or more records per bucket.

•🚨 Hash collisions may occur — handled using overflow chains or open addressing.

•🔄 Dynamic hashing (e.g., extendible hashing) can help grow with data.
 “In
hash file organization, a hash
function is applied to a key field—like
an employee ID—to determine where
the record should go.
 This method shines when it comes to
equality lookups: it’s extremely fast.
But there’s a downside—since data
isn't stored in any particular order,
range queries become almost useless.
 Another challenge is collisions—multiple keys might
hash to the same location.

 To handle that, we use techniques like overflow

buckets or chaining.

 Systems can also use dynamic hashing methods

like extendible hashing to automatically expand as
more data is added, avoiding overflow.”
Indexing Overview

•📚 What is an Index?
A data structure that speeds up data retrieval by
providing
quick lookup paths to records.
•🎯 Why use Indexes?
•Avoid scanning entire files (full table scan).
•Improve performance for searches, joins, and sorting.
🔑 Types of Indexes:
•Primary Index: Based on the primary key, often
sorted and unique.
•Secondary Index: Built on non-primary fields, can
be non-unique.
•📊 Index Structures commonly used:
•B-Trees / B+ Trees
•Hash Indexes
•Bitmap Indexes (for low-cardinality columns)
 Indexes in databases are like the index in a book—it
helps you find the exact page where information is
located without flipping through every page. They
drastically improve search speed by providing
shortcuts.
 Primary indexes are created on the key fields that
uniquely identify records and usually correspond to
how data is sorted on disk. Secondary indexes let you
quickly search based on other attributes, even if the
data isn’t stored in that order.
 There are different data structures used for indexes,
with B-Trees being the most popular because they
keep data sorted and balanced for efficient search,
insert, and delete operations.”
Primary Index
• Definition:
An index built on the primary key of the table, which uniquely
identifies each record.
• 📄 File Organization:
The data file is usually sorted on this key.
• Types:
• Sparse Index: Index entries point to blocks, not individual
records (used when data is sorted).
• Dense Index: Index entries for every record (used when
data is unsorted).
• 🔍 Advantages:
• Fast access to records by primary key.
• Enables efficient range queries due to sorted data.
• 🚧 Constraints:
• Only one primary index per file (due to sorting
requirement).

“The primary index is built on the primary key,
which means the file itself is sorted on this key.
This sorting allows for fast direct access and
efficient range queries.
 There are two main types: sparse and dense.
Sparse indexes only have entries for some blocks
(like the first record in each block), so they use
less space but require scanning within a block.
Dense indexes have entries for every record,
giving very fast lookup but using more space.
 Because the data must be sorted on the primary
key, there can only be one primary index per file.”
Secondary Index

•🔎 Definition:
An index built on a non-primary key attribute (non-sorting key).
• File Organization:
The data file is not sorted on the secondary index key.
•🧩 Characteristics:
•Always dense: contains an index entry for every record.
•Supports multiple secondary indexes per table.
•🔄 Use Cases:
•Querying based on fields other than the primary key (e.g., searching by City or
Department).
•⚠️Performance Considerations:
•Can cause additional I/O cost (since data is unordered on this field).
•Requires more storage for index maintenance.
 “Secondary indexes are created on fields other than
the primary key. Unlike primary indexes, the data
file isn’t sorted on these fields, so the index must
contain entries for every record—this is why they
are always dense.
 You can have many secondary indexes on a table,
allowing flexible query capabilities on different
attributes. The trade-off is that these indexes can
increase storage requirements and slow down
insertions and deletions because the index must be
updated.
 Secondary indexes are essential when you want to
search or filter on non-primary key fields efficiently.”
Dense vs. Sparse Index

Feature Dense Index Sparse Index

Some records (usually
Index Entry for Every record
one per block)
Space Overhead High Low
Slightly slower (needs
Lookup Speed Faster (direct access)
block scan)
Suitable For Unsorted data Sorted data
Higher (more entries
Maintenance Cost Lower
to update)
“Dense and sparse indexes are two strategies for indexing
records.
• Dense indexes have an entry for every record. This
means lookups are very fast since you can find exactly
where the record is. But they take more space and are
more expensive to maintain because every insert or
delete affects the index.
• Sparse indexes only have entries for some records—
usually the first record in each block. This reduces space
and maintenance overhead but means you must scan
within the block after locating it, making lookups slightly
slower.
 Sparse indexes only work well if the data file is sorted on
the key.”
Various Index Structures
Index Structures: Hashing, Dynamic
Hashing, Multilevel, B & B+ Trees
1. Hash-Based Indexes

•Use hash functions to map keys directly to

buckets.
•Fast for equality searches (e.g., WHERE key = value ).

•Poor support for range queries.

•Fixed-size buckets can cause overflow.
2. Dynamic Hashing Techniques

• Extendible Hashing: Directory

doubles when buckets overflow;
supports growth.
• Linear Hashing: Buckets split
gradually to handle overflow.
• Avoids costly full rehashing.
3. Multilevel Indexes

• Indexes built on top of indexes to

reduce search time.
• Example: Two-level index where first
level points to blocks of second-level
index.
• Improves lookup speed by reducing
disk I/O.
4. B-Trees and B+ Trees

• Balanced tree structures ideal for databases and

file systems.
• All leaves at the same depth; supports sorted data
storage.
• B-Tree stores keys and records at all nodes.
• B+ Tree stores keys in internal nodes and actual
records only in leaf nodes.
THANK YOU

Electrical Specifications PDF
100% (2)
Electrical Specifications PDF
3 pages
Chemsheets AS 1078 Crude Oil
No ratings yet
Chemsheets AS 1078 Crude Oil
15 pages
Indexing
No ratings yet
Indexing
62 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
dbms 3 sem
No ratings yet
dbms 3 sem
31 pages
DBMS A1
No ratings yet
DBMS A1
10 pages
File Organization
No ratings yet
File Organization
41 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
DINLect1.pptx
No ratings yet
DINLect1.pptx
69 pages
DBMS_UNIT_5_NOTES
No ratings yet
DBMS_UNIT_5_NOTES
28 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
Unit 6 notes DBMS final
No ratings yet
Unit 6 notes DBMS final
14 pages
DBMS-Unit5-PPT (1)
No ratings yet
DBMS-Unit5-PPT (1)
40 pages
index1 (5)
No ratings yet
index1 (5)
25 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
file organization
No ratings yet
file organization
9 pages
Class 6
No ratings yet
Class 6
15 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
81 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
Indexing_Hashing_Files
No ratings yet
Indexing_Hashing_Files
68 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
Unit 5 DBMS
No ratings yet
Unit 5 DBMS
38 pages
V_Unit[1]
No ratings yet
V_Unit[1]
36 pages
V Unit
No ratings yet
V Unit
15 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
Lec20Indexing_v1
No ratings yet
Lec20Indexing_v1
57 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
File Organization
No ratings yet
File Organization
11 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
26 pages
Unit 4 Chapter 1 Storage and Querying
No ratings yet
Unit 4 Chapter 1 Storage and Querying
37 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
Lecture12(CNC 312)
No ratings yet
Lecture12(CNC 312)
36 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Indexing
No ratings yet
Indexing
6 pages
PPT-203105251-3
No ratings yet
PPT-203105251-3
35 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
Self Unit 2
No ratings yet
Self Unit 2
18 pages
Hashing and Types of Files
No ratings yet
Hashing and Types of Files
28 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
35 pages
Chapter 11. File Organisation and Indexes
No ratings yet
Chapter 11. File Organisation and Indexes
56 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
File Organization and Indexing (1)
No ratings yet
File Organization and Indexing (1)
38 pages
DBMS-U5 Notes
No ratings yet
DBMS-U5 Notes
16 pages
DBMS UNIT-5
No ratings yet
DBMS UNIT-5
23 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
Indexes
No ratings yet
Indexes
70 pages
CH 3 Index
No ratings yet
CH 3 Index
40 pages
10 File Organization in DBMS
No ratings yet
10 File Organization in DBMS
15 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
High Strength Concrete in AS 5100
100% (1)
High Strength Concrete in AS 5100
11 pages
A Review of Data Mining Technologies in Building Energy Systems
No ratings yet
A Review of Data Mining Technologies in Building Energy Systems
16 pages
Notes On Git
No ratings yet
Notes On Git
5 pages
2047 - 4000 Essential English Words 6 Unit 1 The North Star
No ratings yet
2047 - 4000 Essential English Words 6 Unit 1 The North Star
7 pages
Shell Gadus S2 A320 2
No ratings yet
Shell Gadus S2 A320 2
3 pages
Polarized Light and The Mueller Matrix Approach 2nd Edition José J. Gil
No ratings yet
Polarized Light and The Mueller Matrix Approach 2nd Edition José J. Gil
69 pages
Duty Time Record: Date Day/Month/Yea R Time Signature OF Supervisor Total Hours Remarks IN OUT
No ratings yet
Duty Time Record: Date Day/Month/Yea R Time Signature OF Supervisor Total Hours Remarks IN OUT
2 pages
AMC2002C Datasheet
No ratings yet
AMC2002C Datasheet
22 pages
Unit 1 - Computer Graphic
No ratings yet
Unit 1 - Computer Graphic
19 pages
Lesson 9 STP Process Flow Part 7
No ratings yet
Lesson 9 STP Process Flow Part 7
47 pages
Iq-Test Compress PDF
No ratings yet
Iq-Test Compress PDF
11 pages
Survey Questionnaire Dear Respondents
No ratings yet
Survey Questionnaire Dear Respondents
3 pages
Signals And Systems Second Edition I J Nagrath - Own the complete ebook set now in PDF and DOCX formats
100% (2)
Signals And Systems Second Edition I J Nagrath - Own the complete ebook set now in PDF and DOCX formats
42 pages
Intelligence Braking System
No ratings yet
Intelligence Braking System
15 pages
Vintage Audiovisual Museum
No ratings yet
Vintage Audiovisual Museum
15 pages
Setting Up Purchase Order Release Strategy
No ratings yet
Setting Up Purchase Order Release Strategy
18 pages
Povrty On Students Academic Performance
No ratings yet
Povrty On Students Academic Performance
14 pages
2018 Eu Standard Load Cell Cable
No ratings yet
2018 Eu Standard Load Cell Cable
1 page
Determinant SIN
No ratings yet
Determinant SIN
18 pages
Two-Phase Flow Venting From Reactor
No ratings yet
Two-Phase Flow Venting From Reactor
9 pages
Shakil LVDT Lab Report
No ratings yet
Shakil LVDT Lab Report
11 pages
Study of Analytical Characteristics of Local Vegetation Type
No ratings yet
Study of Analytical Characteristics of Local Vegetation Type
42 pages
Gas Turbine Power Generation
No ratings yet
Gas Turbine Power Generation
14 pages
All Titles On This PDF Are Clickable: JEE MAINS 2025/ 2026
No ratings yet
All Titles On This PDF Are Clickable: JEE MAINS 2025/ 2026
10 pages
CHAPTER 18 Eletric Force and Electric Field PDF
No ratings yet
CHAPTER 18 Eletric Force and Electric Field PDF
25 pages
Business Forecasting & Time Series Analysis
No ratings yet
Business Forecasting & Time Series Analysis
24 pages
2016 9-12-School Report Card
No ratings yet
2016 9-12-School Report Card
3 pages
GL-TWL-02 - Tower Light Pre-Delivery Inspection Checklist Guideline (Rev01) 20180307
No ratings yet
GL-TWL-02 - Tower Light Pre-Delivery Inspection Checklist Guideline (Rev01) 20180307
1 page

Module Iippt

Uploaded by

Module Iippt

Uploaded by

MODULE II

Data Storage and

🧠 Why Data Storage & Indexing Matter

📂 What is File Organization?

• 🔄 Unordered records — new records are placed wherever space is

• 📈 Records stored in sorted order (typically by primary key).

•🔑 Uses a hash function to compute the storage location from a key.

•❌ Inefficient for range queries (no order preserved).

•🪣 Data stored in buckets, with one or more records per bucket.

 To handle that, we use techniques like overflow

 Systems can also use dynamic hashing methods

Feature Dense Index Sparse Index

•Use hash functions to map keys directly to

•Poor support for range queries.

• Extendible Hashing: Directory

• Indexes built on top of indexes to

• Balanced tree structures ideal for databases and

You might also like