0% found this document useful (0 votes)

94 views18 pages

File Structure and Indexing

The document discusses file structures and indexing for databases. It covers: 1) Databases typically store large amounts of data on disk for performance and storage needs. File organization determines how records are mapped to disk. 2) There are different types of file organizations like heap files, sorted files, and B-trees that map records to disk in different ways. 3) Variable-length records present challenges in reusing space from deleted records. Techniques like free lists and slotted pages are used to manage variable-length records efficiently.

Uploaded by

Ankan Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views18 pages

File Structure and Indexing

Uploaded by

Ankan Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

File Structure and Indexing

Storage of Databases:

Databases typically store large amounts of data in hard disk which is the primary choice for large
databases. However, in future it may reside at different levels of the memory.
• Hence, it is important to study and understand the properties and characteristics of magnetic disks
and the data files and records organization on disk for effective physical design of DB with acceptable
performance.
• Usually, the DBMS has several options available for organizing the data. The process of physical
database design involves choosing the most appropriate one that best suit the given application
requirements from those options
• There are several primary file organizations like heap file (or unordered file), sorted file (or
sequential file), hashed file, B-trees etc. We will discuss some of the file organizations and access
technique for each organization.

Storage Hierarchy:

Following diagram shows the various storage devices available and their hierarchy:

• Cache – fastest and most costly form of storage; volatile; managed by the system hardware.

• Main memory - fast access (10s to 100s of nanoseconds; 1 nanosecond = 10-9 seconds) & volatile.
Generally too small (or too expensive) to store the entire database.
• Flash memory -
• Data survives power failure
• Data can be written at a location only once, but location can be erased and written to again
Can support only a limited number (10K – 1M) of write/erase cycles.
Erasing of memory has to be done to an entire bank of memory
• Reads are roughly as fast as main memory
• But writes are slow (few microseconds), erase is slower
• Widely used in embedded devices such as digital cameras, phones, and USB keys

• Magnetic-disk - Data is stored on spinning disk, and read/written magnetically and survives power
failures and system crashes. It is the primary medium for the long-term storage of data; typically
stores entire database. Data must be moved from disk to main memory for access, and written back
for storage.

• Optical storage
• non-volatile, data is read optically from a spinning disk using a laser
• CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms
• Blu-ray disks: 27 GB to 54 GB
• Write-one, read-many (WORM) optical disks used for archival storage (CD-R, DVD-R, DVD+R)
• Multiple write versions also available (CD-RW, DVD-RW, DVD+RW, and DVD-RAM)
• Reads and writes are slower than with magnetic disk
• Juke-box systems, with large numbers of removable disks, a few drives, and a mechanism for
automatic loading/unloading of disks available for storing large volumes of data.

File Organization

File organization determines how records are represented in a file structure. Various file
organizations are used to store a relation in disk.
• A file consists of records. These records are mapped onto disk blocks. Block size is fixed but record
size can vary.
• Each file has a file header to contain a variety of information about the file.
• In a relational database, tuples of distinct relations are generally of different sizes. Following are
approaches for mapping the database to files:
• Fixed record length file to store one relation in one file.
• Variable record length files to store one relation in one file.

In RDBMS a particular relation (same record type) may need variable length records for several
reasons:
• One or more of the fields are of varying size (variable-length fields). For example, the NAME
• Field of EMPLOYEE can be a varchar(50) field.
• One or more of the fields may have multiple values for individual record
• One or more of the fields are optional.
• The file contains records of different record types and hence of varying size (mixed file).

Files of fixed length records are easier to implement than are files of variable-length records. Many
of the techniques used for the former can be applied to the variable-length case. Thus, we begin by
considering a file of fixed-length records.
Fixed-Length Records

As an example, let us consider a file of account records for bank database. Assume record length =
40 bytes. A simple approach to add a record one by one with fixed record length 40 bytes as shown
in figure 1.

Fig. 1
However, there are two problems with this simple approach:
1. It is difficult to delete a record. The space of the deleted record must be used while new record
inserted, or we must have a way of marking deleted records so that they can be ignored.
2. Unless the block size happens to be a multiple of 40 (which is unlikely), some records will cross
block boundaries. That is, part of the record will be stored in one block and part in another. It would
thus require two block accesses to read or write such a record.

After deleting record 2 move all the records after that one by one to occupy the deleted space as
shown in figure 2. Such an approach requires moving a large number of records.

Fig. 2
Other approach may be move the final record (record 8) of the file into the space occupied by the
deleted record as shown in figure 3. But doing so requires additional block accesses, so it is
undesirable. Since insertions tend to be more frequent than deletions, it is acceptable to leave open
the space occupied by the deleted record, and to wait for a subsequent insertion before reusing the
space.
Fig. 3

In the above approach a simple marker on a deleted record is not sufficient, since it is hard to find
this available space when an insertion is being done. So we need to use file header. Here we need to
store the address of the first record whose contents are deleted. The first deleted record will store
the address of the second available record, and so on. The deleted records thus form a linked list,
which is often referred to as a free list. Figure 4 shows the free list, after records 1, 4, and 6 have
been deleted.

Fig. 4

On insertion of a new record, we use the record pointed by the header.

• We change the header pointer to point to the next available record.
• If no space is available, we add the new record to the end of the file.
• For fixed-length records the space made available by a deleted record is exactly the space needed
to insert a record. If we allow records of variable length in a file, this match no longer holds. An
inserted record may not fit in the space left free by a deleted record, or it may fill only part of that
space.
• So we need some alternative.

Variable-Length Records

For purposes of illustration we consider a different representation of the account information stored
in the file, in which we use one variable-length record for each branch name and for all the account
information for that branch.
Various ways to implement variable-length records are discussed below:

1. Byte-String Representation: Store each record as a string of consecutive bytes.

A simple method for implementing variable-length records is Either attach a special end of
record (⊥) symbol to the end of each record Or stores the record length at the beginning of
each record.

Some disadvantages:

• It is not easy to reuse space occupied formerly by a deleted record. Although techniques exist to
manage insertion and deletion, they lead to a large number of small fragments of disk storage that
are wasted.
• There is no space, in general, for records to grow longer. If a variable-length record becomes longer,
it must be moved—movement is costly if pointers to the record are stored elsewhere in the database
(e.g., in indices, or in other records), since the pointers must be located and updated.
Thus, the basic byte-string representation is not usually used for implementing variable-length
records. However, a modified form of the byte-string representation, called the slotted-page
structure (as shown in figure below), is commonly used for organizing records within a single block.

• Each entry of block header array contains size of record and location of record in the block.
• There is a header at the beginning of each block, containing the following information:
➢ The number of record entries in the header
➢ The end of free space in the block
➢ An array whose entries contain the location and size of each record.

The actual records are allocated contiguously in the block, starting from the end of the block. The
free space in the block is contiguous, between the final entry in the header array, and the first record.

The slotted-page structure requires that there be no pointers that point directly to records. Instead,
pointers must point to the entry in the header that contains the actual location of the record. This
level of indirection allows records to be moved to prevent fragmentation of space inside a block,
while supporting indirect pointers to the record.

2. Fixed-Length Representation for variable-length records

Another way to implement variable-length records efficiently in a file system is to use one or more
fixed-length records to represent one variable-length record. There are two ways of doing this:
• Reserved Space: If there is a maximum record length that is never exceeded, we can use
fixed-length records of that length. Unused space (for records shorter than the maximum
space) is filled with a special null, or end-of-record, symbol.
Following figure assumes a maximum of three accounts per branch.

Those branches with fewer than three accounts (for example, Round Hill) have records with
null fields. The reserved-space method is useful when most records have a length close to the
maximum. Otherwise, a significant amount of space may be wasted.

• List representation: We can represent variable-length records by lists of fixed length records,
chained together by pointers.

As shown in above figure pointers are used to chain together all records pertaining to the same
branch. Whereas for fixed-length record we use pointers to chain together only deleted records. A
disadvantage to the above structure is that there are spaces in all records except the first in a chain.
The first record needs to have the branch-name value, but subsequent records do not. This wasted
space is significant, since we expect, in practice, that each branch has a large number of accounts.
To deal with this problem, we allow two kinds of blocks in our file:
• Anchor block, which contains the first record of a chain
• Overflow block, which contains records other than those that are the first record of a chain
Thus, all records within a block have the same length, even though not all records in the file have the
same length. Following figure shows this file structure.
Organization of Records in Files

An instance of a relation is a set of records. Given a set of records, the next question is how to
organize them in a file. Several of the possible ways of organizing records in files are:
1. Heap file organization: Any record can be placed anywhere in the file where there is space for the
record. There is no ordering of records. Typically, there is a single file for each relation.
2. Sequential file organization: Records are stored in sequential order, according to the value of a
“search key” of each record.
3. Hashing file organization: A hash function is computed on some attribute of each record. The
result of the hash function specifies in which block of the file the record should be placed.

Basic Concepts of Indexing

An index for a file in a DBMS works in much the same way as the index of a textbook.
• Index at the back of book contains topic (specified by a word or a phrase) in the textbook and the
page nos. where it appears.
• We can search for the topic in the index page, find the pages where it occurs, and then read the
pages to find the information we are looking for.
• The words in the index are in sorted order, making it easy to find the word we are looking for.
Moreover, the index is much smaller than the book, further reducing the effort.

Database system indices play the same role as book indices. For example, to retrieve an account
record given the account number, the database system would look up an index to find on which disk
block the corresponding record resides, and then fetch the disk block, to get the account record. We
will discuss several indexing techniques and their advantages and disadvantages.

Indexing in Databases

• Indexing is a way to optimize performance of a database by minimizing the number of disk accesses
required when a query is processed.
• An index or database index is a data structure which is used to quickly locate and access the data
in a database table.
Indexes are created using some database columns and it has two pieces of information:

An attribute or set of attributes used to look up records in a file is called a search key.
• The first column is the Search key that contains a copy of the primary key or candidate key of the
table. These values are stored in sorted order so that the corresponding data can be accessed quickly
(Note that the data may or may not be stored in sorted order).
• The second column is the Data Reference which contains a set of pointers holding the address of
the disk block where that particular key value can be found.

There are two basic kinds of indices:

1. Ordered indices: Based on a sorted ordering of the values. The values in the index are ordered so
that we can do a binary search on the index. The index file is much smaller than the data file, so
searching the index using a binary search is reasonably efficient.
2. Hash indices: Based on a uniform distribution of values across a range of buckets. The bucket to
which a value is assigned is determined by a function, called a hash function.

Advantage & Disadvantage of Index

• Index makes search operation perform very fast.

Suppose a table has a 100 rows of data, each row size = 20 bytes and there is no index file. If you
read the record number 100, DBMS read each and every row and after reading 99x20 = 1980 bytes
it will find record number 100.
If we have an index, the search for record number 100 starts by reading the index file, not from the
table. The index, containing only two columns, may be just 4 bytes wide in each of its rows. After
reading only 99x4 = 396 bytes of data from the index the management system finds an entry for
record number 100, reads the address of the disk block where record number 100 is stored and
directly points at the record in the physical storage device. The result is a much quicker access to the
record.

• The only minor disadvantage of using index is that it takes up a little more space than the main
table. Additionally, index needs to be updated periodically for insertion or deletion of records in the
main table.

Ordered Indexing Methods

• Primary Index
• Clustering Index
• Secondary Index

Primary Index
• Primary Index requires the rows in data blocks to be ordered on the index key or search key.
• So apart from the index entries being themselves sorted in the index block, primary index also
enforces an ordering of rows in the data blocks.

It shows a sequential file of account records of a bank. Here data records are stored in search-key
order, with branch-name used as the search key. So ordering of both the files is same branch name.

If the ordering field in data file is not a key field, i.e. numerous records in the file can have the same
value for the ordering field, we can use another type of index, called a clustering index. Note that a
file can have at most one physical ordering field, so it can have at most one primary index or one
clustering index, but not both.

Two possible index structures of primary index are as follows.

• Dense index: An index record appears for every search-key value in the file. In a dense primary
index, the index record contains the search-key value and a pointer to the first data record with that
search-key value. The rest of the records with the same search key-value would be stored
sequentially after the first record, since, because the index is a primary one, records are sorted on
the same search key.
Dense index implementations may store a list of pointers to all records with the same search-key
value; doing so is not essential for primary indices.

• Sparse index: An index record appears for only some of the search-key values. Search key pointers
to the first data record with that search-key value. To locate a record with a search key, we find the
index entry which is less than or equal to that search-key. We start at the record pointed to by that
index entry, and follow the pointers in the file until we find the desired record.

Suppose that we are looking up records for the Downtown branch. There is no index record for that
search key. Since the last entry (in alphabetic order) before “Downtown” is “Brighton,” we follow
that pointer. We process this record, and follow the pointer in that record to locate the next record
search-key (branch-name) order. We continue processing records until we encounter a record for a
branch other than Downtown in data file.
As we have seen, it is generally faster to locate a record if we have a dense index rather than a sparse
index. However, sparse indices have advantages over dense indices in that they require less space
and they impose less maintenance overhead for insertions and deletions.

Disadvantage of primary index

Insertion and deletion of records in an order list has some problems. With a primary index, the
problem is compounded because, if we attempt to insert a record in its correct position in the data
file, we have to not only move records to make space for the new record but also change some index
entries, since moving records will change the anchor records of some blocks.
Using an unordered overflow file can reduce this problem. Another possibility is to use a linked list
of overflow records for each block in the data file.

• Secondary Indices

A secondary index provides a secondary means of accessing a file for which some primary access
already exists. The secondary index may be on a field which is a candidate key and has a unique value
in every record, or a non-key with duplicate values.

Secondary indices must be dense, with an index entry for every search-key value, and a pointer to
every record in the file. If a secondary index stores only some of the search-key values, records with
intermediate search-key values may be anywhere in the file and, in general, we cannot find them
without searching the entire file.
The index is an ordered file with two fields. The first field is an indexing field (some non- ordering
field of the data file). The second field is either a block pointer or a record pointer. There can be many
secondary indexes (and hence, indexing fields) for the same file. A secondary index usually needs
more storage space and longer search time than does a primary index, because of its larger number
of entries.

The above diagram shows how the index entries in the index blocks (left side) contains pointers
(these are row locators in database terminology) to corresponding rows in data blocks (right side).
The data blocks do not have rows sorted on the index key. Each secondary index contains the pointer
to the block where the record exists. Once the appropriate block is transferred to main memory, a
search for the desired record within the block can be carried out.

• Multilevel Indices

Even if we use a sparse index, sometimes index file size becomes very large for big data file. So access
time may be long although we are using binary search. Note that, if overflow blocks have been used,
binary search will not be possible. In that case, a sequential search is typically used that will take
even longer. Thus, the process of searching a large index may be costly. To deal with this problem,
we treat the index just as we would treat any other sequential file, and construct a sparse index on
the primary index, as shown below.
To locate a record, we first use binary search on the outer index to find the record for the largest
search-key value less than or equal to the one that we desire. The pointer points to
a block of the inner index. We scan this block until we find the record that has the largest search-key
value less than or equal to the one that we desire. The pointer in this record points to the block of
the file that contains the record for which we are looking. Using the two levels of
indexing, we have read only one index block, rather than the seven we read with binary search, if we
assume that the outer index is already in main memory.

If our file is extremely large, even the outer index may grow too large to fit in main memory. In such
a case, we can create yet another level of index. Indeed, we can repeat this process as many times
as necessary. Indices with two or more levels are called multilevel indices. Searching for records with
a multilevel index requires significantly fewer I/O operations than does searching for records by
binary search. Each level of index could correspond to a unit of physical storage. Thus, we may have
indices at the track, cylinder, and disk levels. Multilevel indices are closely related to tree structures,
such as the binary trees used for in-memory indexing.

B-tree

A multi-level index is a form of search tree; however, insertion and deletion of new index entries is a
severe problem because every level of the index is an ordered file.
It can be very expensive to keep the index in sorted order. So we need a way to make insertions and
deletions to indexes that will not require massive reorganization. B-trees address the problem of
speeding up indexing schemes that are too large to copy into memory.

A B-tree of order m satisfies following properties:

1. Each node has at most m children
2. Each internal node has at least m/2 children
3. Root has at least 2 children if it is not leaf
4. A non-leaf node with k children has (k-1) keys
5. All leaves appear in same level.

• A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches,
sequential access, insertions, and deletions in logarithmic time.
• Each internal node of a B-tree contains a number of keys. The keys act as separation values
which divide its subtrees. For example, if an internal node has 3 child nodes (or subtrees) then
it must have 2 keys: a1 and a2.
• All values in the leftmost subtree will be less than a1, all values in the middle subtree will be
between a1 & a2, and all values in the rightmost subtree will be greater than a2.
• Each B-tree node contains key values and respective data (block) pointers to the actual data
records. Additionally, there are node pointers for the left and right intervals around a key
value.
Insertion
In a B-Tree, a new element must be added only at the leaf node. That means, the new keyValue is
always attached to the leaf node only.
Step 1 - Check whether tree is Empty.
Step 2 - If tree is Empty, then create a new node with new key value and insert it into the tree as a
root node.
Step 3 - If tree is Not Empty, then find the suitable leaf node to which the new key value is added
using Binary Search Tree logic.
Step 4 - If that leaf node has empty position, add the new key value to that leaf node in ascending
order of key value within the node.
Step 5 - If that leaf node is already full, split that leaf node by sending middle value to its parent node.
Repeat the same until the sending value is fixed into a node.
Step 6 - If the splitting is performed at root node then the middle value becomes new root node for
the tree and the height of the tree is increased by one.

Example: Let us understand the insertion operation by constructing a BTree of Order 3 by inserting
numbers from 1 to 10.

As 1 is the first key insert it into new node.

Element 2 is added in the existing leaf node as it has empty position.

While adding 3 we will find the leaf node has no vacant

position so we split and add new node as shown above.

While adding 4 we will find empty position in the leaf node. So add it to that
node.

While adding 5 we will find no empty position

in leaf node so again split it and add.
While adding 6 we will find empty position in leaf node so add there.

While adding 7 we will find no empty position in leaf

so split it. While 6 need to be moved up in the parent we find no empty position in parent, so we
need to split the parent. In this step height will increase by one.

While adding 8 we will find empty position in leaf node so add there.

While adding 9 we will find no empty position in leaf so split it.

While 8 need to be moved up in the parent we find empty position in parent and add 8 there.

While adding 10 we will find empty position in leaf node so add

there.

B+Tree
• In order, to implement dynamic multilevel indexing, B-tree and B+ tree are generally
employed.
• The drawback of B-tree: it stores the data pointer (a pointer to the disk file block containing
the key value), corresponding to a particular key value, along with that key value in the node
of a B-tree. So no of entries that can be packed into a node of a B-tree is reduced, resulting
increase in the number of levels in the B-tree, hence increasing the search time of a record.
• B+ tree eliminates the above drawback by storing data pointers only at the leaf nodes of the
tree. Thus, the structure of leaf nodes of a B+ tree is quite different from the structure of
internal nodes of the B+ tree.
• internal nodes contain only search keys (no data)
• Leaf nodes contain pointers to data records
• Data records are in sorted order by the search key
• All leaves are at the same depth
Example of B+ tree of order 3

B+Tree Index:

Modern databases use mainly B+Tree index which perform range query efficiently.
• only the leave nodes store information (the location of the rows in the associated table)
• other nodes are just here to route to the right node during the search.

With this B+Tree, if we are looking for values between 40 and 100:
• We just have to look for 40 (or the closest value after 40 if 40 doesn’t exist) like we did with the
previous tree.
• Then gather the successors of 40 using the direct links to the successors until we reach 100.

Let’s say you found M successors and the tree has N nodes. The search for a specific node costs log(N)
like the previous tree. But, once you have this node, you get the M successors in M operations with
the links to their successors. This search only costs (M + log(N)) operations vs N operations with the
previous B-tree. Moreover, you don’t need to read the full tree (just M + log(N) nodes), which means
less disk usage. If M is low (like 200 rows) and N large (1000000 rows) it makes a big difference.

Disadvantage of B+ tree:
If we add or remove a row in a database (and therefore in the associated B+Tree index):
• We have to keep the order between nodes inside the B+Tree, otherwise we won’t be able to find
nodes.
• We have to keep the lowest possible number of levels in the B+Tree otherwise the time complexity
in O(log(N)) will increase (will become O(N)).

Example of creation of B+ tree by inserting keys one by one

Say we want to add keys A to M in a B+ tree of order 3.

Add A: a node will be allocated and A is added there.
Add B: there is empty place in that node so B will be added there.
Add C: there is no empty place in that node so new node added and B is promoted as parent node
and C is added in leaf node after B.

Add D: No empty place in leaf node, C is promoted in parent node and D is added in leaf.

Add E: no empty place in leaf node, so D is promoted, but a empty place in parent node. Split the
node and promote C to next higher parent.

Add F: No place in leaf split node and promote E to parent.

Add G: No empty pace in leaf split and promote F (as it is middle value). But higher level parent has
also no empty place, split it and promote E to next higher level.
Add H: No empty pace in leaf split and promote G (as it is middle value) in higher level parent.

Add I: No empty place in leaf, so split and promote H (as it is middle value) to higher level parent.
But higher level parent also has no empty place, split it and promote G to next higher level. Higher
level parent has also no empty place (C, E, G), split it and promote E to next higher level.

2.2 Prototyping
No ratings yet
2.2 Prototyping
23 pages
Anleitung PC 310 Manual - Eng
No ratings yet
Anleitung PC 310 Manual - Eng
9 pages
CSE 441 On Job Training Presentation: Submitted By: Submitted To
No ratings yet
CSE 441 On Job Training Presentation: Submitted By: Submitted To
14 pages
Crankshaft Sensor Code
60% (5)
Crankshaft Sensor Code
28 pages
4 DBMS
No ratings yet
4 DBMS
78 pages
31 File Structures
No ratings yet
31 File Structures
20 pages
File Organization
No ratings yet
File Organization
4 pages
File and File Structure: Overview of Storage Device
No ratings yet
File and File Structure: Overview of Storage Device
29 pages
File Organization1
No ratings yet
File Organization1
17 pages
Dbms 5
No ratings yet
Dbms 5
38 pages
Lecture 17
No ratings yet
Lecture 17
24 pages
Unit - 5 - Part 1
No ratings yet
Unit - 5 - Part 1
49 pages
Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering &technology
No ratings yet
Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering &technology
38 pages
Basic File Structure
No ratings yet
Basic File Structure
17 pages
Business Objects Design
No ratings yet
Business Objects Design
5 pages
Of February 1978, Sex: Male, Class: Form 4A: Compiled by Kapondeni T. 11-Feb-14
No ratings yet
Of February 1978, Sex: Male, Class: Form 4A: Compiled by Kapondeni T. 11-Feb-14
7 pages
VND - Ms Powerpoint&Rendition 1
No ratings yet
VND - Ms Powerpoint&Rendition 1
118 pages
Elmasri 6e Ch17 Week2 HW DiskStorage
No ratings yet
Elmasri 6e Ch17 Week2 HW DiskStorage
96 pages
CH 13
No ratings yet
CH 13
6 pages
File Organization and Indexing: Structure of Disks
No ratings yet
File Organization and Indexing: Structure of Disks
28 pages
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
No ratings yet
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
13 pages
Introduction To File Structure: FS Lab Mini Project Placement Statistics
No ratings yet
Introduction To File Structure: FS Lab Mini Project Placement Statistics
44 pages
7.3 Section 3 File Organisation
No ratings yet
7.3 Section 3 File Organisation
7 pages
File Organization
No ratings yet
File Organization
37 pages
Fundamental File Structure Concepts-Report
No ratings yet
Fundamental File Structure Concepts-Report
25 pages
File Organization
No ratings yet
File Organization
47 pages
6 Storage
No ratings yet
6 Storage
13 pages
Fs Report
No ratings yet
Fs Report
28 pages
DBMS - Unit 3 - Page 1-6
No ratings yet
DBMS - Unit 3 - Page 1-6
19 pages
Module - 3 - Study Session - 2
No ratings yet
Module - 3 - Study Session - 2
11 pages
FP-Lecture-6 01
No ratings yet
FP-Lecture-6 01
33 pages
Disk Organization
No ratings yet
Disk Organization
29 pages
Unit 5
No ratings yet
Unit 5
185 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
No ratings yet
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
33 pages
Storage and File Structures: Goals
No ratings yet
Storage and File Structures: Goals
13 pages
CST 204 Dbms Module - 3 Physical Data Organization
No ratings yet
CST 204 Dbms Module - 3 Physical Data Organization
93 pages
1.file Organization
No ratings yet
1.file Organization
90 pages
Unit 4
No ratings yet
Unit 4
30 pages
DSA Unit6 Theory
No ratings yet
DSA Unit6 Theory
23 pages
Physical Records
No ratings yet
Physical Records
14 pages
Lecture3 PDF
No ratings yet
Lecture3 PDF
28 pages
OSY Chapter 6 SSP
No ratings yet
OSY Chapter 6 SSP
24 pages
Mod 5 QB Soln
No ratings yet
Mod 5 QB Soln
5 pages
File System
No ratings yet
File System
9 pages
6 Data Storage and Querying
100% (1)
6 Data Storage and Querying
58 pages
Chapter 11 File Management
No ratings yet
Chapter 11 File Management
13 pages
Ch4-Data Storage and Indexing
No ratings yet
Ch4-Data Storage and Indexing
116 pages
Database 2 Notes
No ratings yet
Database 2 Notes
42 pages
File Organisation: Seekg Seekp
No ratings yet
File Organisation: Seekg Seekp
7 pages
Dbms Unit III Notes
No ratings yet
Dbms Unit III Notes
27 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
Disk Storage, Basic File Structures, and Hashing
No ratings yet
Disk Storage, Basic File Structures, and Hashing
18 pages
Intro File2
No ratings yet
Intro File2
36 pages
Module 5 File Organization 1
No ratings yet
Module 5 File Organization 1
37 pages
File Organization in RDBMS
No ratings yet
File Organization in RDBMS
9 pages
DBMS Book Special Notes PDF
No ratings yet
DBMS Book Special Notes PDF
68 pages
Unit 6 File Management
No ratings yet
Unit 6 File Management
70 pages
CH 3
No ratings yet
CH 3
47 pages
Physical Memory Address (Frame Number Page Size) +page Offset
No ratings yet
Physical Memory Address (Frame Number Page Size) +page Offset
19 pages
Oracle Database 12c Quickstart
From Everand
Oracle Database 12c Quickstart
Michael Elliott
5/5 (5)
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Gluster Filesystem - Practical Method
From Everand
Gluster Filesystem - Practical Method
Fabian Mestre
No ratings yet
Company - CV ALDI SÜD
No ratings yet
Company - CV ALDI SÜD
2 pages
HNC 100 New Re30139 - 2012-08
No ratings yet
HNC 100 New Re30139 - 2012-08
20 pages
Presentation On "History of Computer".
No ratings yet
Presentation On "History of Computer".
18 pages
AM335x EDMA Drivers Guide
No ratings yet
AM335x EDMA Drivers Guide
5 pages
B01 Multiple
No ratings yet
B01 Multiple
8 pages
RFID Based Security System
No ratings yet
RFID Based Security System
30 pages
Assignment No: 01 Salary Calculation
No ratings yet
Assignment No: 01 Salary Calculation
17 pages
Ao-90265ba Pci Card
No ratings yet
Ao-90265ba Pci Card
46 pages
Exchange Server 2010 Introduction To Supporting Administration
No ratings yet
Exchange Server 2010 Introduction To Supporting Administration
78 pages
AC500-Store and Load PDF
No ratings yet
AC500-Store and Load PDF
4 pages
SS23 Selling Against Competition
No ratings yet
SS23 Selling Against Competition
11 pages
Fortigate 5001E Security System Guide
No ratings yet
Fortigate 5001E Security System Guide
35 pages
Object-Oriented Programming Techniques (Lab)
No ratings yet
Object-Oriented Programming Techniques (Lab)
14 pages
Criminal Record Management System - TutorialsDuniya
No ratings yet
Criminal Record Management System - TutorialsDuniya
61 pages
Banasthali Vidyapith Brochure - Jan-May
No ratings yet
Banasthali Vidyapith Brochure - Jan-May
6 pages
5 ACI Monitoring and Tshoot PDF
No ratings yet
5 ACI Monitoring and Tshoot PDF
82 pages
Maame Yaa Fosuaah Owusu CV
No ratings yet
Maame Yaa Fosuaah Owusu CV
1 page
1.1.1 Binary Systems
No ratings yet
1.1.1 Binary Systems
18 pages
CCS335-Cloud-Computing-QB - Unit 3, 4 & 5
No ratings yet
CCS335-Cloud-Computing-QB - Unit 3, 4 & 5
57 pages
How Do I Automatically Format A Word Document
No ratings yet
How Do I Automatically Format A Word Document
2 pages
Veritas VXVM Cheat Sheet
No ratings yet
Veritas VXVM Cheat Sheet
8 pages
OAI Webinar Series Chapter Five FlexRIC Tutorial Xapp Development
No ratings yet
OAI Webinar Series Chapter Five FlexRIC Tutorial Xapp Development
44 pages
Mathsrtae
No ratings yet
Mathsrtae
3 pages
Petri Nets - Applications
No ratings yet
Petri Nets - Applications
762 pages
SAP EUT EWM Outbound Process
No ratings yet
SAP EUT EWM Outbound Process
26 pages
Siemens - NanoboxPC-IPC227E - EN
No ratings yet
Siemens - NanoboxPC-IPC227E - EN
3 pages

File Structure and Indexing

Uploaded by

File Structure and Indexing

Uploaded by

File Structure and Indexing

On insertion of a new record, we use the record pointed by the header.

1. Byte-String Representation: Store each record as a string of consecutive bytes.

2. Fixed-Length Representation for variable-length records

Basic Concepts of Indexing

There are two basic kinds of indices:

Advantage & Disadvantage of Index

Ordered Indexing Methods

Two possible index structures of primary index are as follows.

Disadvantage of primary index

A B-tree of order m satisfies following properties:

As 1 is the first key insert it into new node.

Element 2 is added in the existing leaf node as it has empty position.

While adding 3 we will find the leaf node has no vacant

While adding 5 we will find no empty position

While adding 7 we will find no empty position in leaf

While adding 9 we will find no empty position in leaf so split it.

While adding 10 we will find empty position in leaf node so add

Example of creation of B+ tree by inserting keys one by one

Say we want to add keys A to M in a B+ tree of order 3.

Add F: No place in leaf split node and promote E to parent.

You might also like