Unit-1-Lecture-9
Unit-1-Lecture-9
Introduction
to file
structure
File organizations and
access methods
Section - 2
File organization
🞂 File organization ensures that records are available for processing.
🞂 It is used to determine an efficient file organization for each base relation.
🞂 For example, if we want to retrieve employee records in alphabetical order
of name. Sorting the file by employee name is a good file organization.
However, if we want to retrieve all employees whose marks are in a
certain range, a file is ordered by employee name would not be a good file
organization.
3
Types of File Organization
🞂 There are three types of organizing the file:
4
Sequential access file organization
• Storing and sorting in contiguous block within files on tape or disk is called
as sequential access file organization.
• In sequential access file organization, all records are stored in a sequential
order. The records are arranged in the ascending or descending order of a
key field.
• Sequential file search starts from the beginning of the file and the records
can be added at the end of the file.
• In sequential file, it is not possible to add a record in the middle of the file
without rewriting the file.
5
Sequential File Organization
▪ Suitable for applications that require
sequential processing of the entire file
▪ The records in the file are ordered by a
search-key
Sequential File Organization (Cont.)
▪ Deletion – use pointer chains
▪ Insertion –locate the position where the
record is to be inserted
• if there is free space insert there
• if no free space, insert the record in an
overflow block
• In either case, pointer chain must be updated
▪ Need to reorganize the file
from time to time to restore
sequential order
Merits & Demerits of Sequential access file organization
Advantages :
🞂 It is simple to program and easy to design.
🞂 Sequential file is best use of storage space.
Disadvantages :
🞂 Sequential file is time consuming process.
🞂 It has high data redundancy.
🞂 Random searching is not possible.
8
Random or Direct access file organization
• Direct access file is also known as random access or relative file
organization.
• In direct access file, all records are stored in direct access storage device
(DASD), such as hard disk. The records are randomly placed throughout
the file.
• The records does not need to be in sequence because they are updated
directly and rewritten back in the same location.
• This file organization is useful for immediate access to large amount of
information. It is used in accessing large databases.
• It is also called as hashing.
9
Merits & Demerits of Direct access file organization
Advantages :
🞂 Direct access file helps in online transaction processing system (OLTP) like
online railway reservation system.
🞂 In direct access file, sorting of the records are not required.
🞂 It accesses the desired records immediately.
🞂 It updates several files quickly.
🞂 It has better control over record allocation.
🞂 It has less storage space as compared to sequential file.
Disadvantages :
🞂 Direct access file does not provide back up facility.
🞂 It is expensive.
10
Indexed sequential access file organization
• Indexed sequential access file combines both sequential file and direct
access file organization.
• In indexed sequential access file, records are stored randomly on a direct
access device such as magnetic disk by a primary key.
• This file have multiple keys. These keys can be alphanumeric in which the
records are ordered is called primary key.
• The data can be access either sequentially or randomly using the index.
The index is stored in a file and read into memory when the file is opened.
11
Example of Indexed sequential access file organization
12
Merits & Demerits of indexed sequential access file organization
Advantages :
🞂 In indexed sequential access file, sequential file and random file access is
possible.
🞂 It accesses the records very fast if the index table is properly organized.
🞂 The records can be inserted in the middle of the file.
🞂 It provides quick access for sequential and direct processing.
🞂 It reduces the degree of the sequential search.
Disadvantages :
🞂 Indexed sequential access file requires unique keys and periodic
reorganization.
🞂 Indexed sequential access file takes longer time to search the index for
data access or retrieval.
🞂 It requires more storage space.
🞂 It is expensive because it requires special software. 13
File Operations
🞂 Operations on database files can be broadly classified into two categories −
• Update Operations
• Retrieval Operations
🞂 Update operations change the data values by insertion, deletion, or update. Retrieval
operations, on the other hand, do not alter the data but retrieve them after optional
conditional filtering. In both types of operations, selection plays a significant role.
Other than creation and deletion of a file, there could be several operations, which
can be done on files.
14
File operations
• Open − A file can be opened in one of the two modes, read mode or write mode. In
read mode, the operating system does not allow anyone to alter data. In other words,
data is read only. Files opened in read mode can be shared among several entities.
Write mode allows data modification. Files opened in write mode can be read but
cannot be shared.
• Locate − Every file has a file pointer, which tells the current position where the data
is to be read or written. This pointer can be adjusted accordingly. Using find (seek)
operation, it can be moved forward or backward.
• Read − By default, when files are opened in read mode, the file pointer points to the
beginning of the file. There are options where the user can tell the operating system
where to locate the file pointer at the time of opening a file. The very next data to the
file pointer is read.
15
File operations
• Write − User can select to open a file in write mode, which enables them to edit its
contents. It can be deletion, insertion, or modification. The file pointer can be located
at the time of opening or can be dynamically changed if the operating system allows
to do so.
• Close − This is the most important operation from the operating system’s point of
view. When a request to close a file is generated, the operating system
• removes all the locks (if in shared mode),
• saves the data (if altered) to the secondary storage media, and
• releases all the buffers and file handlers associated with the file.
🞂 The organization of data inside a file plays a major role here. The process to locate
the file pointer to a desired record inside a file various based on whether the records
are arranged sequentially or clustered.
16
Indexing structure for
index files
Section - 3
What is indexing ?
🞂 We know that data is stored in the form of records. Every record has a key field,
which helps it to be recognized uniquely.
🞂 Indexing is a data structure technique to efficiently retrieve records from the
database files based on some attributes on which the indexing has been done.
Indexing in database systems is similar to what we see in books.
🞂 Indexing is defined based on its indexing attributes. Indexing can be of the following
types −
• Primary Index − Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the relation.
• Secondary Index − Secondary index may be generated from a field which is a
candidate key and has a unique value in every record, or a non-key with duplicate
values.
• Clustering Index − Clustering index is defined on an ordered data file. The data file
is ordered on a non-key field.
18
Types of Indexing
Dense Index
In dense index, there is an index record for every search key value in
the database. This makes searching faster but requires more space
to store index records itself. Index records contain search key value
and a pointer to the actual record on the disk.
19
Types of Indexing
Sparse Index
In sparse index, index records are not created for every search key.
An index record here contains a search key and an actual pointer to
the data on the disk. To search a record, we first proceed by index
record and reach at the actual location of the data. If the data we are
looking for is not where we directly reach by following the index,
then the system starts sequential search until the desired data is
found.
20
Types of Indexing
Multi-level Index
Index records comprise search-key
values and data pointers. Multilevel
index is stored on the disk along with
the actual database files. As the size of
the database grows, so does the size of
the indices. There is an immense need
to keep the index records in the main
memory so as to speed up the search
operations. If single-level index is used,
then a large size index cannot be kept in
memory which leads to multiple disk
accesses.
Multi-level Index helps in breaking down
the index into several smaller indices in
order to make the outermost level so 21
Tha
nk
You