File Organisation DP ss2 WK 1
File Organisation DP ss2 WK 1
What is a File?
Files are used to organize and store data for easy access, retrieval, and
management.
Example: A file of student records where each record contains details about
a single student.
Types of Files
1. Text Files: Store plain text data, such as .txt or .csv files.
Student Ag Clas
Name e s
Here:
File organization is a way of organizing the data or records in a file. It does not refer to
how files are organized in folders, but how the contents of a file are added and
accessed.
File organization refers to the way data is stored in a file so it can be
retrieved, updated, and managed efficiently during data processing. In data
processing, a file contains a collection of related records, and file
organization determines the structure and method used to store and access
these records.
There are many ways records can be organized on disk or tape. The main
methods of file organization used for files are;
Sequential File Organization is a way of storing data in which records are arranged in a specific
order based on a key field, such as a student’s name, roll number, or date of birth. Each record is
stored one after the other in a fixed sequence, and accessing the records requires following that
order from the start.
It is like lining up students according to their roll numbers and calling their names in that same
order.
1. Storage: Records are stored in a specific, sorted order based on a key field.
o For example, student records could be stored in order of roll numbers: 001, 002,
003, and so on.
2. Access: To find a record, the system starts from the beginning and checks each record
until the desired one is found.
3. Updating Records:
o Inserting New Records: When a new record is added, it must be inserted at the
correct position to maintain the sequence, which might require shifting other
records.
o Deleting Records: When a record is deleted, the remaining records stay in
sequence, but empty spaces may need to be handled.
Indexed Sequential Access Method (ISAM) is a type of file organization that combines the
features of both sequential access and indexing. It stores records in a sorted order based on a
key field (e.g., student ID or name) and uses an index to locate specific records faster.
The index acts like a table of contents in a book, helping the system quickly jump to the required
section instead of scanning through the entire file.
In this method, an index is created, much like a book’s index, to locate records quickly without
scanning the entire file.
Practical Example
Imagine a student database where all students are listed alphabetically by their names.
The data file contains their records, while the index file points to the location of each
student’s record. To find the record of “John Doe,” the system uses the index to jump to
the exact location in the data file.
.
Advantages:
Disadvantages:
Practical Example:
Library catalog systems where an index helps locate books based on their titles or
authors.
Application: Database management systems, search engines.
Index file organization is a method of arranging and accessing data
stored in a file using an index, much like an index in a book. The index
helps to quickly locate the position of a specific record in the file.
Practical Analogy:
Imagine you have a large book with 500 pages about African history. When you want to find
information about "Queen Amina of Zazzau," it would take a lot of time to flip through all the
pages. But, if the book has an index at the back, you can look up "Queen Amina" in the index,
see the page number, and go straight to that page.
The main data file is like the book with all the details.
The index file is like the index at the back of the book that tells you where to find
specific records.
1. Faster Access:
Searching through the index is much quicker than scanning the entire data file.
2. Efficient Sorting:
Data doesn't need to be physically arranged in order in the data file. The index can
logically order it.
3. Supports Large Files:
Managing large files becomes easier because the index reduces the need to access the full
file.
4. Flexibility:
Indexes can be created for multiple fields (e.g., name, ID, or subject), offering versatile
search options.
Practical Analogy:
Think of a large library with thousands of books. If you want to find a book, instead of searching
shelf by shelf, you use a catalog that tells you the exact shelf and position of the book based on
its title or ID.
Data is stored at specific locations in the file based on a unique identifier (e.g., a student
ID).
The system calculates the storage location using a hash function, allowing immediate
access.
Advantages:
Disadvantages:
1. Collisions (two records being assigned the same location) require extra handling.
2. Inefficient for processing large amounts of data sequentially.
3. Hash functions must be carefully designed for efficient performance.
Practical Example:
ATM systems where a customer's account is accessed using their account number.
Application: Banking systems, airline reservation systems.
4. Clustered File Organization
This method groups similar records together in the same block or physical location to enhance
access speed. Clustered file organization is not considered good for large databases. In
this mechanism, related records from one or more relations are kept in the same disk
block, that is ordering of records is not based on primary key or search key.
Clustered File Organization is a way of storing data in groups or "clusters" based on a
common attribute. For example, in a school database, records of students from the
same class (like SS1, SS2, or SS3) c2an be grouped together. Each group is stored in
a block, so all data related to the same attribute is found in one place.
This method is designed to improve data access speed when related records are
needed together. Instead of searching the entire file, the system only looks in the
relevant cluster.
How It Works: Records with related data are stored together based on a clustering field,
making it easier to retrieve grouped data.
Advantages:
Disadvantages:
Practical Example:
Heap File Organization is a method of storing records in a database where records are
placed randomly, without any specific order. New data is added wherever there is space
available, usually at the end of the file. This means the data is not sorted by any field,
such as name, date, or student ID.
This type of organization is commonly used when the priority is to store data quickly, and
frequent searches or updates are not required.
How It Works: Data is stored wherever there is free space. It does not follow any
specific sequence.
Advantages:
1. Easy to implement.
2. Fast for inserting new records.
3. Requires no sorting or indexing.
Disadvantages:
1. Searching for specific records is slow because it requires scanning the entire file.
2. Unsuitable for scenarios where frequent updates and deletions occur.
3. Difficult to handle large files efficiently.
Practical Example:
Storing log files where records are simply appended as they are generated.
Application: Temporary or small datasets that require frequent inserts.
Scan: Fetch all records in the file. The pages in the file must be fetched from the
disk into the buffer pool. There is also a CPU overhead per record for locating the
record on the page.
Search with equality selection: Fetch all records that satisfy an equality
selection, for example, find the student record for the student with sid 23. Pages
that contain qualifying records must be fetched from the disk, and qualifying
records must be located within retrieved pages.
Search with range selection:Fetch all records that satisfy a range selection. For
example, find all students records with name alphabetically after smith.
Insert:Insert a given record into the file. We must identify the page in the file into
which the new record must be inserted, fetch that page from the disk, modify it to
include the new record and then write back the modified page.
Delete:Delete a record that is specified using its record id. We must identify the
page in the file into which the new record must be inserted, fetch that page from
the disk, modify and then write it back.
Locate: Every file has a file pointer, which tells the current position where the
data is to be read or written.
Write: User can select to open a file in write mode, the file enables them to edit
its contents. It can be deletion, insertion or modification.
Read: By default, when file are opened in read mode, the file pointer points to
the beginning of the file.
Comparison among Three Files Organization
- A hashed file does not utilize space quite as well as a sorted file, but
insertions and deletions are fast, and equality selections are very fast.
- A heap file has good storage efficiency and supports fast scan, insertion
and deletion or records. However, it is slow for searching.
- A sorted file also offers good storage efficiency, but insertion and deletion
of records are slow. It is quite fast for searching, and it is the best structure for
range selections.