Database File Organisation Lecture
Database File Organisation Lecture
Databases
DBMS File Organisation
Dr David Hamill
Physical & Logical
• Hash
• Records are placed on disk according to
a hash function
A hash function is any function that can be used to map data of arbitrary
size to fixed-size values.
Decisions you must take…….
•One very important design aspect when creating a new
table is the decision to create or not create a clustered
index.
https://beginnersbook.com/2022/06/heap-file-organization-in-dbms/
2. Ordered/Sequential Files
• The records in a file can be physically ordered based on the
values of one or more of the fields.
• If the tuples are already ordered according to the ordering field Sno it
should be possible to reduce the execution time for the query as no
sorting is necessary.
18
SELECT *
FROM Staff
WHERE Sno = ‘SG37’;
• In this case we can use a binary search to execute the query involving
a search condition based on the ordering field Sno
2. Ordered Files - Search
Binary Search Algorithm Example
SELECT *
FROM Staff Sno Page
WHERE Sno = ‘SL20’ SG14 1
1. Initial mid-page is page 5. ‘SG37’ is not the SG21 2
record we are searching for. The value being SG24 3
searched for is greater than ‘SG37’ so we
SG36 4
discard the top half of the file.
2. Retrieve the mid-page of the bottom half of 1 SG37 5
the file, that is page 7. The value of the key SL20 6
field ‘SL21’ is greater than ‘SL20’. 4
SL21 7
3. Discard the bottom half of the search space. 2
4. Retrieve the mid-page of the remaining search SL37 8
space, that is page 6 which contains the record SL66 9
we are searching for.
2. Ordered Files - Search
• In general, the binary search is more efficient than a linear
search.
2. Ordered Files – Insertions & Deletions
• If there is not sufficient space then it would be necessary to move one or
more records onto the next page. This may cause a cascading effect.
When deleting a record we must reorganise the records to remove the free
slot.
Advantages of Sequential File Organization
1.It is simple to adapt method. The implementation is simple compared to
other file organization methods.
2.It is fast and efficient when we are dealing with huge amount of data.
3.This method of file organization is mostly used for generating various
reports and performing statistical operations on data.
4.Data can be stored on a cheap storage devices.
Hashing technique is used to calculate the direct location of a data record on the disk without
using index structure. In this technique, data is stored at the data blocks whose address is
generated by using the hashing function.
The memory location where these records are stored is known as data bucket or data blocks.
•More info
So why would you choose to use hashing?
• For a huge database structure, it’s tough to search all the index values through all its level
and then you need to reach the destination data block to get the desired data.
• Hashing method is used to index and retrieve items in a database as it is faster to search that
specific item using the shorter hashed key instead of using its original value.
• Hashing is an ideal method to calculate the direct location of a data record on the disk
without using index structure.
• There are two types: Static Hashing and Dynamic Hashing
• Data buckets are memory locations where the records are stored. It is also known as Unit Of
Storage.
Static Hashing
• Records do not have to be written sequentially to the file.
• A hash function is used to calculate the address of a page
in which the record is to be stored based on one or more
fields in the record- O(1) lookup complexity. A hash
function, is a mapping function which maps all the set
of search keys to the address where actual records are
placed.
• The base field is called the hash field.
• If the hash field is also a key field of the file then it is
called the hash key.
• Records in a hash file will appear randomly distributed
across the available file space. For this reason, hash files
are sometimes called random or direct files.
Static Hashing - Functions
•Inserting a record: When a new record requires to be inserted into the table, you can generate an
address for the new record using its hash key. When the address is generated, the record is
automatically stored in that location.
•Searching: When you need to retrieve the record, the same hash function should be helpful to
retrieve the address of the bucket where data should be stored.
•Delete a record: Using the hash function, you can first fetch the record which is you wants to delete.
Then you can remove the records for that address in memory.
Dynamic Hashing
• Each address generated by a hash function corresponds to a page (or a
bucket) with slots for multiple records. Data buckets are memory locations
where the records are stored. It is also known as Unit Of Storage.
• When the same address is generated for two or more records a collision is
said to have occurred and the records are called synonyms in this case.
• We must insert the new record in another position when a collision occurs.
• Collision management complicates hash file management and degrades overall
performance
Hashing – Static/Dynamic
• The hashing techniques we have considered so far are static in that the
hash address space is fixed when the file is created. When the space
becomes full it is said to be saturated.
• In this case it is necessary to reorganise the hash structure
• This may involve creating a new file with more space, then choosing a
new hash function and mapping the old file to the new file.