0% found this document useful (0 votes)
2 views

File Organization

The document provides an overview of various types of storage devices, including primary, secondary, and tertiary storage, along with their characteristics and uses. It discusses different file organization methods such as fixed-length and variable-length records, as well as RAID configurations and their advantages and disadvantages. Additionally, it covers data structures like B+ trees and hashing for efficient data retrieval and storage management.

Uploaded by

shreyasood6162
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

File Organization

The document provides an overview of various types of storage devices, including primary, secondary, and tertiary storage, along with their characteristics and uses. It discusses different file organization methods such as fixed-length and variable-length records, as well as RAID configurations and their advantages and disadvantages. Additionally, it covers data structures like B+ trees and hashing for efficient data retrieval and storage management.

Uploaded by

shreyasood6162
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

File Organization

Types of Storage devices


Primary Storage
• The memory storage that is directly accessible to the CPU
comes under this category.
• CPU's internal memory (registers), fast memory (cache),
and main memory (RAM) are directly accessible to the
CPU.
• This storage is typically very small, ultra-fast, and volatile.
• Primary storage requires continuous power supply in
order to maintain its state. In case of a power failure, all its
data is lost.

Disadvantage
Secondary Storage
• Secondary storage devices are used to store data for
future use or as backup.
• Secondary storage includes memory devices that are
not a part of the CPU chipset or motherboard, for
example, magnetic disks, optical disks (DVD, CD, etc.),
hard disks, flash drives, and magnetic tapes
Tertiary Storage
• Tertiary storage is used to store huge volumes of
data.
• Since such storage devices are external to the
computer system, they are the slowest in speed.
• These storage devices are mostly used to take
the back up of an entire system.
• Optical disks and magnetic tapes are widely used
as tertiary storage.
Medias
• Cache
• Main Memory
• Flash Memory (also known as Electrically Erasable
Programmable Memory EEPROM): its reading
speed is as good as main memory but writing
takes time and complex also
• Magnetic-disk storage
• Optical Storage
• Tape Storage
Storage Device Hierarchy
Storage Device Hierarchy (Contd.)
CPU:
• They are fast
• Stores data in registers
• Processes the data values , evaluates the arithmetic
expressions , compute the addresses for data values etc.
Cache:
• RAM is 1000 times slower than CPU
• Hence intermediate storage is needed between RAM and
CPU called cache
• Fetch data from RAM in advance and gives illusion of having
faster main memory
• It is faster than RAM but slower than CPU
Storage Device Hierarchy (Contd.)
RAM:
• Stores frequently used program instructions to
increase the general speed of a system
• Uses optimistic approach of prefetching data
from disk drives
• It is faster than hard disk , but slower than
Cache
• The smallest storage unit is 1 byte
Storage Device Hierarchy (Contd.)
SSD:
• Uses as permanent storage for systems that need faster
storage access
• Offer 10 times faster speed compared to magnetic disks
• Costlier than RAM
• Used in In-memory databases and cloud based file system
Magnetic Disk Drives:
• Has rotating disks coated with magnetic material
• Cheaper and used for persistent data
• Read and write are in the volume of blocks
Disks: Physical Characteristics
Physical and operational characteristics that define
its performance
• Number of Platters Platters ∝ capacity
– The number of platters is directly proportional to the overall
capacity of the disk with each platter packing as much as 200
gigabytes of data.
• Track Density (TPI) number of tracks ∝ 1/Platter
– More platters mean more heads, which are very expensive. A typical
hard drive can have as many as 100,000 Tracks Per Inch (TPI.
• Linear Density (Bits per inch (BPI))
– The linear density of a disk refers to the total number of bits that can
be packed consecutively in one inch of a track.
– Currently used magnetic materials and read/write heads can support
up to 900,000 Bits Per Inch (BPI).
Disks: Physical Characteristics
• Seek Time (ms): Seek time is the time it takes the drive
to position the heads over the requested cylinder. It
can vary from 0 ms if the heads are already over the
right cylinder, and up to 15 to 20 ms.
• Rotational Speed (RPM): The rotational speed of the
spindle of a hard drive affect the latency time (is the
average time for the sector to rotate into position
under a head.)
• Internal Transfer Rate (Mb/s): The transfer rate of a
drive refers to the speed with which data is transferred
from the drive to main memory and vice-versa.
Fixed Length Records
• Fixed length record
– struct Person {
char name[50];
int citNo;
float salary;
};
• If 1 byte for each field the Total space used is 52
bytes
• Here we use first 52 bytes for first record next 52
for next record and so on
Fixed Record
• Problems
– Difficulty to manage deleted records
• Deleted space must be filled by other record
• Or we have to mark it as deleted
– Block size should be multiple of record size
otherwise it requires two block access to read or
write
Variable length records
– struct Person {
char name[50];
int citNo;
float salary;
Int Account_info[]
};
– Due to multiple record type
– Due to variable length for one or more field
– Due to repeating fields
Implementation:Variable length records

• Byte String representation


– Attach end-of-file marker to represent record end
• Slotted-page structure
– Organize record within the block
– Records can be swapped easily after deleting the
record
– No. of record entries maintained in header
– End of free space
Slotted Page structure
Slotted structure (Inserting)
• The variable-length records reside in a
contiguous manner within the block.
• When a new record is to be inserted, it gets
the place at the end of the free space
(because it is contiguous).
• Header fills an entry with the size and
location information of the newly inserted
record.
Slotted structure (deletion)
• When an existing record is deleted, space is
freed, and the header entry sets to deleted.
• Before deleting, it moves the record and
occupies it to create the free space.
• The end-of-free-space gets the update.
• Then all the free space again sets between
the first record and the final entry.
Implementation: Variable length records

• Fixed length Representation


– Reserved space
• Based on Maximum record length that is
never exceeded.
• Used when most of the records are of
similar length otherwise space is wasted
• List representation is used to avoid Space
wastage for long records.
Serial files
• Records follow the order in which they are
received
• Easy to maintain
• Searching is time consuming
• Insert , delete, modify operations are also
time consuming
Sequential files
Here each file/records are stored one after the
other in a sequential manner
– Insert : Adding of record requires shifting of all records
from the appropriate point to the end of files to make a
space for new records.
– Updating:
• requires the creation of new file. Records are copied to
the point where amendments are required.
• Then changes are made and copied to the new file.
• After that remaining records are copied to the new
files.
• Sequential files creates automatic backup copy.
Sequential files
Updating
• Creating a new file in every update is costly
process hence original file is made with holes
(blank space for new record)
• If a block contain ‘K’ record then initial file is
made that may contain ‘L * K’ records where ‘L’
is loading factor and has value from 0<L ≤ 1
Sequential Files
– Deletion: is inverse of addition as it requires
compression of space
– Advantages
• This method is good in case of report generation or
statistical calculations.
• This method is very fast and efficient for large volumes
of data
– Disadvantages
• Each time any insert/update/ delete transaction is
performed, file need to be sorted.
Index Sequential Method
Here records are stored in order of primary key in
the file. Using the primary key, the records are
sorted. For each primary key, an index value is
generated and mapped with the record.
• Dense Index
– Index record contain search-key value and pointer to
first data record
• Sparse Index
– Index Record appear for some of search-key value.
• Multilevel Indices
Hash, Direct or Random Files
hash function is used to calculate the address of the
block to store the records.
• If the hash function is generated on key column,
then that column is called hash key
• If hash function is generated on non-key column,
then the column is hash column.
• When a record has to be retrieved, based on the
hash key column, the address is generated and
directly from that address whole record is
retrieved.
Hash, Direct or Random Files
• when a new record has to be inserted, the
address is generated by hash key and record is
directly inserted. Same is the case with update
and delete.
Hash, Direct or Random Files
• Disadvantages
– older record will be overwritten by newer so
accidental deletion of data is a problem.
– memory is not efficiently used
– this method is not suitable for searching for range
of data
– If hash columns are frequently updated, then the
data block address is also changed accordingly
Hash File
B+ Tree
• B+ tree is similar to binary search tree, but it
can have more than two leaf nodes.
• It stores all the records only at the leaf node.
• Intermediary nodes will have pointers to the
leaf nodes.
• Intermediary do not contain any data/records.
B+ tree
• A B+-tree is a data structure to store vast amounts of
information.
• B+-trees are used to store amounts of data that will not fit
in main system memory.
• secondary storage (usually disk) is used to store the leaf
nodes of the tree.
• Internal nodes of the tree are stored in computer main
memory.
• leaf nodes are the only ones that actually store data items.
• All other nodes are called index nodes or i-nodes and
simply store "guide" values which allow us to traverse the
tree structure
B+ Tree
Pros and Cons of B+ Tree
• B+-tree is a more versatile storage structure than
hashing.
• It supports searching and retrievals based on exact key
match, pattern matching, range of values, and part
key specification.
• The B+-tree index is dynamic, growing as the relation
grows. Thus, unlike ISAM, the performance of a B+-
tree file does not deteriorate as the relation is
updated.
• Retrieval of tuples/records is more efficient than ISAM.
• if the relation is not frequently updated, the ISAM
structure may be more efficient
RAID 0
• In this level, a striped array of disks is
implemented. The data is broken down into
blocks and the blocks are distributed among
disks. Each disk receives a block of data to
write/read in parallel. It enhances the speed
and performance of the storage device. There
is no parity and backup in Level 0.
RAID 1
• RAID 1 uses mirroring techniques. When data
is sent to a RAID controller, it sends a copy of
data to all the disks in the array. RAID level 1 is
also called mirroring and provides 100%
redundancy in case of a failure.
RAID 2
• RAID 2 records Error Correction Code using
Hamming distance for its data, striped on
different disks. Like level 0, each data bit in a
word is recorded on a separate disk and ECC
codes of the data words are stored on a
different set disks. Due to its complex
structure and high cost, RAID 2 is not
commercially available.
RAID 3
• RAID 3 stripes the data onto multiple disks.
The parity bit generated for data word is
stored on a different disk. This technique
makes it to overcome single disk failures.
RAID 4
• In this level, an entire block of data is written onto
data disks and then the parity is generated and
stored on a different disk. Note that level 3 uses
byte-level striping, whereas level 4 uses block-level
striping. Both level 3 and level 4 require at least three
disks to implement RAID.
RAID 5
• RAID 5 writes whole data blocks onto different
disks, but the parity bits generated for data
block stripe are distributed among all the data
disks rather than storing them on a different
dedicated disk.
RAID 6
• RAID 6 is an extension of level 5. In this level, two
independent parities are generated and stored in distributed
fashion among multiple disks. Two parities provide additional
fault tolerance. This level requires at least four disk drives to
implement RAID.
Comparing RAID
• RAID level 0 is a right choice when data safety
and its security is not a big case.
• level 0 is used in high-performance
applications.
Comparing RAID
• The designers can go for RAID level 1 for
rebuilding the data.
• As in RAID level 1, the user can copy the data
from another disk.
• In case of other levels, it is required to access
all other disks in the array for rebuilding the
data of a failed disk.
Comparing RAID
• Build performance is an important factor in
high-performance database systems.
• In fact, the time taken to rebuild the data may
become a significant part of the repair time,
so rebuild performance also influence the
meantime for data loss.
Comparing RAID
• The block striping (raid-5)provides good data
transfer rates for large transfers, and uses a
few disks for making small data transfers.
• In the case of small data transfer, the access
time dominates, which, as a result, diminishes
the benefits of the parallel reads.
• RAID level 3 can also be proved as a bad
choice for making small data transfers.
Comparing RAID
• comparing RAID level 6 with RAID level 5, it
offers a good reliability option than RAID level
5.
• Designers can use RAID level 6 in applications
where data safety and security is a major
concern.
• Currently, many RAID implementations do not
support RAID level 6.
Comparing RAID
• RAID level 1 is good for applications like
storage of log files in the database system as
it offers the best write performance.
• On the other hand, RAID level 5 offers low
storage overhead in comparison to RAID level
1. But it takes high time overhead for write
performance.
• It is better to choose RAID level 5 for those
applications where data is read frequently but
written rarely.
Comparing RAID
• RAID level 1 and RAID level 5 have become
the most moderate choices among all other
RAID levels
• RAID level 5 provides high input-output
requirements, and RAID level 1 offers
moderate storage requirements for the data.

You might also like