Storage Dev 20jun
Storage Dev 20jun
The invention of magnetic tape for sound recording pre-dates the invention of the computer by
many years so this technology was the first to be utilised as a storage device. Magnetic tape is
still widely used, for long-term storage of archive material and for backup copies of large files
The interaction with it is controlled by a read head and a write head. A read head uses the basic
law of physics that a state of magnetization will affect an electrical property; a write head uses
the reverse law. Although they are separate devices the two heads are combined in a read-write
head. The two alternative states of magnetization are interpreted as a 1 or 0. The logical
construction is that data is stored in concentric tracks. Each track consists of a sequence of bits
but these are formatted into sectors where each sector contains a defined number of bytes. The
sector becomes the smallest unit of storage. To store a file, a sufficient number of sectors have to
be allocated but these may or may not be adjacent to each other. A disadvantage of tape is that
access to the data stored is sequential. If an application needs data from the far end of the tape,
then it will take a long time to read because it will have to read through the rest of the tape from
the start. This sequential access makes magnetic tape unsuitable for most data-handling
applications. However, magnetic tape can hold large volumes of data and it is easily transported.
This makes it suitable for offline storage of data. It is these qualities which make it valuable for
producing and storing backup and archive copies of files stored on a computer system.
A hard drive is considered to be a direct-access read-write device because any sector can be
chosen for reading or writing. However, the data in a sector has to be read sequentially. One
particular omission is consideration of how manufacturers can effectively deal with the fact that
the physical length of a track increases from the innermost track to the outermost track.
If this fact is ignored the data storage capacity must be less than it potentially could be. The other
omission is the simple fact that the storage capacity of disk drives has continued to improve and
sizes have continued to shrink. Thus have higher capacity and more data can be saved in it. Now-
a-days computers are equipped with 500GB hard drive which can be extended to 2TB. Most hard
drives in desktop operate at the standard performance speed of 7200RPM.
Solid-state media
Another type of a flash solid state storage device is a memory card. These are also cheap,
portable and compact. Mostly used cameras, phones, and mp3 player. They come in different
capacities and sizes. The most common type of memory card is an SD card vwhich contains
pictures taken from a camera or music for an mp3 player. A subscriber identity module (SIM)
card is a type of memory card because it stores phone numbers and text messages.
Optical media
Optical storage was developed from existing technology not associated with computing systems.
Large quantities of data can be stored on the surface and the medium is portable from one
machine to another.
The data on a CD-ROM or DVD-ROM cannot be altered. For this reason, manufacturers have
used CD-ROMs to distribute software and large data files, such as encyclopedias. Data-rich
systems are now produced on DVD than CD because DVDs provide greater storage capacity.
The Compact Disc Read Only Memory (CD-ROM) discs can hold about 80MB of data which
cannot be altered, thus it cannot be deleted accidentally. They use Random access file
organization. The digital versatile disc Read only memory (dvd-rom) can hold over 2gb of data.
They are available in dual layer which hold twice the data. They are also used to store high
quality videos because of their high capacity. They use random access file organization methods.
The read-write version (CD-RW) which came later provided the needed write functionality.
However, the CD gave way to the digital versatile disc (DVD). The latest and most powerful
technology is the Blu-ray disc (BO).
The discs spin and a laser beam is reflected from a surface which is sandwiched between a
substrate and a protective outer coating. For a CD-ROM, the reflective surface is manufactured
with indentations, called 'pits', separated by what are referred to as 'lands'. When the disc is being
read, the travel of the laser beam to a pit causes a difference in phase compared to reflection
from a land. This phase difference is recognized by the photodiode detector and attached
circuitry and interpreted as a 1 or 0.
For CD-RW and DVD-RW technologies, data is being written to the disc (the 'burn' process) the
heat generated by the absorption of the laser light changes the material to liquid form. Depending
on the intensity of the laser light the material reverts to either a crystalline or an amorphous solid
form when it cools. When the disc is read, the laser light is reflected from the crystalline solid
but not from the amorphous solid allowing the coding of a 1 or 0. Despite there only being this
one path the formatting of the data into sectors allows the disc to be used as a direct-access
device just as is the case for a magnetic hard disk. Most personal computer systems are fitted
with a CD or DVD drive, which makes the data stored truly portable between machines.
This is a magnified view of the pits and lands on the surface of a CD. The different patterns
relate to the data on the disc.
Serial files
A serial file contains records which have no defined order. A typical use of a serial file would be
for a bank to record transactions involving customer accounts. A program would be
running. Each time there was a withdrawa l or a deposit the program would receive the
details as data input and would record these in a transaction file. The records would enter
the file in chronological order but otherwise the file would have no ordering of the records.
A text file can be considered to be a type of serial file but it is different because the file has
repeating lines which are defined by an end-of-line character or characters. There is no endof-
record character. A record in a serial file must have a defined format to allow data to be
input and output correctly.
Serial access
Data are stored in the file in the order in which it
arrives. This is the simplest form of storage, but the
data are effectively unstructured, so finding an item again may be very difficult. This sort of data
storage
is only used when it is unlikely that the data will
be needed again or when the order of the data
should be determined by when it is input. A good
example of a serial file is the book that you are reading
now. The words were all typed in, in order, and
that is how they should be read. Reading this book
would be impossible if all the words were in alphabetic
order. Another example of the use of a serial file is
discussed in the section “Backup and archiving data”
Sequential files
A sequential file has records that are ordered. It is the type of file suited to long-term
storage of data. As such it should be the type of file that is considered as an alternative to a
database. The discussion in Chapter 10 (Section 10.01) compared a text file wi t h a database
but the arguments for using a database remain the same if a sequential file is used for the
comparison. In the banking scenario, a sequential file could be used as a master file for an
individual customer account. Periodica lly, the transaction file would be read and all affected
customer account master files wou ld be updated.
In order to allow the sequential file to be ordered there has to be a key field for which the
values are unique and sequential but not necessarily consecutive. It is worth emphasising
the difference between key fields and primary keys in a database table, where the values
are req ui red to be unique but not to be sequential. In a sequentia l fi le, a particular record is
found by sequentially read ing the va lue of the key fie ld until the required value is found.
A file of data can usually only be read from beginning
to end. Data records are stored in sequence – using a key
field – and this is known as a sequential file.
Consider again the example of a set of students
whose data are stored in a computer. The data could
be stored in alphabetic order of their name. It could
be stored in the order that they performed in a
Computing exam or by date of birth with the oldest
first. However it is done, the data has been arranged so
that it is easier to find a particular record. If the data
are in alphabetic order of name and the computer is
asked for Zaid’s record, it must start looking from the
beginning of the file.
We need to be aware of the file processing features
available in the programming language we shall use. It
is highly unlikely that you would be able to read records
starting at the end of the file.
Direct-access files
Direct-access files are sometimes referred to as 'random-access' files but, as with randomaccess
memory, the randomness is only that the access is not defined by a sequential
reading of the file. For large files, direct access is attractive because of the time that would
be t aken to search through a sequentia l file. In an ideal scenario, data in a direct-access file
would be stored in an identifiable record which could be located immediately when required.
Unfortunately, this is not possible. Instead, data is stored in an identifiable record but finding
it may involve an initial direct access to a nearby record fo llowed by a limited seria l search.
The choice of the position chosen for a record must be calculated using data in the record
so that the same calcu lation can be carried out when subsequently there is a search for
the data . The normal method is to use a hashing algorithm. This takes as input the value
for the key field and outputs a va lue for the position of the record relative to the start of the
file. The hashing algorithm must take into account the potential maximum length of the file,
that is, the number of records the fi le wil l store. A simple example of a hashing algorithm, if
the key field has a numeric va lue, is to divide the value by a suitably large number and use
the remainder from the division to define the position. This method will not create unique
positions. If a hash position is calculated that duplicates one already ca lculated by a different
key, the next position in the file is used. This is why a search wi ll involve a direct access
possibly followed by a limited serial search.
File access
Once a file organisation has been chosen and the data has been entered into a file, the
question now to be considered is how this data is to be used. If an individual data item is to
be read then the access method fo r a serial file is to successively read reco rd by record until
the required data is found. If the data is stored in a sequential file the process is similar but
only t he value in the key field has to be read . For a direct-access file, the value in the key field
is submitted to the hashi ng algorithm which t hen provides t he same value for th e posit ion in
the file that was provided when the algorithm was used at the t ime of data input.
File access might also be needed to delete or edit data. The normal approach with a
sequential file is to create a new version of the file . Data is copied from t he old file to the new
file unt il the record is reached which needs deleting or edit ing. If deletion is needed, readi ng
and copying of the old file continues from t he next record. If a reco rd has changed, an edited
version of the record is written to the new file and then the rema inin g records are copied to
the new file. For a direct-access file t here is no need to create a new file (un less the file has
become full) . A deleted record can have a flag set so that in a subsequent reading process the
record is skipped over.
Seria l fil e organisation is well suited to batch processing or for backing up dat a on magnetic
tape. However, if a program needs a file in which individual data items might be read,
updated or deleted then direct-access file organisation is the most suitable and serial fi le
organisat ion the least suitable.
Random access
A file that stores data in no particular order (a random access file) is very useful because it
makes adding new data very simple. In any form of sequential file, an individual item of data is
very dependent on other items
of data. Jawad cannot be placed after Mahmood because
that is the wrong “order”.
However, it is necessary to have some form of order
because otherwise the file cannot be read easily. It
would be wonderful if the computer could work out
where data are stored by looking at the data that is to
be retrieved. In other words, the user asks for Jawad’s
record and the computer can go straight to it because
the word Jawad tells it where it is being stored. How this
can be done is explained in the next section.