0% found this document useful (0 votes)
39 views

Of February 1978, Sex: Male, Class: Form 4A: Compiled by Kapondeni T. 11-Feb-14

This document discusses different types of files and file organization methods. It describes master files, transaction files, and reference files. It then covers fixed length records and variable length records, including their advantages and disadvantages. Finally, it discusses different methods of file organization such as serial, sequential, and indexed-sequential organization.

Uploaded by

ziggatron
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Of February 1978, Sex: Male, Class: Form 4A: Compiled by Kapondeni T. 11-Feb-14

This document discusses different types of files and file organization methods. It describes master files, transaction files, and reference files. It then covers fixed length records and variable length records, including their advantages and disadvantages. Finally, it discusses different methods of file organization such as serial, sequential, and indexed-sequential organization.

Uploaded by

ziggatron
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

FILE HANDLING

Types of files
1. Master File: it is a permanent file kept up-to-date by applying transactions that occur
during the operation of the business. It contains all information of the job. It contains
permanent and semi-permanent data. Static (permanent) data stored in database files
can include Surname, First names, Date of birth, etc.
2. Transaction Files: These are temporary files that contain data that can change
regularly, usually created on daily basis and is used to update the master file. It contains
details of all transactions that have occurred in the last period only. This includes sales
per day, student mark in a weekly test, etc. Transaction files are used to update master
files. They can be discarded once updating has occurred.
3. Reference files: These are files that contain permanent data which is required for
reference purposes only. This includes data on tax bands, formulae, etc. No changes to
files are done.
4. Data file: A set of related records (either written or electronic) kept together.
5. Text file: file that only accommodated string data, no graphs, pictures or tables.
Characters can be organised in a line by line basis.

FIXED LENGTH RECORDS


These are records that allocate a specific amount of space for data, mostly a specific
number of characters. Every field contains exactly the same number of bytes. Also,
every record contains exactly the same number of fields. For instance, a school keeps
student records in a fixed length file. The student number has 6 characters, Surname
was assigned 10 characters, First Name is given 10 characters, Date of Birth has 6
characters, sex has one character and class has 2 characters only in that order in the
computer database file. In total, the length of each record is 35 characters.
The following student details are to be entered into the computer:
Student Number: 012999, Surname: Kapondeni, First Name: Tungamirirai
Date of Birth: 7th of February 1978, Sex: Male, Class: Form 4A
When entered into the database, the record will appear as follows:

From the table above, it can be noticed that:


- The Sex field is coded to accommodate only the letters M or F. This is shorter and
therefore faster to enter data into the computer and to search records than entering the
words Male or Female.
- The Surname Kapondeni is shorter than the allocated 10 spaces. The other spaces will
remain idle.
- The First Name Tungamirirai is too long than the allocated spaces and therefore extra
characters will be cut.

Fixed length records have the following advantages:


 Entering data is faster as records are shorter and less typing is required.
 Less memory is required.
 Less data entry errors are encountered.
 It is faster to carry out searches.
 Faster to do validation checks and procedures.
 They are easier for programmers to work with than variable length records.

Compiled by Kapondeni T. 11-Feb-14


 They allow an accurate estimate of disk storage requirements. Thus disk storage
space can be easily managed as records occupy a specific number of characters.
 They are very easy to update.
 Faster to access records.

However, fixed length records have the following disadvantages:


 Can lead to wastage if disk storage space if used to store variable length data.
 For example, not all surnames are of the same length.
 Some spaces may lie idle as data entered will be shorter than the space allocated.
 Some data to be entered may be too long for the space allocated and therefore
will be cut.

b. VARIABLE LENGTH RECORDS


These are records that allow data to occupy the amount of space that it needs. They
allow data with varying (different) number of characters or sizes. Records may also
have varying number of fields. The number of bytes in a particular field may vary from
record to record. Also, the number of fields may vary from record to record. They
usually show where the field or record starts and ends, for example:

NB:- * Indicates the end of field marker, and the ≈ indicates the end of record marker, and
these allow data to be processed.
Variable length records have the following advantages:
 They are more economical in terms of usage of disk storage space as they do not
allow spaces to lie idle.
 Less space is wasted on the storage medium.
 It allows as many fields as possible to be held on a particular record, e.g subjects
taken in an exam by a particular student.
 More records can be packed on one physical block, thereby reducing time spend
in reading the file.
 Data entered will not be cut but appears as entered no matter how long it is.
 No truncation of data occurs
However, variable length records have the following disadvantages:
 End of field and end of record markers occupy disk storage space that might be
used to store data.
 These records are difficult to update as the transaction and master files might
have different lengths.
 The processing required to separate out the fields is complex.
 It is difficult to estimate file sizes accurately when a new system is being
designed.
 Records cannot be updated insitu.

FILE ORGANISATION
Refers to the way in which records in a file are stored, retrieved and updated. This
affects the number of records stored, access speed and updating speed. The most
common methods of file organisation are: Serial File Organisation, Sequential File
organisation, indexed – sequential file organisation and random (direct) file
organisation.

Compiled by Kapondeni T. 11-Feb-14


1. Serial File Organisation: This is whereby records are stored one after another as
they occur, without any definite order as on magnetic tapes. Data is not stored in any
particular sequence. Data is read from the first record until the needed data is found.
New records are added to the end of the file. Serial file organisation is not appropriate
for master files since records are not sorted and therefore are difficult to access and to
update. Suitable for temporary/ transaction files since records are not sorted.
To delete records:
 More complex
 Read record to be deleted from the file, search it from 1st record until found=true
 re-write the whole file to a new disk, omitting the unwanted record.

To add a new record(algorithm);


 open the file
 append new record to the end of the file

2. Sequential File Organisation: This is whereby records are stored one after another
and are sorted into a key sequence, that is, in ascending or descending order of a given
key filed as on magnetic tapes. Records are held one after another in key sequence.
Sequential files organisation is appropriate for files with a high hit rate like payroll
processing.
They are suitable for master files since they are ordered. However, it takes too long to
access required data since the records are accessed by reading from the first record
until the required data is found. Adding of new records is difficult as this is done by re-
entering the data and the new record is inserted at its right position. It is time
consuming to update such records. Suitable for master files since records are sorted.
This is used where all records need processing. They are faster and more efficient than
serial files.
To access/view a record, each record on the file must be read, starting from the
beginning of the file, until the required record is found.

To add a new record, copy existing records up to where the new record is to be
inserted, insert record, then copy rest of file. The algorithm can be as follows:
 open old master file for reading
 open new master file for writing
 start from beginning of old master file
 Repeat
 Read next record (call it current record)
 If current record key>new record key THEN
 Write new record to the new file
 End If
 Until new record is inserted or EOF (old)
 If record not yet inserted THEN
 Write new record to the new file
 Enf If

To delete a record, the whole file is to be copied over to a new sequential file, omitting
the file to be deleted.
Processing of records is faster than that of serial files

Compiled by Kapondeni T. 11-Feb-14


Hit rate – proportion or percentage of records being accessed on any one run. In payroll
systems, the hit rate is mostly 100% since every employee will be paid. Hit rate is
calculated by dividing the number of records accessed by total number of records in the
file and then multiplying by 100. For example, if 270 records are accessed out of 300
records, the hit rate is 270/300 x 100 = 90%

3. Indexed-Sequential Files: This is whereby records are ordered in sequence based


on the value of the index or disk address as supported by hard disks. It supports batch
processing. It is also used for creating master file since the records are ordered. It is
also suitable for real time processing applications like stock control as it is fast in
accessing records and in updating them. It provides direct access to data as on hard
disks, diskettes and compact disks. It ensures that data is accessed in some order. It
ensures that no data is missed during accessing. Can provide direct access if requests
are send online.
Indexed sequential files consists of 3 basic parts:
 the index
 The home area
 Overflow area
The index:
Contains record keys and disk addresses. The record key can be one or more fields that
uniquely identify a record. Each record key is associated with a disk address (which can
be surface, track and sector number) to identify the specific sector of the home area.
Thus the index points to the home area.

The Home Area


This contains the data records stored in record key sequence. The home area is in
sequence and can be accessed sequentially. In some situations, it can be accessed
randomly using the index.
It allows data to be stored in blocks that contain several records. A block may be one or
more sectors of the disk.
Each block may be partially filled in order to allow new records to be added later. For
example, if a block can accommodate 12 records, 8 records may be saved in each block,
allowing new records to be added during execution. This is called packing density,
which is usually 70% or more. Thus if the computer is using 70% packing density, it
means, data is stored in 70% of each block in the home area. The packing density is
always less than 100% to allow insertion of additional records later.
The home area also points to the overflow area.

Overflow area
The home area may become too small and may not accommodate all records. The home
area may become full. In this case, the remaining part of the home area just store
pointers to indicate position of overflow area of any additional records as the home area
gets full

NB: However, it may take longer to process the records. This is because records would
have been placed in the overflow area. After reading the index, it takes a single disc
access to read a record in the home area. Each time the home area is accessed, it takes at
least two disc accesses; one to read the home area and one to read the overflow. This
problem can be solved by re-organising the file using the housekeeping program, which
copies the file to a new file, placing all the overflow records into the home area and re-
writing the indices.

Compiled by Kapondeni T. 11-Feb-14


4. Random (Direct/Hash/Relative) File Organisation: This is whereby records are
not in any order but stored and accessed according to their disk address or relative
position, calculated from the primary key of the record, as supported by hard disks and
compact disks. Records are stored and retrieved according to their disk address /
relative position within file. The hashing algorithm/formula translates the primary key
into an address, using the modulo method.
To add a new record, use the hashing algorithm to work out the appropriate memory
location. If the location is empty, the records is inserted/written, otherwise the next
block is examined until an empty space is found.
To search/access a record, its address is calculated from the record key using the
hashing algorithm, the record at that address is then read, if it not the required record,
the next record is read and examined until either the record is found or empty space is
encountered. Suitable for online systems where fast response is required.
To delete a record, set flag to zero but leave the value there, therefore space can be
reused but is not actually empty. The record is not physically deleted but just marked as
deleted.

Structure of random files


Records are stored in blocks which are not necessarily in sequence. The position of the
record is determined by a hashing algorithm or randomising function. When a
record is to be stored in a file, a hashing algorithm is applied to the record key to
determine the block that is to be used. For example, the blocks may range from 0 to 499,
and the hash algorithm generates the number within this range. The records are stored
in this format:
Record Key Block
22387 300
13495 201
58905 104
48676 349
68798 34

It is appropriate where extremely fast access to data is required as in airline


reservation. Updating of records is in situ, very simple and very fast. Hard disk,
compact disks and diskettes promotes random file organisation.
When records are deleted, they are just marked as deleted but are not removed from
the file. These deleted files take up space and may slow down processing. This can be
solved by saving the records on a different file, removing the deleted records.

Overflow
If there is no space on the block, collision is said to have occurred and the record must
be stored elsewhere.
A re-hashing algorithm is carried out on the block that is full in order to give another
block that is not full. If the given block is full again, the hashing algorithm is applied
again until an empty block is found. The overflow area can be used just as in the indexed
sequential files.
NB:- If no further information is given, assume that overflow records are stored in the
next block

Hashing algorithm - used to translate record key into an address. However, synonyms
may occur, i.e. two record keys generate the same address (use overflow area and flag)

Compiled by Kapondeni T. 11-Feb-14


To solve problems of clashes of blocks after applying the hashing algorithm:
(a) Subsequent locations are read until empty location is found. The record is
inserted in the empty location. If the maximum address is reached, it loops back
to the first address, i.e. position 000
(b) A bucket (area of memory) can be set aside for overflow. Any clashing record is
inserted in the bucket or in next location in serial form.
(c) Another method is to use the existing record as the head of list. Pointers are then
used to point to records with the same hash value. New values are inserted in
free location.

FILE PROCESSING
Refers to any form of activity that can be done using files. This includes: file referencing,
sorting, maintenance and updating.
1. File Referencing/Interrogation: This involves searching of record and displaying it
on the screen in order to gain certain information, leaving it unchanged. The record can
also be printed.
2. Sorting: Refers to a process of arranging (organising) records in a specific ordered
sequence, like in ascending or descending order of the key field.
3. Merging Files: This is the process of combining two or more files/records of the
same structure into one. Below is an example of how records can be merged:
Record A (sorted) Record B (unsorted)
12 34 71 78 101 103 67 3 90 12
Record C (Merged and sorted for records A and Record B)
3 12 34 67 71 78 90 101 103

4. File maintenance: This is the process of reorganising the structure of records and
changing (adding or removing or editing) fields. May also involve updating more
permanent fields on each record, adding / deleting records. This can be due to changes
due to addition or deletion of records.
5. File Updating: Updating is the process of making necessary changes to files and
records, entering recent information. Only master files are updated and they must be
up-to-date. For updating to occur, any one of the following must have occurred:
A new record has been entered. Deletion of an unwanted record. An amendment
(change) to the existing data has been made, e.g. change in date of birth only.

The most common methods of file updating are:


Updating in situ and Updating by copying.

a. Updating by copying
This happens in sequential file updating. The transaction file must be sorted in the same
order with the master file records. This is done through the following steps:
- A record is read from master file into memory.
- A record is then read from transaction file into memory.
- Record keys from each file are compared.
- If record keys are the same, the master file is updated by moving fields form
transaction file to the new master file.

In sequential file updating, it is recommended to keep at least three master file versions
that will be used for data recovery in case of a system failure or accidental loss of data.
The first master file is called the Grandfather file, the second master file is called the
father file and the third master file is the son file. This relationship is called the
grandfather-father-son version of files. The process of keeping three versions of master
Compiled by Kapondeni T. 11-Feb-14
files (grandfather-father-son) as a result of sequential file updating is called File
Generations. Thus the first master file (grandfather file) is called the first generation
file, the second master file (father file) is called the second generation file and the third
master file (son file) is the third generation file. The following diagram illustrates the
sequential file updating process:

*NB: - Always create data backups


on compact disk or hard disks and
re-run the old master file with the
transaction file if the computer
system fails or if data is lost. This
is a data recovery method that
works well.

*NB:- A backup is a copy of file(s)


on an alternative medium like CD-
ROM in case the original file is
damaged or lost and will be used
for recovery purposes. The
original files could be deleted
accidentally, deleted by hackers,
corrupted by system failure or
could be corrupted by hackers.

Algorithm for sequential file updating


Open master file for reading
Open transaction file for reading
Open new master file for writing
Repeat
Read next transaction file record
While master file record key<transaction file record key
Write master file record key to new master file record
Read next master file record
End While
Update record
Until EOF (Transaction file)

While not EOF Master File


Read next record
Write master record to new master file
EndWhile

b. Updating by overlay (in situ):


In this case, record is accessed directly, read into memory, updated and written back to
its original position (in situ). This occurs in random and indexed-sequential files, thus
on devices like hard discs and memory sticks.

It applies for random files since record is accessed by means of an address therefore can
be written back to same address after updating process.

Compiled by Kapondeni T. 11-Feb-14

You might also like