Of February 1978, Sex: Male, Class: Form 4A: Compiled by Kapondeni T. 11-Feb-14
Of February 1978, Sex: Male, Class: Form 4A: Compiled by Kapondeni T. 11-Feb-14
Types of files
1. Master File: it is a permanent file kept up-to-date by applying transactions that occur
during the operation of the business. It contains all information of the job. It contains
permanent and semi-permanent data. Static (permanent) data stored in database files
can include Surname, First names, Date of birth, etc.
2. Transaction Files: These are temporary files that contain data that can change
regularly, usually created on daily basis and is used to update the master file. It contains
details of all transactions that have occurred in the last period only. This includes sales
per day, student mark in a weekly test, etc. Transaction files are used to update master
files. They can be discarded once updating has occurred.
3. Reference files: These are files that contain permanent data which is required for
reference purposes only. This includes data on tax bands, formulae, etc. No changes to
files are done.
4. Data file: A set of related records (either written or electronic) kept together.
5. Text file: file that only accommodated string data, no graphs, pictures or tables.
Characters can be organised in a line by line basis.
NB:- * Indicates the end of field marker, and the ≈ indicates the end of record marker, and
these allow data to be processed.
Variable length records have the following advantages:
They are more economical in terms of usage of disk storage space as they do not
allow spaces to lie idle.
Less space is wasted on the storage medium.
It allows as many fields as possible to be held on a particular record, e.g subjects
taken in an exam by a particular student.
More records can be packed on one physical block, thereby reducing time spend
in reading the file.
Data entered will not be cut but appears as entered no matter how long it is.
No truncation of data occurs
However, variable length records have the following disadvantages:
End of field and end of record markers occupy disk storage space that might be
used to store data.
These records are difficult to update as the transaction and master files might
have different lengths.
The processing required to separate out the fields is complex.
It is difficult to estimate file sizes accurately when a new system is being
designed.
Records cannot be updated insitu.
FILE ORGANISATION
Refers to the way in which records in a file are stored, retrieved and updated. This
affects the number of records stored, access speed and updating speed. The most
common methods of file organisation are: Serial File Organisation, Sequential File
organisation, indexed – sequential file organisation and random (direct) file
organisation.
2. Sequential File Organisation: This is whereby records are stored one after another
and are sorted into a key sequence, that is, in ascending or descending order of a given
key filed as on magnetic tapes. Records are held one after another in key sequence.
Sequential files organisation is appropriate for files with a high hit rate like payroll
processing.
They are suitable for master files since they are ordered. However, it takes too long to
access required data since the records are accessed by reading from the first record
until the required data is found. Adding of new records is difficult as this is done by re-
entering the data and the new record is inserted at its right position. It is time
consuming to update such records. Suitable for master files since records are sorted.
This is used where all records need processing. They are faster and more efficient than
serial files.
To access/view a record, each record on the file must be read, starting from the
beginning of the file, until the required record is found.
To add a new record, copy existing records up to where the new record is to be
inserted, insert record, then copy rest of file. The algorithm can be as follows:
open old master file for reading
open new master file for writing
start from beginning of old master file
Repeat
Read next record (call it current record)
If current record key>new record key THEN
Write new record to the new file
End If
Until new record is inserted or EOF (old)
If record not yet inserted THEN
Write new record to the new file
Enf If
To delete a record, the whole file is to be copied over to a new sequential file, omitting
the file to be deleted.
Processing of records is faster than that of serial files
Overflow area
The home area may become too small and may not accommodate all records. The home
area may become full. In this case, the remaining part of the home area just store
pointers to indicate position of overflow area of any additional records as the home area
gets full
NB: However, it may take longer to process the records. This is because records would
have been placed in the overflow area. After reading the index, it takes a single disc
access to read a record in the home area. Each time the home area is accessed, it takes at
least two disc accesses; one to read the home area and one to read the overflow. This
problem can be solved by re-organising the file using the housekeeping program, which
copies the file to a new file, placing all the overflow records into the home area and re-
writing the indices.
Overflow
If there is no space on the block, collision is said to have occurred and the record must
be stored elsewhere.
A re-hashing algorithm is carried out on the block that is full in order to give another
block that is not full. If the given block is full again, the hashing algorithm is applied
again until an empty block is found. The overflow area can be used just as in the indexed
sequential files.
NB:- If no further information is given, assume that overflow records are stored in the
next block
Hashing algorithm - used to translate record key into an address. However, synonyms
may occur, i.e. two record keys generate the same address (use overflow area and flag)
FILE PROCESSING
Refers to any form of activity that can be done using files. This includes: file referencing,
sorting, maintenance and updating.
1. File Referencing/Interrogation: This involves searching of record and displaying it
on the screen in order to gain certain information, leaving it unchanged. The record can
also be printed.
2. Sorting: Refers to a process of arranging (organising) records in a specific ordered
sequence, like in ascending or descending order of the key field.
3. Merging Files: This is the process of combining two or more files/records of the
same structure into one. Below is an example of how records can be merged:
Record A (sorted) Record B (unsorted)
12 34 71 78 101 103 67 3 90 12
Record C (Merged and sorted for records A and Record B)
3 12 34 67 71 78 90 101 103
4. File maintenance: This is the process of reorganising the structure of records and
changing (adding or removing or editing) fields. May also involve updating more
permanent fields on each record, adding / deleting records. This can be due to changes
due to addition or deletion of records.
5. File Updating: Updating is the process of making necessary changes to files and
records, entering recent information. Only master files are updated and they must be
up-to-date. For updating to occur, any one of the following must have occurred:
A new record has been entered. Deletion of an unwanted record. An amendment
(change) to the existing data has been made, e.g. change in date of birth only.
a. Updating by copying
This happens in sequential file updating. The transaction file must be sorted in the same
order with the master file records. This is done through the following steps:
- A record is read from master file into memory.
- A record is then read from transaction file into memory.
- Record keys from each file are compared.
- If record keys are the same, the master file is updated by moving fields form
transaction file to the new master file.
In sequential file updating, it is recommended to keep at least three master file versions
that will be used for data recovery in case of a system failure or accidental loss of data.
The first master file is called the Grandfather file, the second master file is called the
father file and the third master file is the son file. This relationship is called the
grandfather-father-son version of files. The process of keeping three versions of master
Compiled by Kapondeni T. 11-Feb-14
files (grandfather-father-son) as a result of sequential file updating is called File
Generations. Thus the first master file (grandfather file) is called the first generation
file, the second master file (father file) is called the second generation file and the third
master file (son file) is the third generation file. The following diagram illustrates the
sequential file updating process:
It applies for random files since record is accessed by means of an address therefore can
be written back to same address after updating process.