0% found this document useful (0 votes)
71 views

File Organizationnotes

The document discusses different file organization techniques including sequential, direct/random, and indexed sequential. It provides details on sequential file organization including that records are arranged in order and the entire file must be processed to find a record. It also covers direct/random file organization where records can be directly accessed using a key and magnetic disk which allows direct addressing of data locations through tracks and sectors.

Uploaded by

Jatin Chaudhari
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

File Organizationnotes

The document discusses different file organization techniques including sequential, direct/random, and indexed sequential. It provides details on sequential file organization including that records are arranged in order and the entire file must be processed to find a record. It also covers direct/random file organization where records can be directly accessed using a key and magnetic disk which allows direct addressing of data locations through tracks and sectors.

Uploaded by

Jatin Chaudhari
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10

File Organization

A file is a collection of logically related records. A record is a structure of logically related fields, or elements, of information. The technique used to represent and store the records on a file is called the file organization. The fundamental file organization techniques are: 1. Sequential 2. Direct or Randam 3. Indexed sequential. The selection of the particular file organization depends upon: 1. The type of application 2. The method of processing for updating file 3. Size of the file 4. File inquiry capabilities 5. File volatility 6. The response time Sequential Access File Organization In a sequential file, records are arranged in the ascending or descending order or chronological order of a key field. Since the records are ordered by the key filed, there is no storage location identification. Sequential file organization is suitable for such applications where the file is to be processed in its entirety. To locate a particular record within a file each record is examined in sequence from the beginning until the desired record is found. Transactions affecting the sequential files are accumulated in batches, which are then used to update the sequential file at periodical intervals. Sequential files are normally created and maintained on magnetic tape. All updated data is sorted prior to its use. During batch processing, the updated data and the sequential file data are alternately read and processed. It is impractical to write a new record back to the same position as the old one occupied and a s a result the master file is updated in the CPU and then written on to a new tape. Sequential file can also be constructed on magnetic disk. In such cases the direct access capabilities of the disk are not taken advantage of. Advantages: - Simple to understand approach. - Easy to organize, maintain and understand. - Loading a record require only the record key. - Efficient and economical if the activity rate i.e the proportion of the file records to be processed is high. - Relatively inexpensive I/O media and devices may be used. - Files may be relatively easy to reconstruct since a good measure of built-in backup is usually available. Drawbacks: - Entire file must be processed even when the activity rate is very low. - Transactions must be sorted and placed in sequence prior to processing. - Data redundancy is typically high since the same data may be stored in several file sequenced in different keys. - Random enquiries are virtually impossible to handle. Applications: Payroll A/C, Financial A/C.

SLICA

Magnetic Tape Magnetic tape is widely used storage device that can store data serially or sequentially, known as SASD - serial access storage device. Magnetic tape consist of a strip of plastic tape coated with a magnetic material deposited in grains, each of which may be magnetized in one of two possible directions to store 0 or 1. Most manufacturers now use tape and data maybe recorded on wither seven or nine parallel tracks, one track in each information. The set of grains across the width of the tape represents one character. Seven or nine bits can be recorded in a very small length of the tape called a frame. Seven track tapes are now widely replaced with nine track tapes. A frame in a nine track tape hold a byte and recording density varies from 800 bpi to 1600 bpi. The most commonly used tape length is 2,400 feet, the first and last 25 feet not being used for holding the data. The packing density is usually 1600 character per inch, which is nowadays 6000 character per inches. The data transfer rate of 300,000 characters per second is common, which can be upto 1,200,000 in latest machines. Following fig shows data recording on 7 track magnetic tape.

Reel of magnetic tape is physically mounted on the input device known as Magnetic Tape Device unit which has following four major components: 1. Two reel holders, one for the feed tape and the other for the take up spool. 2. A tape drive mechanism 3. Read write and erase head 4. Vacuum chambers serving as tape reservoirs, to ensure even tape movement, and prevent the breaking of tape during the stop/start process.

SLICA

Reading or Recording in magnetic tape Records can be read from tape into main storage and written back one at a time. After one record has been written the tape stops until the write instruction for the next record is given. This start-stop process involves slowing down the tape until it comes to rest, and then accelerating again to full speed. Data can only be read or recorded when the tape is moving at full speed and consequently there is blank stretch of tape between each record, the inter-record gap (IRG). To minimize the amount of time n space wasted by inter-record gaps, a blocking technique is often employed. Each block contains number of records and the whole block is read into main store, or written on to the tape, one at time. As the tape stops/ starts between each block nothing can be written in the inter-block gap (IBG). Fig shows the layout:

Here load point marker and end pint marker indicates start and end of the tape. Header and trailer label represents start and stop of logical record. Direct Access File Organization Also referred to as Random or Relative organization. Files in this organization are stored in a direct-access storage device (DASD) like magnetic disk, using an identification key. This identification key relates a record to its actual storage position in the file. The computer can directly locate the key to find the desired record without having to search through any other records first. Ex. employee records could be accessed by using the employee number assigned to them. Usually records are assigned numbers in a chronological order as new numbers are added to the file. When inactive numbers are removed from a file, these numbers become available to be assigned to new records. Records stored in such system appear to be in a random order. The processing is random processing. It is used in online system where rapid response and fast updating are important. Direct access files are more useful in situations when the majority of accesses to records in that file are for individual records at unpredictable times. Records are stored in file by their key fields. An arithmetic procedure called transform is adopted to convert the record key number into a DASD storage location number. Key value Physical Position Beginning 1 COW ZEBRA . APE DOG . CAT BAT

SLICA

of the file

2 I-1 I N-1 N

End of file When a relative file is established, the relationship that will be used to translate between key values and physical addresses is designed. R(key) address Advantages - Immediate access to records for updating purpose is possible. - Immediate updating of several files as a result of a single transaction is possible. - Transactions need not be sorted. - Different discs or disc units are not required for updating records as existing records may be amended by overwriting. - Random inquiries which are too frequent in business situations can be easily handled. - It is also possible to process direct file records sequentially in a record key sequence. - A direct file organization is most suitable for interactive online applications such as airline or railway reservations systems, teller facility in banking applications, etc. Drawbacks: - Data may be accidentally erased or over-written unless special precautions are taken. - May be less efficient in the use of storage space than sequentially organized file. - Expensive hardware and software resources are required. - Relative complexity of programming. - System design around it is complex and costly. - File updation is more difficult as compared to sequential files. - Special security measures are necessary for online direct files that are accessible from several stations. Magnetic Disk Magnetic disk is a storage device also known as random access device as it permits direct addressing of the data locations. An individual disc is a thin circular metal plate/platter, coated on both sides with ferrous oxide material, and data is recorded in the form of magnetized spots on the tracks, a spot representing the presence by 1 and its absence 0, enabling representation of data in binary. A surface of a magnetic disk is divided into number of invisible concentric circles, called tracks, ranging from 200 to 1500. Data per track can be 4 Kbytes to 200 Kbytes, depending on the system of recording. The tracks are further subdivided into sectors, blocks etc., each with its own unique address. Disks move on a vertical rotating shaft or spindle, rotating at speed of 2400 to 3600 revolutions per minute. 4 SLICA

Many systems provide for a pack or cartridge of disks, referred to as disk pack, or module, to be mounted on the Disk Drive. There are spaces between disks. Reading/writing is accomplished by means of a series of read/write heads which are fitted on the access arms. Two types of head arrangements are used- the moving head system or head-per track (HTP) system. In the case of moving head, only one head is provided for each surface which floats on an arms. To provide access to a particular track, the head moves on the arm to be positioned over it. The actual reading writing takes place only when the desired sector comes under the head because of the rotation. HTP system is similar to moving head except that a read/write head is provided for each track of each of the surface. Head movement is not required, heads are fixed. There are two types of Disk Drives on which magnetic disks are amounted: Exchangeable (or replaceable) disk storage Fixed disk storage

In most moving head system the stack of disks can be taken out for off-line storage of information. Such removable stack of disks is known as disk-pack. Data on the magnetic disk can be accessed again and again. It can also be recorded erasing the older information. Each seek arm has at least two recording heads to read/write data on either side of the disk. The top and bottom surface of a disk pack are not used. Thus a pack of 6 disks has only 10 usable surfaces. Magnetic Disk Drives in which one or more disks are permanently mounted to the spindle are termed as Fixed Disk Storage or Winchester disk. Some devices have two spindles and can read or write with two disk packs simultaneously. They are termed as Twin (or dual) Exchangeable Disk Storage (TEDS or DEDS).

SLICA

The capacity of the disk pack may vary from 20 megabytes to about 1000 megabytes of data. The total number of bytes on the disk pack is: Number of cylinders tracks per cylinder sectors per track bytes per sector. Indexed Sequential Access Organization The sequential and direct access files are considered the opposite of each other. The indexed sequential file combines the positive aspects of both the sequential and the direct access files. In an indexed sequential file, records are stored sequentially on a direct access device (i.e. magnetic disk) and data is accessible either randomly or sequentially. The sequential access of data occurs one record at a time until the desired item of data is found. Its best suited where both batch and on-line processing are to be supported. The records in these files are organized in sequence for the efficient processing of large batch jobs but an index is also used to speed up access to the records. Records are stored sequentially by a key record in a DASD. At the time of periodic updating during a batch run, the direct access capability is not in use, only first record may be directly accessed, while all other records are read in sequence. However, indexes permit access to selected records without searching the entire file. This technique is known as indexed sequential access method (ISAM). Advantages - Permits the efficient and economical use of sequential processing techniques when the activity rate is high. - Permits quick access to records in a relatively efficient way when this activity is a small fraction of the work-load. Drawback - Slow retrieval compared to random access as searching of index requires time. - Less efficient in the use of storage space than some other alternatives. - Relatively expensive hardware and software resources are required. Applications: Material A/C, Banking industry.

SLICA

Garbage Collection Number of methods are available to allocate and free the allocated storage when not in use. Allocating the storage is simple. Programmer can requests it by declaring the structure at program-block entry or by invoking a routine, which creates a specific structure at run time. Freeing the allocated storage is difficult task. At block exist the storage allocated at entry for local variables can all be freed. The difficult storage to handle is those, which are allocated dynamically. One method is to give the responsibility of releasing the storage to the programmer. Some languages support statements like free, which can be used to do this. Most languages however, reserve for themselves the task of storage release. The problem now becomes one of determining by what means the system decides to free storage. There are several methods. One is to free no storage at all until there is almost none left. Then all the allocated blocks are checked and those that are no longer being used, are freed. This method is called garbage collection. During the program execution, blocks of storage that once were needed but which at some later time became unnecessary and unused are called garbage. A garbage collection simply goes through and recovers these garbage blocks. Two problems arise in the context of storage release. One is the accumulation of garbage; this has the effect of decreasing the amount of free storage available and consequently increasing the chances of having to refuse a request for storage. The other problem is of dangling references. A dangling reference is a pointer existing in a program which still accesses a block that has been freed. If ever the block is reallocated and then this dangling pointer is used, the program once again has access to that block which is now being used for completely different purpose. This method makes use of a special routine which is invoked whenever the available storage is almost exhausted, or whenever a particular request cannot be met, or perhaps, whenever the amount of available storage has decreased beyond a certain predefined point. Normal program execution is interrupted while this routine frees garbage blocks and is resumed when the garbage collector has finished its work. The garbage collection algorithm normally has two phases. The first phase consists of a tracing of all the access paths from all the program and system variables through the allocated blocks. Phase two consists of moving through the entire segment of memory, resetting the marks of the marked blocks, and returning to the free list every allocated block that has not been marked. This method also solves the problem of dangling pointers. Compaction Compaction is a technique of reclaiming storage. Compaction works by actually moving blocks of data etc., from one location in memory to another so as to collect all the free blocks into one large block. The allocation problem then becomes completely simplified. Allocation now consists of merely moving a pointer which points to the top of this successively shortening block of storage. Once this single storage block gets too small again, the compaction mechanism is again invoked to reclaim, what unused storage may now exist among allocated blocks.

SLICA

There is generally no storage release mechanism. Instead, a marking algorithm is used to mark blocks that are still in use. Then, instead of freeing each unmarked block by calling a release mechanism to put it on the free list, the compacter simply collects all unmarked blocks into one large block at one end of the memory segment. The only problem with this method is of redefining the pointer, which can be solved by making extra passes through memory. After blocks are marked, the entire memory is stepped through and the new address for each marked block is determined. This new address is stored in the block itself. Then another pass over memory is made. On this pass, pointer that point to marked blocks are reset to point where the marked blocks will be after compaction. The new addresses can be available from the blocks themselves. After all pointers have been reset, then the marked blocks are moved to their new locations. Addressing Techniques in direct access files There are three fundamental techniques used by mapping functions R, where R(key value) address: 1. Direct mapping 2. Directory lookup 3. Calculation Direct Mapping technique The simplest technique for translating a record key to a storage address is direct mapping. Absolute addressing One simple approach to implementing R(key value) -----> address Is to have Key value = address. This mapping function is called absolute addressing. The key value supplied by a human or program user is the same as the records actual address. Processing time is less but the method is device dependent and address-space dependent. User has to know how the records are physically stored. Relative Addressing Another simple approach to implement R(key value) -----> address Is called relative addressing. Here Key value = relative address. A relative address can be supplied to a program for translation to an absolute address. The relative address of the record in a file is the records ordinal number in the file. A file with space for N records has records with relative addresses from the set {1,2,3,.N-1, N}. Method is still storage-space dependent. Directory lookup techniques After direct mapping the next most simple approach to implementing R(key value) -----> address is directory lookup. This method takes all the advantages of direct mapping while eliminating the disadvantages.

SLICA

The basic idea of the directory lookup approach is to keep table or directory of key values:address pairs (or key value:relative address pairs). In order to find a record on a relative file, one locates its key value in the directory and then the indicated address is used to find the record on storage. Directory entries are sorted to provide rapid searching.

Address calculation techniques Another common approach to implementing R(key value) -----> address is to perform a calculation on the key value such that the result is a relative address. One problem that may encountered is the situation where R(K1) = R(K2) But K1 K2 is called the collision; two unequal keys have been calculated to have the same address. Address calculation techniques are also referred to as: - Scatter storage techniques - Randomising techniques - Key-to-address transformation methods - Direct addressing techniques - Hash table methods - Hashing We will us the term hashing here, the calculation applied to a key value to obtain an address is called a hash function. The primary goal of a hash function is to generate relatively few collisions. Hashing can be used in conjunction with directory lookup also. Most common hash functions are: Division-reminder Mid-square Folding Division-reminder hashing Its a hash function, which uses a simple division method. The basic idea of the approach is to divide a key value by an appropriate number, then to use reminder of the division as the relative address for the record. Ex. Let div be the divisor, key be the key, and addr be the resultant address. The function

SLICA

R(key)------> address can be implemented as: addr=key % div; There are several factor that should be considered in selecting the divisor. Mid-square Hashing In this technique the key is squared, then specified digits are extracted from the middleof the result to yield the relative address. Ex. Key value --- 123456789 Key squared ---- 15241578750190521 Relative address --- 8750. Hashing by Folding The key value is partitioned into a number of parts, each of which has the same number of digits as does the target relative address. These partitions are then folded over each other and summed. The result, with its highest order digit truncated, if necessary is the relative address. Ex: key 123456789 and target address will have 4 digits. The key is partitioned into 4-digit chuncks 1 234 5 6789

then folded to give 1 2345 9876 And summed to 13221. The high order digit is truncated, giving relative address as 3221.

SLICA

10

You might also like