Csi 10
Csi 10
FILE STRUCTURE
Content
▪ Define two categories of access methods: sequential access and random access.
▪ Understand the structure of sequential files and how they are updated.
▪ Understand the structure of indexed files and the relation between the index and the
▪ data file.
▪ Understand the idea behind hashed files and describe some hashing methods.
▪ Files are stored on auxiliary or secondary storage devices. The two most common forms of secondary
storage are disk and tape. Files in secondary storage can be both read from and written to. Files can also
exist in forms that the computer can write to but not read. For example, the display of information on
the system monitor is a form of file, as is data sent to a printer. In a general sense, the keyboard is also a
file, although it cannot store data.
▪ A sequential file is one in which records can only be accessed one after another
from beginning to end. Figure 13.2 shows the layout of a sequential file. Records are
stored one after another in auxiliary storage, such as tape or disk, and there is an
EOF (end-of-file) marker after the last record.
▪ Algorithm 10.1 shows how records in a sequential file are processed. We process
the records one by one. After the operating system processes the last record, the
EOF is detected and the loop is exited.
▪ To access a record in a file randomly, we need to know the address of the record.
For example, suppose a customer wants to check their bank account. Neither the
customer nor the teller knows the address of the customer’s record. The customer
can only give the teller their account number (key). Here, an indexed file can relate
the account number (key) to the record address (Figure 13.5).
Figure 10.3 Mapping in an indexed file Figure 10.4 Logical view of an indexed file
4. HASHED FILES
▪ In an indexed file, the index maps the key to the address. A hashed file uses a
mathematical function to accomplish this mapping. The user gives the key, the
function maps the key to the address and passes it to the operating system, and the
record is retrieved (Figure 10.5).
▪ Direct hashing
▪ In direct hashing, the key is the data file address without any algorithmic manipulation. The file must
therefore contain a record for every possible key. Although situations suitable for direct hashing are
limited, it can be very powerful because it guarantees that there are no synonyms or collisions (discussed
later in this chapter), as with other methods
▪ Also known as division remainder hashing, the modulo division method divides the key by the file size
and uses the remainder plus 1 for the address. This gives the simple hashing algorithm that follows,
where list_size is the number of elements in the file. The reason for adding a 1 to the mod operation
result is that our list starts with 1 instead of 0:
▪ Two terms used to categorize files: text files and binary files. A file stored on a
storage device is a sequence of bits that can be interpreted by an application
program as a text file or a binary file, as shown in Figure 13.15.
▪ A binary file is a collection of data stored in the internal format of the computer. In
this definition, data can be an integer (including other data types represented as
unsigned integers, such as image, audio, or video), a floating-point number, or any
other structured data (except a file).