File Management
File Management
• File concepts
1. File naming
2. File attributes
3. File operation
4. File types
5. File system
File
• A file can be defined as a data structure which
stores the sequence of records.
• Files are stored in a file system, which may exist
on a disk or in the main memory.
• Files can be simple (plain text) or complex
(specially-formatted).
• The collection of files is known as Directory.
• The collection of directories at the different levels,
is known as File System.
FILE NAMING
• A file can be given a name for the convenience of its use by its creator.
• A name is attached to every file so as to uniquely identify it and access it through
its name.
• The naming rules differ from OS to OS
• Generally names are of 1 to 8 characters with combinations of alphabets,
numbers and few special characters.
• Alphabets can be both uppercase or lowercase which is considered to be
different in different OS.
• Eg: in UNIX, files- HELLO,hello,Hello,HeLlO are different files whereas in MS-DOS
they all represent same files.
Functions of file system
• It enables users to give user defined names, to create, to modify and delete files.
• It provides a uniform logical view of data to users rather than physical view i.e. internal
structure by giving user friendly interface.
• It enables users to structure their files in a way most appropriate for each application.
• It controls the transferring of data blocks between secondary storage and main
memory and also between different files.
• It provides semantics or the rules for file sharing among different processes and users.
• It also allocates and manages space for files on secondary storage devices such as
magnetic tapes and disks.
• It protects the file from system failures and applies measures for recovery and backup.
• It provides security measures for confidential data such as electronic funds or criminal
records.
• It also provide encryption and decryption facilities to the users
2. ACCESS METHODS
• Files are used to store data. The information present in the file can be
accessed by various methods.
• Thus, the way of retrieving data from file is known as access methods.
• The various access methods are:
1. Sequential access
2. Direct access
3. Indexed access
SEQUENTIAL ACCESS
• Information is the file is accessed in the order it is stored in the file i.e. one record after the other.
• For example, reading of the 34th record followed by the 5th record and then 1st record is not possible with
this access method.
• Read command cause a pointer to be moved ahead by one.
• Write command allocate space for the record and move the pointer to the new End Of File.
• Such a method is reasonable for tape rather than disk.
• Advantages:
1. Simple and easy to implement
2. No need for any storage space identification
3. Uses memory efficiently
• Disadvantages:
1. Searching is time consuming
2. New records can be added to the end of the file.
3. High data redundancy.
4. Not possible to handle the random enquiries.
• Example: reset, read next, write next.
DIRECT ACCESS/RANDOM ACCESS/ RELATIVE ACCESS
• The various records can be read or write randomly. There is no restriction on the order of reading or
writing for a direct access file.
• It uses direct access storage devices(DASD) like disks rather than magnetic tapes.
• Various records or blocks of file are numbered for reference purpose.
• In order to perform read or write operations, we specify block number where read or write operation is
to be performed.
• Example: an instruction “read n” will read block number n.
• Application example: in banking application, a customer may want to look up his current balance. This
can be done by locating this customer’s record using his account number as key rather than
sequentially reading the records for thousand of the customers before the customer’s record is located
and read.
• Advantages:
1. Faster access
2. No sorting
3. Random record insertion and deletion
• Disadvantages:
1. Requires backup facility as records are directly updated.
2. Requires expensive DASD to store records
3. Less efficient in terms of memory usage
4. Data may be accidently erased or overwritten
INDEXED ACCESS
• Combination of both sequential and random access methods
• An index is created for the file
• This index contains pointer for various blocks of a file, just like an index in the back of a book .
• Index is searched sequentially whereas its pointer is used to access file directly
• Here various records are stored randomly on a DASD such as disk using a primary key. The data
is accessed either sequentially or randomly using index.
• If we want to find a record of a file, first the index is searched and then the pointer from index
is used to access that file. In this way, a required record is found.
• When the size of a file is large, its index becomes too large. In such a situation a further an
index is created for the index file.
• Advantages:
1. Records are processed in both sequential and random access methods.
2. Faster.
• Disadvantages:
1. Requires lots of storage space because of the presence of index.
2. Indexed files have to be reorganised from time to time to get rid of deleted records
3. DIRECTORY
• Operations
• Directory structure:
1. Single level
2. Two level
3. Tree
4. Acyclic
5. general
FOLDER vs DIRECTORY
• Directory is a file system concept. In GUI the directory represents as Folder.
• Example 1: Unix systems, /usr/bin/ is usually referred to as a directory when viewed in a
command line console, but if accessed through a graphical file manager, users may
sometimes call it a folder.
• Example 2: Windows OS uses both Directory(in CUI like mkdir, cd) and Folder(in GUI).
• Analogy:
• A Folder is like a room.
• A Directory is like a hotel keeper who knows all the rooms.
DIRECTORY OPERATIONS
• A directory is a symbol table of files that stores all the related
information about the file it holds with the contents.
• Thus, a directory is a list of files.
• Each entry of a directory defines a file.
• A typical directory entry may contains the information like file name,
types, its version number, size, owner of file, access rights etc.
LOGICAL STRUCTURE OF
DIRECTORY
SINGLE LEVEL DIRECTORY
• The simplest method is to have one big list of all the files on the disk. The entire system will contain only one directory which is
supposed to mention all the files present in the file system. The directory contains one entry per each file present on the file system.
• Advantages
1. Implementation is very simple.
2. If the sizes of the files are very small then the searching becomes faster.
3. File creation, searching, deletion is very simple since we have only one directory.
• Disadvantages
1. We cannot have two files with the same name.
2. The directory may be very big therefore searching for a file may take so much time.
3. Protection cannot be implemented for multiple users.
4. There are no ways to group same kind of files.
5. Choosing the unique name for every file is a bit complex and limits the number of files in the system because most of the
Operating System limits the number of characters used to construct the file name.
TWO LEVEL DIRECTORY
• In two level directory systems, we can create a separate directory for each user.
• There is one master directory which contains separate directories dedicated to
each user.
• For each user, there is a different directory present at the second level,
containing group of user's file.
• The system doesn't let a user to enter in the other user's directory without
permission.
• A two-level directory can be thought of as a tree, or an inverted tree, of height 2.
1. The root of the tree is the MFD.
2. Its direct descendants are the UFDs.
3. The descendants of the UFDs are the files themselves. The files are the leaves of the tree.
• Although the two-level directory structure solves the name-collision problem, it still has
disadvantages.
• This structure effectively isolates one user from another.
• Isolation is an advantage when the users are completely independent but is a
disadvantage when the users want to cooperate on some task and to access one
another's files.
• Specifying a user name and a file name defines a path in the tree from the root (the
MFD) to a leaf (the specified file).
• Thus, a user name and a file name define a path name. To name a file uniquely, a user
must know the path name of the file desired.
• EXAMPLE: C:\userb\test
TREE STRUCTURE DIRECTORY
• It is extension of two level directory. The major difference here is that each UFD
can in turn has subdirectories. Thus each user directory can have files or further
sub-directories as branches and leaves.
• The various users can create their own sub directories to organize the files of
different types, such as a separate subdirectories for graphic files, text files etc.
• One bit in each directory entry defines the entry:
1. as a file (0),
2. as a subdirectory (1).
• Path names can be of two types: absolute and relative
1. An absolute path name begins at the root and follows a path down to the specified file,
giving the directory names on the path.
2. A relative path name defines a path from the current directory.
• With a tree-structured directory system, users can be allowed to access, in addition to their
files, the files of other users.
1. For example, user B can access a file of user A by specifying its path names.
2. User B can specify either an absolute or a relative path name.
3. Alternatively, user B can change her current directory to be user A's directory and access the file by its file
names.
• Advantages:
1. Very generalize, since full path name can be given.
2. Very scalable, the probability of name collision is less.
3. Searching becomes very easy, we can use both absolute path as well as relative.
• Disadvantages:
1. Every file does not fit into the hierarchical model, files may be saved into multiple directories.
2. We can not share files.
3. It is inefficient, because accessing a file may go under multiple directories.
ACYCLIC GRAPH DIRECTORY
• It allows different user directories to be organised within master file directory just like tree
structure.
• The additional feature provided by acyclic group is that the subdirectories or files of different
users can be shared.
• A shared directory or file will exist in the file system in two or more places simultaneously.
• In such a case, shared file or directory does not mean that two copies of that file or directory
exist.
• With shared file, only one actual file exists, so a change made by one use is immediately visible
to other.
• In case of shared subdirectories, a new file created by one user automatically appear in all the
shared subdirectories.
• Because of sharing, a file may have multiple absolute path names. As a result distinct file
names may refer to the same file.
• If a file gets deleted in acyclic graph structured directory system, then
• 1. In the case of soft link, the file just gets deleted and we are left with a dangling pointer.
• 2. In the case of hard link, the actual file will be deleted only if all the references to it gets
deleted.
• Advantage: easy to traverse the graph in order to find a particular file
• Disadvantages: as there are no cycles, shared section(files and directories) are
traversed twice. This leads to the wastage of time
GENERAL GRAPH DIRECTORY
• In general graph directory structure, cycles are allowed within a directory structure
where multiple directories can be derived from more than one parent directory.
• The main problem with this kind of directory structure is to calculate total size or space
that has been taken by the files and directories.
• Advantages:
• It allows cycles.
• It is more flexible than other directories structure.
• Disadvantages:
• It is more costly than others.
• It needs garbage collection.
4. FILE PROTECTION
• TYPES OF ACCESS
• ACCESS CONTROL & PASSWORD PROTECTION
PASSWORD PROTECTION
• The owner of a file can protect a file from an unauthorized access by
assigning a password. Thus , only those users who know the
passwords can access a particular file.
• Disadvantage:
1. If a separate password is associated with each file, the user will have to
remember too many passwords.
2. If only one password is used for protecting all files then cracking of one
password by an unauthorized user will enable him to access all the files.
5.DIRECTORY IMPLEMENTATION
• There is the number of algorithms by using which, the directories can
be implemented. However, the selection of an appropriate directory
implementation algorithm may significantly affect the performance of
the system.
• The directory implementation algorithms are classified according to
the data structure they are using. There are mainly two algorithms
which are used in these days.
1. Linear List
•In this algorithm, all the files in a directory are maintained as singly linked list.
Each file contains the pointers to the data blocks which are assigned to it and the
next file in the directory.
•Characteristics
•When a new file is created, then the entire list is checked whether the new file
name is matching to a existing file name or not. In case, it doesn't exist, the file can
be created at the beginning or at the end. Therefore, searching for a unique name
is a big concern because traversing the whole list takes time.
•The list needs to be traversed in case of every operation (creation, deletion,
updating, etc) on the files therefore the systems become inefficient.
2. Hash Table
•To overcome the drawbacks of singly linked list implementation of directories,
there is an alternative approach that is hash table. This approach suggests to use
hash table along with the linked lists.
•A key-value pair for each file in the directory gets generated and stored in the
hash table. The key can be determined by applying the hash function on the file
name while the key points to the corresponding file stored in the directory.
•Now, searching becomes efficient due to the fact that now, entire list will not be
searched on every operating. Only hash table entries are checked using the key
and if an entry found then the corresponding file will be fetched using the value.
6. ALLOCATION METHODS
• The allocation method defines how the files are stored in the disk blocks. The
direct access nature of the disks gives us the flexibility to implement the files. In
many cases, different files or many files are stored on the same disk.
• The main problem that occurs in the operating system is that how we allocate
the spaces to these files so that the utilization of disk is efficient and the quick
access to the file is possible. There are mainly three methods of file allocation in
the disk.
1. Contiguous allocation
2. Linked allocation
3. Indexed allocation
• The main idea behind contiguous allocation methods is to provide
1. Efficient disk space utilization
2. Fast access to the file blocks
CONTIGUOUS ALLOCATION
• In this scheme, each file occupies a contiguous set of blocks on the disk. For
example, if a file requires n blocks and is given a block b as the starting location,
then the blocks assigned to the file will be: b, b+1, b+2,……b+n-1.
• This means that given the starting block address and the length of the file (in
terms of blocks required), we can determine the blocks occupied by the file.
• The directory entry for a file with contiguous allocation contains
• Disadvantages:
1. The pointer head is relatively greater than the linked
allocation of the file.
2. Indexed allocation suffers from the wasted space.
3. For the large size file, it is very difficult for single
index block to hold all the pointers.
4. For very small files say files that expend only 2-3
blocks the indexed allocation would keep on the entire
block for the pointers which is insufficient in terms of
memory utilization.
7.FREE SPACE MANAGEMENT
• As we know that the memory space in the disk is limited. So we need to use the space
of the deleted files for the allocation of the new file. one optical disk allows only one
write at a time in the given sector and thus it is not physically possible to reuse it for
other files.
• The system maintains a free space list by keep track of the free disk space. The free
space list contains all the records of the free space disk block. Thee free blocks are
those which are not allocated to other file or directory.
• When we create a file we first search for the free space in the memory and then check
in the free space list for the required amount of space that we require for our file. if the
free space is available then allocate this space to the new file. After that, the allocating
space is deleted from the free space list. Whenever we delete a file then its free
memory space is added to the free space list.
• The process of looking after and managing the free blocks of the disk is called free
space management. There are some methods or techniques to implement a free space
list. These are as follows:
1. Bitmap
2. Linked list
3. Grouping
4. Counting
BIT MAP/BIT VECTOR
• A Bitmap or Bit Vector is series or collection of bits where each bit corresponds
to a disk block. The bit can take two values: 0 and 1:
• 0 indicates that the block is allocated and 1 indicates a free block.
• For Example: Apple Macintosh operating system uses the bitmap method to
allocate the disk space.
• Advantages:
• This technique is relatively simple.
• This technique is very efficient to find the free space on the disk.
• Disadvantages:
• This technique requires a special hardware support to find the first 1
in a word it is not 0.
• This technique is not useful for the larger disks.
LINKED LIST(FREE LIST)
• This is another technique for free space management. In this linked list of all the free
block is maintained. In this, there is a head pointer which points the first free block of
the list which is kept in a special location on the disk.
• This block contains the pointer to the next block and the next block contain the pointer
of another next and this process is repeated. By using this disk it is not easy to search
the free list.
• This technique is not sufficient to traverse the list because we have to read each disk
block that requires I/O time. So traversing in the free list is not a frequent action.
• In our earlier example, we see that keep block 2 is the first free block which points to
another block which contains the pointer of the 3 blocks and 3 blocks contain the
pointer to the 4 blocks and this contains the pointer to the 5 block then 5 block
contains the pointer to the next block and this process is repeated at the last .
• Advantages:
• Whenever a file is to be allocated a free block, the operating system can simply
allocate the first block in free space list and move the head pointer to the next
free block in the list.
• Disadvantages:
• Searching the free space list will be very time consuming; each block will have to
be read from the disk, which is read very slowly as compared to the main
memory.
• Not Efficient for faster access.
GROUPING(LINKED LIST OF INDICES)
• A modification of the free-list approach is to store the addresses of n free blocks in the first free block.
• The last block contains the addresses of another n free blocks, and so on.
• The importance of this implementation is that the addresses of a large number of free blocks can be
found quickly.
COUNTING
• We can keep the address of the first free block and the number n of free
contiguous blocks that follow the first block.
• Each entry in the free-space list then consists of a disk address and a count.
• Although each entry requires more space than would a simple disk address, the
overall list will be shorter, as long as the count is generally greater than1.