Linux Programming UNIT-II IV B.
Tech II Sem (KR21)
Unit II – Files and Directories
Working with Files
In this chapter we learn how to create, open, read, write, and close files.
UNIX File Structure
In UNIX, everything is a file.
Programs can use disk files, serial ports, printers and other devices in the exactly the same
way as they would use a file.
Directories, too, are special sorts of files.
Directories
As well as its contents, a file has a name and 'administrative information', i.e. the file's
creation/modification date and its permissions.
The permissions are stored in the inode, which also contains the length of the file and
where on the disc it's stored.
A directory is a file that holds the inodes and names of other files. Files
are arranged in directories, which also contain subdirectories.
A user, neil, usually has his files stores in a 'home' directory, perhaps /home/neil.
Files and Devices
Even hardware devices are represented (mapped) by files in UNIX. For example, as
root, you mount a CD-ROM drive as a file,
$ mount -t iso9660 /dev/hdc /mnt/cd_rom
$ cd /mnt/cd_rom
Department of CSE 1|Page
Linux Programming UNIT-II IV [Link] II Sem (KR21)
/dev/console - this device represents the system console.
/dev/tty - This special file is an alias (logical device) for controlling terminal
(keyboard and screen, or window) of a process.
/dev/null - This is the null device. All output written to this device is discarded.
System Calls and Device Drivers
System calls are provided by UNIX to access and control files and devices. A
number of device drivers are part of the kernel.
The system calls to access the device drivers include:
Library Functions
To provide a higher level interface to device and disk files, UNIIX provides a number of
standard libraries.
Department of CSE 2|Page
Linux Programming UNIT-II IV [Link] II Sem (KR21)
Low-level File Access
Each running program, called a process, has associated with it a number of file
descriptors.
When a program starts, it usually has three of these descriptors already opened. These are:
The write system call arranges for the first n bytes bytes from buf to be written to the file
associated with the file descriptor files.
With this knowledge, let's write our first program, simple_write.c:
Here is how to run the program and its output.
$ simple_write
Here is some data
$
read
The read system call reads up to nbytes of data from the file associated with the file decriptor
fildes and places them in the data area buf.
This program, simple_read.c, copies the first 128 bytes of the standard input to the standard
output.
Department of CSE 3|Page
Linux Programming UNIT-II IV [Link] II Sem (KR21)
If you run the program, you should see:
$ echo hello there | simple_read
hello there
$ simple_read < [Link]
Files
open
To create a new file descriptor we need to use the open system call.
open establishes an access path to a file or device.
The name of the file or device to be opened is passed as a parameter, path, and the
oflags parameter is used to specify actions to be taken on opening the file.
The oflags are specified as a bitwise OR of a mandatory file access mode and other optional
modes. The open call must specify one of the following file access modes:
The call may also include a combination (bitwise OR) of the following optional modes in
the oflags parameter:
Department of CSE 4|Page
Linux Programming UNIT-II IV [Link] I Sem (R15)
Initial Permissions
When we create a file using the O_CREAT flag with open, we must use the three parameter
form. mode, the third parameter, is made form a bitwise OR of the flags defined in the
header file sys/stat.h. These are:
For example
Has the effect of creating a file called myfile, with read permission for the owner and execute
permission for others, and only those permissions.
umask
The umask is a system variable that encodes a mask for file permissions to be used when a file is
created.
You can change the variable by executing the umask command to supply a new value.
The value is a three-digit octal value. Each digit is the results of ANDing values from 1, 2, or 4.
Department of CSE 5|Page
Linux Programming UNIT-II IV [Link] II Sem (KR21)
For example, to block 'group' write and execute, and 'other' write, the umask would be:
Values for each digit are ANDed together; so digit 2 will have 2 & 1, giving 3. The resulting
umask is 032.
close
We use close to terminate the association between a file descriptor, fildes, and its file.
ioctl
ioctl is a bit of a rag-bag of things. It provides an interface for controlling the behavior of
Department of CSE 6|Page
Linux Programming UNIT-II IV [Link] II Sem (KR21)
devices, their descriptors and configuring underlying services.
ioctl performs the function indicated by cmd on the object referenced by the descriptor files.
Try It Out - A File Copy Program
We now know enough about the open, read and write system calls to write a low- level
program, copy_system.c, to copy one file to another, character by character.
Running the program will give the following:
We used the UNIX time facility to measure how long the program takes to run. It took 2
Department of CSE 7|Page
Linux Programming UNIT-II IV [Link] II Sem (KR21)
and one half minutes to copy the 1Mb file.
We can improve by copying in larger blocks. Here is the improved
copy_block.c program.
Now try the program, first removing the old output file:
The revised program took under two seconds to do the copy.
Other System Calls for Managing Files
Here are some system calls that operate on these low-level file descriptors.
lseek
The lseek system call sets the read/write pointer of a file descriptor, fildes. You use it to set
where in the file the next read or write will occur.
Department of CSE 8|Page
Linux Programming UNIT-II IV [Link] II Sem (KR21)
The offset parameter is used to specify the position and the whence parameter specifies
how the offset is used.
whence can be one of the following:
fstat, stat and lstat
The fstat system call returns status information about the file associated with an open file
descriptor.
The members of the structure, stat, may vary between UNIX systems, but will include:
The permissions flags are the same as for the open system call above. File-type flags include:
Department of CSE 9|Page
Linux Programming UNIT-II IV [Link] II Sem (KR21)
Other mode flags include:
Masks to interpret the st_mode flags include:
There are some macros defined to help with determining file types. These include:
To test that a file doesn't represent a directory and has execute permisson set for the owner and
no other permissions, we can use the test:
Department of CSE 10 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
dup and dup2
The dup system calls provide a way of duplicating a file descriptor, giving two or more,
different descriptors that access the same file.
The Standard I/O Library
The standard I/O library and its header file stdio.h, provide a versatile interface to low-level
I/O system calls.
Three file streams are automatically opened when a program is started. They are stdin,
stdout, and stderr.
Now, let's look at:
Department of CSE 11 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
fopen
The fopen library function is the analog of the low level open system call.
fopen opens the file named by the filename parameter and associates a stream with it. The mode
parameter specifies how the file is to be opened. It's one of the following strings:
If successful, fopen returns a non-null FILE * pointer.
fread
The fread library function is used to read data from a file stream. Data is read into a data buffer
given by ptr from the stream, stream.
fwrite
The fwrite library call has a similar interface to fread. It takes data records from the specified
Department of CSE 12 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
data buffer and writes them to the output stream.
fclose
The fclose library function closes the specified stream, causing any unwritten data to be written.
fflush
The fflush library function causes all outpstanding data on a file stream to be written immediately.
fseek
The fseek function is the file stream equivalent of the lseek system call. It sets the position
in the stream for the next read or write on that stream.
fgetc, getc, getchar
The fgetc function returns the next byte, as a character, from a file stream. When it reaches
the end of file, it returns EOF.
The getc function is equivalent to fgetc, except that you can implement it as a macro.
The getchar function is equivalent to getc(stdin) and reads the next character from the
standard input.
fputc, putc, putchar
Department of CSE 13 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
The fputc function writes a character to an output file stream. It returns the value it has written,
or EOF on failure.
The function putc is quivalent to fputc, but you may implement it as a macro.
The putchar function is equivalent to putc(c,stdout), writing a single character to the standard
output.
fgets, gets
The fgets function reads a string from an input file stream. It writes characters to the string pointed to
by s until a newline is encountered, n-1 characters have been transferred or the end of file is reached.
Formatted Input and Output
There are library functions for producing output in a controlled fashion.
printf, fprintf and sprintf
The printf family of functions format and output a variable number of arguments of different
types. Ordinary characters are passed unchanged into the output. Conversion specifiers cause
printf to fetch and format additional argumetns passed as parameters. They are start with a %.
For example
which produces, on the standard output:
Some numbers: 1, 2, and 3
Here are some of the most commonly used conversion specifiers:
Department of CSE 14 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
Here's another example:
This produces:
Hello Miss A Mathew, aged 6.5
Field specifiers are given as numbers immediatley after the % character in a
conversion specifier. They are used to make things clearer.
The printf function returns an integer, the number of characters written.
scanf, fscanf and sscanf
\
The scanf family of functions work in a similar way to the printf group, except that thye read
items from a stream and place vlaues into variables.
The format string for scanf and friends contains both ordinary characters and
conversion specifiers.
Here is a simple example:
Department of CSE 15 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
The call to scanf will succeed and place 1234 into the variable num given either if the following
inputs.
Other conversion specifiers are:
Given the input line,
this call to scanf will correctly scan four items:
In general, scanf and friends are not highly regarded, for three reasons:
Other Stream Functions
Department of CSE 16 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
Other library functions use either stream paramters or the standard streams stdin, stdout,
stderr:
You can use the file stream functions to re-implement the file copy program, by using library
functions.
Try It Out - Another File Copy Program
This program does the character-by-character copy is accomplished using calls to the functions
referenced in stdio.h.
Running this program as before, we get:
$ time copy_stdio
1.69user 0.78system 0:03.70elapsed 66%CPU This
time, the program runs in 3.7 seconds.
Stream Errors
To indicate an error, many of the stdio library functions return out of range values, such as
null pointers or the constant EOF.
Department of CSE 17 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
In these cases, the error is indicated in the external variable errno:
You can also interrogate the state of a file stream to determine whether an error has occurred,
or the end of file has been reached.
The ferror function tests the error indicator for a stream and returns non-zero if its set, zero otherwise.
The feof function tests the end-of-file indicator within a stream and returns non-zero if it is set
zero otherwise.
You use it like this:
The clearerr function clears the end-of-file and error indicators for the stream to which stream
points.
Streams and File Descriptors
Each file stream is associated with a low level file descriptor.
You can mix low-level input and output operations with higher level stream operations, but this
is generally unwise.
The effects of buffering can be difficult to predict.
File and Directory Maintenance
The standard libraries and system calls provide complete control over the creation and
maintenance of files and directories.
chmod
Department of CSE 18 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
You can change the permissions on a file or directory using the chmod system call. Tis forms
the basis of the chmod shell program.
chown
A superuser can change the owner of a file using the chown system call.
unlink, link, symlink
We can remove a file using unlink.
The unlink system call edcrements the link count on a file. The
link system call cretes a new link to an existing file.
The symlink creates a symbolic link to an existing file.
mkdir, rmdir
We can create and remove directories using the mkdir and rmdir system calls.
The mkdir system call makes a new directory with path as its name.
The rmdir system call removes an empty directory.
chdir, getcwd
A program can naviagate directories using the chdir system call.
Department of CSE 19 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
A program can determine its current working directory by calling the getcwd library function.
The getcwd function writes the name of the current directory into the given buffer, buf.
Scanning Directories
The directory functions are declared in a header file, dirent.h. They use a structure, DIR, as a
basis for directory manipulation.
Here are these functions:
opendir
The opendir function opens a directory and establishes a directory stream.
readdir
The readdir function returns a pointer to a structure detailing the next directory entry in the
directory stream dirp.
The dirent structure containing directory entry details included the following entries:
telldir
Department of CSE 20 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
The telldir function returns a value that records the current position in a directory stream.
Seekdir
The seekdir function sets the directory entry pointer in the directory stream given by dirp.
closedir
The closedir function closes a directory stream and frees up the resources associated with it.
Try It Out - A Directory Scanning Program
1. The printdir, prints out the current directory. It will recurse for
subdirectories.
Department of CSE 21 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
2. Now we move onto the main function:
Department of CSE 22 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
The program produces output like this (edited for brevity):
How It Works
After some initial error checking, using opendir, to see that the directory
exists, printdir makes a call to chdir to the directory specified. While the entries returned
by readdir aren't null, the program checks to see whether the entry is a directory. If it isn't,
it prints the file entry with indentation depth.
Here is one way to make the program more general.
You can run it using the command:
$ printdir /usr/local | more
Errors
System calls and functions can fail. When they do, they indicate the reason for their failure by
setting the value of the external varaible errno.
The values and meanings of the errors are listed in the header file errno.h. They include:
Department of CSE 23 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
There are a couple of useful functions for reporting errors when they occur:
strerror and perror.
The strerror function maps an error number into a string describing the type of error that has
occurred.
The perror function also maps the current error, as reported in errno, into a string and
prints it on the standard error stream.
It's preceded by the message given in the string s (if not null), followed by a colon and a
space. For example:
might give the following on the standard error output:
Advanced Topics
Department of CSE 24 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
fcntl
The fcntl system call provides further ways to manipulate low level file descriptors.
It can perform miscellaneous operations on open file descriptors. The call,
returns a new file descriptor with a numerical value equal to or greater than the integer
newfd.
The call, returns the file descriptor flags as defined in fcntl.h. The call,
is used to set the file descriptor flags, usually just FD_CLOEXEC.
The calls,
respectively get and set the file status flags and access modes.
mmap
The mmap function creates a pointer to a region of memory associated with the contents of the
file accessed through an open file descriptor.
You can use the addr parameter to request a particular memory address.
The prot parameter is used to set access permissions for the memory segment. This is a bitwise
OR of the following constant values.
The flags parameter controls how changes made to the segment by the program are reflected
elsewhere.
Department of CSE 25 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
The msync function causes the changes in part or all of the memory segment to be written back
to (or read from) the mapped file.
The part of the segment to be updated is given by the passed start address, addr, and length,
len. The flags parameter controls how the update should be performed.
The munmap function releases the memory segment.
Try It Out - Using mmap
1. The following program, mmap_eg.c shows a file of structures beingupdated using
mmap and array-style accesses.
Here is the definition of the RECORD structure and the create NRECORDS versions each
recording their
number.
Department of CSE 26 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
2. We now change the integer value of record 43 to 143, and write this to the43rd record's
string:
Department of CSE 27 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)
3. We now map the records into memory and access the 43rd record in order to change
the integer to 243 (and update the record string), again using memory mapping:
Summary
This chapter showed how LINUX provides direct access to files and devices..
Department of CSE 28 | P a g e