0% found this document useful (0 votes)
29 views28 pages

Linux File and Directory Management

This document covers Unit II of Linux Programming, focusing on files and directories, including how to create, open, read, write, and close files in UNIX. It explains the UNIX file structure, system calls, device drivers, and various library functions for file operations. Additionally, it discusses file permissions, file descriptor management, and directory manipulation functions.

Uploaded by

collegekmit76
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views28 pages

Linux File and Directory Management

This document covers Unit II of Linux Programming, focusing on files and directories, including how to create, open, read, write, and close files in UNIX. It explains the UNIX file structure, system calls, device drivers, and various library functions for file operations. Additionally, it discusses file permissions, file descriptor management, and directory manipulation functions.

Uploaded by

collegekmit76
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Linux Programming UNIT-II IV B.

Tech II Sem (KR21)

Unit II – Files and Directories


Working with Files

In this chapter we learn how to create, open, read, write, and close files.

UNIX File Structure

In UNIX, everything is a file.


Programs can use disk files, serial ports, printers and other devices in the exactly the same
way as they would use a file.
Directories, too, are special sorts of files.
Directories

As well as its contents, a file has a name and 'administrative information', i.e. the file's
creation/modification date and its permissions.

The permissions are stored in the inode, which also contains the length of the file and
where on the disc it's stored.

A directory is a file that holds the inodes and names of other files. Files
are arranged in directories, which also contain subdirectories.
A user, neil, usually has his files stores in a 'home' directory, perhaps /home/neil.

Files and Devices

Even hardware devices are represented (mapped) by files in UNIX. For example, as
root, you mount a CD-ROM drive as a file,

$ mount -t iso9660 /dev/hdc /mnt/cd_rom


$ cd /mnt/cd_rom

Department of CSE 1|Page


Linux Programming UNIT-II IV [Link] II Sem (KR21)

/dev/console - this device represents the system console.


/dev/tty - This special file is an alias (logical device) for controlling terminal
(keyboard and screen, or window) of a process.
/dev/null - This is the null device. All output written to this device is discarded.

System Calls and Device Drivers

System calls are provided by UNIX to access and control files and devices. A
number of device drivers are part of the kernel.
The system calls to access the device drivers include:

Library Functions
To provide a higher level interface to device and disk files, UNIIX provides a number of
standard libraries.

Department of CSE 2|Page


Linux Programming UNIT-II IV [Link] II Sem (KR21)

Low-level File Access

Each running program, called a process, has associated with it a number of file
descriptors.
When a program starts, it usually has three of these descriptors already opened. These are:
The write system call arranges for the first n bytes bytes from buf to be written to the file
associated with the file descriptor files.

With this knowledge, let's write our first program, simple_write.c:

Here is how to run the program and its output.

$ simple_write
Here is some data
$
read

The read system call reads up to nbytes of data from the file associated with the file decriptor
fildes and places them in the data area buf.
This program, simple_read.c, copies the first 128 bytes of the standard input to the standard
output.

Department of CSE 3|Page


Linux Programming UNIT-II IV [Link] II Sem (KR21)

If you run the program, you should see:

$ echo hello there | simple_read


hello there
$ simple_read < [Link]
Files
open
To create a new file descriptor we need to use the open system call.

open establishes an access path to a file or device.


The name of the file or device to be opened is passed as a parameter, path, and the
oflags parameter is used to specify actions to be taken on opening the file.

The oflags are specified as a bitwise OR of a mandatory file access mode and other optional
modes. The open call must specify one of the following file access modes:

The call may also include a combination (bitwise OR) of the following optional modes in
the oflags parameter:

Department of CSE 4|Page


Linux Programming UNIT-II IV [Link] I Sem (R15)

Initial Permissions

When we create a file using the O_CREAT flag with open, we must use the three parameter
form. mode, the third parameter, is made form a bitwise OR of the flags defined in the
header file sys/stat.h. These are:

For example

Has the effect of creating a file called myfile, with read permission for the owner and execute
permission for others, and only those permissions.

umask

The umask is a system variable that encodes a mask for file permissions to be used when a file is
created.
You can change the variable by executing the umask command to supply a new value.
The value is a three-digit octal value. Each digit is the results of ANDing values from 1, 2, or 4.

Department of CSE 5|Page


Linux Programming UNIT-II IV [Link] II Sem (KR21)

For example, to block 'group' write and execute, and 'other' write, the umask would be:

Values for each digit are ANDed together; so digit 2 will have 2 & 1, giving 3. The resulting
umask is 032.
close

We use close to terminate the association between a file descriptor, fildes, and its file.
ioctl

ioctl is a bit of a rag-bag of things. It provides an interface for controlling the behavior of

Department of CSE 6|Page


Linux Programming UNIT-II IV [Link] II Sem (KR21)

devices, their descriptors and configuring underlying services.


ioctl performs the function indicated by cmd on the object referenced by the descriptor files.

Try It Out - A File Copy Program

We now know enough about the open, read and write system calls to write a low- level
program, copy_system.c, to copy one file to another, character by character.

Running the program will give the following:

We used the UNIX time facility to measure how long the program takes to run. It took 2

Department of CSE 7|Page


Linux Programming UNIT-II IV [Link] II Sem (KR21)

and one half minutes to copy the 1Mb file.


We can improve by copying in larger blocks. Here is the improved
copy_block.c program.

Now try the program, first removing the old output file:

The revised program took under two seconds to do the copy.


Other System Calls for Managing Files
Here are some system calls that operate on these low-level file descriptors.
lseek

The lseek system call sets the read/write pointer of a file descriptor, fildes. You use it to set
where in the file the next read or write will occur.

Department of CSE 8|Page


Linux Programming UNIT-II IV [Link] II Sem (KR21)

The offset parameter is used to specify the position and the whence parameter specifies
how the offset is used.
whence can be one of the following:

fstat, stat and lstat

The fstat system call returns status information about the file associated with an open file
descriptor.
The members of the structure, stat, may vary between UNIX systems, but will include:

The permissions flags are the same as for the open system call above. File-type flags include:

Department of CSE 9|Page


Linux Programming UNIT-II IV [Link] II Sem (KR21)

Other mode flags include:

Masks to interpret the st_mode flags include:

There are some macros defined to help with determining file types. These include:

To test that a file doesn't represent a directory and has execute permisson set for the owner and
no other permissions, we can use the test:

Department of CSE 10 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

dup and dup2

The dup system calls provide a way of duplicating a file descriptor, giving two or more,
different descriptors that access the same file.

The Standard I/O Library

The standard I/O library and its header file stdio.h, provide a versatile interface to low-level
I/O system calls.

Three file streams are automatically opened when a program is started. They are stdin,
stdout, and stderr.

Now, let's look at:

Department of CSE 11 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

fopen

The fopen library function is the analog of the low level open system call.

fopen opens the file named by the filename parameter and associates a stream with it. The mode
parameter specifies how the file is to be opened. It's one of the following strings:

If successful, fopen returns a non-null FILE * pointer.


fread

The fread library function is used to read data from a file stream. Data is read into a data buffer
given by ptr from the stream, stream.
fwrite

The fwrite library call has a similar interface to fread. It takes data records from the specified

Department of CSE 12 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

data buffer and writes them to the output stream.


fclose

The fclose library function closes the specified stream, causing any unwritten data to be written.

fflush

The fflush library function causes all outpstanding data on a file stream to be written immediately.
fseek

The fseek function is the file stream equivalent of the lseek system call. It sets the position
in the stream for the next read or write on that stream.
fgetc, getc, getchar

The fgetc function returns the next byte, as a character, from a file stream. When it reaches
the end of file, it returns EOF.
The getc function is equivalent to fgetc, except that you can implement it as a macro.
The getchar function is equivalent to getc(stdin) and reads the next character from the
standard input.
fputc, putc, putchar

Department of CSE 13 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

The fputc function writes a character to an output file stream. It returns the value it has written,
or EOF on failure.
The function putc is quivalent to fputc, but you may implement it as a macro.
The putchar function is equivalent to putc(c,stdout), writing a single character to the standard
output.
fgets, gets

The fgets function reads a string from an input file stream. It writes characters to the string pointed to
by s until a newline is encountered, n-1 characters have been transferred or the end of file is reached.
Formatted Input and Output
There are library functions for producing output in a controlled fashion.
printf, fprintf and sprintf

The printf family of functions format and output a variable number of arguments of different
types. Ordinary characters are passed unchanged into the output. Conversion specifiers cause
printf to fetch and format additional argumetns passed as parameters. They are start with a %.
For example

which produces, on the standard output:

Some numbers: 1, 2, and 3

Here are some of the most commonly used conversion specifiers:

Department of CSE 14 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

Here's another example:

This produces:

Hello Miss A Mathew, aged 6.5


Field specifiers are given as numbers immediatley after the % character in a
conversion specifier. They are used to make things clearer.

The printf function returns an integer, the number of characters written.

scanf, fscanf and sscanf

\
The scanf family of functions work in a similar way to the printf group, except that thye read
items from a stream and place vlaues into variables.

The format string for scanf and friends contains both ordinary characters and
conversion specifiers.

Here is a simple example:

Department of CSE 15 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

The call to scanf will succeed and place 1234 into the variable num given either if the following
inputs.

Other conversion specifiers are:

Given the input line,

this call to scanf will correctly scan four items:

In general, scanf and friends are not highly regarded, for three reasons:

Other Stream Functions

Department of CSE 16 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

Other library functions use either stream paramters or the standard streams stdin, stdout,
stderr:

You can use the file stream functions to re-implement the file copy program, by using library
functions.
Try It Out - Another File Copy Program
This program does the character-by-character copy is accomplished using calls to the functions
referenced in stdio.h.

Running this program as before, we get:

$ time copy_stdio
1.69user 0.78system 0:03.70elapsed 66%CPU This
time, the program runs in 3.7 seconds.

Stream Errors
To indicate an error, many of the stdio library functions return out of range values, such as
null pointers or the constant EOF.

Department of CSE 17 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

In these cases, the error is indicated in the external variable errno:

You can also interrogate the state of a file stream to determine whether an error has occurred,
or the end of file has been reached.
The ferror function tests the error indicator for a stream and returns non-zero if its set, zero otherwise.

The feof function tests the end-of-file indicator within a stream and returns non-zero if it is set
zero otherwise.
You use it like this:

The clearerr function clears the end-of-file and error indicators for the stream to which stream
points.
Streams and File Descriptors
Each file stream is associated with a low level file descriptor.
You can mix low-level input and output operations with higher level stream operations, but this
is generally unwise.
The effects of buffering can be difficult to predict.

File and Directory Maintenance


The standard libraries and system calls provide complete control over the creation and
maintenance of files and directories.

chmod

Department of CSE 18 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

You can change the permissions on a file or directory using the chmod system call. Tis forms
the basis of the chmod shell program.

chown
A superuser can change the owner of a file using the chown system call.

unlink, link, symlink


We can remove a file using unlink.

The unlink system call edcrements the link count on a file. The
link system call cretes a new link to an existing file.
The symlink creates a symbolic link to an existing file.

mkdir, rmdir
We can create and remove directories using the mkdir and rmdir system calls.

The mkdir system call makes a new directory with path as its name.
The rmdir system call removes an empty directory.

chdir, getcwd

A program can naviagate directories using the chdir system call.

Department of CSE 19 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

A program can determine its current working directory by calling the getcwd library function.

The getcwd function writes the name of the current directory into the given buffer, buf.
Scanning Directories
The directory functions are declared in a header file, dirent.h. They use a structure, DIR, as a
basis for directory manipulation.
Here are these functions:

opendir
The opendir function opens a directory and establishes a directory stream.

readdir

The readdir function returns a pointer to a structure detailing the next directory entry in the
directory stream dirp.
The dirent structure containing directory entry details included the following entries:

telldir

Department of CSE 20 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

The telldir function returns a value that records the current position in a directory stream.

Seekdir
The seekdir function sets the directory entry pointer in the directory stream given by dirp.

closedir

The closedir function closes a directory stream and frees up the resources associated with it.
Try It Out - A Directory Scanning Program
1. The printdir, prints out the current directory. It will recurse for
subdirectories.

Department of CSE 21 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

2. Now we move onto the main function:

Department of CSE 22 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

The program produces output like this (edited for brevity):


How It Works
After some initial error checking, using opendir, to see that the directory
exists, printdir makes a call to chdir to the directory specified. While the entries returned
by readdir aren't null, the program checks to see whether the entry is a directory. If it isn't,
it prints the file entry with indentation depth.

Here is one way to make the program more general.

You can run it using the command:

$ printdir /usr/local | more

Errors

System calls and functions can fail. When they do, they indicate the reason for their failure by
setting the value of the external varaible errno.

The values and meanings of the errors are listed in the header file errno.h. They include:

Department of CSE 23 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

There are a couple of useful functions for reporting errors when they occur:
strerror and perror.

The strerror function maps an error number into a string describing the type of error that has
occurred.

The perror function also maps the current error, as reported in errno, into a string and
prints it on the standard error stream.
It's preceded by the message given in the string s (if not null), followed by a colon and a
space. For example:

might give the following on the standard error output:

Advanced Topics

Department of CSE 24 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

fcntl
The fcntl system call provides further ways to manipulate low level file descriptors.

It can perform miscellaneous operations on open file descriptors. The call,

returns a new file descriptor with a numerical value equal to or greater than the integer
newfd.
The call, returns the file descriptor flags as defined in fcntl.h. The call,

is used to set the file descriptor flags, usually just FD_CLOEXEC.


The calls,

respectively get and set the file status flags and access modes.
mmap

The mmap function creates a pointer to a region of memory associated with the contents of the
file accessed through an open file descriptor.

You can use the addr parameter to request a particular memory address.
The prot parameter is used to set access permissions for the memory segment. This is a bitwise
OR of the following constant values.

The flags parameter controls how changes made to the segment by the program are reflected
elsewhere.

Department of CSE 25 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

The msync function causes the changes in part or all of the memory segment to be written back
to (or read from) the mapped file.

The part of the segment to be updated is given by the passed start address, addr, and length,
len. The flags parameter controls how the update should be performed.

The munmap function releases the memory segment.

Try It Out - Using mmap


1. The following program, mmap_eg.c shows a file of structures beingupdated using
mmap and array-style accesses.
Here is the definition of the RECORD structure and the create NRECORDS versions each
recording their
number.

Department of CSE 26 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

2. We now change the integer value of record 43 to 143, and write this to the43rd record's
string:

Department of CSE 27 | P a g e
Linux Programming UNIT-II IV [Link] II Sem (KR21)

3. We now map the records into memory and access the 43rd record in order to change
the integer to 243 (and update the record string), again using memory mapping:

Summary

This chapter showed how LINUX provides direct access to files and devices..

Department of CSE 28 | P a g e

Common questions

Powered by AI

The 'mmap' function in Linux is used to map files or devices into memory, allowing file contents to be accessed as if they were array data. Key parameters include 'addr', specifying the desired memory address; 'prot', defining memory protection options such as readable or writable; and 'flags', controlling whether changes to the mapped area are shared with the file or private to the process. This function is beneficial for efficiently handling large files by leveraging in-memory processing .

The 'dup' and 'dup2' system calls are important for duplicating file descriptors in Linux. 'dup' creates a new file descriptor that points to the same file description as the original, allowing multiple file descriptors to refer to the same open file. 'dup2' extends this functionality by allowing the new file descriptor to be specified, closing it first if it is already open, which is useful for redirecting standard input/output streams in a deterministic way .

The 'umask' system variable in Linux encodes a mask for file permissions to be used when a file is created, effectively blocking certain permissions. It is represented as a three-digit octal value, with each digit resulting from ANDing values of 1, 2, or 4. This mask can be modified by executing the 'umask' command with a new value to change permissions behavior during file creation .

The standard I/O library provides buffered I/O, which means data is temporarily stored in a buffer before being read from or written to a file, improving efficiency by reducing the number of system calls. When mixing low-level and high-level I/O operations, the effects of buffering might cause unexpected behavior, such as reading stale data, because the buffer might not be in sync with the underlying file descriptor. This desynchronization requires careful coordination to prevent data loss or corruption .

'perror' and 'strerror' functions assist in error handling by converting error numbers into human-readable messages. 'perror' outputs a description of the last error that occurred, based on the value of 'errno', onto the standard error stream, prefixed with an optional custom message. 'strerror', on the other hand, returns a string describing the error number passed to it, allowing for more flexible error reporting in custom error-handling routines .

The 'open' system call in Linux is used to establish an access path to a file or device. The 'oflags' parameter specifies the actions to be taken when opening the file, which include a mandatory file access mode (such as read-only, write-only, or read-write) and may include additional optional modes (such as append or create) combined using a bitwise OR operation .

'fread' and 'fwrite' are part of the standard I/O library in C and are used for reading from and writing to file streams. 'fread' reads data into a buffer from a file stream, while 'fwrite' writes data from a buffer to a file stream. Unlike low-level system calls like 'read' and 'write', which operate directly on file descriptors, 'fread' and 'fwrite' operate on higher-level file streams, offering buffered I/O which generally results in better performance and simpler code .

File-based system calls in Linux can fail due to a variety of reasons such as permission issues, non-existent files, or insufficient resources. These failures are indicated by setting the external variable 'errno' to an error code representing the specific failure reason. Programmers can then use this error code with functions like 'perror' or 'strerror' to retrieve and display a descriptive error message for diagnostics .

'scanf' and related functions are often criticized for three main reasons: they can leave trailing newline characters in the input buffer, they do not handle erroneous or unexpected input well, potentially causing undefined behavior, and they require precise input formats which can be inflexible. As alternatives, more robust error-checking functions or stream-based input functions like 'fgets', which handle inputs more predictably, are suggested .

The 'fseek' function sets the position for the next read or write operation on a file stream, analogous to 'lseek' which does the same for file descriptors. Both functions allow specifying the offset and the reference point (the start of the file, current position, or end of the file), providing flexible positioning within files. 'fseek' operates on streams and, as part of the standard I/O library, can handle buffered data more efficiently than 'lseek' which works with lower-level file descriptors .

You might also like