0% found this document useful (0 votes)
506 views

Linux Case Study

Linux began in 1991 as a small kernel written by Linus Torvalds. It has since grown through collaboration to include much of UNIX functionality. The kernel provides processes, virtual memory, and device drivers. Key components also include system libraries and utilities. Loadable kernel modules allow new drivers and functionality to be added dynamically. The kernel provides process and resource management to share the CPU and arbitrate hardware access between components.

Uploaded by

rockzapper1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
506 views

Linux Case Study

Linux began in 1991 as a small kernel written by Linus Torvalds. It has since grown through collaboration to include much of UNIX functionality. The kernel provides processes, virtual memory, and device drivers. Key components also include system libraries and utilities. Loadable kernel modules allow new drivers and functionality to be added dynamically. The kernel provides process and resource management to share the CPU and arbitrate hardware access between components.

Uploaded by

rockzapper1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

1.

LINUX Operating System - A Case Study

Introduction
The development of Linux began in 1991, when a Finnish student, Linus Torvalds, wrote and
christened Linux, a small but self-contained kernel for the 80386 processor, the first true 32-bit
processor in Intel’s range of PC-compatible CPUs. Early in its development, the Linus source
code was made available for free on the internet. As a result, Linux’s history has been one of
collaboration by many users from all around the world, corresponding almost exclusively over
the internet. From an initial kernel that partially implemented a small subset of the UNIX system
services, the Linux system has grown to include much UNIX functionality.

The Linux Kernel


The first Linux kernel released to the public was version 0.01, dated May 24, 1991. It had no
networking, ran on only 80836-compatible Intel processors and PC hardware, and had extremely
limited device-driver support. The virtual memory subsystem was also fairly basic and included
no support for memory mapped files; however, even this early incarnation supported shard pages
with copy-on-write. The only file system supported was the Minix file system. However, the
kernel did implement proper UNIX processes with protected address spaces.

Linux 2.0 was given a major version number increment on account of two major new
capabilities: support for multiple architectures, including a fully 64-bit native Alpha port, and
support for multiprocessor architectures. The memory management code was substantially
improved to provide a unified cache for file-system data independent of the caching of block
devices. As a result of this change, the kernel offered greatly increased file-system and virtual
memory performance. The 2.0 kernel also included much improved TCP/IP performance, and a
number of new networking protocols were added.

In January 1999, Linux 2.2 was added released and continued the improvements added by Linux
2.0. Networking was enhanced with more flexible firewalling, better routing and traffic
management, as well as support for TCP large window. Signal handling, interrupts and some I/O
are now locked at a finer level than before to improve SMP performance.

Components of a Linux System


Linux is composed of three main bodies of code, in line with most traditional UNIX
implementations:
1. Kernel: The kernel is responsible for maintaining all the important abstractions of the
operating system, including such things as virtual memory and processes.
2. System libraries: The system libraries define a standard set of functions through which
applications can interact with the kernel, and that implement much of the operating
system functionality that does not need the full privilege of kernel code.
3. System utilities: The system utilities are programs that perform individual, specialized
management tasks. Some system utilities may be invoked just once to initialize and
configure some aspect of the system; others – known as daemons in UNIX terminology –
may run permanently, handling such tasks as responding to incoming network
connections, accepting logon requests from terminals, or updating log files.

System management User processes User utility compilers


programs programs

System shared libraries

Linux kernel

Loadable kernel modules


Figure 1. Components of Linux Operating System

Kernel Modules
The Linux kernel has the ability to load and unload arbitrary sections of kernel code on demand.
These loadable kernel modules run in privileged kernel mode, and as a consequence have full
access to all the hardware capabilities of the machine on which they run. The module support
under Linux has three components:
1. The module management allows modules to be loaded into memory and talk to the rest
of the kernel.
2. The driver registration allows modules to tell the rest of the kernel that a new driver has
become available.
3. The conflict-resolution mechanism allows different device drivers to reserve hardware
resources and to protect those resources from accidental user by another driver.

Module Management
Loading a module requires more than just loading its binary contents into kernel memory. The
system must also make sure that any reference the module makes to kernel symbols or entry
points get updated to point to the correct locations in the kernel’s address space. Linux deals with
this reference updating by splitting the job of module loading into two separate sections: the
management of sections of module code in kernel memory, and the handling of symbols that
modules are allowed to reference.

Linux maintains an internal symbol table in the kernel. The set of exported symbols constitutes a
well-defined interface by which a module may interact with the kernel. The loading of modules
is performed in two stages. First, the module-loader utility asks the kernel to reserve a
continuous area of virtual kernel memory for the module. The kernel returns the address of the
memory allocated, and the loader utility can use this address to relocate the module’s machine
code to the correct loading address. A second call then passes the module, plus any symbol table
that the new module wants to export, to the kernel. The module itself is now copied into the
previously allocated space, and the kernel’s symbol table is updated with the new symbols for
possible use by other modules not yet loaded.

Driver Registration
Once a module is loaded, it remains no more than isolated region of memory unless it lets the
rest of the kernel know what new functionality it provides. The kernel maintains dynamic tables
of all known drivers, and provides a set of routines to allow drivers to be added to or removed
from these tables at any time. A module may register many type of drivers. The kernel makes
sure that it calls a module’s startup routine when that module is loaded, and calls the module’s
cleanup routine before that module is unloaded. These routines are responsible for registering the
module’s functionality.

Conflict Resolution
Linux provides a central conflict-resolution mechanism to help arbitrate access to certain
hardware resources. Its aims are as follows:
 To prevent modules from clashing over access to hardware resources
 To prevent autoprobes – device driver probes that auto-detect device configuration –
from interfering with existing device drivers
 To resolve conflicts among multiple drivers trying to access the same hardware

To these ends, the kernel maintains lists of allocated hardware resources. The PC has a limited
number of possible I/O ports, interrupt lines, and DMA channels; when any device driver wants
to access such a resource, it is expected to reserve the resource with the kernel database first.
This requirement incidentally allows the system administrator to determine exactly which
resources have been allocated by which driver at any given point. A module is expected to use
this mechanism to reserve in advance any hardware resources that it expects to use.

Process Management
A process is the basic context which all user-requested activity is serviced within the operating
system. Linux uses a process model similar to those of other versions of UNIX.

The Fork/Exec Process Model


The basic principle of process management is to separate two distinct operations: the creation of
processes and the running of a new program. A new process is created by the ‘fork’ system call,
and a new program is run after a call to ‘execve’. This model has the advantage of great
simplicity. Under Linux, we can break down the context into a number of specific sections.
Broadly, process properties fall into three groups: the process identity, environment, and context.

Process Identity
A process identity consists mainly of the following items:
 Process ID (PID): Each process has a unique identifier. PIDs are used to specify
processes to the operating system when an application makes a system call to signal,
modify, or wait for another process.
 Credentials: Each process must have an associated user ID and one or more group IDs
that determine the rights of a process to access system resources and files.

Processes and Threads


Processes represent the execution of single programs, whereas, threads represent separate,
concurrent execution contexts within a single process running a single program. The Linux
kernel deals simply with the difference between processes and threads: it uses exactly the same
internal representation for each. A thread is just a new process that happens to share the same
address space as its parent. The distinction between a process and a thread is made only when a
new thread is created, by the ‘clone’ system call. Whereas ‘fork’ creates a new process that has
its own entirely new process context, ‘clone’ creates a new process that has its own identity but
is allowed to share the data structure of its parent.

Process Scheduling
Scheduling is the job of allocating CPU time to different tasks within an operating system. In
case of Linux, another aspect of scheduling is also important: the running of the various kernel
tasks. Kernel tasks encompass both tasks that are requested by a running process and tasks that
execute internally on behalf of a device driver.

Linux has two separate process-scheduling algorithms. One is a time-sharing algorithm for fair
preemptive scheduling among multiple processes; the other is designed for real-time tasks where
absolute priorities are more important than fairness.

For time-sharing processes, Linux uses a prioritized, credit-based algorithm. Each process
possesses a certain number of scheduling credits; when a new task must be chosen to run, the
process with the most credits is selected. Every time that a timer interrupt occurs, the currently
running process loses one credit; when its credits reaches zero, it is suspended and another
process is chosen.

Linux’s real time scheduling is simpler still. Linux implements two real-time scheduling classes:
first come, first served and round-robin. In both cases, each process has a priority in addition to
its scheduling class.
Memory Management
Memory management under Linux has two components. The first deals with allocating and
freeing physical memory. The second handles virtual memory, which is memory mapped into the
address space of running processes.

Physical Memory
The primary physical memory manager in the Linux kernel is the page allocator. It is responsible
for allocating and freeing all physical pages. A buddy-heap allocator pairs adjacent units of
allocable memory together. Each allocable memory region has an adjacent partner and whenever
two allocated partners regions are freed up, they are combined to form a larger region. The other
three main subsystems that do their own management of physical pages are closely related.
These are the buffer cache, the page cache, and the virtual memory system.

The buffer cache is the kernel’s main cache for block oriented devices such as disk drives, and is
the main mechanism through which I/O to these devices is performed. The page cache caches
entire pages of file contents, and is not limited to block devices; it can also cache networked
data. The virtual memory system manages the contents of each process’ virtual address space.

Virtual Memory
The Linux virtual memory system is responsible for maintaining the address space visible to
each process. It creates pages of virtual memory on demand, and manages the loading of those
pages from disk or their swapping back out to disk as required. Under Linux, the virtual memory
manager maintains two separate views of a process’ address space: as a set of separate regions,
and as a set of pages.

The first view of an address space is the logical view, describing instructions that the virtual
memory system has received concerning the layout of the address space. The regions for each
address space are linked into a balanced binary tree to allow fast lookup of the region
corresponding to any virtual address. The kernel also maintains a second, physical view of each
address space. This view is stored in the hardware page table for the process. The page-table
entries determine the exact current location of each page of virtual memory, whether it is on disk
or in physical memory.

Swapping and Paging


Linux does not implement whole-process swapping; it uses the newer paging mechanism
exclusively. The paging system can be divided into two sections. First, the policy algorithm
decides which pages to write out to disk, and when to write them. Second, the paging mechanism
carries out the transfer and pages data back into physical memory when they are needed again.
The pages with the least frequently used policy are paged out. The paging mechanism supports
both to dedicated swap devices and partitions, and to normal files, although swapping to a file is
significantly slower due to the extra overhead incurred by the file system.

File System
The standard on-disk file system used by Linux is called extfs (extended file system). A later
redesign of this file system to improve performance and scalability and to add a few missing
features led to the second extended file system (ext2fs). The latest iteration of this file system
used now is ext4fs. It can support volumes with sizes up to 1 exabyte and files with sizes up to
16 terabytes. The ext4 file system is backward compatible with ext3 and ext2, making it possible
to mount ext3 and ext2 file systems as ext4. It allows for pre-allocation of on-disk space for a
file.

Input and Output


In Linux, all device drivers appear as normal files. A user can open an access channel to a device
in the same way as she can open any other file – devices appear as objects within file system.
Linux splits all devices into three classes: block devices, character devices and network devices.
Block devices include all devices that allow random access to completely independent, fixed-
sized blocks of data, including hard disk, floppy disk and CD-ROM. Applications can also
access these block devices directly as they wish; for example, a database application may prefer
to perform its own, fine-tuned laying out of data onto the disk, rather than using the general
purpose file system.

Character devices include most other devices, with the main exception of network devices.
These devices do not need to support all the functionality of regular files. For example, a
loudspeaker device would allow data to be written to it, but it would not support reading of data
back from it.
Network devices are dealt with differently from block and character devices. Users cannot
directly transfer data to network devices; instead, they must communicate indirectly by opening a
connection to the kernel’s networking subsystem.

Inter-process Communication
Linux offers several mechanisms for passing data among processes. The standard pipe
mechanism allows a child process to inherit a communication from its parent; data written to one
end of the pipe can be read at the other. Under Linux, pipes appear as just another type of inode
to virtual file system software, and each pipe has a pair of wait_queues to synchronize the reader
and writer. It also defines a set of networking facilities that can send streams of data to both local
and remote processes.

Two other methods of sharing data among processes are available. First, shared memory offers
an extremely fast way to communicate large or small numbers of data; any data written by one
process to a shared memory region can be read immediately by any other process that has
mapped that region into its address space. The main advantage of shared memory is that, on its
own, it offers no synchronization. Shared memory becomes particularly powerful when used in
conjunction with another inter-process-communication mechanism that provides the missing
synchronization.
Networking
Linux not only supports the standard Internet protocols used for most UNIX-to-UNIX
communications but it also implements a number of protocols native to other, non-UNIX
operating systems. Internally, networking in the Linux kernel is implemented by three layers of
software:
1. Socket interface
2. Protocol drivers
3. Network device drivers

User applications perform all networking requests through the socket interface. The next layer of
software is the protocol stack. Whenever any networking data arrive at this layer, either from an
application’s socket or from a network device driver, the data are expected to have been tagged
with an identifier specifying which network protocol they contain. The protocol layer may
rewrite packets, create new packets, split or reassemble packets into fragments, or simply discard
incoming data. Ultimately, once it has finished processing a set of packets, it passes them on, up
to the socket interface if the data is destined for a local connection or downward to a device
driver if the packet needs to be transmitted remotely.

Security
Linux’s security model is closely related to typical UNIX security mechanisms. The security
concerns can be classified in two groups:
1. Authentication: Making sure that nobody can access the system without first proving
that he has entry rights
2. Access control: Providing a mechanism for checking whether a user has the right to
access a certain object, and preventing access to objects as required
Authentication
Authentication has typically been performed through the use of a publicly readable password
file. A user’s password is combined with a random value and the result is encoded with a one-
way transformation function and stored in the password file. The use of the one way function
means that the original password cannot be deduced from the password file except by trial and
error. When a user presents a password to the system, the password is recombined with the value
stored in the password file and passed through the same one-way transformation. If the result
matches the contents of the password file, then the password is accepted.

Access Control
Access control under Linux is performed through the use of unique numeric identifiers. A user
identifier (UID) identifies a single user or a single set of access rights. A group identifier (GID)
is an extra identifier that can be used to identify rights belonging to more than one user.

Every object in the system under user and group access control has a single UID and a single
GID associated with it. If a process’ UID matches the UID of an object, then the process has user
rights or owner rights to that object. If the UIDs do not match but any of the process’ GIDs
match the object’s GID, then group rights are conferred; otherwise, the process has world rights
to the object.

The only exception is the privileged rood UID. A process with this special UID is granted
automatic access to any object in the system, bypassing normal access checks. Most of the
kernel’s key internal resources are implicitly owned by the root UID.

You might also like