Chap18 Linux System
Chap18 Linux System
Linux History
Design Principles
Chapter 18: Kernel Modules
Operating System Concepts – 9th Edition Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.2 Silberschatz, Galvin and Gagne ©2013
Objectives History
Linux is a modern, free operating system based on UNIX
To explore the history of the UNIX operating system from standards
which Linux is derived and the principles upon which Linux’s
design is based First developed as a small but self-contained kernel in 1991
by Linus Torvalds, with the major design goal of UNIX
To examine the Linux process model and illustrate how Linux compatibility, released as open source
schedules processes and provides interprocess Its history has been one of collaboration by many users from
communication all around the world, corresponding almost exclusively over
To look at memory management in Linux the Internet
To explore how Linux implements file systems and manages It has been designed to run efficiently and reliably on
I/O devices common PC hardware, but also runs on a variety of other
platforms
The core Linux operating system kernel is entirely original,
but it can run much existing free UNIX software, resulting in
an entire UNIX-compatible operating system free from
proprietary code
Linux system has many, varying Linux distributions
including the kernel, applications, and management tools
Operating System Concepts – 9th Edition 18.3 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.4 Silberschatz, Galvin and Gagne ©2013
The Linux Kernel Linux 2.0
Version 0.01 (May 1991) had no networking, ran only on 80386- Released in June 1996, 2.0 added two major new capabilities:
compatible Intel processors and on PC hardware, had extremely Support for multiple architectures, including a fully 64-bit native Alpha
limited device-drive support, and supported only the Minix file port
system Support for multiprocessor architectures
Linux 1.0 (March 1994) included these new features: Other new features included:
Support for UNIX’s standard TCP/IP networking protocols Improved memory-management code
BSD-compatible socket interface for networking programming Improved TCP/IP performance
Device-driver support for running IP over an Ethernet Support for internal kernel threads, for handling dependencies between
loadable modules, and for automatic loading of modules on demand
Enhanced file system Standardized configuration interface
Support for a range of SCSI controllers for
Available for Motorola 68000-series processors, Sun Sparc
high-performance disk access systems, and for PC and PowerMac systems
Extra hardware support
2.4 and 2.6 increased SMP support, added journaling file system,
Version 1.2 (March 1995) was the final PC-only Linux kernel preemptive kernel, 64-bit memory support
Kernels with odd version numbers are development kernels, 3.0 released in 2011, 20th anniversary of Linux, improved
those with even numbers are production kernels virtualization support, new page write-back facility, improved
memory management, new Completely Fair Scheduler
Operating System Concepts – 9th Edition 18.5 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.6 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.7 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.8 Silberschatz, Galvin and Gagne ©2013
Linux Licensing Design Principles
The Linux kernel is distributed under the GNU General Public Linux is a multiuser, multitasking system with a full set of
License (GPL), the terms of which are set out by the Free UNIX-compatible tools
Software Foundation
Its file system adheres to traditional UNIX semantics, and it
Not public domain, in that not all rights are waived fully implements the standard UNIX networking model
Anyone using Linux, or creating their own derivative of Linux, Main design goals are speed, efficiency, and standardization
may not make the derived product proprietary; software
Linux is designed to be compliant with the relevant POSIX
released under the GPL may not be redistributed as a binary-
documents; at least two Linux distributions have achieved
only product
official POSIX certification
Can sell distributions, but must offer the source code too
Supports Pthreads and a subset of POSIX real-time
process control
The Linux programming interface adheres to the SVR4 UNIX
semantics, rather than to BSD behavior
Operating System Concepts – 9th Edition 18.9 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.10 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.11 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.12 Silberschatz, Galvin and Gagne ©2013
Components of a Linux System (Cont.) Kernel Modules
The system libraries define a standard set of functions Sections of kernel code that can be compiled, loaded, and
through which applications interact with the kernel, and which unloaded independent of the rest of the kernel.
implement much of the operating-system functionality that
A kernel module may typically implement a device driver, a file
does not need the full privileges of kernel code
system, or a networking protocol
The system utilities perform individual specialized
The module interface allows third parties to write and distribute, on
management tasks
their own terms, device drivers or file systems that could not be
User-mode programs rich and varied, including multiple distributed under the GPL.
shells like the bourne-again (bash)
Kernel modules allow a Linux system to be set up with a standard,
minimal kernel, without any extra device drivers built in.
Four components to Linux module support:
module-management system
module loader and unloader
driver-registration system
conflict-resolution mechanism
Operating System Concepts – 9th Edition 18.13 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.14 Silberschatz, Galvin and Gagne ©2013
Supports loading modules into memory and letting them talk Allows modules to tell the rest of the kernel that a new driver
to the rest of the kernel has become available
Module loading is split into two separate sections: The kernel maintains dynamic tables of all known drivers, and
provides a set of routines to allow drivers to be added to or
Managing sections of module code in kernel memory
removed from these tables at any time
Handling symbols that modules are allowed to reference
Registration tables include the following items:
The module requestor manages loading requested, but
Device drivers
currently unloaded, modules; it also regularly queries the
kernel to see whether a dynamically loaded module is still in File systems
use, and will unload it when it is no longer actively needed Network protocols
Binary format
Operating System Concepts – 9th Edition 18.15 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.16 Silberschatz, Galvin and Gagne ©2013
Conflict Resolution Process Management
A mechanism that allows different device drivers to reserve UNIX process management separates the creation of
hardware resources and to protect those resources from processes and the running of a new program into two distinct
accidental use by another driver. operations.
The conflict resolution module aims to: The fork() system call creates a new process
Prevent modules from clashing over access to hardware A new program is run after a call to exec()
resources
Under UNIX, a process encompasses all the information that
Prevent autoprobes from interfering with existing device the operating system must maintain to track the context of a
drivers single execution of a single program
Resolve conflicts with multiple drivers trying to access the Under Linux, process properties fall into three groups: the
same hardware: process’s identity, environment, and context
1. Kernel maintains list of allocated HW resources
Operating System Concepts – 9th Edition 18.17 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.18 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.19 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.20 Silberschatz, Galvin and Gagne ©2013
Process Context Process Context (Cont.)
The (constantly changing) state of a running program at any Whereas the file table lists the existing open files, the
point in time file-system context applies to requests to open new files
The scheduling context is the most important part of the The current root and default directories to be used for new
process context; it is the information that the scheduler needs to file searches are stored here
suspend and restart the process The signal-handler table defines the routine in the process’s
The kernel maintains accounting information about the address space to be called when specific signals arrive
resources currently being consumed by each process, and the The virtual-memory context of a process describes the full
total resources consumed by the process in its lifetime so far contents of the its private address space
The file table is an array of pointers to kernel file structures
When making file I/O system calls, processes refer to files by
their index into this table, the file descriptor (fd)
Operating System Concepts – 9th Edition 18.21 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.22 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.23 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.24 Silberschatz, Galvin and Gagne ©2013
CFS CFS (Cont.)
Eliminates traditional, common idea of time slice Then each task run with for time proportional to task’s weight
Instead all tasks allocated portion of processor’s time divided by total weight of all runnable tasks
CFS calculates how long a process should run as a function Configurable variable target latency is desired interval during
of total number of tasks which each task should run at least once
N runnable tasks means each gets 1/N of processor’s time Consider simple case of 2 runnable tasks with equal weight
and target latency of 10ms – each then runs for 5ms
Then weights each task with its nice value
If 10 runnable tasks, each runs for 1ms
Smaller nice value -> higher weight (higher priority)
Minimum granularity ensures each run has reasonable
amount of time (which actually violates fairness idea)
Operating System Concepts – 9th Edition 18.25 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.26 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.27 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.28 Silberschatz, Galvin and Gagne ©2013
Kernel Synchronization (Cont.) Interrupt Protection Levels
To avoid performance penalties, Linux’s kernel uses a
synchronization architecture that allows long critical sections to
run without having interrupts disabled for the critical section’s
entire duration
Interrupt service routines are separated into a top half and a
bottom half
The top half is a normal interrupt service routine, and runs
with recursive interrupts disabled
The bottom half is run, with all interrupts enabled, by a Each level may be interrupted by code running at a higher
miniature scheduler that ensures that bottom halves never level, but will never be interrupted by code running at the
interrupt themselves same or a lower level
This architecture is completed by a mechanism for disabling User processes can always be preempted by another
selected bottom halves while executing normal, foreground process when a time-sharing scheduling interrupt occurs
kernel code
Operating System Concepts – 9th Edition 18.29 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.30 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.31 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.32 Silberschatz, Galvin and Gagne ©2013
Managing Physical Memory Managing Physical Memory (Cont.)
The page allocator allocates and frees all physical pages; it Memory allocations in the Linux kernel occur either statically
can allocate ranges of physically-contiguous pages on (drivers reserve a contiguous area of memory during system
request boot time) or dynamically (via the page allocator)
The allocator uses a buddy-heap algorithm to keep track of Also uses slab allocator for kernel memory
available physical pages Page cache and virtual memory system also manage
Each allocatable memory region is paired with an physical memory
adjacent partner Page cache is kernel’s main cache for files and main
Whenever two allocated partner regions are both freed mechanism for I/O to block devices
up they are combined to form a larger region Page cache stores entire pages of file contents for local
If a small memory request cannot be satisfied by and network file I/O
allocating an existing small free region, then a larger free
region will be subdivided into two partners to satisfy the
request
Operating System Concepts – 9th Edition 18.33 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.34 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.35 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.36 Silberschatz, Galvin and Gagne ©2013
Virtual Memory Virtual Memory (Cont.)
The VM system maintains the address space visible to each Virtual memory regions are characterized by:
process: It creates pages of virtual memory on demand, and
The backing store, which describes from where the pages for
manages the loading of those pages from disk or their swapping
a region come; regions are usually backed by a file or by
back out to disk as required.
nothing (demand-zero memory)
The VM manager maintains two separate views of a process’s
address space:
The region’s reaction to writes (page sharing or copy-on-
write
A logical view describing instructions concerning the layout of
The kernel creates a new virtual address space
the address space
1. When a process runs a new program with the exec()
The address space consists of a set of non-overlapping
system call
regions, each representing a continuous, page-aligned
subset of the address space 2. Upon creation of a new process by the fork() system call
A physical view of each address space which is stored in the
hardware page tables for the process
Operating System Concepts – 9th Edition 18.37 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.38 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.39 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.40 Silberschatz, Galvin and Gagne ©2013
Kernel Virtual Memory Executing and Loading User Programs
The Linux kernel reserves a constant, architecture-dependent Linux maintains a table of functions for loading programs; it gives
region of the virtual address space of every process for its own each function the opportunity to try loading the given file when an
internal use exec system call is made
This kernel virtual-memory area contains two regions: The registration of multiple loader routines allows Linux to support
A static area that contains page table references to every both the ELF and a.out binary formats
available physical page of memory in the system, so that Initially, binary-file pages are mapped into virtual memory
there is a simple translation from physical to virtual Only when a program tries to access a given page will a page
addresses when running kernel code fault result in that page being loaded into physical memory
The reminder of the reserved section is not reserved for An ELF-format binary file consists of a header followed by several
any specific purpose; its page-table entries can be modified page-aligned sections
to point to any other areas of memory
The ELF loader works by reading the header and mapping the
sections of the file into separate regions of virtual memory
Operating System Concepts – 9th Edition 18.41 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.42 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.43 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.44 Silberschatz, Galvin and Gagne ©2013
Static and Dynamic Linking (Cont.) File Systems
To the user, Linux’s file system appears as a hierarchical directory tree
Linux implements dynamic linking in user mode through special
obeying UNIX semantics
linker library
Internally, the kernel hides implementation details and manages the
Every dynamically linked program contains small statically multiple different file systems via an abstraction layer, that is, the virtual
linked function called when process starts file system (VFS)
Maps the link library into memory The Linux VFS is designed around object-oriented principles and is
composed of four components:
Link library determines dynamic libraries required by process
and names of variables and functions needed A set of definitions that define what a file object is allowed to look like
The inode object structure represent an individual file
Maps libraries into middle of virtual memory and resolves
references to symbols contained in the libraries The file object represents an open file
Shared libraries compiled to be position-independent code The superblock object represents an entire file system
(PIC) so can be loaded anywhere A dentry object represents an individual directory entry
Operating System Concepts – 9th Edition 18.45 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.46 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.47 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.48 Silberschatz, Galvin and Gagne ©2013
The Linux ext3 File System (Cont.) Ext2fs Block-Allocation Policies
The main differences between ext2fs and FFS concern their disk
allocation policies
In ffs, the disk is allocated to files in blocks of 8Kb, with blocks being
subdivided into fragments of 1Kb to store small files or partially filled
blocks at the end of a file
ext3 does not use fragments; it performs its allocations in smaller
units
The default block size on ext3 varies as a function of total size of
file system with support for 1, 2, 4 and 8 KB blocks
ext3 uses cluster allocation policies designed to place logically
adjacent blocks of a file into physically adjacent blocks on disk, so
that it can submit an I/O request for several disk blocks as a single
operation on a block group
Maintains bit map of free blocks in a block group, searches for free
byte to allocate at least 8 blocks at a time
Operating System Concepts – 9th Edition 18.49 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.50 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.51 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.52 Silberschatz, Galvin and Gagne ©2013
Input and Output Block Devices
Provide the main interface to all disk devices in a system
The Linux device-oriented file system accesses disk storage
through two caches: The block buffer cache serves two main purposes:
Data is cached in the page cache, which is unified with the it acts as a pool of buffers for active I/O
virtual memory system it serves as a cache for completed I/O
Metadata is cached in the buffer cache, a separate cache The request manager manages the reading and writing of buffer
indexed by the physical disk block contents to and from a block device driver
Linux splits all devices into three classes: Kernel 2.6 introduced Completely Fair Queueing (CFQ)
block devices allow random access to completely Now the default scheduler
independent, fixed size blocks of data
Fundamentally different from elevator algorithms
character devices include most other devices; they don’t
need to support the functionality of regular files Maintains set of lists, one for each process by default
network devices are interfaced via the kernel’s networking Uses C-SCAN algorithm, with round robin between all
subsystem outstanding I/O from all processes
Four blocks from each process put on at once
Operating System Concepts – 9th Edition 18.53 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.54 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.55 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.56 Silberschatz, Galvin and Gagne ©2013
Character Devices (Cont.) Interprocess Communication
Line discipline is an interpreter for the information from the Like UNIX, Linux informs processes that an event has occurred
terminal device via signals
The most common line discipline is tty discipline, which glues There is a limited number of signals, and they cannot carry
the terminal’s data stream onto standard input and output information: Only the fact that a signal occurred is available to a
streams of user’s running processes, allowing processes to process
communicate directly with the user’s terminal
The Linux kernel does not use signals to communicate with
Several processes may be running simultaneously, tty line processes with are running in kernel mode, rather,
discipline responsible for attaching and detaching terminal’s communication within the kernel is accomplished via scheduling
input and output from various processes connected to it as states and wait_queue structures
processes are suspended or awakened by user
Also implements System V Unix semaphores
Other line disciplines also are implemented have nothing to
Process can wait for a signal or a semaphore
do with I/O to user process – i.e. PPP and SLIP networking
protocols Semaphores scale better
Operations on multiple semaphores can be atomic
Operating System Concepts – 9th Edition 18.57 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.58 Silberschatz, Galvin and Gagne ©2013
Operating System Concepts – 9th Edition 18.59 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.60 Silberschatz, Galvin and Gagne ©2013
Security Security (Cont.)
The pluggable authentication modules (PAM) system is Linux augments the standard UNIX setuid mechanism in two
available under Linux ways:
PAM is based on a shared library that can be used by any It implements the POSIX specification’s saved user-id
system component that needs to authenticate users mechanism, which allows a process to repeatedly drop and
Access control under UNIX systems, including Linux, is reacquire its effective uid
performed through the use of unique numeric identifiers (uid It has added a process characteristic that grants just a
and gid) subset of the rights of the effective uid
Access control is performed by assigning objects a protections Linux provides another mechanism that allows a client to
mask, which specifies which access modes—read, write, or selectively pass access to a single file to some server process
execute—are to be granted to processes with owner, group, or without granting it any other privileges
world access
Operating System Concepts – 9th Edition 18.61 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9th Edition 18.62 Silberschatz, Galvin and Gagne ©2013
End of Chapter 18
Operating System Concepts – 9th Edition Silberschatz, Galvin and Gagne ©2013